Improved "ArchiveQuery"
The existing ArchiveQuery
has some obvious flaws.
- #682 (closed) describes failure due to 502. This might be unavoidable if the API is under high load. ArchiveQuery should deal with it instead of error-ing out
- #679 (closed) describes a JSON decode error. This should not happen, but obviously can happen. The ArchiveQuery should deal with it instead of error-ing out. Proper logging should also help to better identify the cause (e.g. specific calculation)
- #680 (closed) describes that some required things are missing. This should be fixed in v1, which adds all required to the search.
-
The last point is implemented poorly, because references are treated like sub-sections and not followed
Long running queries might always exhibit problems. The ArchiveQuery should be reimplemented with the explicit premise of API failures. As a consequence:
- results should be cached explicitly, locally, and somewhat permanently
- actual error handling
- the implementation should be more modern, e.g. with asyncio
Steps to take:
-
get familiar with asyncio -
rework the ArchiveQuery
implementation based on httpx + asyncio -
evaluate how much parallelism (asyncio again) we can use in the archive query API -
rework the API accordingly -
discuss the documentation examples with luca/luigi and martin/simon to make them more meaningful (this should finally also address the bugs above)