ArchiveQuery in practice
On paper the ArchiveQuery
works. However, in a practical setting, e.g. running the "Query and analyze the Archive" tutorial, we get a bunch of problems.
With this fix !1694 (merged) (#1905 (closed)), I can run the query from the "Query and analyze the Archive" tutorial, with these settings:
archive_query = ArchiveQuery(
query=query,
required=required,
page_size=100,
batch_size=10,
semaphore=8,
results_max=max_entries,
)
I had runs where this was successful with an acceptable 2:30 runtime. However:
- if I increase the batch size, the individual request become too long and a timeout will cause a 503
- if I lower the batch size, too many requests are send to quickly, and I get a 503 because of rate limiting
- if I run this too often on the same deployment, I start getting 502 because of memory related OOM kills
To fix this, we need:
-
Fix the timeouts and probes on the k8s deployment. (@mscheidg) -
Implement a max_requests_per_second
on the archive query to throttle the ArchiveQuery. (@thchang) -
Figure out the memory issues. This is might be related to #1899 (closed).
Edited by Markus Scheidgen