Improved app and helm chart with respect to timeouts and rate limiting. (!1710) · Merge requests · nomad-lab / nomad-FAIR

This partially helps with: #1914, it includes the changes of !1701 (closed)

Changes:

Timeouts are now consistently applied to ingress and proxy rules based on shared values
Separate ingress for api and others (gui, docs) for tighter rate limits at the api
Concurrent connections limit in addition to connection per second limit
ArchiveQuery defaults fit the timeout and rate limiting settings
Increased the HPC cloud loadbalancer timeouts to be slightly longer than the nomad timeouts (not this MR)
Removed the joblib based threading for multi entry archive apis. This was a noop due to GIL.
Added an await call into the multi entry archive loop, allowing requests (e.g. probes) during a running multi entry archive call.
Multi entry archive apis stop computing the requested archive list after a client disconnect.
refactored the main app, because HTTP middlewares are prohibiting recognising client disconnects (https://github.com/encode/starlette/discussions/2094). Now the api does not use any HTTP middleware
more consistent use of parameter free events in api logging

Solutions:

The app now does stop when a request is canceled (e.g. via timeout).
Timeouts are a consistent 60s and the rate limit is set to 10 concurrent api requests and 32 requests per second.
The long running multi entry archive api calls allow concurrent requests. This already worked for all downloads via the used StreamingResponses.

Edited Mar 07, 2024 by Markus Scheidgen

Improved app and helm chart with respect to timeouts and rate limiting.