Skip to content

Resolve "In-api processing of simple and individual entries"

Closes #745 (closed).

The endpoint for PUT uploads/{upload_id}/raw/{path} is extended with two query arguments: wait_for_processing and include_archive. When specifying wait_for_processing, the upload and processing is run locally and the endpoint call blocks until the processing is complete (but only the uploaded file is matched and (if a parser matches) processed, nothing else), and we return an additional processing key in the response. If include_archive is specified, the archive is also included in the response.

The response now has the following format:

{
  "upload_id": ...,
  "data": <upload Proc data>
  "processing": {
    "upload_id": ...,
    "path": <full raw path to the uploaded file>,
    "entry_id": ...,
    "parser_name": ...,
    "entry": <entry proc data>
    "archive": <archive data>
  }
}

Which is identical to before except for the addition of the processing key. The processing key will be null if not specifying wait_for_processing. If we do specify wait_for_processing, but the file does not match any parser, the entry-related values inside the processing dict will be all null.

I introduced a new decorator @process_local, which was maybe a bit ambitious, but I think it was the best way to do it to get a consistent handling of the process attributes, like process_status, sync_counter etc, and unified error boundary. The local process cannot be started (i.e. the endpoint will fail) if something else is running, and nothing else can be run or scheduled to run while a local process is running.

Edited by David Sikter

Merge request reports