Refactor the tasks functionality
The aim of this branch is to simplify the process/task related code and use more consistent terminology, by:
- Using only one status (
process_status
) on the Proc object instead of two as before (process_status
andtask_status
). - Changing to only use the "task" terminology when referring to celery tasks, to avoid confusion.
- Getting rid of our own @task decorator and the related validation logic (as this logic was not designed for a case where there are many different types of processes on the same Proc object, as is the case now).
The process_status
is changed to take on the following values:
- READY: The process is ready to start
- PENDING: The process has been called, but still waiting for a celery worker to start running.
- RUNNING: Currently running the main process function.
- WAITING_FOR_RESULT: Waiting for the result from some other process (used when the upload waits for the entries to finish processing)
- SUCCESS: The last process completed successfully.
- FAILURE: The last process completed with a fatal failure.
This is almost exactly the same values as used by task_status
previously, the only difference is the new statuses WAITING_FOR_RESULT and READY, which are usually not used in logical checks etc (thus, usually in the code we just need to read the process_status
where we previously read the task_status
to adapt to the new philosophy).
Additional information about what the process is doing is stored in a "free" text field, current_process_step
, roughly replacing the old current_task
field (the main difference being a better name and that we don't have any validation logic on it).
Error handling is done at the "top" level, i.e. the process level.
Mongo fields removed from the Proc
class:
- tasks: List[str]
- current_task = StringField(default=None)
- tasks_status = StringField(default=CREATED)
Mongo fields added to the Proc
class:
- current_process_step = StringField(default=None)
Note, because of these changes a datafix is needed to migrate existing data!