A more flexible and more celery-tonic processing module
This is a pre-requisite for #251 (closed)
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
-
add a redis to helm -
run some large scale processing
@himanel1 We should do the refactoring all the way. I think you should organize the submodules based on the processed entities Calc and Upload rather then trying to separate mongo from celery. A typical submodule structure that we also use in other modules would be:
-
processing/__init__.py
- Only docs and imports to expose to other modules -
processing/processing.py
- All celery setup suff -
processing/common.py
- Common sutff, the mongoengine base, our custom celery tasks/request, shared constants, etc., Pipeline, PipelineContext, Stage, empty_task -
processing/calc.py
- Including the "celery task"comp_process
(don't like the name, btw.) -
processing/upload.py
- Including upload_cleanup, pipelines, get_pipeline, run_pipline - I think we can move the tests into a singular module, or rename test_base->test_common, test_data->test_upload
upload can depend on calc; upload and calc can depend on common; all can depend on processing; no other dependencies between submodules should be necessary
In the future we could think about replacing: @process, current_process, process_status with celery. But at the moment its very convinient to use mongodb query to check on the processing status of all entities. I feel celery wasn't really designed with persistent tasks in mind. Also we would need to be far more regid with the celery infrastructure and add persistence to rabbitmq and redis.