A more flexible and more celery-tonic processing module
This is a pre-requisite for #251 (closed)
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
- add a redis to helm
- run some large scale processing
@himanel1 We should do the refactoring all the way. I think you should organize the submodules based on the processed entities Calc and Upload rather then trying to separate mongo from celery. A typical submodule structure that we also use in other modules would be:
processing/__init__.py- Only docs and imports to expose to other modules
processing/processing.py- All celery setup suff
processing/common.py- Common sutff, the mongoengine base, our custom celery tasks/request, shared constants, etc., Pipeline, PipelineContext, Stage, empty_task
processing/calc.py- Including the "celery task"
comp_process(don't like the name, btw.)
processing/upload.py- Including upload_cleanup, pipelines, get_pipeline, run_pipline
- I think we can move the tests into a singular module, or rename test_base->test_common, test_data->test_upload
upload can depend on calc; upload and calc can depend on common; all can depend on processing; no other dependencies between submodules should be necessary
In the future we could think about replacing: @process, current_process, process_status with celery. But at the moment its very convinient to use mongodb query to check on the processing status of all entities. I feel celery wasn't really designed with persistent tasks in mind. Also we would need to be far more regid with the celery infrastructure and add persistence to rabbitmq and redis.