Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • nomad-FAIR nomad-FAIR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 140
    • Issues 140
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nomad-lab
  • nomad-FAIRnomad-FAIR
  • Issues
  • #319

Closed
Open
Created Apr 17, 2020 by Markus Scheidgen@mscheidgOwner1 of 2 tasks completed1/2 tasks

A more flexible and more celery-tonic processing module

This is a pre-requisite for #251 (closed)

@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:

@mscheidg

  • add a redis to helm
  • run some large scale processing

@himanel1 We should do the refactoring all the way. I think you should organize the submodules based on the processed entities Calc and Upload rather then trying to separate mongo from celery. A typical submodule structure that we also use in other modules would be:

  • processing/__init__.py - Only docs and imports to expose to other modules
  • processing/processing.py - All celery setup suff
  • processing/common.py - Common sutff, the mongoengine base, our custom celery tasks/request, shared constants, etc., Pipeline, PipelineContext, Stage, empty_task
  • processing/calc.py - Including the "celery task" comp_process (don't like the name, btw.)
  • processing/upload.py - Including upload_cleanup, pipelines, get_pipeline, run_pipline
  • I think we can move the tests into a singular module, or rename test_base->test_common, test_data->test_upload

upload can depend on calc; upload and calc can depend on common; all can depend on processing; no other dependencies between submodules should be necessary

In the future we could think about replacing: @process, current_process, process_status with celery. But at the moment its very convinient to use mongodb query to check on the processing status of all entities. I feel celery wasn't really designed with persistent tasks in mind. Also we would need to be far more regid with the celery infrastructure and add persistence to rabbitmq and redis.

Edited Apr 20, 2020 by Markus Scheidgen
Assignee
Assign to
Time tracking