Commit 6f4f3160 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Refactored dependencies, added documentations, improved docker, added processing tests.

parent 8b569d93
......@@ -30,12 +30,12 @@ RUN mkdir /install
WORKDIR /install
COPY requirements.txt requirements.txt
COPY requirements-worker.txt requirements-worker.txt
COPY requirements-dep.txt requirements-dep.txt
COPY nomad/dependencies.py nomad/dependencies.py
COPY nomad/config.py nomad/config.py
RUN pip install -r requirements.txt
RUN pip install -r requirements-worker.txt
RUN pip install -r requirements-dep.txt
RUN python nomad/dependencies.py
# second stage is used to install the actual code and run the celery worker as nomad user
......
......@@ -15,10 +15,10 @@ virtualenv -p `which phyton3` .pyenv
source .pyenv/bin/activate
```
Third, install the documentation system [sphinx](http://www.sphinx-doc.org/en/master/index.html):
Third, install the development dependencies, including the documentation system
[sphinx](http://www.sphinx-doc.org/en/master/index.html):
```
pip install sphinx
pip install recommonmark
pip install -r requirements-dev.txt
```
Forth, generate the documentation:
......
# Modules
## Dependencies
```eval_rst
.. automodule:: nomad.dependencies
```
## Files
```eval_rst
.. automodule:: nomad.files
```
## Processing
```eval_rst
.. automodule:: nomad.processing
```
# Setup
### Install the legacy NOMAD submoduels.
This has to be done differently in the future. For no init the submodules and checkout
working branches/tags:
- submodules/parsers/parser-vasp master
- submodules/python-common master
- submodules/nomad-meta-info 1.6.0
### Install intra nomad dependencies.
This includes parsers, normalizers, python-common, meta-info, etc.
Those dependencies are managed and configures via python scripts.
To checkout a tag use:
This step is some what optional. Those dependencies are only needed for processing.
If you do not develop on the processing and do not need to run the workers from
your environment, and only use the docker image for processing, you can skip.
Install some pre-requisite requriements
```
git fetch --all --tags --prune
git checkout tags/1.6.0 -b 1.6.0
pip install -r requirements-dep.txt
```
`pip install -r requirements` in `python-common`, and `pip install -e .` in `python-common` and
`parsers/parser-vasp`. Futhermore, there are some dependency issues in `python-commons` requirments.
Run the dependency installation
```
python nomad/dependencies.py
```
### Install the python in your own virtual environment.
### Install the the actual code
```
pip install -r requirements.txt
......@@ -24,35 +26,54 @@ pip install -e .
```
### Run dev infrastructure with docker.
You can do it with or without the ELK stack.
To run is without (default):
First, build the docker images
```
cd ./infrastructure
docker-compose build
sh up-wo-elk.sh
```
To run with ELK, enable `logstash` in nomad.config:logstash, and start the docker compose with
This will download images for services like redis, minio, rabbitmq. It will configure
an existing image for the ELK stack. It will build the processing image that contains
all intra nomad dependencies (parsers, normalizers, etc.) and will run the celery workers
that do the processing.
You can run all containers, including ELK and processing workers:
```
docker-compose up
```
You can reach the Kibana with [localhost:5601](http://localhost:5601).
You can alos run services selectively, e.g.
```
docker-compose up redis, rabbitmq, minio
```
If you run the ELK stack (and enable logstash in nomad/config.py),
you can reach the Kibana with [localhost:5601](http://localhost:5601).
The index prefix for logs is `logstash-`.
Optionally register the infrastructue minio host to the minio client (mc).
If you want to access the minio object storage via the mc client, register the
infrastructure's minio host to the minio client (mc).
```
mc config host add minio http://localhost:9007 AKIAIOSFODNN7EXAMPLE wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
### Run the celery worker (should be moved to docker TODO)
### Run the celery worker
You can run the worker as part of the docker infrastructure.
```
cd infrastructure
docker-compose up nomad-worker
```
You can also run the worker yourself, e.g. to develop on the processing. To simply
run a worker do (from the root)
```
celery -A nomad.processing worker -l info
```
You can use different debug level (e.g. switch `info` to `debug`)
Use watchdog during development. Install (i.e. [fixed](https://github.com/gorakhargosh/watchdog/issues/330) version fo MacOS)
Use watchdog during development to reload the worker on code changes.
Watchdog is part of the requirements-dev.txt. For MacOS (there is currently a bug in watchdog)
uninstall and install this [fixed](https://github.com/gorakhargosh/watchdog/issues/330) version
```
pip install git+https://github.com/gorakhargosh/watchdog.git
```
......@@ -63,6 +84,10 @@ watchmedo auto-restart -d ./nomad -p '*.py' -- celery worker -l info -A nomad.pr
```
### Run tests.
You need to have the infrastructure running (including the nomad-worker service)
```
cd instrastructure
docker-compose up -d
cd ..
python tests/test_files.py
```
......@@ -19,7 +19,6 @@ services:
minio:
restart: always
image: minio/minio:RELEASE.2018-06-08T03-49-38Z
hostname: "minio"
# image: minio/minio
ports:
- 9007:9000
......@@ -35,7 +34,6 @@ services:
rabbitmq:
restart: always
image: rabbitmq
hostname: "rabbitmq"
environment:
- "RABBITMQ_ERLANG_COOKIE=SWQOKODSQALRPCLNMEQG"
- "RABBITMQ_DEFAULT_USER=rabbitmq"
......@@ -57,7 +55,6 @@ services:
elk:
restart: always
build: ./elk/
hostname: "elk"
ports:
- 5601:5601 # kibana web
- 9200:9200 # elastic search api
......@@ -75,4 +72,4 @@ services:
- elk
volumes:
- '../.volumes/fs:/app/.volumes/fs'
command: celery worker -l debug -A nomad.processing
command: python -m celery worker -l debug -A nomad.processing
......@@ -13,8 +13,8 @@
# limitations under the License.
"""
Integration of nomad projects into the processing
=================================================
This module allows to configure and install all necessary legecy nomad GIT repositories
to process (parser, normalizer, etc.) uploaded calculations.
Parsers are developed as independed, individual python programs in their own GIT repositories.
They are build on a common modules called *python-common*, also in a separate GIT.
......@@ -34,11 +34,23 @@ Preparing dependencies and parsers
To make GIT maintained python modules available, we use:
.. autoclass:: nomad.parsers.PythonGit
.. autoclass:: PythonGit
Parsers, as a special case for a GIT maintained python modules, can be used via:
.. autoclass:: nomad.parsers.Parser
.. autoclass:: Parser
General dependencies are configured in
.. autodata:: dependencies
Parsers are configured in
.. autodata:: parsers
To install all dependencies use
.. autofunction:: prepare
"""
import re
import sys
......@@ -73,17 +85,15 @@ class PythonGit():
This is only useful before you want to use the respective module in a different
python process, because it will not try to reload any already loaded modules into
the current python process.
Args:
name: A name that determines the download path, can contain '/' for sub dirs.
Names are important, because modules might use relatives paths between
them.
git_url: A publically available and fetchable url to the GIT repository.
git_commit: The full commit SHA of the desired commit.
"""
def __init__(self, name, git_url, git_commit):
"""
Args:
name: A name that determines the download path, can contain '/' for sub dirs.
Names are important, because modules might use relatives paths between
them.
git_url: A publically available and fetchable url to the GIT repository.
git_commit: The full commit SHA of the desired commit.
"""
super().__init__()
self.name = name
self.git_url = git_url
self.git_commit = git_commit
......@@ -158,6 +168,13 @@ class Parser():
"""
Instances specify a parser. It allows to find *main files* from given uploaded
and extracted files. Further, allows to run the parser on those 'main files'.
Args:
python_git: The :class:`PythonGit` that describes the parser code.
parser_class_name: Full qualified name of the main parser class. We assume it have one
parameter for the backend.
main_file_re: A regexp that matches main file paths that this parser can handle.
main_contents_re: A regexp that matches main file headers that this parser can parse.
"""
def __init__(self, python_git, parser_class_name, main_file_re, main_contents_re):
self.name = python_git.name
......@@ -185,6 +202,9 @@ class Parser():
parser = Parser(backend=JsonParseEventsWriterBackend)
parser.parse(mainfile)
def __repr__(self):
return self.python_git.__repr__()
class VASPRunParser(Parser):
def __init__(self):
......@@ -204,11 +224,24 @@ class VASPRunParser(Parser):
)
parsers = [
VASPRunParser()
Parser(
python_git=PythonGit(
name='parsers/vasp',
git_url='https://gitlab.mpcdf.mpg.de/nomad-lab/parser-vasp.git',
git_commit='nomad-xt'),
parser_class_name='vaspparser.VASPParser',
main_file_re=r'^.*\.xml$',
main_contents_re=(
r'^\s*<\?xml version="1\.0" encoding="ISO-8859-1"\?>\s*'
r'?\s*<modeling>'
r'?\s*<generator>'
r'?\s*<i name="program" type="string">\s*vasp\s*</i>'
r'?')
)
]
parser_dict = {parser.name: parser for parser in parsers}
others = [
dependencies = [
PythonGit(
name='nomad-meta-info',
git_url='https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-meta-info.git',
......@@ -221,7 +254,10 @@ others = [
def prepare():
for python_git in others:
"""
Installs all dependencies from :data:`dependencies` and :data:`parsers`.
"""
for python_git in dependencies:
python_git.prepare()
for parser in parsers:
......
......@@ -13,9 +13,6 @@
# limitations under the License.
"""
NOMAD's file storage implementation
===================================
This file storage abstraction currently uses the object storage API
http://minio.io to manage and organize files. Object storage
organizes files in *buckets/ids*, with small amounts of *buckets* and virtually
......
......@@ -12,6 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This modules allows to (1) run a celery worker that can perform all processing
task, (2) allows to start a processing canvas (series of tasks), (3) contains
utilities to read and render the current state and results of a processing canvas
run.
"""
from celery import Celery, group, subtask
from celery.result import result_from_tuple
from celery.signals import after_setup_task_logger, after_setup_logger
......
watchdog
\ No newline at end of file
watchdog
sphinx
recommonmark
gitpython
\ No newline at end of file
import unittest
from unittest import TestCase
import time
import logging
from minio import ResponseError
import nomad.files as files
import nomad.config as config
from nomad.processing import start_process_upload, get_process_upload_state
test_upload_id = '__test_upload_id'
class ProcessingTests(TestCase):
def setUp(self):
files._client.fput_object(config.s3.uploads_bucket, test_upload_id, 'data/examples_vasp.zip')
def tearDown(self):
try:
files._client.remove_object(config.s3.uploads_bucket, test_upload_id)
except ResponseError:
pass
def test_processing(self):
task = start_process_upload(test_upload_id)
result = None
while(True):
time.sleep(0.0001)
new_result = get_process_upload_state(task)
if result != new_result:
result = new_result
if result['close'] == 'SUCCESS' or result['close'] == 'FAILURE':
break
self.assertTrue(result['close'] == 'SUCCESS')
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
unittest.main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment