From a9ea7db5a98e473b0295516e93c1e9edfb46012a Mon Sep 17 00:00:00 2001 From: temok-mx <temok.mx@gmail.com> Date: Thu, 10 Sep 2020 17:23:12 +0200 Subject: [PATCH] Updated README.md; added metadata.yml; the lead branch is now master, inactive branches became tags --- .gitlab-ci.yml | 19 -------- README.md | 120 ++++++++++++++++++++++++++----------------------- metadata.yml | 32 +++++++++++++ 3 files changed, 95 insertions(+), 76 deletions(-) delete mode 100644 .gitlab-ci.yml create mode 100644 metadata.yml diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml deleted file mode 100644 index e25c269..0000000 --- a/.gitlab-ci.yml +++ /dev/null @@ -1,19 +0,0 @@ -stages: - - test - -testing: - stage: test - script: - - cd .. && rm -rf nomad-lab-base - - git clone --recursive git@gitlab.mpcdf.mpg.de:nomad-lab/nomad-lab-base.git - - cd nomad-lab-base - - git submodule foreach git checkout master - - git submodule foreach git pull - - sbt cp2k/test - - export PYTHONEXE=/labEnv/bin/python - - sbt cp2k/test - only: - - master - tags: - - test - - spec2 \ No newline at end of file diff --git a/README.md b/README.md index af9a84a..feaee4a 100644 --- a/README.md +++ b/README.md @@ -1,72 +1,78 @@ -This is the main repository of the [NOMAD](https://www.nomad-coe.eu/) parser for -[CP2K](https://www.cp2k.org/). +This is a NOMAD parser for [CP2K](https://www.cp2k.org/). It will read CP2K input and +output files and provide all information in NOMAD's unified Metainfo based Archive format. -# Example -```python - from cp2kparser import CP2KParser - import matplotlib.pyplot as mpl +## Preparing code input and output file for uploading to NOMAD + +NOMAD accepts `.zip` and `.tar.gz` archives as uploads. Each upload can contain arbitrary +files and directories. NOMAD will automatically try to choose the right parser for you files. +For each parser (i.e. for each supported code) there is one type of file that the respective +parser can recognize. We call these files `mainfiles` as they typically are the main +output file a code. For each `mainfile` that NOMAD discovers it will create an entry +in the database that users can search, view, and download. NOMAD will associate all files +in the same directory as files that also belong to that entry. Parsers +might also read information from these auxillary files. This way you can add more files +to an entry, even if the respective parser/code might not directly support it. + +For cp2k please provide at least the files from this table if applicable to your +calculations (remember that you can provide more files if you want): - # 1. Initialize a parser with a set of default units. - default_units = ["eV"] - parser = CP2KParser(default_units=default_units) - # 2. Parse a file - path = "path/to/main.file" - results = parser.parse(path) - # 3. Query the results with using the id's created specifically for NOMAD. - scf_energies = results["energy_total_scf_iteration"] - mpl.plot(scf_energies) - mpl.show() +To create an upload with all calculations in a directory structure: + +``` +zip -r <upload-file>.zip <directory>/* ``` -# Installation -The code is python 2 and python 3 compatible. First download and install -the nomadcore package: +Go to the [NOMAD upload page](https://nomad-lab.eu/prod/rae/gui/uploads) to upload files +or find instructions about how to upload files from the command line. + +## Using the parser -```sh -git clone https://gitlab.mpcdf.mpg.de/nomad-lab/python-common.git -cd python-common -pip install -r requirements.txt -pip install -e . +You can use NOMAD's parsers and normalizers locally on your computer. You need to install +NOMAD's pypi package: + +``` +pip install nomad-lab ``` -Then download the metainfo definitions to the same folder where the -'python-common' repository was cloned: +To parse code input/output from the command line, you can use NOMAD's command line +interface (CLI) and print the processing results output to stdout: -```sh -git clone https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-meta-info.git ``` +nomad parse --show-archive <path-to-file> +``` + +To parse a file in Python, you can program something like this: +```python +import sys +from nomad.cli.parse import parse, normalize_all -Finally download and install the parser: +# match and run the parser +backend = parse(sys.argv[1]) +# run all normalizers +normalize_all(backend) -```sh -git clone https://gitlab.mpcdf.mpg.de/nomad-lab/parser-cp2k.git -cd parser-cp2k -pip install -e . +# get the 'main section' section_run as a metainfo object +section_run = backend.resource.contents[0].section_run[0] + +# get the same data as JSON serializable Python dict +python_dict = section_run.m_to_dict() +``` + +## Developing the parser + +Also install NOMAD's pypi package: + +``` +pip install nomad-lab +``` + +Clone the parser project and install it in development mode: + +``` +git clone https://gitlab.mpcdf.mpg.de/nomad-lab/parser-cp2k parser-cp2k +pip install -e parser-cp2k ``` -# Notes -The parser is based on CP2K 2.6.2. - -The CP2K input setting -[PRINT_LEVEL](https://manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL.html#PRINT_LEVEL) -controls the amount of details that are outputted during the calculation. The -higher this setting is, the more can be parsed from the upload. - -The parser will try to find the paths to all the input and output files, but if -they are located very deep inside some folder structure or outside the folder -where the output file is, the parser will not be able to locate them. For this -reason it is recommended to keep the upload structure as flat as possible. - -Here is a list of features/fixes that would make the parsing of CP2K results -easier: - - The pdb trajectory output doesn't seem to conform to the actual standard as - the different configurations are separated by the END keyword which is - supposed to be written only once in the file. The [format - specification](http://www.wwpdb.org/documentation/file-format) states that - different configurations should start with MODEL and end with ENDMDL tags. - - The output file should contain the paths/filenames of different input and - output files that are accessed during the program run. This data is already - available for some files (input file, most files produced by MD), but many - are not mentioned. +Running the parser now, will use the parser's Python code from the clone project. diff --git a/metadata.yml b/metadata.yml new file mode 100644 index 0000000..d46378a --- /dev/null +++ b/metadata.yml @@ -0,0 +1,32 @@ +code-label: CP2K +code-label-style: all in capitals +code-url: https://www.cp2k.org/ +parser-dir-name: dependencies/parsers/cp2k/ +parser-git-url: https://gitlab.mpcdf.mpg.de/nomad-lab/parser-cp2k +parser-specific: | + ## Usage notes + The parser is based on CP2K 2.6.2. + + The CP2K input setting + [PRINT_LEVEL](https://manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL.html#PRINT_LEVEL) + controls the amount of details that are outputted during the calculation. The + higher this setting is, the more can be parsed from the upload. + + The parser will try to find the paths to all the input and output files, but if + they are located very deep inside some folder structure or outside the folder + where the output file is, the parser will not be able to locate them. For this + reason it is recommended to keep the upload structure as flat as possible. + + Here is a list of features/fixes that would make the parsing of CP2K results + easier: + - The pdb trajectory output doesn't seem to conform to the actual standard as + the different configurations are separated by the END keyword which is + supposed to be written only once in the file. The [format + specification](http://www.wwpdb.org/documentation/file-format) states that + different configurations should start with MODEL and end with ENDMDL tags. + - The output file should contain the paths/filenames of different input and + output files that are accessed during the program run. This data is already + available for some files (input file, most files produced by MD), but many + are not mentioned. + +table-of-files: '' -- GitLab