Skip to content
Snippets Groups Projects

This is the main repository of the NOMAD parser for CP2K.

Installation

This parser is a submodule of the nomad-lab-base repository. Developers within the NoMaD project will automatically get a copy of this repository when they download and install the base repository.

Structure

The scala layer can access the parser functionality through the scalainterface.py file, by calling the following command:

    python scalainterface.py path/to/main/file

This scala interface is separated into it's own file to separate it from the rest of the code. Some parsers will have the interface in the same file as the parsing code, but I feel that this is a cleaner approach.

The parser is designed to support multiple versions of CP2K with a DRY approach: The initial parser class is based on CP2K 2.6.2, and other versions will be subclassed from it. By sublassing, all the previous functionality will be preserved, new functionality can be easily created, and old functionality overridden only where necesssary.

Upload Folder Structure, File Naming and CP2K Settings

CP2K Settings

The CP2K input setting PRINT_LEVEL controls the amount of details that are outputted during the calculation. The higher this setting is, the more can be parsed from the upload.

Structure

The following upload structure will maximize the amount of parsed contents. If the parser cannot find certain files from their assumed locations, they are simply ignored.

-The input file is assumed to be on the same folder as the output file. The name of the input file is read from the output file, where it is stated without the full path.

Standalone Mode

The parser is designed to be usable also outside the NoMaD project as a separate python package. This standalone python-only mode is primarily for people who want to easily access the parser without the need to setup the whole "NOMAD Stack". It is also used when running custom unit tests found in the folder "cp2k/test/unittests". Here is an example of the call syntax:

    from cp2kparser import CP2KParser
    import matplotlib.pyplot as mpl

    # 1. Initialize a parser by giving a path to the CP2K output file and a list of
    # default units
    path = "path/to/main.file"
    default_units = ["eV"]
    parser = CP2KParser(path, default_units=default_units)

    # 2. Parse
    results = parser.parse()

    # 3. Query the results with using the id's created specifically for NOMAD.
    scf_energies = results["energy_total_scf_iteration"]
    mpl.plot(scf_energies)
    mpl.show()

Tools and Methods

This section describes some of the guidelines that are used in the development of this parser.

Documentation

This parser follows the google style guide for documenting python code. Documenting makes it much easier to follow the logic behind your parser.

Testing

The parsers can become quite complicated and maintaining them without systematic testing is impossible. There are general tests that are performed automatically in the scala layer for all parsers. This is essential, but can only test that the data is outputted in the correct format and according to some general rules. These tests cannot verify that the contents are correct.

In order to truly test the parser output, unit testing is needed. This unit tests for this parser are located in test/unittests. Unit tests provide one way to test each parseable quantity and python has a very good library for unit testing. When the parser supports a new quantity it is quite fast to create unit tests for it. These tests will validate the parsing, and also easily detect bugs that may rise when the code is modified in the future.

Profiling

The parsers have to be reasonably fast. For some codes there is already significant amount of data in the NoMaD repository and the time taken to parse it will depend on the performance of the parser. Also each time the parser evolves after system deployment, the existing data may have to be reparsed at least partially.

By profiling what functions take the most computational time and memory during parsing you can identify the bottlenecks in the parser. There are already existing profiling tools such as cProfile which you can plug into your scripts very easily.

Notes for CP2K Developers

Here is a list of features/fixes that would make the parsing of CP2K results easier:

  • The pdb trajectory output doesn't seem to conform to the actual standard as the different configurations are separated by the END keyword which is supposed to be written only once in the file. The format specification states that different configurations should start with MODEL and end with ENDMDL tags.