Commit 3156aa6f authored by Lauri Himanen's avatar Lauri Himanen
Browse files

Restructured the folders and files to use a common project structure, updated readme.

parent fbdb8625
This is the main repository of the [NOMAD]( parser for
This is the main repository of the [NOMAD]( parser for
# Standalone Installation
The parser is designed to be usable as a separate python package. Here is an
example of the call syntax:
# Example
from bigdftparser import bigdftparser
from bigdftparser import BigDFTParser
import matplotlib.pyplot as mpl
# 0. Initialize a parser by giving a path to the BigDFT output file and a list of
# default units
path = "path/to/main.file"
# 1. Initialize a parser with a set of default units.
default_units = ["eV"]
parser = bigdftparser(path, default_units=default_units)
parser = BigDFTParser(default_units=default_units)
# 1. Parse
results = parser.parse()
# 2. Parse a file
path = "path/to/main.file"
results = parser.parse(path)
# 2. Query the results with using the id's created specifically for NOMAD.
# 3. Query the results with using the id's created specifically for NOMAD.
scf_energies = results["energy_total_scf_iteration"]
To install this standalone version, you need to first clone the
** repository and the
** repository into the
same folder. Then install the *python-common* package according to the
instructions found in the README. After that, you can install this package by
running either of the following two commands depending on your python version:
# Installation
The code is python 2 and python 3 compatible. First download and install
the nomadcore package:
python develop --user # for python2
python3 develop --user # for python3
git clone
cd python-common
pip install -r requirements.txt
pip install -e .
# Scala access
The scala layer in the Nomad infrastructure can access the parser functionality
through the file, by calling the following command:
Then download the metainfo definitions to the same folder where the
'python-common' repository was cloned:
python path/to/main/file
git clone
This scala interface is in it's own file to separate it from the rest of the
# Support of different versions
The parser is designed to support multiple versions of BigDFT with a
[DRY]( approach: The
initial parser class is based on BigDFT 1.8.0, and other versions will be
subclassed from it. By sublassing, all the previous functionality will be
preserved, new functionality can be easily created, and old functionality
overridden only where necesssary.
Finally download and install the parser:
# Developer Info
This section describes some of the guidelines that are used in the development
of this parser.
## Documentation
This parser tries to follow the [google style
for documenting python code. Documenting makes it much easier to follow the
logic behind your parser.
## Testing
The parsers can become quite complicated and maintaining them without
systematic testing is impossible. There are general tests that are
performed automatically in the scala layer for all parsers. This is essential,
but can only test that the data is outputted in the correct format and
according to some general rules. These tests cannot verify that the contents
are correct.
In order to truly test the parser output, regression testing is needed. The
tests for this parser are located in the **regtest** folder. Tests provide one
way to test each parseable quantity and python has a very good [library for
unit testing]( When the parser
supports a new quantity it is quite fast to create unit tests for it. These
tests will validate the parsing, and also easily detect bugs that may rise when
the code is modified in the future.
## Profiling
The parsers have to be reasonably fast. For some codes there is already
significant amount of data in the NoMaD repository and the time taken to parse
it will depend on the performance of the parser. Also each time the parser
evolves after system deployment, the existing data may have to be reparsed at
least partially.
git clone
cd parser-big-dft
pip install -e .
By profiling what functions take the most computational time and memory during
parsing you can identify the bottlenecks in the parser. There are already
existing profiling tools such as
which you can plug into your scripts very easily.
# Notes
The parser is based on BigDFT 1.8.
......@@ -6,7 +6,6 @@ from nomadcore.baseclasses import ParserInterface
logger = logging.getLogger("nomad")
class BigDFTParser(ParserInterface):
"""This class handles the initial setup before any parsing can happen. It
determines which version of BigDFT was used to generate the output and then
......@@ -15,8 +14,8 @@ class BigDFTParser(ParserInterface):
After the implementation has been setup, you can parse the files with
def __init__(self, main_file, metainfo_to_keep=None, backend=None, default_units=None, metainfo_units=None, debug=True, log_level=logging.ERROR, store=True):
super(BigDFTParser, self).__init__(main_file, metainfo_to_keep, backend, default_units, metainfo_units, debug, log_level, store)
def __init__(self, metainfo_to_keep=None, backend=None, default_units=None, metainfo_units=None, debug=True, log_level=logging.ERROR, store=True):
super(BigDFTParser, self).__init__(metainfo_to_keep, backend, default_units, metainfo_units, debug, log_level, store)
def setup_version(self):
"""Setups the version by looking at the output file and the version
......@@ -78,4 +77,4 @@ class BigDFTParser(ParserInterface):
except AttributeError:
logger.exception("A parser class '{}' could not be found in the module '[]'.".format(class_name, parser_module))
self.main_parser = parser_class(self.parser_context.main_file, self.parser_context)
self.main_parser = parser_class(self.parser_context)
......@@ -13,5 +13,5 @@ if __name__ == "__main__":
# Initialise the parser with the main filename and a JSON backend
main_file = sys.argv[1]
parser = BigDFTParser(main_file, backend=JsonParseEventsWriterBackend)
parser = BigDFTParser(backend=JsonParseEventsWriterBackend)
......@@ -7,15 +7,14 @@ from bigdftparser.generic.libxc_codes import LIB_XC_MAPPING
LOGGER = logging.getLogger("nomad")
class BigDFTMainParser(AbstractBaseParser):
"""The main parser class that is called for all run types. Parses the NWChem
output file.
def __init__(self, file_path, parser_context):
def __init__(self, parser_context):
super(BigDFTMainParser, self).__init__(file_path, parser_context)
super(BigDFTMainParser, self).__init__(parser_context)
# Map keys in the output to funtions that handle the values
self.key_to_funct_map = {
......@@ -30,7 +29,7 @@ class BigDFTMainParser(AbstractBaseParser):
"Energy (Hartree)": lambda x: self.backend.addRealValue("energy_total", float(x), unit="hartree"),
def parse(self):
def parse(self, filepath):
"""The output file of a BigDFT run is a YAML document. Here we directly
parse this document with an existing YAML library, and push its
contents into the backend. This function will read the document in
......@@ -39,7 +38,7 @@ class BigDFTMainParser(AbstractBaseParser):
with open(self.file_path, "r") as fin:
with open(filepath, "r") as fin:
# Open default sections and output default information
section_run_id = self.backend.openSection("section_run")
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment