Commit ea1ca62d authored by Lauri Himanen's avatar Lauri Himanen
Browse files

Initial push with the correct code structure.

parent 4877388a
Pipeline #8373 failed with stage
# use glob syntax.
syntax: glob
*.ser
*.class
*~
*.bak
#*.off
*.old
*.pyc
*.bk
*.swp
.DS_Store
**/__pycache__
# logging files
detailed.log
# eclipse conf file
.settings
.classpath
.project
.manager
.scala_dependencies
# idea
.idea
*.iml
# building
target
build
null
tmp*
temp*
dist
test-output
build.log
# other scm
.svn
.CVS
.hg*
# switch to regexp syntax.
# syntax: regexp
# ^\.pc/
#SHITTY output not in target directory
build.log
#emacs TAGS
TAGS
lib/
env/
# Egg
parser/parser-big-dft/bigdftparser.egg-info/
stages:
- test
testing:
stage: test
script:
- cd .. && rm -rf nomad-lab-base
- git clone --recursive git@gitlab.mpcdf.mpg.de:nomad-lab/nomad-lab-base.git
- cd nomad-lab-base
- git submodule foreach git checkout master
- git submodule foreach git pull
- sbt nwchem/test
- export PYTHONEXE=/labEnv/bin/python
- sbt bigdft/test
only:
- master
tags:
- test
- spec2
[NOMAD Laboratory CoE](http://nomad-coe.eu) parser for [BigDFT](http://bigdft.org/)
This is the main repository of the [NOMAD](http://nomad-lab.eu) parser for
[BigDFT](http://bigdft.org/).
The original repository lives at
https://gitlab.mpcdf.mpg.de/nomad-lab/parser-wien2k
but you probably want to checkout
https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base
to also get all dependencies.
# Standalone Installation
The parser is designed to be usable as a separate python package. Here is an
example of the call syntax:
```python
from bigdftparser import bigdftparser
import matplotlib.pyplot as mpl
# 0. Initialize a parser by giving a path to the BigDFT output file and a list of
# default units
path = "path/to/main.file"
default_units = ["eV"]
parser = bigdftparser(path, default_units=default_units)
# 1. Parse
results = parser.parse()
# 2. Query the results with using the id's created specifically for NOMAD.
scf_energies = results["energy_total_scf_iteration"]
mpl.plot(scf_energies)
mpl.show()
```
To install this standalone version, you need to first clone the
*git@gitlab.mpcdf.mpg.de:nomad-lab/python-common.git* repository and the
*git@gitlab.mpcdf.mpg.de:nomad-lab/nomad-meta-info.git* repository into the
same folder. Then install the *python-common* package according to the
instructions found in the README. After that, you can install this package by
running either of the following two commands depending on your python version:
```sh
python setup.py develop --user # for python2
python3 setup.py develop --user # for python3
```
# Scala access
The scala layer in the Nomad infrastructure can access the parser functionality
through the scalainterface.py file, by calling the following command:
```python
python scalainterface.py path/to/main/file
```
This scala interface is in it's own file to separate it from the rest of the
code.
# Support of different versions
The parser is designed to support multiple versions of BigDFT with a
[DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) approach: The
initial parser class is based on BigDFT 1.8.0, and other versions will be
subclassed from it. By sublassing, all the previous functionality will be
preserved, new functionality can be easily created, and old functionality
overridden only where necesssary.
# Developer Info
This section describes some of the guidelines that are used in the development
of this parser.
## Documentation
This parser tries to follow the [google style
guide](https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments)
for documenting python code. Documenting makes it much easier to follow the
logic behind your parser.
## Testing
The parsers can become quite complicated and maintaining them without
systematic testing is impossible. There are general tests that are
performed automatically in the scala layer for all parsers. This is essential,
but can only test that the data is outputted in the correct format and
according to some general rules. These tests cannot verify that the contents
are correct.
In order to truly test the parser output, regression testing is needed. The
tests for this parser are located in the **regtest** folder. Tests provide one
way to test each parseable quantity and python has a very good [library for
unit testing](https://docs.python.org/2/library/unittest.html). When the parser
supports a new quantity it is quite fast to create unit tests for it. These
tests will validate the parsing, and also easily detect bugs that may rise when
the code is modified in the future.
## Profiling
The parsers have to be reasonably fast. For some codes there is already
significant amount of data in the NoMaD repository and the time taken to parse
it will depend on the performance of the parser. Also each time the parser
evolves after system deployment, the existing data may have to be reparsed at
least partially.
By profiling what functions take the most computational time and memory during
parsing you can identify the bottlenecks in the parser. There are already
existing profiling tools such as
[cProfile](https://docs.python.org/2/library/profile.html#module-cProfile)
which you can plug into your scripts very easily.
from bigdftparser.parser import BigDFTParser
import os
import re
import logging
import importlib
from nomadcore.baseclasses import ParserInterface
logger = logging.getLogger("nomad")
#===============================================================================
class BigDFTParser(ParserInterface):
"""This class handles the initial setup before any parsing can happen. It
determines which version of BigDFT was used to generate the output and then
sets up a correct main parser.
After the implementation has been setup, you can parse the files with
parse().
"""
def __init__(self, main_file, metainfo_to_keep=None, backend=None, default_units=None, metainfo_units=None, debug=True, log_level=logging.ERROR, store=True):
super(BigDFTParser, self).__init__(main_file, metainfo_to_keep, backend, default_units, metainfo_units, debug, log_level, store)
def setup_version(self):
"""Setups the version by looking at the output file and the version
specified in it.
"""
# Search for the BigDFT version specification. The correct parser is
# initialized based on this information.
regex_version = re.compile(" Northwest Computational Chemistry Package \(NWChem\) (\d+\.\d+)")
version_id = None
with open(self.parser_context.main_file, 'r') as outputfile:
for line in outputfile:
# Look for version
result_version = regex_version.match(line)
if result_version:
version_id = result_version.group(1).replace('.', '')
if version_id is None:
msg = "Could not find a version specification from the given main file."
logger.exception(msg)
raise RuntimeError(msg)
# Setup the root folder to the fileservice that is used to access files
dirpath, filename = os.path.split(self.parser_context.main_file)
dirpath = os.path.abspath(dirpath)
self.parser_context.file_service.setup_root_folder(dirpath)
self.parser_context.file_service.set_file_id(filename, "output")
# Setup the correct main parser based on the version id. If no match
# for the version is found, use the main parser for NWChem 6.6
self.setup_main_parser(version_id)
def get_metainfo_filename(self):
return "big_dft.nomadmetainfo.json"
def get_parser_info(self):
return {'name': 'big-dft-parser', 'version': '1.0'}
def setup_main_parser(self, version_id):
# Currently the version id is a pure integer, so it can directly be mapped
# into a package name.
base = "bigdftparser.versions.bigdft{}.mainparser".format(version_id)
parser_module = None
parser_class = None
try:
parser_module = importlib.import_module(base)
except ImportError:
logger.warning("Could not find a parser for version '{}'. Trying to default to the base implementation for BigDFT 1.8.0".format(version_id))
base = "bigdftparser.versions.bigdft180.mainparser"
try:
parser_module = importlib.import_module(base)
except ImportError:
logger.exception("Could not find the module '{}'".format(base))
raise
try:
class_name = "BigDFTMainParser"
parser_class = getattr(parser_module, class_name)
except AttributeError:
logger.exception("A parser class '{}' could not be found in the module '[]'.".format(class_name, parser_module))
raise
self.main_parser = parser_class(self.parser_context.main_file, self.parser_context)
"""
This is the access point to the parser for the scala layer in the
nomad project.
"""
from __future__ import absolute_import
import sys
import setup_paths
from nomadcore.parser_backend import JsonParseEventsWriterBackend
from bigdftparser import BigDFTParser
if __name__ == "__main__":
# Initialise the parser with the main filename and a JSON backend
main_file = sys.argv[1]
parser = BigDFTParser(main_file, backend=JsonParseEventsWriterBackend)
parser.parse()
"""
Setups the python-common library in the PYTHONPATH system variable.
"""
import sys
import os
import os.path
baseDir = os.path.dirname(os.path.abspath(__file__))
commonDir = os.path.normpath(os.path.join(baseDir, "../../../../../python-common/common/python"))
parserDir = os.path.normpath(os.path.join(baseDir, "../../parser-nwchem"))
# Using sys.path.insert(1, ...) instead of sys.path.insert(0, ...) based on
# this discusssion:
# http://stackoverflow.com/questions/10095037/why-use-sys-path-appendpath-instead-of-sys-path-insert1-path
if commonDir not in sys.path:
sys.path.insert(1, commonDir)
sys.path.insert(1, parserDir)
from __future__ import absolute_import
from nomadcore.simple_parser import SimpleMatcher as SM
from nomadcore.caching_backend import CachingLevel
from nomadcore.baseclasses import MainHierarchicalParser, CacheService
import re
import logging
import numpy as np
LOGGER = logging.getLogger("nomad")
#===============================================================================
class BigDFTMainParser(MainHierarchicalParser):
"""The main parser class that is called for all run types. Parses the NWChem
output file.
"""
def __init__(self, file_path, parser_context):
"""
"""
super(BigDFTMainParser, self).__init__(file_path, parser_context)
# Cache for storing current method settings
# self.method_cache = CacheService(self.parser_context)
# self.method_cache.add("single_configuration_to_calculation_method_ref", single=False, update=False)
#=======================================================================
# Cache levels
# self.caching_levels.update({
# 'x_nwchem_section_geo_opt_module': CachingLevel.Cache,
# 'x_nwchem_section_geo_opt_step': CachingLevel.Cache,
# 'x_nwchem_section_xc_functional': CachingLevel.Cache,
# 'x_nwchem_section_qmd_module': CachingLevel.ForwardAndCache,
# 'x_nwchem_section_qmd_step': CachingLevel.ForwardAndCache,
# 'x_nwchem_section_xc_part': CachingLevel.ForwardAndCache,
# })
#=======================================================================
# Main Structure
self.root_matcher = SM("",
forwardMatch=True,
sections=['section_run'],
subMatchers=[
self.input(),
self.header(),
self.system(),
# This repeating submatcher supports multiple different tasks
# within one run
SM("(?:\s+NWChem DFT Module)|(?:\s+NWChem Geometry Optimization)|(?:\s+NWChem QMD Module)|(?:\s+\* NWPW PSPW Calculation \*)",
repeats=True,
forwardMatch=True,
subFlags=SM.SubFlags.Unordered,
subMatchers=[
self.energy_force_gaussian_task(),
self.energy_force_pw_task(),
self.geo_opt_module(),
self.dft_gaussian_md_task(),
]
),
]
)
#=======================================================================
# onClose triggers
def onClose_section_run(self, backend, gIndex, section):
backend.addValue("program_name", "NWChem")
backend.addValue("program_basis_set_type", "gaussians+plane_waves")
#=======================================================================
# onOpen triggers
def onOpen_section_method(self, backend, gIndex, section):
self.method_cache["single_configuration_to_calculation_method_ref"] = gIndex
#=======================================================================
# adHoc
def adHoc_forces(self, save_positions=False):
def wrapper(parser):
match = True
forces = []
positions = []
while match:
line = parser.fIn.readline()
if line == "" or line.isspace():
match = False
break
components = line.split()
position = np.array([float(x) for x in components[-6:-3]])
force = np.array([float(x) for x in components[-3:]])
forces.append(force)
positions.append(position)
forces = -np.array(forces)
positions = np.array(positions)
# If anything found, push the results to the correct section
if forces.size != 0:
self.scc_cache["atom_forces"] = forces
if save_positions:
if positions.size != 0:
self.system_cache["atom_positions"] = positions
return wrapper
#=======================================================================
# SimpleMatcher specific onClose
def save_geo_opt_sampling_id(self, backend, gIndex, section):
backend.addValue("frame_sequence_to_sampling_ref", gIndex)
#=======================================================================
# Start match transforms
def transform_dipole(self, backend, groups):
dipole = groups[0]
components = np.array([float(x) for x in dipole.split()])
backend.addArrayValues("x_nwchem_qmd_step_dipole", components)
#=======================================================================
# Misc
def debug_end(self):
def wrapper():
print("DEBUG")
return wrapper
"""
This is a setup script for installing the parser locally on python path with
all the required dependencies. Used mainly for local testing.
"""
from setuptools import setup, find_packages
#===============================================================================
def main():
# Start package setup
setup(
name="bigdftparser",
version="0.1",
description="NoMaD parser implementation for BigDFT.",
author="Lauri Himanen",
author_email="lauri.himanen@aalto.fi",
license="GPL3",
package_dir={'': 'parser/parser-big-dft'},
packages=find_packages(),
install_requires=[
'pint',
'numpy',
'nomadcore',
],
)
# Run main function by default
if __name__ == "__main__":
main()
package eu.nomad_lab.parsers
import eu.{ nomad_lab => lab }
import eu.nomad_lab.DefaultPythonInterpreter
import org.{ json4s => jn }
import scala.collection.breakOut
object BigDFTParser extends SimpleExternalParserGenerator(
name = "BigDFTParser",
parserInfo = jn.JObject(
("name" -> jn.JString("BigDFTParser")) ::
("parserId" -> jn.JString("BigDFTParser" + lab.BigdftVersionInfo.version)) ::
("versionInfo" -> jn.JObject(
("nomadCoreVersion" -> jn.JObject(lab.NomadCoreVersionInfo.toMap.map {
case (k, v) => k -> jn.JString(v.toString)
}(breakOut): List[(String, jn.JString)])) ::
(lab.BigdftVersionInfo.toMap.map {
case (key, value) =>
(key -> jn.JString(value.toString))
}(breakOut): List[(String, jn.JString)])
)) :: Nil
),
mainFileTypes = Seq("text/.*"),
mainFileRe = """ Northwest Computational Chemistry Package \(NWChem\) \d+\.\d+
------------------------------------------------------
Environmental Molecular Sciences Laboratory
Pacific Northwest National Laboratory
Richland, WA 99352""".r,
cmd = Seq(DefaultPythonInterpreter.pythonExe(), "${envDir}/parsers/nwchem/parser/parser-nwchem/nwchemparser/scalainterface.py",
"${mainFilePath}"),
cmdCwd = "${mainFilePath}/..",
resList = Seq(
"parser-big-dft/bigdftparser/__init__.py",
"parser-big-dft/bigdftparser/setup_paths.py",
"parser-big-dft/bigdftparser/parser.py",
"parser-big-dft/bigdftparser/scalainterface.py",
"parser-big-dft/bigdftparser/versions/__init__.py",
"parser-big-dft/bigdftparser/versions/bigdft180/__init__.py",
"parser-big-dft/bigdftparser/versions/bigdft180/mainparser.py",
"nomad_meta_info/public.nomadmetainfo.json",
"nomad_meta_info/common.nomadmetainfo.json",
"nomad_meta_info/meta_types.nomadmetainfo.json",
"nomad_meta_info/big_dft.nomadmetainfo.json"
) ++ DefaultPythonInterpreter.commonFiles(),
dirMap = Map(
"parser-big-dft" -> "parsers/big-dft/parser/parser-big-dft",
"nomad_meta_info" -> "nomad-meta-info/meta_info/nomad_meta_info"
) ++ DefaultPythonInterpreter.commonDirMapping()
)
package eu.nomad_lab.parsers
import org.specs2.mutable.Specification
object BigDFTParserSpec extends Specification {
"BigDFTParserTest" >> {
"test with json-events" >> {
ParserRun.parse(BigDFTParser, "parsers/big-dft/test/examples/single_point/output.out", "json-events") must_== ParseResult.ParseSuccess
V}
}
"test single_point with json" >> {
ParserRun.parse(BigDFTParser, "parsers/big-dft/test/examples/single_point/output.out", "json") must_== ParseResult.ParseSuccess
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment