Commit 33124133 authored by Lauri Himanen's avatar Lauri Himanen
Browse files

Added parsing for external force file, added detection of CP2K run mode and...

Added parsing for external force file, added detection of CP2K run mode and optimization based on it, started working on GEO_OPT parsing.
parent 68bf2b9c
......@@ -72,18 +72,10 @@ This section describes some of the guidelines that are used in the development
of this parser.
## Documentation
The [google style
This parser follows the [google style
guide](https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments)
provides a good template on how to document your code. Documenting makes it
much easier to follow the logic behind your parser.
## Logging
Python has a great [logging
package](https://docs.python.org/2/library/logging.html) which helps in
following the program flow and catching different errors and warnings. In
cp2kparser the file cp2kparser/generics/logconfig.py defines the behaviour of
the logger. There you can setup the log levels even at a modular level. A more
easily readable formatting is also provided for the log messages.
for documenting python code. Documenting makes it much easier to follow the
logic behind your parser.
## Testing
The parsers can become quite complicated and maintaining them without
......@@ -93,19 +85,13 @@ but can only test that the data is outputted in the correct format and
according to some general rules. These tests cannot verify that the contents
are correct.
In order to truly test the parser output, unit testing is needed. Unit tests
provide one way to test each parseable quantity and python has a very good
[library for unit testing](https://docs.python.org/2/library/unittest.html).
When the parser supports a new quantity it is quite fast to create unit tests
for it. These tests will validate the parsing, and also easily detect bugs that
may rise when the code is modified in the future.
## Unit Conversion
You can find unit conversion tools from the python-common repository and its
nomadcore package. The unit conversion is currenlty done by
[Pint](https://pint.readthedocs.org/en/0.6/) and it has a very natural syntax,
support for numpy arrays and an easily reconfigurable constant/unit declaration
mechanisms.
In order to truly test the parser output, unit testing is needed. This unit
tests for this parser are located in test/unittests. Unit tests provide one way
to test each parseable quantity and python has a very good [library for unit
testing](https://docs.python.org/2/library/unittest.html). When the parser
supports a new quantity it is quite fast to create unit tests for it. These
tests will validate the parsing, and also easily detect bugs that may rise when
the code is modified in the future.
## Profiling
The parsers have to be reasonably fast. For some codes there is already
......
......@@ -83,8 +83,8 @@ class CP2KInput(object):
if keyword.value is not None:
return keyword.get_value()
else:
if section.accessed:
return keyword.default_value
# if section.accessed:
return keyword.default_value
def get_default_keyword(self, path):
return self.get_section(path).default_keyword
......
......@@ -38,6 +38,7 @@ class CP2KParser(ParserInterface):
# Setup the root folder to the fileservice that is used to access files
dirpath, filename = os.path.split(self.parser_context.main_file)
dirpath = os.path.abspath(dirpath)
self.parser_context.file_service.setup_root_folder(dirpath)
self.parser_context.file_service.set_file_id(filename, "output")
......
import os
import re
import logging
import cPickle as pickle
import numpy as np
......@@ -11,8 +12,19 @@ logger = logging.getLogger("nomad")
class CP2KInputParser(BasicParser):
"""Used to parse out a CP2K input file.
When given a filepath to a CP2K input file, this class attemts to parse
it's contents.
CP2K offers a complete structure for the input in an XML file, which can be
printed with the command cp2k --xml. This XML file has been preparsed into
a native python object ('CP2KInput' class found in generic.inputparsing)
and stored in a python pickle file. It e.g. contains all the default values
that are often needed as they are used if the user hasn't specified a
settings in the input. This XML file is used to get the default values
because it is rather cumbersome to hard code them in the parser itself,
especially if there will be lot's of them. Hard coded values will also be
more error prone, and would have to be checked for each parser version.
CP2K input supports including other input files and also
supports variables. This is currently not supported, but may be added at
some point.
"""
def __init__(self, file_path, parser_context):
super(CP2KInputParser, self).__init__(file_path, parser_context)
......@@ -98,29 +110,36 @@ class CP2KInputParser(BasicParser):
#=======================================================================
# Cell periodicity
periodicity = self.input_tree.get_keyword("FORCE_EVAL/SUBSYS/CELL/PERIODIC").upper()
periodicity_list = None
if periodicity == "NONE":
periodicity_list = (False, False, False)
elif periodicity == "X":
periodicity_list = (True, False, False)
elif periodicity == "XY":
periodicity_list = (True, True, False)
elif periodicity == "XYZ":
periodicity_list = (True, True, True)
elif periodicity == "XZ":
periodicity_list = (True, False, True)
elif periodicity == "Y":
periodicity_list = (False, True, False)
elif periodicity == "YZ":
periodicity_list = (False, True, True)
elif periodicity == "Z":
periodicity_list = (False, False, True)
if periodicity_list is not None:
periodicity = self.input_tree.get_keyword("FORCE_EVAL/SUBSYS/CELL/PERIODIC")
if periodicity is not None:
periodicity = periodicity.upper()
periodicity_list = ("X" in periodicity, "Y" in periodicity, "Z" in periodicity)
self.backend.addArrayValues("configuration_periodic_dimensions", np.asarray(periodicity_list))
else:
logger.warning("Could not determine cell periodicity from FORCE_EVAL/SUBSYS/CELL/PERIODIC")
#=======================================================================
# Single point force file name
force_file = self.input_tree.get_keyword("FORCE_EVAL/PRINT/FORCES/FILENAME")
if force_file != "__STD_OUT__":
force_file_path = self.normalize_cp2k_path(force_file, "xyz")
self.file_service.set_file_id(force_file_path, "force_file_single_point")
def normalize_cp2k_path(self, path, extension, name=""):
"""The paths in CP2K input can be given in many ways. This function
tries to normalize these forms into a valid path.
"""
if name:
name = "-" + name
project_name = self.input_tree.get_keyword("GLOBAL/PROJECT_NAME")
if path.startswith("="):
normalized_path = path[1:]
elif re.match(r"./", path):
normalized_path = "{}{}-1_0.{}".format(path, name, extension)
else:
normalized_path = "{}-{}{}-1_0.{}".format(project_name, path, name, extension)
return normalized_path
def fill_input_tree(self, file_path):
"""Parses a CP2K input file into an object tree.
......
......@@ -3,6 +3,7 @@ from nomadcore.simple_parser import SimpleMatcher as SM
from nomadcore.caching_backend import CachingLevel
from nomadcore.baseclasses import MainHierarchicalParser
from inputparser import CP2KInputParser
from singlepointforceparser import CP2KSinglePointForceParser
import numpy as np
import logging
logger = logging.getLogger("nomad")
......@@ -10,7 +11,8 @@ logger = logging.getLogger("nomad")
#===============================================================================
class CP2KMainParser(MainHierarchicalParser):
"""The main parser class.
"""The main parser class. Used to parse the CP2K output file and to
instantiate all the other subparsers for each other file.
"""
def __init__(self, file_path, parser_context):
"""Initialize an output parser.
......@@ -19,10 +21,29 @@ class CP2KMainParser(MainHierarchicalParser):
self.regex_f = "-?\d+\.\d+(?:E(?:\+|-)\d+)?" # Regex for a floating point value
self.regex_i = "-?\d+" # Regex for an integer
# Define the output parsing tree for this version
self.root_matcher = SM("",
# Find out the run type for this calculation. The run type can be used
# to optimize the parsing, because not all parsing functionality will
# be necessary for all run types.
run_type = None
with open(file_path) as f:
counter = 0
for line in f:
counter += 1
# Define the regex that searches for the
regex_string = r"\s+GLOBAL\| Run type\s+(.+)"
regex_compiled = re.compile(regex_string)
match = regex_compiled.match(line)
if match is not None:
run_type = match.groups()[0]
break
if counter > 50:
break
if run_type is None:
logger.error("Could not determine the CP2K run type from the output file.")
# SimpleMatcher for the header that is common to all run types
self.header = SM(r" DBCSR\| Multiplication driver",
forwardMatch=True,
sections=['section_run', "section_system_description", "section_method"],
subMatchers=[
SM( r" DBCSR\| Multiplication driver",
sections=['cp2k_section_dbcsr'],
......@@ -67,7 +88,15 @@ class CP2KMainParser(MainHierarchicalParser):
SM( "\s+- Atoms:\s+(?P<number_of_atoms>\d+)"),
SM( "\s+- Shell sets:\s+(?P<cp2k_shell_sets>\d+)")
]
),
)
]
)
# Simple matcher for run type ENERGY_FORCE, ENERGY
self.energy_force = SM(
" MODULE QUICKSTEP: ATOMIC COORDINATES IN angstrom",
forwardMatch=True,
subMatchers=[
SM( " MODULE QUICKSTEP: ATOMIC COORDINATES IN angstrom",
adHoc=self.adHoc_cp2k_section_quickstep_atom_information(),
otherMetaInfo=["atom_label", "atom_position"]
......@@ -90,38 +119,41 @@ class CP2KMainParser(MainHierarchicalParser):
adHoc=self.adHoc_atom_forces()
),
]
),
SM( " MD| Molecular Dynamics Protocol",
sections=["cp2k_section_md"],
forwardMatch=True,
)
]
)
# Simple matcher for run type MD, MOLECULAR_DYNAMICS
self.md = SM(
" MD| Molecular Dynamics Protocol",
sections=["cp2k_section_md"],
forwardMatch=True,
subMatchers=[
SM( " ENERGY\| Total FORCE_EVAL",
repeats=True,
sections=["cp2k_section_md_step"],
subMatchers=[
SM( " ENERGY\| Total FORCE_EVAL",
repeats=True,
sections=["cp2k_section_md_step"],
SM( " ATOMIC FORCES in \[a\.u\.\]",
sections=["cp2k_section_md_forces"],
subMatchers=[
SM( "\s+\d+\s+\d+\s+[\w\W\d]+\s+(?P<cp2k_md_force_atom_string>{0}\s+{0}\s+{0})".format(self.regex_f),
sections=["cp2k_section_md_force_atom"],
repeats=True,
)
]
),
SM( " STEP NUMBER\s+=\s+(?P<cp2k_md_step_number>\d+)"),
SM( " TIME \[fs\]\s+=\s+(?P<cp2k_md_step_time>\d+\.\d+)"),
SM( " TEMPERATURE \[K\]\s+=\s+(?P<cp2k_md_temperature_instantaneous>{0})\s+(?P<cp2k_md_temperature_average>{0})".format(self.regex_f)),
SM( " i =",
sections=["cp2k_section_md_coordinates"],
otherMetaInfo=["cp2k_md_coordinates"],
dependencies={"cp2k_md_coordinates": ["cp2k_md_coordinate_atom_string"]},
subMatchers=[
SM( " ATOMIC FORCES in \[a\.u\.\]",
sections=["cp2k_section_md_forces"],
subMatchers=[
SM( "\s+\d+\s+\d+\s+[\w\W\d]+\s+(?P<cp2k_md_force_atom_string>{0}\s+{0}\s+{0})".format(self.regex_f),
sections=["cp2k_section_md_force_atom"],
repeats=True,
)
]
),
SM( " STEP NUMBER\s+=\s+(?P<cp2k_md_step_number>\d+)"),
SM( " TIME \[fs\]\s+=\s+(?P<cp2k_md_step_time>\d+\.\d+)"),
SM( " TEMPERATURE \[K\]\s+=\s+(?P<cp2k_md_temperature_instantaneous>{0})\s+(?P<cp2k_md_temperature_average>{0})".format(self.regex_f)),
SM( " i =",
sections=["cp2k_section_md_coordinates"],
otherMetaInfo=["cp2k_md_coordinates"],
dependencies={"cp2k_md_coordinates": ["cp2k_md_coordinate_atom_string"]},
subMatchers=[
SM( " \w+\s+(?P<cp2k_md_coordinate_atom_string>{0}\s+{0}\s+{0})".format(self.regex_f),
endReStr="\n",
sections=["cp2k_section_md_coordinate_atom"],
repeats=True,
)
]
SM( " \w+\s+(?P<cp2k_md_coordinate_atom_string>{0}\s+{0}\s+{0})".format(self.regex_f),
endReStr="\n",
sections=["cp2k_section_md_coordinate_atom"],
repeats=True,
)
]
)
......@@ -129,6 +161,37 @@ class CP2KMainParser(MainHierarchicalParser):
)
]
)
# Compose root matcher according to the run type. This way the
# unnecessary regex parsers will not be compiled and searched. Saves
# computational time.
if run_type in ("ENERGY_FORCE", "FORCE", "WAVEFUNCTION_OPTIMIZATION", "WFN_OPT"):
self.root_matcher = SM("",
forwardMatch=True,
sections=['section_run', "section_system_description", "section_method"],
subMatchers=[
self.header,
self.energy_force
]
)
elif run_type in ("GEO_OPT", "GEOMETRY_OPTIMIZATION"):
self.root_matcher = SM("",
forwardMatch=True,
sections=['section_run', "section_system_description", "section_method"],
subMatchers=[
self.header,
self.geo_opt
]
)
elif run_type in ("MD", "MOLECULAR_DYNAMICS"):
self.root_matcher = SM("",
forwardMatch=True,
sections=['section_run', "section_system_description", "section_method"],
subMatchers=[
self.header,
self.md
]
)
#=======================================================================
# The cache settings
self.caching_level_for_metaname = {
......@@ -168,12 +231,26 @@ class CP2KMainParser(MainHierarchicalParser):
else:
logger.warning("Unknown self-interaction correction method used.")
def onClose_section_single_configuration_calculation(self, backend, gIndex, section):
"""
"""
# If the force file for a single point calculation is available, and
# the forces were not parsed fro the output file, parse the separate
# file
if section["atom_forces"] is None:
force_file = self.file_service.get_file_by_id("force_file_single_point")
if force_file is not None:
force_parser = CP2KSinglePointForceParser(force_file, self.parser_context)
force_parser.parse()
else:
logger.warning("The file containing the forces printed by ENERGY_FORCE calculation could not be found.")
def onClose_cp2k_section_filenames(self, backend, gIndex, section):
"""
"""
# If the input file is available, parse it
input_file = section["cp2k_input_filename"][0]
filepath = self.parser_context.file_service.get_absolute_path_to_file(input_file)
filepath = self.file_service.get_absolute_path_to_file(input_file)
if filepath is not None:
input_parser = CP2KInputParser(filepath, self.parser_context)
input_parser.parse()
......
import logging
import numpy as np
from nomadcore.baseclasses import BasicParser
logger = logging.getLogger("nomad")
#===============================================================================
class CP2KSinglePointForceParser(BasicParser):
"""Used to parse out a force file printed out by a CP2K single point
calculation. It is not exactly an ZYX file, so here we define separate
parser.
"""
def __init__(self, file_path, parser_context):
super(CP2KSinglePointForceParser, self).__init__(file_path, parser_context)
def parse(self):
start = False
forces = []
with open(self.file_path) as f:
for line in f:
if line.startswith(" # Atom"):
start = True
continue
elif line.startswith(" SUM"):
forces = np.array(forces)
self.backend.addArrayValues("atom_forces", forces, unit="forceAu")
break
elif start:
components = [float(comp) for comp in line.split()[-3:]]
forces.append(components)
ATOMIC FORCES in [a.u.]
# Atom Kind Element X Y Z
1 1 Si 0.00000000 0.00000000 0.00000000
2 1 Si 0.00000000 0.00000001 0.00000001
3 1 Si 0.00000001 0.00000001 0.00000000
4 1 Si 0.00000001 0.00000000 0.00000001
5 1 Si -0.00000001 -0.00000001 -0.00000001
6 1 Si -0.00000001 -0.00000001 -0.00000001
7 1 Si -0.00000001 -0.00000001 -0.00000001
8 1 Si -0.00000001 -0.00000001 -0.00000001
SUM OF ATOMIC FORCES -0.00000000 -0.00000000 -0.00000000 0.00000000
&GLOBAL
PROJECT Si_bulk8
RUN_TYPE ENERGY_FORCE
PRINT_LEVEL MEDIUM
&END GLOBAL
&FORCE_EVAL
METHOD Quickstep
&SUBSYS
&KIND Si
ELEMENT Si
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q4
&END KIND
&CELL
A 5.430697500 0.000000000 0.000000000
B 0.000000000 5.430697500 0.000000000
C 0.000000000 0.000000000 5.430697500
&END CELL
&COORD
Si 0.000000000 0.000000000 0.000000000
Si 0.000000000 2.715348700 2.715348700
Si 2.715348700 2.715348700 0.000000000
Si 2.715348700 0.000000000 2.715348700
Si 4.073023100 1.357674400 4.073023100
Si 1.357674400 1.357674400 1.357674400
Si 1.357674400 4.073023100 4.073023100
Si 4.073023100 4.073023100 1.357674400
&END COORD
&END SUBSYS
&DFT
BASIS_SET_FILE_NAME ../../BASIS_SET
POTENTIAL_FILE_NAME ../../GTH_POTENTIALS
&QS
EPS_DEFAULT 1.0E-10
&END QS
&MGRID
NGRIDS 4
CUTOFF 300
REL_CUTOFF 60
&END MGRID
&XC
&XC_FUNCTIONAL PADE
&END XC_FUNCTIONAL
&END XC
&SCF
SCF_GUESS ATOMIC
EPS_SCF 1.0E-7
MAX_SCF 300
&DIAGONALIZATION ON
ALGORITHM STANDARD
&END DIAGONALIZATION
&MIXING T
METHOD BROYDEN_MIXING
ALPHA 0.4
NBROYDEN 8
&END MIXING
&END SCF
&END DFT
&PRINT
&FORCES ON
FILENAME forces
&END FORCES
&END PRINT
&END FORCE_EVAL
This diff is collapsed.
3
i = 1, E = -17.1643447508
O 12.2353220000 1.3766420000 10.8698800000
H 12.4189404412 2.2491702452 11.2652097702
H 11.9149490419 1.5748472981 9.9693615826
3
i = 2, E = -17.1643578234
O 12.2353220000 1.3766420000 10.8698800000
H 12.4200101767 2.2468980462 11.2643381000
H 11.9115895109 1.5703371433 9.9671556911
3
i = 3, E = -17.1643766833
O 12.2353220000 1.3766420000 10.8698800000
H 12.4233662348 2.2477209582 11.2631307672
H 11.9069167231 1.5604096790 9.9689943947
3
i = 4, E = -17.1644565615
O 12.2353220000 1.3766420000 10.8698800000
H 12.4423251455 2.2644837735 11.2309777118
H 11.8922856584 1.5514786478 9.9754905025
3
i = 5, E = -17.1645203350
O 12.2353220000 1.3766420000 10.8698800000
H 12.4431348352 2.2577511412 11.2267313861
H 11.8919233389 1.5518052357 9.9721549140
3
i = 6, E = -17.1646190312
O 12.2353220000 1.3766420000 10.8698800000
H 12.4736086753 2.2567157451 11.2132013565
H 11.9047281262 1.5469224568 9.9655629396
3
i = 7, E = -17.1646198446
O 12.2353220000 1.3766420000 10.8698800000
H 12.4742379375 2.2563137117 11.2125606428
H 11.9051180271 1.5469372555 9.9662564829
3
i = 8, E = -17.1646203294
O 12.2353220000 1.3766420000 10.8698800000
H 12.4749724176 2.2565225180 11.2119388324
H 11.9039582865 1.5481084051 9.9667841313
3
i = 9, E = -17.1646204547
O 12.2353220000 1.3766420000 10.8698800000
H 12.4752917518 2.2561254118 11.2121163217
H 11.9031806595 1.5481019598 9.9671120050
3
i = 10, E = -17.1646204237
O 12.2353220000 1.3766420000 10.8698800000
H 12.4755864059 2.2562726297 11.2125118440
H 11.9026358443 1.5480115451 9.9673498890
3
i = 11, E = -17.1646204766
O 12.2353220000 1.3766420000 10.8698800000
H 12.4754906682 2.2560912723 11.2123985687
H 11.9026556670 1.5480524898 9.9673305122
3
i = 12, E = -17.1646347711
O 12.2353220000 1.3766420000 10.8698800000
H 12.4754916182 2.2560930719 11.2123996927
H 11.9026554703 1.5480520836 9.9673307045
&GLOBAL
PROJECT H2O
RUN_TYPE GEO_OPT
PRINT_LEVEL LOW
&END GLOBAL
&FORCE_EVAL
METHOD QS
&SUBSYS
&CELL
ABC 12.4138 12.4138 12.4138
&END CELL
&COORD
O 12.235322 1.376642 10.869880
H 12.415139 2.233125 11.257611
H 11.922476 1.573799 9.986994
&END COORD
&KIND H
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q1
&END KIND
&KIND O
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q6
&END KIND
&END SUBSYS
&DFT
BASIS_SET_FILE_NAME ./BASIS_SET
POTENTIAL_FILE_NAME ./POTENTIAL
&QS
EPS_DEFAULT 1.0E-7
&END QS
&MGRID
CUTOFF 200
NGRIDS 4
REL_CUTOFF 30
&END MGRID
&SCF
SCF_GUESS ATOMIC
EPS_SCF 1.0E-05
MAX_SCF 200
&DIAGONALIZATION T
ALGORITHM STANDARD
&END DIAGONALIZATION
&MIXING T
ALPHA 0.5
METHOD PULAY_MIXING
NPULAY 5
&END MIXING
&PRINT
&RESTART OFF
&END RESTART
&END PRINT
&END SCF
&XC
&XC_FUNCTIONAL PADE
&END XC_FUNCTIONAL
&END XC
&END DFT
&END FORCE_EVAL
&MOTION
&GEO_OPT
TYPE MINIMIZATION
MAX_DR 1.0E-03
MAX_FORCE 1.0E-03
RMS_DR 1.0E-03
RMS_FORCE 1.0E-03
MAX_ITER 200
OPTIMIZER CG
&CG
MAX_STEEP_STEPS 0
RESTART_LIMIT 9.0E-01
&END CG
&END GEO_OPT
&CONSTRAINT
&FIXED_ATOMS
COMPONENTS_TO_FIX XYZ
LIST 1
&END FIXED_ATOMS
&END CONSTRAINT