Commit a8950983 authored by Lauri Himanen's avatar Lauri Himanen
Browse files

Added unit conversion routines.

parent 912790d4
...@@ -37,12 +37,6 @@ Currently the python package is divided into three subpackages: ...@@ -37,12 +37,6 @@ Currently the python package is divided into three subpackages:
- Generics: Generic utility classes and base classes - Generics: Generic utility classes and base classes
- Implementation: The classes that actually define the parser functionality. - Implementation: The classes that actually define the parser functionality.
# Reusable components and ideas for other parsers
Some components and ideas could be reused in other parsers as well. If you find
any of the following ideas useful in you parser, you are welcome to reuse
them.
## Engines ## Engines
Basically all the "engines", that is the modules that parse certain type of Basically all the "engines", that is the modules that parse certain type of
files, are reusable as is in other parsers. They could be put into a common files, are reusable as is in other parsers. They could be put into a common
...@@ -64,7 +58,7 @@ flexible nature as you can specify comments, column delimiters, column ...@@ -64,7 +58,7 @@ flexible nature as you can specify comments, column delimiters, column
indices and the patterns used to separate different configurations. indices and the patterns used to separate different configurations.
- XMLEngine: For parsing XML files using XPath syntax. - XMLEngine: For parsing XML files using XPath syntax.
## NomadParser base class ## Generics
In the generics folder there is a module called nomadparser.py that defines a In the generics folder there is a module called nomadparser.py that defines a
class called NomadParser. This acts as a base class for the cp2k parser defined class called NomadParser. This acts as a base class for the cp2k parser defined
in the implementation folder. in the implementation folder.
...@@ -80,6 +74,14 @@ parsers: ...@@ -80,6 +74,14 @@ parsers:
- Time measurement for performance analysis - Time measurement for performance analysis
- Providing file contents, sizes and handles - Providing file contents, sizes and handles
# Tools and Methods
The following is a list of tools/methods that can help the development process.
## Documentation
The [google style guide](https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments) provides a good template on how to document your code.
Documenting makes it much easier to follow the logic behind your parser.
## Logging ## Logging
Python has a great [logging package](https://www.google.com) which helps in Python has a great [logging package](https://www.google.com) which helps in
following the program flow and catching different errors and warnings. In following the program flow and catching different errors and warnings. In
...@@ -89,9 +91,17 @@ easily readable formatting is also provided for the log messages. ...@@ -89,9 +91,17 @@ easily readable formatting is also provided for the log messages.
## Testing ## Testing
The parsers can become quite complicated and maintaining them without The parsers can become quite complicated and maintaining them without
systematic testing is perhaps not a good idea. Unittests provide one way to systematic testing is perhaps not a good idea. Unit tests provide one way to
test each parseable quantity and python has a very good [library for test each parseable quantity and python has a very good [library for
unittesting](https://docs.python.org/2/library/unittest.html). unit testing](https://docs.python.org/2/library/unittest.html).
## Unit conversion
The NoMaD parsers need a unified approach to unit conversion. The parsers
should use the same set of physical constants, and a system that does the
conversion semiautomatically. I would propose using
[Pint](https://pint.readthedocs.org/en/0.6/) as it has a very natural syntax
and an easily reconfigurable constant/unit declaration mechanisms. The
constants and units can be shared as simple text files across all parsers.
## Profiling ## Profiling
The parsers have to be reasonably fast. For some codes there is already The parsers have to be reasonably fast. For some codes there is already
...@@ -105,3 +115,6 @@ parsing you can identify the bottlenecks in the parser. There are already ...@@ -105,3 +115,6 @@ parsing you can identify the bottlenecks in the parser. There are already
existing profiling tools such as existing profiling tools such as
[cProfile](https://docs.python.org/2/library/profile.html#module-cProfile) [cProfile](https://docs.python.org/2/library/profile.html#module-cProfile)
which you can plug into your scripts very easily. which you can plug into your scripts very easily.
#! /usr/bin/env python #! /usr/bin/env python
import cp2kparser.generics.logconfig import cp2kparser.generics.logconfig
from pint import UnitRegistry
import os
ureg = UnitRegistry()
ureg.load_definitions(os.path.dirname(__file__)+"/unit_registry.txt")
# Pint constants for NomadParsers
# source: http://physics.nist.gov/cuu/Constants/Table/allascii.txt
# Bohr radius
bohr = 0.52917721067e-10 m
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os import os
import logging import logging
import cPickle as pickle import cPickle as pickle
...@@ -13,21 +11,33 @@ class CP2KInputEngine(object): ...@@ -13,21 +11,33 @@ class CP2KInputEngine(object):
When given a file handle to a CP2K input file, this class attemts to parse When given a file handle to a CP2K input file, this class attemts to parse
out it's structure into an accessible object tree. out it's structure into an accessible object tree.
""" """
def __init__(self, parser): def __init__(self):
"""
Args:
parser: Instance of a NomadParser or it's subclass. Allows
access to e.g. unified file reading methods.
"""
self.parser = parser
self.root_section = None self.root_section = None
self.input_tree = None self.input_tree = None
def parse(self): def parse(self, inp):
"""Parses the CP2K input file into an object tree. """Parses a CP2K input file into an object tree.
Return an object tree represenation of the input augmented with the
default values and lone keyword values from the cp2k_input.xml file
which is version specific. Keyword aliases are also mapped to the same data.
To query the returned tree use the following functions:
get_keyword("GLOBAL/PROJECT_NAME")
get_parameter("GLOBAL/PRINT")
get_default_keyword("FORCE_EVAL/SUBSYS/COORD")
Args:
inp: A string containing the contents of a CP2K input file. The
input file can be stored as string as it isn't that big.
Returns:
The input as an object tree.
""" """
# The input file should be quite small, so just get the entire contents # See if version is setup
inp = self.parser.get_file_contents("input") if self.input_tree is None:
logger.error("Please setup the CP2K version before parsing")
return
section_stack = [] section_stack = []
...@@ -60,383 +70,13 @@ class CP2KInputEngine(object): ...@@ -60,383 +70,13 @@ class CP2KInputEngine(object):
keyword_value = split[1] keyword_value = split[1]
self.input_tree.set_keyword(path + "/" + keyword_name, keyword_value) self.input_tree.set_keyword(path + "/" + keyword_name, keyword_value)
def get_input_tree(self):
if self.input_tree is not None:
return self.input_tree return self.input_tree
else:
logger.error("Input tree not yet created.")
def setup_version_number(self, version_number): def setup_version_number(self, version_number):
""" The pickle file which contains preparsed data from the
cp2k_input.xml is version specific. By calling this function before
parsing the correct file can be found.
"""
pickle_path = os.path.dirname(__file__) + "/cp2kinputenginedata/cp2k_{}/cp2k_input_tree.pickle".format(version_number) pickle_path = os.path.dirname(__file__) + "/cp2kinputenginedata/cp2k_{}/cp2k_input_tree.pickle".format(version_number)
input_tree_pickle_file = open(pickle_path, 'rb') input_tree_pickle_file = open(pickle_path, 'rb')
self.input_tree = pickle.load(input_tree_pickle_file) self.input_tree = pickle.load(input_tree_pickle_file)
#===============================================================================
# Run main function by default
# if __name__ == "__main__":
# input_file = open("../tests/cp2k_2.6.2/functionals/lda/lda.inp", 'r').read()
# engine = CP2KInputEngine()
# engine.setup_version_number(262)
# engine.parse(input_file)
#===============================================================================
# class InputSection(object):
# """Represents a section in a CP2K input file"""
# def __init__(self, name, params=None):
# self.name = name.upper()
# self.params = params
# self.keywords = defaultdict(list)
# self.subsections = defaultdict(list)
# def write(self):
# """Outputs input section as string"""
# output = []
# for name, k_list in self.keywords.iteritems():
# for value in k_list:
# output.append(value)
# for name, s_list in self.subsections.iteritems():
# for s in s_list:
# if s.params:
# output.append('&%s %s' % (s.name, s.params))
# else:
# output.append('&%s' % s.name)
# for l in s.write():
# output.append(' %s' % l)
# output.append('&END %s' % s.name)
# return output
# def get_subsection(self, path, index=0):
# """Finds a subsection specified by a string where subsections are
# separated by a slash. If multiple subsections are found with the same
# path, the one specified by the given index (default 0) is returned.
# Example: get_subsection("FORCE_EVAL/PRINT/FORCES")
# Args:
# path: String indicating the path to the subsection
# index: In case of repeating subsections, return the one specified
# by this index.
# Returns:
# The InputSection object if found.
# """
# parts = path.upper().split('/', 1)
# candidates = self.subsections.get(parts[0]) # [s for s in self.subsections if s.name == parts[0]]
# if not candidates:
# logger.debug("Subsection '{}' not found.".format(parts[0]))
# return None
# elif len(candidates) > 1:
# logger.warning("Multiple subsections with the same name found with name '{}' If no index is given, the first occurence in the input file is returned.".format(parts[0]))
# try:
# subsection = candidates[index]
# except IndexError:
# logger.error("Invalid subsection index given.")
# if len(parts) == 1:
# return subsection
# return subsection.get_subsection(parts[1])
# def get_keyword(self, keyword, section_path, engine, index=0):
# """Finds a keyword specified by a string. If multiple keywords are
# found with the same name, the one specified by the given index (default
# 0) is returned. If the keyword is not explicitly set, returns the
# default specified by the cp2k version specific XML file.
# Args:
# keyword: String indicating the name of the keyword. The name is the
# first word in the line.
# index: In case of repeating keywords, return the one specified
# by this index.
# Returns:
# The keyword value (everything else than the first word on the line).
# """
# candidates = self.keywords.get(keyword)
# if not candidates:
# logger.debug("No keywords with name '{}' found in subsection '{}'. Using the default XML value.".format(keyword, self.name))
# # Form a XPath from the given path
# xpath = "."
# sections = section_path.split("/")
# for section in sections:
# xpath += "/SECTION[NAME='{}']".format(section)
# xpath += "/KEYWORD[NAME='{}']/DEFAULT_VALUE".format(keyword)
# xml_file = engine.get_xml_file()
# xmlengine = engine.parser.xmlengine
# result = xmlengine.parse(xml_file, xpath)
# return result[0].text
# elif len(candidates) > 1:
# logger.warning("Multiple keywords found with name '{}'. If no index is given, the first occurence in the input file is returned.".format(keyword))
# try:
# result = candidates[index]
# except IndexError:
# logger.error("Invalid keyword index given.")
# return result
# def get_parameter(self, engine, path):
# """Return the SECTION_PARAMETER for this InputSection. If none is
# explicitly set, return the default specified by the cp2k version
# specific XML file.
# """
# if self.params is None:
# # Form a XPath from the given path
# xpath = "."
# sections = path.split("/")
# for section in sections:
# xpath += "/SECTION[NAME='{}']".format(section)
# xpath += "/SECTION_PARAMETERS/LONE_KEYWORD_VALUE"
# xml_file = engine.get_xml_file()
# xmlengine = engine.parser.xmlengine
# result = xmlengine.parse(xml_file, xpath)
# return result[0].text
# return self.params
#===============================================================================
# class CP2KInputEngine(object):
# """Used to parse out a CP2K input file.
# When given a file handle to a CP2K input file, this class attemts to parse
# out it's structure into an accessible object tree. Because the input file
# has such a clearly defined structure (unlike the output file of CP2K), it
# is better to use a dedicated parser instead of regular expressions.
# """
# def __init__(self, parser):
# """
# Args:
# parser: Instance of a NomadParser or it's subclass. Allows
# access to e.g. unified file reading methods.
# """
# self.parser = parser
# self.root_section = None
# self.xml_file = None
# def parse_input(self):
# """Parses the given CP2K input string. Default any aliases used for
# keywords to the default names.
# """
# # The input file should be quite small, so just get the entire contents
# inp = self.parser.get_file_contents("input")
# root_section = InputSection('CP2K_INPUT')
# section_stack = [root_section]
# for line in inp.split('\n'):
# line = line.split('!', 1)[0].strip()
# if len(line) == 0:
# continue
# if line.upper().startswith('&END'):
# s = section_stack.pop()
# elif line[0] == '&':
# parts = line.split(' ', 1)
# name = parts[0][1:]
# if len(parts) > 1:
# s = InputSection(name=name, params=parts[1].strip())
# else:
# s = InputSection(name=name)
# section_stack[-1].subsections[name.upper()].append(s)
# section_stack.append(s)
# else:
# split = line.split(' ', 1)
# keyword_name = split[0]
# normalized_keyword = self.normalize_keyword(keyword_name)
# keyword_value = split[1]
# section_stack[-1].keywords[normalized_keyword].append(keyword_value)
# self.root_section = root_section
# def get_subsection(self, path, index=0):
# return self.root_section.get_subsection(path, index)
# def get_keyword(self, path, index=0):
# split = path.rsplit('/', 1)
# section_path = split[0]
# normalized_keyword = self.normalize_keyword(path)
# section = self.root_section.get_subsection(section_path, index)
# if section is not None:
# return section.get_keyword(normalized_keyword, section_path, self)
# def get_parameter(self, path, index=0):
# section = self.root_section.get_subsection(path, index)
# if section is not None:
# return section.get_parameter(self, path)
# def setup_version_number(self, version_number):
# xml_file_path = os.path.dirname(__file__) + "/cp2kinputenginedata/xml/cp2k_{}/cp2k_input.xml".format(version_number)
# self.xml_file = open(xml_file_path, 'r')
# def get_xml_file(self):
# """Return the file handle that has been reset to the beginning.
# """
# self.xml_file.seek(os.SEEK_SET)
# return self.xml_file
# def create_section_xpath(self, path):
# """Strip the last part of the path and get the xpart for the remaining
# part.
# """
# # Form a XPath from the given path
# xpath = "."
# splitted_path = path.split("/")
# sections = splitted_path[:-1]
# keyword = splitted_path[-1]
# for section in sections:
# xpath += "/SECTION[NAME='{}']".format(section)
# return xpath, keyword
# def normalize_keyword(self, path):
# """Translate every section and keyword in the input file to the default
# name (=remove aliases).
# """
# xml_file = self.get_xml_file()
# # See if already normalized
# section_xpath, keyword = self.create_section_xpath(path)
# xml_engine = self.parser.xmlengine
# section = xml_engine.parse(xml_file, section_xpath)[0]
# # Find if default
# default_xpath = section_xpath + "/KEYWORD/[NAME='{}'][@type='default']".format(keyword)
# default_name = xml_engine.parse(section, default_xpath)
# if default_name:
# return keyword
# # If alias, find default
# # default_xpath = section_xpath + "/KEYWORD/[NAME='{}'][@type='alias']../KEYWORD/[@type='default']".format(keyword)
# # default_name = xml_engine.parse(section, default_xpath)
# return None #default_name[0].text
# #===============================================================================
# class InputSection(object):
# """Represents a section in a CP2K input file"""
# def __init__(self, name, params=None):
# self.name = name.upper()
# self.params = params
# self.keywords = defaultdict(list)
# self.subsections = defaultdict(list)
# def write(self):
# """Outputs input section as string"""
# output = []
# for name, k_list in self.keywords.iteritems():
# for value in k_list:
# output.append(value)
# for name, s_list in self.subsections.iteritems():
# for s in s_list:
# if s.params:
# output.append('&%s %s' % (s.name, s.params))
# else:
# output.append('&%s' % s.name)
# for l in s.write():
# output.append(' %s' % l)
# output.append('&END %s' % s.name)
# return output
# def get_subsection(self, path, index=0):
# """Finds a subsection specified by a string where subsections are
# separated by a slash. If multiple subsections are found with the same
# path, the one specified by the given index (default 0) is returned.
# Example: get_subsection("FORCE_EVAL/PRINT/FORCES")
# Args:
# path: String indicating the path to the subsection
# index: In case of repeating subsections, return the one specified
# by this index.
# Returns:
# The InputSection object if found.
# """
# parts = path.upper().split('/', 1)
# candidates = self.subsections.get(parts[0]) # [s for s in self.subsections if s.name == parts[0]]
# if not candidates:
# logger.debug("Subsection '{}' not found.".format(parts[0]))
# return None
# elif len(candidates) > 1:
# logger.warning("Multiple subsections with the same name found with name '{}' If no index is given, the first occurence in the input file is returned.".format(parts[0]))
# try:
# subsection = candidates[index]
# except IndexError:
# logger.error("Invalid subsection index given.")
# if len(parts) == 1:
# return subsection
# return subsection.get_subsection(parts[1])
# def get_keyword(self, keyword, section_path, engine, index=0):
# """Finds a keyword specified by a string. If multiple keywords are
# found with the same name, the one specified by the given index (default
# 0) is returned. If the keyword is not explicitly set, returns the
# default specified by the cp2k version specific XML file.
# Args:
# keyword: String indicating the name of the keyword. The name is the
# first word in the line.
# index: In case of repeating keywords, return the one specified
# by this index.
# Returns:
# The keyword value (everything else than the first word on the line).
# """
# candidates = self.keywords.get(keyword)
# if not candidates:
# logger.debug("No keywords with name '{}' found in subsection '{}'. Using the default XML value.".format(keyword, self.name))
# # Form a XPath from the given path
# xpath = "."
# sections = section_path.split("/")
# for section in sections:
# xpath += "/SECTION[NAME='{}']".format(section)
# xpath += "/KEYWORD[NAME='{}']/DEFAULT_VALUE".format(keyword)
# xml_file = engine.get_xml_file()
# xmlengine = engine.parser.xmlengine
# result = xmlengine.parse(xml_file, xpath)
# return result[0].text
# elif len(candidates) > 1:
# logger.warning("Multiple keywords found with name '{}'. If no index is given, the first occurence in the input file is returned.".format(keyword))
# try:
# result = candidates[index]
# except IndexError:
# logger.error("Invalid keyword index given.")
# return result
# def get_parameter(self, engine, path):
# """Return the SECTION_PARAMETER for this InputSection. If none is
# explicitly set, return the default specified by the cp2k version
# specific XML file.
# """
# if self.params is None:
# # Form a XPath from the given path
# xpath = "."
# sections = path.split("/")
# for section in sections:
# xpath += "/SECTION[NAME='{}']".format(section)
# xpath += "/SECTION_PARAMETERS/LONE_KEYWORD_VALUE"
# xml_file = engine.get_xml_file()
# xmlengine = engine.parser.xmlengine
# result = xmlengine.parse(xml_file, xpath)
# return result[0].text
# return self.params
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os import os
import logging import logging
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
......
...@@ -6,6 +6,7 @@ import time ...@@ -6,6 +6,7 @@ import time
from abc import ABCMeta, abstractmethod from abc import ABCMeta, abstractmethod
import logging import logging
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
from cp2kparser import ureg
#=============================================================================== #===============================================================================
...@@ -106,12 +107,12 @@ class NomadParser(object): ...@@ -106,12 +107,12 @@ class NomadParser(object):
def get_quantity(self, name): def get_quantity(self, name):
"""Given a unique quantity id which is present in the metainfo """Given a unique quantity id which is present in the metainfo
declaration, parses the corresponding quantity (if available) and declaration, parses the corresponding quantity (if available), converts
return the value as json. it to SI units and return the value as json.
""" """
# Start timing # Start timing
logger.debug(74*'-') logger.debug(74*'-')
logger.debug("Getting quantity '{}'".format(name)) logger.info("Getting quantity '{}'".format(name))
start = time.clock() start = time.clock()
#Check availability #Check availability
...@@ -124,7 +125,7 @@ class NomadParser(object): ...@@ -124,7 +125,7 @@ class NomadParser(object):
result = self.results.get(name) result = self.results.get(name)
if not result: if not result:
# Ask the engine for the quantity # Ask the engine for the quantity
result = self.get_unformatted_quantity(name) result = self.get_quantity_json(name)
self.results[name] = result self.results[name] = result
else: else:
logger.debug("Using cached result.") logger.debug("Using cached result.")
...@@ -135,14 +136,28 @@ class NomadParser(object): ...@@ -135,14 +136,28 @@ class NomadParser(object):
if result is None: if result is None:
logger.info("There was an issue in parsing quantity '{}'. It is either not present in the files or could not be succesfully parsed.".format(name)) logger.info("There was an issue in parsing quantity '{}'. It is either not present in the files or could not be succesfully parsed.".format(name))
else: else:
logger.info("Succesfully parsed quantity '{}'. Result:\n{}".format(name, result)) logger.info("Succesfully parsed quantity '{}'.".format(name))
# Do the conversion to SI units based on the given units
stop = time.clock() stop = time.clock()
logger.debug("Elapsed time: {} ms".format((stop-start)*1000)) logger.info("Elapsed time: {} ms".format((stop-start)*1000))
# logger.info("Result: {}".format(result))
return result return result
def get_quantity_json(self, name):
result_si = self.get_quantity_SI(name)
return result_si
def get_quantity_SI(self, name):
result = self.get_quantity_unformatted(name)
# Do the conversion to SI units based on the given units