Commit 17208913 authored by Lauri Himanen's avatar Lauri Himanen
Browse files

Working on reading different atomic coordinate files

parent 7bccc7dc
......@@ -39,7 +39,7 @@ Currently the python package is divided into three subpackages:
## Engines
Basically all the "engines", that is the modules that parse certain type of
files, are reusable as is in other parsers. They could be put into a common
files, are reusable in other parsers. They could be put into a common
repository where other developers can improve and extend them. One should also
write tests for the engines that would validate their behaviour and ease the
performance analysis.
......@@ -93,17 +93,21 @@ easily readable formatting is also provided for the log messages.
## Testing
The parsers can become quite complicated and maintaining them without
systematic testing is perhaps not a good idea. Unit tests provide one way to
systematic testing can become troublesome. Unit tests provide one way to
test each parseable quantity and python has a very good [library for
unit testing](https://docs.python.org/2/library/unittest.html).
unit testing](https://docs.python.org/2/library/unittest.html). When the parser
supports a new quantity it is quite fast to create unit tests for it. These
tests will validate the parsing, and also easily detect bugs that may rise when
the code is modified in the future.
## Unit conversion
The NoMaD parsers need a unified approach to unit conversion. The parsers
should use the same set of physical constants, and a system that does the
conversion semiautomatically. I would propose using
[Pint](https://pint.readthedocs.org/en/0.6/) as it has a very natural syntax
and an easily reconfigurable constant/unit declaration mechanisms. The
constants and units can be shared as simple text files across all parsers.
[Pint](https://pint.readthedocs.org/en/0.6/) as it has a very natural syntax,
support for numpy arrays and an easily reconfigurable constant/unit declaration
mechanisms. The constants and units can be shared as simple text files across
all parsers.
## Profiling
The parsers have to be reasonably fast. For some codes there is already
......
import ase.io
import logging
import MDAnalysis
logger = logging.getLogger(__name__)
#===============================================================================
class AtomsEngine(object):
"""Used to parse various different atomic coordinate files.
Initially use ASE for all file types, if needed add new types or make
own implementations.
Supports the following file formats:
- xyz (.xyz):
- cif (.cif): Crystallographic Information File
- pdb (.pdb): Protein Data Bank
Reading is primarily done by ASE or MDAnalysis, but in some cases own
implementation has to be made.
"""
def parse_atoms(self, contents, index=None, format=None):
atoms = ase.io.read(contents, index=index, format=format)
return atoms
def determine_tool(self, format):
ASE = "ASE"
# MDAnalysis = "MDAnalysis"
# custom = "custom"
formats = {
"xyz": ASE,
"cif": ASE,
"pdb": ASE,
}
result = formats.get(format)
if result:
return result
else:
logger.warning("The format '{}' is not supported by AtomsEngine.".format(format))
def parse_number(self, contents, format=None):
atoms = ase.io.read(contents, index=0, format=format)
n_atoms = atoms.get_number_of_atoms()
return n_atoms
def parse_n_atoms(self, contents, format):
"""Read the first configuration of the coordinate file to extract the
number of atoms in it.
"""
# Figure out which tool to use
tool = self.determine_tool(format)
n_atoms = None
def parse_coordinates(self, contents, index, format=None):
atoms = ase.io.read(contents, index=index, format=format)
coordinates = atoms.get_positions()
return coordinates
if tool == "ASE":
atoms = ase.io.read(contents, index=0, format=format)
n_atoms = atoms.get_number_of_atoms()
return n_atoms
if tool == "MDAnalysis":
u = MDAnalysis.Universe(contents.name)
n_atoms = len(u.atoms)
return n_atoms
......@@ -11,6 +11,8 @@ def scan_path_for_files(path):
".inp",
".out",
".xyz",
".cif",
".pdb",
}
files = []
for filename in os.listdir(path):
......
......@@ -138,6 +138,7 @@ class CP2KParser(NomadParser):
# Determine the presence of an initial coordinate file
init_coord_file = self.input_tree.get_keyword("FORCE_EVAL/SUBSYS/TOPOLOGY/COORD_FILE_NAME")
if init_coord_file is not None:
logger.debug("Initial coordinate file found.")
# Check against the given files
file_path = self.search_file(init_coord_file)
self.file_ids["initial_coordinates"] = file_path
......@@ -382,26 +383,34 @@ class CP2KImplementation(object):
# Check where the coordinates are specified
coord_format = self.input_tree.get_keyword("FORCE_EVAL/SUBSYS/TOPOLOGY/COORD_FILE_FORMAT")
# Check if the unit cell is multiplied programmatically
multiples = self.input_tree.get_keyword("FORCE_EVAL/SUBSYS/TOPOLOGY/MULTIPLE_UNIT_CELL")
factors = [int(x) for x in multiples.split()]
factor = np.prod(np.array(factors))
# See if the coordinates are provided in the input file
if coord_format == "OFF":
logger.debug("Using coordinates from the input file.")
coords = self.input_tree.get_default_keyword("FORCE_EVAL/SUBSYS/COORD")
coords.strip()
n_particles = coords.count("\n")
result.value = n_particles
return result
elif coord_format == "CP2K":
msg = "Unsupported coordinate file format: '{}'".format(coord_format)
result.value = factor*n_particles
elif coord_format in ["CP2K", "G96", "XTL"]:
msg = "Tried to read the number of atoms from the initial configuration, but the parser does not yet support the '{}' format that is used by file '{}'.".format(coord_format, self.parser.file_ids["initial_coordinates"])
logger.warning(msg)
result.error_message = msg
result.code = ResultCode.fail
return result
# External file
init_coord_file = self.parser.get_file_handle("initial_coordinates")
n_particles = self.atomsengine.parse_number(init_coord_file, format="xyz")
result.value = n_particles
else:
# External file, use AtomsEngine
init_coord_file = self.parser.get_file_handle("initial_coordinates")
if coord_format == "XYZ":
n_particles = self.atomsengine.parse_n_atoms(init_coord_file, format="xyz")
if coord_format == "CIF":
n_particles = self.atomsengine.parse_n_atoms(init_coord_file, format="cif")
if coord_format == "PDB":
n_particles = self.atomsengine.parse_n_atoms(init_coord_file, format="pdb")
result.value = factor*n_particles
return result
......
from ase import Atoms
import ase.io
sys = Atoms("NaCl", positions=[(0.000000, 0.000000, -0.065587), (0.000000, -0.757136, 0.520545)])
ase.io.write("./pdb/n/coords.pdb", sys)
&FORCE_EVAL
METHOD Quickstep
&DFT
BASIS_SET_FILE_NAME ../../../data/BASIS_SET
POTENTIAL_FILE_NAME ../../../data/POTENTIAL
&MGRID
CUTOFF 50
&END MGRID
&QS
EPS_DEFAULT 1.0E-6
&END QS
&SCF
EPS_SCF 1.0E-4
SCF_GUESS ATOMIC
&END SCF
&XC
&XC_FUNCTIONAL Pade
&END XC_FUNCTIONAL
&END XC
&END DFT
&SUBSYS
&CELL
ABC 6.0 6.0 6.0
&END CELL
&KIND Na
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q1
&END KIND
&KIND Cl
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q7
&END KIND
&TOPOLOGY
COORD_FILE_NAME coords.cif
COORD_FILE_FORMAT CIF
CONN_FILE_FORMAT OFF
&END
&END SUBSYS
&PRINT
&FORCES ON
&END FORCES
&END PRINT
&END FORCE_EVAL
&GLOBAL
PROJECT_NAME NaCl
RUN_TYPE ENERGY_FORCE
PRINT_LEVEL LOW
&END GLOBAL
DBCSR| Multiplication driver SMM
DBCSR| Multrec recursion limit 512
DBCSR| Multiplication stack size 1000
DBCSR| Multiplication size stacks 3
DBCSR| Use subcommunicators T
DBCSR| Use MPI combined types F
DBCSR| Use MPI memory allocation T
DBCSR| Use Communication thread T
DBCSR| Communication thread load 87
**** **** ****** ** PROGRAM STARTED AT 2015-11-24 15:57:27.799
***** ** *** *** ** PROGRAM STARTED ON lauri-Lenovo-Z50-70
** **** ****** PROGRAM STARTED BY lauri
***** ** ** ** ** PROGRAM PROCESS ID 13910
**** ** ******* ** PROGRAM STARTED IN /home/lauri/Dropbox/nomad-dev/gitlab/
parser-cp2k/cp2kparser/tests/cp2k_2.6
.2/particle_number/cif/n
CP2K| version string: CP2K version 2.6.2
CP2K| source code revision number: svn:15893
CP2K| is freely available from http://www.cp2k.org/
CP2K| Program compiled at ke 4.11.2015 08.48.42 +0200
CP2K| Program compiled on lauri-Lenovo-Z50-70
CP2K| Program compiled for Linux-x86-64-gfortran_basic
CP2K| Input file name NaCl.inp
GLOBAL| Force Environment number 1
GLOBAL| Basis set file name ../../../data/BASIS_SET
GLOBAL| Geminal file name BASIS_GEMINAL
GLOBAL| Potential file name ../../../data/POTENTIAL
GLOBAL| MM Potential file name MM_POTENTIAL
GLOBAL| Coordinate file name coords.cif
GLOBAL| Method name CP2K
GLOBAL| Project name NaCl
GLOBAL| Preferred FFT library FFTW3
GLOBAL| Preferred diagonalization lib. SL
GLOBAL| Run type ENERGY_FORCE
GLOBAL| All-to-all communication in single precision F
GLOBAL| FFTs using library dependent lengths F
GLOBAL| Global print level LOW
GLOBAL| Total number of message passing processes 1
GLOBAL| Number of threads for this process 1
GLOBAL| This output is from process 0
MEMORY| system memory details [Kb]
MEMORY| rank 0 min max average
MEMORY| MemTotal 8070396 8070396 8070396 8070396
MEMORY| MemFree 4005012 4005012 4005012 4005012
MEMORY| Buffers 204292 204292 204292 204292
MEMORY| Cached 2021788 2021788 2021788 2021788
MEMORY| Slab 214568 214568 214568 214568
MEMORY| SReclaimable 175888 175888 175888 175888
MEMORY| MemLikelyFree 6406980 6406980 6406980 6406980
*** 15:57:27 WARNING in topology_cif:read_coordinate_cif :: The field ***
*** (_symmetry_equiv_pos_as_xyz) was not found in CIF file! ***
*** topology_cif.F line 314 ***
*******************************************************************************
*******************************************************************************
** **
** ##### ## ## **
** ## ## ## ## ## **
** ## ## ## ###### **
** ## ## ## ## ## ##### ## ## #### ## ##### ##### **
** ## ## ## ## ## ## ## ## ## ## ## ## ## ## **
** ## ## ## ## ## ## ## #### ### ## ###### ###### **
** ## ### ## ## ## ## ## ## ## ## ## ## **
** ####### ##### ## ##### ## ## #### ## ##### ## **
** ## ## **
** **
** ... make the atoms dance **
** **
** Copyright (C) by CP2K Developers Group (2000 - 2014) **
** **
*******************************************************************************
SCF PARAMETERS Density guess: ATOMIC
--------------------------------------------------------
max_scf: 50
max_scf_history: 0
max_diis: 4
--------------------------------------------------------
eps_scf: 1.00E-04
eps_scf_history: 0.00E+00
eps_diis: 1.00E-01
eps_eigval: 1.00E-05
--------------------------------------------------------
level_shift [a.u.]: 0.00
--------------------------------------------------------
Mixing method: DIRECT_P_MIXING
--------------------------------------------------------
No outer SCF
Number of electrons: 8
Number of occupied orbitals: 4
Number of molecular orbitals: 4
Number of orbital functions: 30
Number of independent orbital functions: 30
Extrapolation method: initial_guess
SCF WAVEFUNCTION OPTIMIZATION
Step Update method Time Convergence Total energy Change
------------------------------------------------------------------------------
1 P_Mix/Diag. 0.40E+00 0.0 14.58121016 -13.4539324305 -1.35E+01
2 P_Mix/Diag. 0.40E+00 0.0 6.57076641 -13.6696217686 -2.16E-01
3 P_Mix/Diag. 0.40E+00 0.0 3.16009711 -13.7773622218 -1.08E-01
4 P_Mix/Diag. 0.40E+00 0.0 2.45355010 -13.8411453117 -6.38E-02
5 P_Mix/Diag. 0.40E+00 0.0 1.25066477 -13.8792699947 -3.81E-02
6 P_Mix/Diag. 0.40E+00 0.0 0.91325473 -13.9021310128 -2.29E-02
7 P_Mix/Diag. 0.40E+00 0.0 0.49828656 -13.9158560001 -1.37E-02
8 P_Mix/Diag. 0.40E+00 0.0 0.35039713 -13.9240981610 -8.24E-03
9 P_Mix/Diag. 0.40E+00 0.0 0.20187320 -13.9290474183 -4.95E-03
10 P_Mix/Diag. 0.40E+00 0.0 0.13751199 -13.9320188235 -2.97E-03
11 P_Mix/Diag. 0.40E+00 0.0 0.08201192 -13.9338024646 -1.78E-03
12 DIIS/Diag. 0.13E-03 0.0 0.05796513 -13.9348729755 -1.07E-03
13 DIIS/Diag. 0.25E-05 0.0 0.00032765 -13.9364792678 -1.61E-03
14 DIIS/Diag. 0.36E-05 0.0 0.00020918 -13.9364792678 -1.74E-11
15 DIIS/Diag. 0.28E-05 0.0 0.00007961 -13.9364792678 -6.43E-12
*** SCF run converged in 15 steps ***
Electronic density on regular grids: -7.9999701621 0.0000298379
Core density on regular grids: 7.9999557490 -0.0000442510
Total charge density on r-space grids: -0.0000144131
Total charge density g-space grids: -0.0000144131
Overlap energy of the core charge distribution: 0.24643842291154
Self energy of the core charge distribution: -34.03233561623406
Core Hamiltonian energy: 9.17629879768523
Hartree energy: 13.84903265236433
Exchange-correlation energy: -3.17591352456281
Total energy: -13.93647926783578
ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -13.936479267846462
ATOMIC FORCES in [a.u.]
# Atom Kind Element X Y Z
1 1 Na 0.00000002 1.58232685 -1.22548119
2 2 Cl 0.00000025 -1.59619348 1.24072651
SUM OF ATOMIC FORCES 0.00000027 -0.01386663 0.01524532 0.02060833
-------------------------------------------------------------------------------
- -
- DBCSR STATISTICS -
- -
-------------------------------------------------------------------------------
COUNTER CPU ACC ACC%
number of processed stacks 48 0 0.0
matmuls inhomo. stacks 0 0 0.0
matmuls total 48 0 0.0
flops 13 x 13 x 4 21632 0 0.0
flops 13 x 17 x 4 28288 0 0.0
flops 17 x 17 x 4 36992 0 0.0
flops total 86912 0 0.0
marketing flops 115200
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
- -
- MESSAGE PASSING PERFORMANCE -
- -
-------------------------------------------------------------------------------
ROUTINE CALLS TOT TIME [s] AVE VOLUME [Bytes] PERFORMANCE [MB/s]
MP_Group 5 0.000
MP_Bcast 19 0.000 6. 0.45
MP_Allreduce 233 0.000 11. 22.97
MP_Sync 4 0.000
MP_Alltoall 361 0.000 2028. 3262.04
MP_Wait 384 0.000
MP_ISend 128 0.001 1000. 197.92
MP_IRecv 128 0.000 1000. 1869.84
MP_Memory 384 0.000
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
- -
- R E F E R E N C E S -
- -
-------------------------------------------------------------------------------
CP2K version 2.6.2, the CP2K developers group (2015).
CP2K is freely available from http://www.cp2k.org/ .
Borstnik, U; VandeVondele, J; Weber, V; Hutter, J.
PARALLEL COMPUTING, 40 (5-6), 47-58 (2014).
Sparse matrix multiplication: The distributed block-compressed sparse
row library.
http://dx.doi.org/10.1016/j.parco.2014.03.012
Hutter, J; Iannuzzi, M; Schiffmann, F; VandeVondele, J.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 4 (1), 15-25 (2014).
CP2K: atomistic simulations of condensed matter systems.
http://dx.doi.org/10.1002/wcms.1159
Krack, M.
THEORETICAL CHEMISTRY ACCOUNTS, 114 (1-3), 145-152 (2005).
Pseudopotentials for H to Kr optimized for gradient-corrected
exchange-correlation functionals.
http://dx.doi.org/10.1007/s00214-005-0655-y
VandeVondele, J; Krack, M; Mohamed, F; Parrinello, M; Chassaing, T;
Hutter, J. COMPUTER PHYSICS COMMUNICATIONS, 167 (2), 103-128 (2005).
QUICKSTEP: Fast and accurate density functional calculations using a
mixed Gaussian and plane waves approach.
http://dx.doi.org/10.1016/j.cpc.2004.12.014
Frigo, M; Johnson, SG.
PROCEEDINGS OF THE IEEE, 93 (2), 216-231 (2005).
The design and implementation of FFTW3.
http://dx.doi.org/10.1109/JPROC.2004.840301
Hartwigsen, C; Goedecker, S; Hutter, J.
PHYSICAL REVIEW B, 58 (7), 3641-3662 (1998).
Relativistic separable dual-space Gaussian pseudopotentials from H to Rn.
http://dx.doi.org/10.1103/PhysRevB.58.3641
Lippert, G; Hutter, J; Parrinello, M.
MOLECULAR PHYSICS, 92 (3), 477-487 (1997).
A hybrid Gaussian and plane wave density functional scheme.
http://dx.doi.org/10.1080/002689797170220
Goedecker, S; Teter, M; Hutter, J.
PHYSICAL REVIEW B, 54 (3), 1703-1710 (1996).
Separable dual-space Gaussian pseudopotentials.
http://dx.doi.org/10.1103/PhysRevB.54.1703
-------------------------------------------------------------------------------
- -
- T I M I N G -
- -
-------------------------------------------------------------------------------
SUBROUTINE CALLS ASD SELF TIME TOTAL TIME
MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM
CP2K 1 1.0 0.002 0.002 0.385 0.385
qs_forces 1 2.0 0.000 0.000 0.323 0.323
qs_energies_scf 1 3.0 0.000 0.000 0.305 0.305
scf_env_do_scf 1 4.0 0.000 0.000 0.288 0.288
scf_env_do_scf_inner_loop 15 5.0 0.001 0.001 0.288 0.288
rebuild_ks_matrix 16 6.8 0.000 0.000 0.167 0.167
qs_ks_build_kohn_sham_matrix 16 7.8 0.001 0.001 0.167 0.167
qs_ks_update_qs_env 15 6.0 0.000 0.000 0.155 0.155
sum_up_and_integrate 16 8.8 0.000 0.000 0.143 0.143
integrate_v_rspace 16 9.8 0.134 0.134 0.142 0.142
qs_rho_update_rho 16 6.0 0.000 0.000 0.119 0.119
calculate_rho_elec 16 7.0 0.106 0.106 0.119 0.119
quickstep_create_force_env 1 2.0 0.000 0.000 0.059 0.059
create_qs_kind_set 1 3.0 0.000 0.000 0.029 0.029
read_qs_kind 2 4.0 0.016 0.016 0.029 0.029
qs_init_subsys 1 3.0 0.001 0.001 0.025 0.025
qs_env_setup 1 4.0 0.000 0.000 0.022 0.022
qs_env_rebuild_pw_env 3 3.7 0.000 0.000 0.022 0.022
pw_env_rebuild 1 6.0 0.000 0.000 0.022 0.022
fft_wrap_pw1pw2 161 10.1 0.001 0.001 0.022 0.022
compute_max_radius 1 7.0 0.018 0.018 0.018 0.018
fft_wrap_pw1pw2_30 65 10.6 0.002 0.002 0.018 0.018
parser_read_line 6920 5.0 0.003 0.003 0.017 0.017
parser_read_line_low 20 7.1 0.014 0.014 0.014 0.014
qs_vxc_create 16 8.8 0.000 0.000 0.014 0.014
xc_vxc_pw_create 16 9.8 0.002 0.002 0.014 0.014
density_rs2pw 16 8.0 0.000 0.000 0.013 0.013
fft3d_s 162 12.0 0.011 0.011 0.013 0.013
qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 0.013 0.013
xc_rho_set_and_dset_create 16 10.8 0.000 0.000 0.012 0.012
xc_functional_eval 16 11.8 0.012 0.012 0.012 0.012
init_scf_run 1 4.0 0.000 0.000 0.011 0.011
scf_env_initial_rho_setup 1 5.0 0.000 0.000 0.010 0.010
cp_dbcsr_plus_fm_fm_t_native 16 6.9 0.000 0.000 0.010 0.010
calculate_dm_sparse 15 6.0 0.000 0.000 0.009 0.009
-------------------------------------------------------------------------------
**** **** ****** ** PROGRAM ENDED AT 2015-11-24 15:57:28.237
***** ** *** *** ** PROGRAM RAN ON lauri-Lenovo-Z50-70
** **** ****** PROGRAM RAN BY lauri
***** ** ** ** ** PROGRAM PROCESS ID 13910
**** ** ******* ** PROGRAM STOPPED IN /home/lauri/Dropbox/nomad-dev/gitlab/
parser-cp2k/cp2kparser/tests/cp2k_2.6
.2/particle_number/cif/n
data_image0
_cell_length_a 1
_cell_length_b 1
_cell_length_c 1
_cell_angle_alpha 90
_cell_angle_beta 90
_cell_angle_gamma 90
loop_
_atom_site_label
_atom_site_occupancy
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_thermal_displace_type
_atom_site_B_iso_or_equiv
_atom_site_type_symbol
Na1 1.0000 0.00000 0.00000 -0.06559 Biso 1.000 Na
Cl1 1.0000 0.00000 -0.75714 0.52055 Biso 1.000 Cl
&FORCE_EVAL
METHOD Quickstep
&DFT
BASIS_SET_FILE_NAME ../../../data/BASIS_SET
POTENTIAL_FILE_NAME ../../../data/POTENTIAL
&MGRID
CUTOFF 50
&END MGRID
&QS
EPS_DEFAULT 1.0E-6
&END QS
&SCF
EPS_SCF 1.0E-4
SCF_GUESS ATOMIC
&END SCF
&XC
&XC_FUNCTIONAL Pade
&END XC_FUNCTIONAL
&END XC
&END DFT
&SUBSYS
&CELL
ABC 6.0 6.0 6.0
MULTIPLE_UNIT_CELL 2 1 3
&END CELL
&COORD
Na 0.000000 0.000000 -0.065587
Cl 0.000000 -0.757136 0.520545
&END COORD
&KIND Na
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q1
&END KIND
&KIND Cl
BASIS_SET DZVP-GTH-PADE
POTENTIAL GTH-PADE-q7
&END KIND
&TOPOLOGY
MULTIPLE_UNIT_CELL 2 1 3
&END
&END SUBSYS
&PRINT
&FORCES ON
&END FORCES
&END PRINT
&END FORCE_EVAL
&GLOBAL
PROJECT_NAME NaCl
RUN_TYPE ENERGY_FORCE
PRINT_LEVEL LOW
&END GLOBAL
DBCSR| Multiplication driver SMM
DBCSR| Multrec recursion limit 512
DBCSR| Multiplication stack size 1000
DBCSR| Multiplication size stacks 3
DBCSR| Use subcommunicators T
DBCSR| Use MPI combined types F
DBCSR| Use MPI memory allocation T
DBCSR| Use Communication thread T
DBCSR| Communication thread load 87
**** **** ****** ** PROGRAM STARTED AT 2015-11-24 16:40:05.023
***** ** *** *** ** PROGRAM STARTED ON lauri-Lenovo-Z50-70
** **** ****** PROGRAM STARTED BY lauri
***** ** ** ** ** PROGRAM PROCESS ID 14804
**** ** ******* ** PROGRAM STARTED IN /home/lauri/Dropbox/nomad-dev/gitlab/