Commit b9b8f841 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Clean up for next FAIRmat.

parent c6bf6c2c
# parser-skeleton
## About
This is not a real parser, its a skeleton for parsers. To write you own parsers, its
best to fork this skeleton and use it as a template.
## Setup and run example
We are currently targeting Python 3.6. Some nomad dependencies might still have problems
with 3.7++. It will definitely not work with 2.x. If you run into troubles, you could
try to ignore some dependencies. Most of them are only used in the DFT context.
Best use a virtual environment:
```
virtualenv -p python3 .pyenv
source .pyenv/bin/activate
```
Clone and install the nomad infrastructure and the necessary dependencies (including this parser)
```
git clone https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR nomad
cd nomad
git submodule update --init
pip install --upgrade pip
pip install --upgrade setuptools
pip install -r requirements.txt
./dependencies.sh -e
pip install -e .
```
The parsers (among other things) are git submodules. The `./dependencies.sh` will run
through all the sub modules and install them as pip packages (be in you virtual env!).
Fork this project on [gitlab](https://gitlab.mpcdf.mpg.de/nomad-lab/parser-skeleton).
Rename your fork in its settings/advanced and move it to the nomad-lab namespace.
Choose a name that starts with `parser-`, e.g. `parser-your-parser-name`.
You'll need a [http://www.mpcdf.mpg.de](https://www.mpcdf.mpg.de/userspace/forms/onlineregistrationform) account.
Add your parser to the nomad project on a separate branch:
```
git checkout -b your-parser-name
git submodule add https://gitlab.mpcdf.mpg.de/nomad-lab/parser-your-parser-name dependencies/parsers/your-parser-name
```
Do the necessary changes:
- [setup.py](setup.py): Change the project metadata
- [skeletonparser](skeletonparser): Change the directory name, i.e. python package name (no uppercases, no `_` please)
- [skeletonparser/__init__.py](skeletonparser/__init__.py): Implement your parser, change the class names
- [skeletonparser/__main__.py](skeletonparser/__main__.py): Change the module/class names
- [skeletonparser/skeleton.nomadmetainfo.json](skeletonparser/skeleton.nomadmetainfo.json): Change the name, add your metadata definitions.
- [README.md](README.md): Change this readme accordingly.
- probably some other things I forgot.
General metadata quantities (those that we can agree on) go to
`dependencies/nomad-meta-info/meta_info/nomad_meta_info/general.experimental.nomadmetainfo.json`.
But we should agree first. In the mean time, just put them in [skeletonparser/skeleton.nomadmetainfo.json](skeletonparser/skeleton.nomadmetainfo.json).
You can browse around `dependencies/nomad-meta-info/meta_info/nomad_meta_info/` to
see example definitions. **You cannot add values or open sections without defining them first!**
To run the parser:
```
cd nomad/dependencies/parsers/your-parser-name
python -m yourparserpythonpackage tests/example.metadata.json
```
## Docs
Click through the [nomad archive page](https://metainfo.nomad-coe.eu/nomadmetainfo_public/archive.html)
to learn about the *meta-info* metadata format and how to define your metadata.
Here is a more involved tutorial (but its pretty DFT and parsing text files specific):
[nomad@fairdi docs](http://enc-staging-nomad.esc.rzg.mpg.de/fairdi/nomad/docs/parser_tutorial.html)
## FAQ
For any questions, **please open issues** (regarding parser development and using this skeleton)
in this [parser-skeleton project](https://gitlab.mpcdf.mpg.de/nomad-lab/parser-skeleton/issues).
We will compile a FAQ from your issues.
......@@ -20,11 +20,8 @@ import numpy as np
from datetime import datetime
from nomad.parsing.parser import FairdiParser
from nomad.datamodel.metainfo.general_experimental import (
section_experiment as SectionExperiment,
section_data as SectionData,
section_method as SectionMethod,
section_sample as SectionSample)
from nomad.datamodel.metainfo.common_experimental import (
Experiment, Data, Method, Sample, Location, Material)
class APTFIMParser(FairdiParser):
......@@ -38,54 +35,66 @@ class APTFIMParser(FairdiParser):
with open(filepath, 'rt') as f:
data = json.load(f)
section_experiment = archive.m_create(SectionExperiment)
experiment = archive.m_create(Experiment)
experiment.raw_metadata = data
# Read general tool environment details
section_experiment.experiment_location = data.get('experiment_location')
section_experiment.experiment_facility_institution = data.get('experiment_facility_institution')
section_experiment.experiment_summary = '%s of %s.' % (data.get('experiment_method').capitalize(), data.get('specimen_description'))
location = experiment.m_create(Location)
location.address = data.get('experiment_location')
location.facility = data.get('experiment_facility_institution')
experiment.experiment_summary = '%s of %s.' % (
data.get('experiment_method').capitalize(), data.get('specimen_description'))
try:
section_experiment.experiment_time = int(datetime.strptime(data.get('experiment_date_global_start'), '%d.%m.%Y %M:%H:%S').timestamp())
experiment.experiment_time = datetime.strptime(
data.get('experiment_date_global_start'), '%d.%m.%Y %M:%H:%S')
except ValueError:
pass
try:
section_experiment.experiment_end_time = int(datetime.strptime(data.get('experiment_date_global_end'), '%d.%m.%Y %M:%H:%S').timestamp())
experiment.experiment_end_time = datetime.strptime(
data.get('experiment_date_global_end'), '%d.%m.%Y %M:%H:%S')
except ValueError:
pass
# Read data parameters
section_data = section_experiment.m_create(SectionData)
section_data.data_repository_name = data.get('data_repository_name')
section_data.data_preview_url = data.get('data_repository_url')
section_data = experiment.m_create(Data)
section_data.repository_name = data.get('data_repository_name')
section_data.entry_repository_url = data.get('data_repository_url')
section_data.repository_url = '/'.join(data.get('data_repository_url').split('/')[0:3])
preview_url = data.get('data_preview_url')
# TODO: This a little hack to correct the preview url and should be removed
# after urls are corrected
preview_url = '%s/files/%s' % tuple(preview_url.rsplit('/', 1))
section_data.data_preview_url = preview_url
section_data.preview_url = preview_url
# Read parameters related to method
section_method = section_experiment.m_create(SectionMethod)
section_method.experiment_method_name = data.get('experiment_method')
section_method.experiment_method_abbreviation = 'APT/FIM'
section_method.probing_method = 'electric pulsing'
# backend.addValue('experiment_tool_info', data.get('instrument_info')) ###test here the case that input.json keyword is different to output.json
# measured_pulse_voltage for instance should be a conditional read
# backend.addValue('measured_number_ions_evaporated', data.get('measured_number_ions_evaporated'))
# backend.addValue('measured_detector_hit_pos', data.get('measured_detector_hit_pos'))
# backend.addValue('measured_detector_hit_mult', data.get('measured_detector_hit_mult'))
# backend.addValue('measured_detector_dead_pulses', data.get('measured_detector_dead_pulses'))
# backend.addValue('measured_time_of_flight', data.get('measured_time_of_flight'))
# backend.addValue('measured_standing_voltage', data.get('measured_standing_voltage'))
# backend.addValue('measured_pulse_voltage', data.get('measured_pulse_voltage'))
# backend.addValue('experiment_operation_method', data.get('experiment_operation_method'))
# backend.addValue('experiment_imaging_method', data.get('experiment_imaging_method'))
method = experiment.m_create(Method)
method.data_type = 'image'
method.method_name = data.get('experiment_method')
method.method_abbreviation = 'APT/FIM'
method.probing_method = 'electric pulsing'
method.instrument_description = data.get('instrument_info')
method.measured_number_ions_evaporated = data.get('measured_number_ions_evaporated')
method.measured_detector_hit_pos = data.get('measured_detector_hit_pos') == 'yes'
method.measured_detector_hit_mult = data.get('measured_detector_hit_mult') == 'yes'
method.measured_detector_dead_pulses = data.get('measured_detector_dead_pulses') == 'yes'
method.measured_time_of_flight = data.get('measured_time_of_flight') == 'yes'
method.measured_standing_voltage = data.get('measured_standing_voltage') == 'yes'
method.measured_pulse_voltage = data.get('measured_pulse_voltage') == 'yes'
method.experiment_operation_method = data.get('experiment_operation_method') == 'yes'
method.experiment_imaging_method = data.get('experiment_imaging_method') == 'yes'
# Read parameters related to sample
section_sample = section_experiment.m_create(SectionSample)
section_sample.sample_description = data.get('specimen_description')
section_sample.sample_microstructure = data.get('specimen_microstructure')
section_sample.sample_constituents = data.get('specimen_constitution')
sample = experiment.m_create(Sample)
sample.sample_description = data.get('specimen_description')
sample.sample_microstructure = data.get('specimen_microstructure')
sample.sample_constituents = data.get('specimen_constitution')
material = sample.m_create(Material)
atom_labels = data.get('specimen_chemistry')
formula = ase.Atoms(atom_labels).get_chemical_formula()
section_sample.sample_atom_labels = np.array(atom_labels)
section_sample.sample_chemical_formula = formula
material.atom_labels = np.array(atom_labels)
material.chemical_formula = formula
material.chemical_name = formula
......@@ -3,9 +3,9 @@ from nomad.metainfo import Environment
from nomad.metainfo.legacy import LegacyMetainfoEnvironment
import aptfimparser.metainfo.aptfim
import nomad.datamodel.metainfo.general
import nomad.datamodel.metainfo.general_experimental
import nomad.datamodel.metainfo.common_experimental
m_env = LegacyMetainfoEnvironment()
m_env.m_add_sub_section(Environment.packages, sys.modules['aptfimparser.metainfo.aptfim'].m_package) # type: ignore
m_env.m_add_sub_section(Environment.packages, sys.modules['nomad.datamodel.metainfo.general'].m_package) # type: ignore
m_env.m_add_sub_section(Environment.packages, sys.modules['nomad.datamodel.metainfo.general_experimental'].m_package) # type: ignore
m_env.m_add_sub_section(Environment.packages, sys.modules['nomad.datamodel.metainfo.common_experimental'].m_package) # type: ignore
import numpy as np # pylint: disable=unused-import
import typing # pylint: disable=unused-import
from nomad.metainfo import ( # pylint: disable=unused-import
MSection, MCategory, Category, Package, Quantity, Section, SubSection, SectionProxy,
Reference
)
from nomad.metainfo.legacy import LegacyDefinition
from nomad.metainfo import Package, Quantity, Section
from nomad.datamodel.metainfo import common_experimental
from nomad.datamodel.metainfo import general_experimental
m_package = Package(name='aptfim')
m_package = Package(
name='aptfim_nomadmetainfo_json',
description='None',
a_legacy=LegacyDefinition(name='aptfim.nomadmetainfo.json'))
class Method(common_experimental.Method):
class section_experiment(general_experimental.section_experiment):
m_def = Section(validate=False, extends_base_section=True, a_legacy=LegacyDefinition(name='section_experiment'))
none_shape = Quantity(
type=int,
shape=[],
description='''
Shape of the None/Null object
''',
a_legacy=LegacyDefinition(name='none_shape'))
experiment_tool_info = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Name of the equipment, instrument with which the experiment was performed e.g.
LEAP5000XS
''',
a_legacy=LegacyDefinition(name='experiment_tool_info'))
m_def = Section(validate=False, extends_base_section=True)
experiment_operation_method = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Operation mode of the instrument (APT, FIM or combination)
''',
a_legacy=LegacyDefinition(name='experiment_operation_method'))
description='Operation mode of the instrument (APT, FIM or combination)')
experiment_imaging_method = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Pulsing method to enforce a controlled ion evaporation sequence
''',
a_legacy=LegacyDefinition(name='experiment_imaging_method'))
description='Pulsing method to enforce a controlled ion evaporation sequence')
specimen_description = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Sample description e.g. pure W wire samples trial 2
''',
a_legacy=LegacyDefinition(name='specimen_description'))
number_of_disjoint_elements = Quantity(
number_ions_evaporated = Quantity(
type=int,
shape=[],
unit='dimensionless',
description='''
Number of elements (disjoint element names) expected
''',
a_legacy=LegacyDefinition(name='number_of_disjoint_elements'))
specimen_chemistry = Quantity(
type=str,
shape=['number_of_elements'],
unit='dimensionless',
description='''
List of periodic table names expected contained in dataset
''',
a_legacy=LegacyDefinition(name='specimen_chemistry'))
specimen_microstructure = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Qualitative type of specimen and microstructure analyzed (e.g. thin films, nano
objects, single crystal, polycrystal)
''',
a_legacy=LegacyDefinition(name='specimen_microstructure'))
specimen_constitution = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Qualitative information how many phases in the specimen
''',
a_legacy=LegacyDefinition(name='specimen_constitution'))
measured_number_ions_evaporated = Quantity(
type=np.dtype(np.int32),
shape=[],
unit='dimensionless',
description='''
Number of ions successfully evaporated
''',
a_legacy=LegacyDefinition(name='measured_number_ions_evaporated'))
description='Number of ions successfully evaporated')
measured_detector_hit_pos = Quantity(
type=str,
shape=[],
unit='millimeter ** 2',
description='''
Detector hit positions x and y
''',
a_legacy=LegacyDefinition(name='measured_detector_hit_pos'))
type=bool,
description='Detector hit positions x and y was measured')
measured_detector_hit_mult = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Detector hit multiplicity
''',
a_legacy=LegacyDefinition(name='measured_detector_hit_mult'))
type=bool,
description='Detector hit multiplicity was measured')
measured_detector_dead_pulses = Quantity(
type=str,
shape=[],
unit='dimensionless',
description='''
Detector number of dead pulses
''',
a_legacy=LegacyDefinition(name='measured_detector_dead_pulses'))
type=bool,
description='Detector number of dead pulses was measured')
measured_time_of_flight = Quantity(
type=str,
shape=[],
unit='nanosecond',
description='''
Raw ion time of flight
''',
a_legacy=LegacyDefinition(name='measured_time_of_flight'))
type=bool,
description='Raw ion time of flight was measured')
measured_standing_voltage = Quantity(
type=str,
shape=[],
unit='volt',
description='''
Standing voltage
''',
a_legacy=LegacyDefinition(name='measured_standing_voltage'))
type=bool,
description='Standing voltage was measured')
measured_pulse_voltage = Quantity(
type=str,
shape=[],
unit='volt',
description='''
Pulse voltage
''',
a_legacy=LegacyDefinition(name='measured_pulse_voltage'))
type=bool,
description='Pulse voltage was measured')
m_package.__init_metainfo__()
{
"type": "nomad_meta_info_1_0",
"description": "Metadata for an atom probe tomography or field ion microscopy experiment.",
"dependencies":[
{
"metainfoPath":"general.nomadmetainfo.json"
},
{
"metainfoPath":"general.experimental.nomadmetainfo.json"
}
],
"metaInfos": [
{
"description": "String identifier aka name of the repository where the raw data to the experiment is available",
"name": "experiment_typpe",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_context"]
},
{
"description": "String identifier aka name of the repository where the raw data to the experiment is available",
"name": "data_repository_name",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "URL of this repository",
"name": "where",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Thumbnail image informing about the experiment",
"name": "data_preview_url",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Shape of the None/Null object",
"name": "none_shape",
"dtypeStr": "i",
"kindStr": "type_dimension",
"shape": [],
"superNames": ["section_experiment"]
},
{
"description": "Full name of the experimental method in use",
"name": "experiment_method",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Name of the city and country the experiment took place, format 'Country, City'",
"name": "experiment_location",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Name of the institution hosting the experimental facility",
"name": "experiment_facility_institution",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Name of the equipment, instrument with which the experiment was performed e.g. LEAP5000XS",
"name": "experiment_tool_info",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "UTC start time of the experiment, format 'DD.MM.YYYY - HH.MM.SS'",
"name": "experiment_date_global_start",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "C"
},
{
"description": "UTC end time of the experiment, format 'DD.MM.YYYY - HH.MM.SS'",
"name": "experiment_date_global_end",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "C"
},
{
"description": "Local start time of the experiment, format 'DD.MM.YYYY - HH.MM.SS'",
"name": "experiment_date_local_start",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "C"
},
{
"description": "Operation mode of the instrument (APT, FIM or combination)",
"name": "experiment_operation_method",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Pulsing method to enforce a controlled ion evaporation sequence",
"name": "experiment_imaging_method",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Sample description e.g. pure W wire samples trial 2",
"name": "specimen_description",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Number of elements (disjoint element names) expected",
"name": "number_of_elements",
"dtypeStr": "i",
"kindStr": "type_dimension",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "List of periodic table names expected contained in dataset",
"name": "specimen_chemistry",
"dtypeStr": "C",
"shape": ["number_of_elements"],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Qualitative type of specimen and microstructure analyzed (e.g. thin films, nano objects, single crystal, polycrystal)",
"name": "specimen_microstructure",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Qualitative information how many phases in the specimen",
"name": "specimen_constitution",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": ""
},
{
"description": "Number of ions successfully evaporated",
"name": "measured_number_ions_evaporated",
"dtypeStr": "i",
"shape": [],
"superNames": ["section_experiment"],
"units": "1"
},
{
"description": "Detector hit positions x and y",
"name": "measured_detector_hit_pos",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "mm, mm"
},
{
"description": "Detector hit multiplicity",
"name": "measured_detector_hit_mult",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "1"
},
{
"description": "Detector number of dead pulses",
"name": "measured_detector_dead_pulses",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "1"
},
{
"description": "Raw ion time of flight",
"name": "measured_time_of_flight",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "ns"
},
{
"description": "Standing voltage",
"name": "measured_standing_voltage",
"dtypeStr": "C",
"shape": [],
"superNames": ["section_experiment"],
"units": "V"
},