nomad-FAIR issueshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues2021-06-18T06:16:15Zhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/470Create a new section for entry results2021-06-18T06:16:15ZMarkus ScheidgenCreate a new section for entry resultsThis is one step of refactoring the metainfo: #419
Requires metainfo references to quantities: #471
- create the definitions for a new top-level section `results` (in `nomad.datamodel.metainfo.results`)
- create a normalizer that runs...This is one step of refactoring the metainfo: #419
Requires metainfo references to quantities: #471
- create the definitions for a new top-level section `results` (in `nomad.datamodel.metainfo.results`)
- create a normalizer that runs at the end and creates this section `results`
- in the beginning this is supposed to be completely independent of the existing systems, a pure add-on
- we do not use the `section_`-prefix for section results
- it will not contain an "entry"-metadata
- this is what an example instance should look like:
```
results
material
material_id
material_type
formulas
elements
simulated_structure
species
sites
lattice
pbc
conventional_structure?
species
sites
lattice
pbc
symmetry
-> system
method
name: "electronic structure dft simulation"
simulation
code_version
code_name
dft
xc_functional_class # e.g. GGA
xc_functionals # e.g. [GGX_X_FFM, GGX_C_FFM] full lib-xc name
basis_set
md
experiment
properties (search=properties: ['electronic_dos', 'energies'])
-> eigenvalues
-> electronic_band_structure
electronic_dos
-> fermi
-> values
-> energies
smearing_width (search)
-> phonon_dos
energies
forces
```
- We plan to release this (as an add-on) in the next-next major release (e.g. late-spring)
- 0.9.x – Now
- 0.10.0 – Jan/Feb
- parsers, fastapi, optimade based on OPT, oasis uploads, analytics datasets
- overview page, enc-search
- 0.11.0 – Mai/Jun
- **this refactor**
- new search interface
- rewrite encyclopediaLauri HimanenLauri Himanenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/461Optimade based on optimade-python-tools2021-03-02T11:38:01ZMarkus ScheidgenOptimade based on optimade-python-toolsIf we have fastapi available in deployments #408, we should replace "our" optimade implementation with the official optimade-python-tools. This will require to add more ES support to optimade-python-tools.If we have fastapi available in deployments #408, we should replace "our" optimade implementation with the official optimade-python-tools. This will require to add more ES support to optimade-python-tools.v0.10.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/451Restructure the Shpinx documentation2020-12-15T12:37:21ZMarkus ScheidgenRestructure the Shpinx documentationIt is just not intuitiveIt is just not intuitivehttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/419NOMAD Metainfo refactor2021-09-04T15:42:48ZMarkus ScheidgenNOMAD Metainfo refactorThere should be some coordinated effort to clean up the NOMAD Metainfo
- [ ] parser specific metainfo should really be parser specific ... there are some that miss the `x_...` prefix
- [x] only have one common code file/package for comp...There should be some coordinated effort to clean up the NOMAD Metainfo
- [ ] parser specific metainfo should really be parser specific ... there are some that miss the `x_...` prefix
- [x] only have one common code file/package for computational
- [x] have an extra package for workflows
- [x] refactor names, make use of aliases
- [x] get rid of sampling method and frame sequence
- [x] remove repeats were applicable
- [x] add derived for normalised quantities
- [x] analyse which quantities are not used at all via ES
- [ ] how to deal with "external" references (to other archive objects, to binary files, and files in general)
- [x] a set of public metadata for "elastic" properties (#407)
- [x] a set of public md properties
- [ ] further classification of non crystalline systems (e.g. amorphous solids/glass)
Related to: #308, #289, #358, #407v1.0.0-betaAlvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/408Rewrite/refactor of our API with FastAPI2021-05-06T08:47:28ZMarkus ScheidgenRewrite/refactor of our API with FastAPI
## Motivation
### Flask + REST Plus is bad
- outdated and not really maintained
- cumbersome, restrictive REST Plus specific request/response models
- argparser (Flask) and models (REST Plus) are unrelated
- no manipulation of generat...
## Motivation
### Flask + REST Plus is bad
- outdated and not really maintained
- cumbersome, restrictive REST Plus specific request/response models
- argparser (Flask) and models (REST Plus) are unrelated
- no manipulation of generated OpenAPI spec
- optimade Python tools are not based on Flask REST Plus
### Fast API seems to solve a lot of problems
- models based on type annotations (pydantic)
- object oriented models
- query parameter parsing from function parameter annotations
- based on more modern uvicorn; prod==dev server
### our API is messy
- lots of inconsistencies between repo/archive/raw/... related to queries, pagination, etc.
- lots of doubled CnC code pieces
- lots of functionality/parameter crammed into GET requests, e.g. `/repo/`
- *grown* tests
## Operations
Before we look at API operations (Fast API and Open API call the basic API building blocks *operations*),
we should look at our datamodel it's core *entities* and principle *views* on data.
### NOMAD entities
- (info)
- users
- uploads
- entries
- datasets
- materials
- (metainfo)
#### views
- metadata (repo)
- archive
- raw
- encyclopedia, currently a mix of metadata and archive
- processing/upload (should be just part of metadata)
In contrast to the old API, we structure the API based on entities. The *metadata* view is the default, and there is access to the other views via subpaths `raw`, `archive`.
#### models
Many operations use the same input and output structures over and over again:
- Query
- Pagination, includes response size, order, aggregation
- (Metadata|Archive)Required, a list or structure of things to include in return
- Statistics, additional histogram information on metadata results
- FileFiler, for file downloads
- Models for our entities: User, UserMetadata, Dataset, Entry, Upload
#### operations
|http |path |input |output |
|- |- |- |- |
|GET |`/entries` |SimpleQuery, Pagination, MetadataRequired |Metadata* |
|POST |`/entries/edit` |Query, UserMetadata, EditActions |Edit or EditActions |
|POST |`/entries/query` |Query, Statistics, Pagination, MetadataRequired, |Metadata*, Stat. |
|GET |`/entries/<id>` |MetadataRequired |Metadata |
|GET |`/entries/raw` |SimpleQuery, FileFilter |.zip+manifest |
|POST |`/entries/raw` |Query, FileFilter |.zip+manifest |
|GET |`/entries/<id>/raw` |FileFilter |.zip+manifest |
|GET |`/entries/<id>/raw/<path>` | |file |
|GET |`/entries/archive` |SimpleQuery, Pagination |Archive*|.zip+manifest |
|POST |`/entries/archive` |Query, ArchiveRequired, Pagination |Archive*|.zip+manifest |
|GET |`/entries/<id>/archive` |ArchiveRequired |Archive*|.json |
|GET |`/entries/<id>/archive/<path>` | |Archive*|json-value |
|GET |`/uploads` |SimpleQuery, Pagination, MetadataRequired |(Upload, Metadata*)* |
|POST |`/uploads/query` |Query, Statistics, Pagination, MetadataRequired |(Upload, Metadata*)* |
|PUT |`/uploads` |file |Upload |
|GET |`/uploads/<id>` |Pagination, MetadataRequired |Upload, Metadata* |
|POST |`/uploads/<id>` |Upload+Actions |Upload |
|GET |`/uploads/<id>/query` |Query, Statistics, Pagination, MetadataRequired |Upload, Metadata* |
|DELETE |`/uploads/<id>` | |Upload |
|GET |`/uploads/<id>/raw` |FileFilter |.zip+manifest |
|GET |`/uploads/<id>/<glob>` |FileFilter |.zip+manifest |
|GET |`/uploads/<id>/<path>` | |file |
|GET |`/datasets` |SimpleQuery, Pagination, MetadataRequired |(Dataset, Metadata*)* |
|POST |`/datasets/query` |Query, Statistics, Pagination, MetadataRequired |(Dataset, Metadata*)* |
|PUT |`/datasets` |Dataset |Dataset |
|GET |`/datasets/<id>` |SimpleQuery, Pagination, MetadataRequired |Dataset, Metadata* |
|GET |`/datasets/<id>/query` |Query, Statistics, Pagination, MetadataRequired |Dataset, Metadata* |
|POST |`/datasets/<id>` |Dataset+Actions |Dataset |
|DELETE |`/datasets/<id>` | |Dataset |
|GET |`/material` |SimpleQuery, Pagination, MetadataRequired |(Material, Metadata*)* |
|POST |`/material/query` |Query, Statistics, Pagination, MetadataRequired |(Material, Metadata*)* |
|GET |`/material/<id>` |SimpleQuery, Pagination, MetadataRequired |Material, Metadata* |
|GET |`/material/<id>/query` |Query, Statistics, Pagination, MetadataRequired |Material, Metadata* |
|GET |`/materials/suggestions` |? |Suggestion* |
|GET |`/groups` |SimpleQuery, Pagination, MetadataRequired |(Group, Metadata*)* |
|POST |`/groups/query` |Query, Statistics, Pagination, MetadataRequired |(Group, Metadata*)* |
|GET |`/groups/<id>` |SimpleQuery, Pagination, MetadataRequired |Group, Metadata* |
|GET |`/groups/<id>/query` |Query, Statistics, Pagination, MetadataRequired |Group, Metadata* |
|GET |`/group/<id>/query` |Query, Statistics, Pagination, MetadataRequired |Group, Metadata* |
|GET |`/users` |Pagination |User* |
|GET |`/users/me` | |User, Auth |
|GET |`/users/<id>` | |User |
|GET |`/info` | |Info |
|GET |`/metainfo` |MetainfoQuery |Archive* |
|GET |`/metainfo/<path>` | |Archive or json-value |
## Tasks
- [x] learn fast api and build skeleton API with a few model and static example response
- [x] think of a documentation scheme: swagger, reference, tutorials
- [x] restructure the tests based on the skeleton (test first)
- [ ] move the implementation step by step (search, upload, edit, archive, raw, metainfo, ...)
- [ ] implement everything
Ideally we would already have end-to-end GUI tests that use the API for "real"v0.10.0https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/403Clean up the experimental data2020-11-10T13:45:24ZMarkus ScheidgenClean up the experimental dataFor show-casing purposes the experimental section of NOMAD needs to be "cleaned"
- [x] rename CMS/EMS -> computational/experimental
- [x] fix the uploader and co-author names
- [x] fix other metadata like locations and dates
- [x] bette...For show-casing purposes the experimental section of NOMAD needs to be "cleaned"
- [x] rename CMS/EMS -> computational/experimental
- [x] fix the uploader and co-author names
- [x] fix other metadata like locations and dates
- [x] better metadata and experiment names
- [x] more data (e.g. automatised EELS indexing)
- [ ] EELS preview?
- [ ] maybe Markus Kühbach has real preview figs for his set
- [x] a disclaimer (in the search) about the "show-case" nature of the experimental section
- [ ] more databases to "index"
## show-cases the indexing of external databse
We could use [EELS](https://eelsdb.eu/spectra/) to show that NOMAD could crawl web-based databases to index. Simply Python web-scraping techniques should suffice to create an upload consisting of respective web-pages that "parsers" can convert into respective NOMAD metainfo data.
## show-case the indexing of external repositories
We could use zenoodo and its API to improve the metadata in NOMAD/experimental by downloading titles, descriptions, authors, etc.
## authors
There is a difference between the person uploading the metadata, i.e. the person providing the reference to the data and the authors of the data. The later usually given by an external database or repository. We need to reflect this in NOMAD user datamodel and add support for non NOMAD user authors: #404v0.10.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/396Phonopy Workflows2020-12-18T16:07:58ZMarkus ScheidgenPhonopy Workflows- Phonopy parser to be an actual Phonopy parser and not just an FHI-aim phonopy parser
- Phonopy parser to uses the new workflow system- Phonopy parser to be an actual Phonopy parser and not just an FHI-aim phonopy parser
- Phonopy parser to uses the new workflow systemAlvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/363VASP geometry optimizations as workflows2020-12-18T16:07:57ZMarkus ScheidgenVASP geometry optimizations as workflows- [x] metainfo for workflows
- [x] adapt VASP parser
- [ ] normalizer (system, dos, encyclopedia, ?) and section_metadata use the workflow result system/calculation
![image](/uploads/aaa6b494cf0afc03d96372e3a338c973/image.png)- [x] metainfo for workflows
- [x] adapt VASP parser
- [ ] normalizer (system, dos, encyclopedia, ?) and section_metadata use the workflow result system/calculation
![image](/uploads/aaa6b494cf0afc03d96372e3a338c973/image.png)Alvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/352Example FAIRmat parsers2020-07-14T15:26:04ZMarkus ScheidgenExample FAIRmat parsersMost parsers that we will write in FAIRmat, will probably just be "simple" converts that transform already structured data (e.g. JSON) into archive/metainfo data. As working examples, the eels, mpes, aptfim parsers, which simply convert ...Most parsers that we will write in FAIRmat, will probably just be "simple" converts that transform already structured data (e.g. JSON) into archive/metainfo data. As working examples, the eels, mpes, aptfim parsers, which simply convert json/dict data, should be rewritten to use the new metainfo instead of the nomadcore/python_common backend-based libraries.
Make sure to work on the `nomad-fair-metainfo` branches of the parsers. Do not use the older parser base-classes and use nomad.parsing.MatchingParser as a baseclass. No nomadcore imports should remain. This will need a new parser base class, i.e. MetainfoParser that returns a metainfo section instead of a backend. We later need to adapt the processing to handle both backend-based and metainfo-based parsers.Version 0.8.2Alvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/350Parallel archive access2020-12-18T16:07:57ZMarkus ScheidgenParallel archive accessThe performance when accessing large archive query result sets is poor. We identified this is mostly due to many small serial file system accesses to GPFS. The endpoint `nomad.app.api.archive.ArchiveQueryResource` should use threading to...The performance when accessing large archive query result sets is poor. We identified this is mostly due to many small serial file system accesses to GPFS. The endpoint `nomad.app.api.archive.ArchiveQueryResource` should use threading to read for many calcs simultaneously.Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/298Move data from mongo to the archive2020-03-24T22:44:45ZMarkus ScheidgenMove data from mongo to the archiveCurrently all of `EntryMetadata` is stored in mongodb (in `proc.Calc.metadata`). This makes the entries quite big and some bulk operations on mongo are becoming quite slow. This might become even worse, if we think about adding encyclope...Currently all of `EntryMetadata` is stored in mongodb (in `proc.Calc.metadata`). This makes the entries quite big and some bulk operations on mongo are becoming quite slow. This might become even worse, if we think about adding encyclopedia metadata as well.
Only some quantities of `EntryMetadata` should be stored in mongo (especially what is editable by the user). The rest should be part of the archive. Downside is that optimade API and indexing elastic will need to read from the archive to work.
Tasks:
- [x] allow to store filtered `EntryMetadata` in archive with working references to optimade (encyclopedia)
- [x] new index function
- [x] adopt processing
- [x] adopt optimade API
![image](/uploads/2351ef278c25372040f5b046ff6619fa/image.png)https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/222Parser use the new Metainfo2020-03-22T14:04:55ZMarkus ScheidgenParser use the new MetainfoAfter #59 and #221, we should move all parsers to use the new Metainfo. It already works (but is not used) for the VASP parser.
This includes an updated API that serves the metainfo (moved as part from #221).After #59 and #221, we should move all parsers to use the new Metainfo. It already works (but is not used) for the VASP parser.
This includes an updated API that serves the metainfo (moved as part from #221).multi-domain support/metainfo 2.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/221Refactor the datamodel to use the new metainfo2020-03-06T11:33:11ZMarkus ScheidgenRefactor the datamodel to use the new metainfo- [x] optimize the Metainfo
- [x] implement adapters for elastic, flask argparse, flask restplus models, mongo
- [x] replace the old datamodel
- [ ] API for loading the definitions to JSON
- [ ] GUI uses the loaded definitions for toolti...- [x] optimize the Metainfo
- [x] implement adapters for elastic, flask argparse, flask restplus models, mongo
- [x] replace the old datamodel
- [ ] API for loading the definitions to JSON
- [ ] GUI uses the loaded definitions for tooltips, columns specs, search, etc.multi-domain support/metainfo 2.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/605Skip matching for very large uploads2021-09-14T18:47:14ZMarkus ScheidgenSkip matching for very large uploadsParser matching becomes increasingly stressful with more and more parsers added. An upload with a very large amount of files (>100k) takes more time to parser match than its processing timeout.
We need to optimise parser matching: #604...Parser matching becomes increasingly stressful with more and more parsers added. An upload with a very large amount of files (>100k) takes more time to parser match than its processing timeout.
We need to optimise parser matching: #604
As a quickfix, we can add a "skip_matching" option to the nomad.json that allows to predefine mainfiles and skip the actual matching of all files.Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/610Move POST datasets/{id}/doi to datasets/{id}/action/doi2021-10-06T12:31:45ZMarkus ScheidgenMove POST datasets/{id}/doi to datasets/{id}/action/doiThis makes it consistent with `uploads/{id}/action/publish`.This makes it consistent with `uploads/{id}/action/publish`.v1.0.0-betaMohammad NakhaeeMohammad Nakhaeehttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/575Optionally exclude aggregation field from aggregation context2021-11-15T09:18:03ZMarkus ScheidgenOptionally exclude aggregation field from aggregation contextThis is a solution to the problem in #573
We need to add a flag `exclude_from_search` to the aggregation model. If this flag is set, the API will use ES `post_filter` see [ES docs](https://www.elastic.co/guide/en/elasticsearch/referenc...This is a solution to the problem in #573
We need to add a flag `exclude_from_search` to the aggregation model. If this flag is set, the API will use ES `post_filter` see [ES docs](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-post-filter.html)) to exclude the aggregation field from the search context that the aggregation is used in.v1.0.0-betaMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/573Search aggregation optimization2021-11-24T12:20:34ZLauri HimanenSearch aggregation optimizationThe new search GUI requires a slightly different set of aggregated data than the old GUI. The new search API endpoint (`entries/query`) is not optimal for some of the queries that are now required, consider e.g. the following case:
Imag...The new search GUI requires a slightly different set of aggregated data than the old GUI. The new search API endpoint (`entries/query`) is not optimal for some of the queries that are now required, consider e.g. the following case:
Imagine a dropdown for `structure_name`. Without any filters applied, we can populate the options in it by simply doing the search and aggregations together in a single API call. The resulting query would look something like this:
```
{
query: {},
aggregations: [
"structure_name": {terms: ...}
]
}
```
Let's say the resulting aggregation data would contain the entries: `["diamond", "perovskite"]`.
Now let's apply a filter by selecting "diamond" from the dropdown. We can combine the aggregation and query in a single API call like this:
```
{
query: {
"structure_name": ["diamond"]
},
aggregations: [
"structure_name": {terms: ...}
]
}
```
When executing this query, the aggregation data will only contain `["diamond"]`, as the filters are applied _before_ doing the aggregation. If we always use a fixed set of search options to populate the GUI and don't allow OR queries, this would not be a problem (like in the old GUI, where if you e.g. click `system_type="bulk"`, all the other fixed options just become unavailable). But If we want to update the available options in our dropdown based on the aggregation results (like the new GUI does for dropdowns, checkboxes, etc. in order to also show "perovskite" if the other filters allow this), we have to do a separate aggregation query for each quantity, where the list of filters is modified (=any filters targeting the aggregated quantity are removed, but filters targeting other quantities still affect the returned results.).
In order to minimize the number of API calls and stress on ElasticSearch, we should think about combining these queries in the API endpoint or changing the GUI behaviour.v1.0.0-betaLauri HimanenLauri Himanenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/545Unify `entry/raw` and `upload/raw` v1 API endpoints2021-06-08T07:57:38ZMarkus ScheidgenUnify `entry/raw` and `upload/raw` v1 API endpointsthe `entries/id/raw/*` endpoints should work like the `uploads/id/raw/path` endpoint. Only difference is that entries need a search first to determine access rights. For uploads this is determined by published/owner.the `entries/id/raw/*` endpoints should work like the `uploads/id/raw/path` endpoint. Only difference is that entries need a search first to determine access rights. For uploads this is determined by published/owner.David SikterDavid Sikterhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/543Unify download stream generator2021-05-19T07:23:29ZDavid SikterUnify download stream generatorA single, unified generator for download streams should be possible, with roughly the following input arguments:
- a generator returning pairs (upload_id, path)
- files params (compress flag and pattern for filtering by filename)
- inclu...A single, unified generator for download streams should be possible, with roughly the following input arguments:
- a generator returning pairs (upload_id, path)
- files params (compress flag and pattern for filtering by filename)
- include_subdirs (recursive include subdir content)
- single_raw_file (if we should just stream one file, without encapsulating it in a zip bundle)
The generator also needs to handle caching and orderly closing.David SikterDavid Sikterhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/534Reduce the spectra metainfo complexity2021-06-18T07:49:43ZMarkus ScheidgenReduce the spectra metainfo complexityThe spectra metainfo (used by eels and xps parser, on branch `experiment`, 205aab5a2d9e908d8bbc18b737a81fbae38f6be7) in `nomad.datamodel.metainfo.common_experimental` defines too many details. We should try a different approach. It is ba...The spectra metainfo (used by eels and xps parser, on branch `experiment`, 205aab5a2d9e908d8bbc18b737a81fbae38f6be7) in `nomad.datamodel.metainfo.common_experimental` defines too many details. We should try a different approach. It is basically always a set of channels (counts, energies, etc.), which are arrays with the same number of values. Currently, it defines a quantity for all possible channels (probably not exhaustive). We should reduce this to the most common channels (count, energy) and provide a higher dimensional quantity to store all other channels. The description for those other channels (name, id, unit) could be stored in a repeating sub section.