nomad-FAIR issueshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues2020-07-14T15:26:04Zhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/352Example FAIRmat parsers2020-07-14T15:26:04ZMarkus ScheidgenExample FAIRmat parsersMost parsers that we will write in FAIRmat, will probably just be "simple" converts that transform already structured data (e.g. JSON) into archive/metainfo data. As working examples, the eels, mpes, aptfim parsers, which simply convert ...Most parsers that we will write in FAIRmat, will probably just be "simple" converts that transform already structured data (e.g. JSON) into archive/metainfo data. As working examples, the eels, mpes, aptfim parsers, which simply convert json/dict data, should be rewritten to use the new metainfo instead of the nomadcore/python_common backend-based libraries.
Make sure to work on the `nomad-fair-metainfo` branches of the parsers. Do not use the older parser base-classes and use nomad.parsing.MatchingParser as a baseclass. No nomadcore imports should remain. This will need a new parser base class, i.e. MetainfoParser that returns a metainfo section instead of a backend. We later need to adapt the processing to handle both backend-based and metainfo-based parsers.Version 0.8.2Alvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/350Parallel archive access2020-12-18T16:07:57ZMarkus ScheidgenParallel archive accessThe performance when accessing large archive query result sets is poor. We identified this is mostly due to many small serial file system accesses to GPFS. The endpoint `nomad.app.api.archive.ArchiveQueryResource` should use threading to...The performance when accessing large archive query result sets is poor. We identified this is mostly due to many small serial file system accesses to GPFS. The endpoint `nomad.app.api.archive.ArchiveQueryResource` should use threading to read for many calcs simultaneously.Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/319A more flexible and more celery-tonic processing module2023-12-21T15:39:04ZMarkus ScheidgenA more flexible and more celery-tonic processing moduleThis is a pre-requisite for #251
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
@mscheidg
- [x] add a redis to helm
- [ ] run some...This is a pre-requisite for #251
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
@mscheidg
- [x] add a redis to helm
- [ ] run some large scale processing
@himanel1
We should do the refactoring all the way. I think you should organize the submodules based on the processed entities Calc and Upload rather then trying to separate mongo from celery. A typical submodule structure that we also use in other modules would be:
* `processing/__init__.py` - Only docs and imports to expose to other modules
* `processing/processing.py` - All celery setup suff
* `processing/common.py` - Common sutff, the mongoengine base, our custom celery tasks/request, shared constants, etc., Pipeline, PipelineContext, Stage, empty_task
* `processing/calc.py` - Including the "celery task" `comp_process` (don't like the name, btw.)
* `processing/upload.py` - Including upload_cleanup, pipelines, get_pipeline, run_pipline
* I think we can move the tests into a singular module, or rename test_base->test_common, test_data->test_upload
upload can depend on calc; upload and calc can depend on common; all can depend on processing; no other dependencies between submodules should be necessary
In the future we could think about replacing: @process, current_process, process_status with celery. But at the moment its very convinient to use mongodb query to check on the processing status of all entities. I feel celery wasn't really designed with persistent tasks in mind. Also we would need to be far more regid with the celery infrastructure and add persistence to rabbitmq and redis.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/308Band structure: normalizer and metainfo update2023-01-10T09:54:38ZLauri HimanenBand structure: normalizer and metainfo updateThe band structure processing should be refactored. This would consist of the following steps:
* [ ] Update band structure metainfo so that the duplicate normalized values are removed (`section_k_band` vs `section_k_band_normalized`) a...The band structure processing should be refactored. This would consist of the following steps:
* [ ] Update band structure metainfo so that the duplicate normalized values are removed (`section_k_band` vs `section_k_band_normalized`) and add new metainfo for the reciprocal cell and band gaps. The metainfo for `k_band_path_normalized_is_standard` should be renamed and moved to `section_k_band`. Also, the name of the whole section could be renamed, as `section_k_band` to me implies a single band, whereas in reality it contains all the bands. I would suggest "electronic_band_structure" (I think phonon band structure should be put under a "phonon_band_structure" instead of using a flag to separate between different kinds.
* [ ] The shape of the energy values is currently [number_of_spin_channels, number_of_k_points_per_segment, number_of_band_segment_eigenvalues]. This makes sense from a parser perspective, as the output is typically stratified over k-points. However, when the band should be analyzed or visualized it makes more sense to store it in shape [number_of_spin_channels, number_of_band_segment_eigenvalues, number_of_k_points_per_segment]. This way the bands would be stored as a contiguous block of memory and can be easily and effectively looped over for visualization or band gap analysis. On the parser side, this would only require swapping the axes 1 and 2 in the numpy array before storing the values.
* [ ] Create a BandStructureNormalizer that will:
* [x] Calculate band gaps
* [ ] Add labels for high-symmetry points according to the Setyawan/Curtarolo standard. There is a partial implementation in the VASP parser. It, however, supports only a subset of Bravais lattices for some reason.
* [ ] Check if the path follows the Setyawan/Curtarolo standard.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/307Parser re-compile "submatchers" all the time2023-12-21T15:58:47ZMarkus ScheidgenParser re-compile "submatchers" all the timeMany of the legacy NOMAD CoE parsers use `SimpleMatcher`s (SM). In order to use a parse tree of SMs, the tree has to be "compiled". This takes quite a while and should only be done once for each parse tree. Unfortunately, most parsers do...Many of the legacy NOMAD CoE parsers use `SimpleMatcher`s (SM). In order to use a parse tree of SMs, the tree has to be "compiled". This takes quite a while and should only be done once for each parse tree. Unfortunately, most parsers do not allow that: the SM tree is build and compiled for each parser run.
I managed to add a cache to the compile function, so that each SM tree is only compiled once. While some parsers only create the SM tree once, some parser don't. In principle this should be avoidable, but the code structure does not allow it.
Examples of such parsers are:
- quantum espresso
- crystal
- cp2k
- cpmd
Besides this, the parsers are not opted for reuse at all. While the legacy/nomadcore modules suggest reusability at some places (e.g. parser vs. context, interface vs. parsers), it is not thought through and lots of initialisation is done again, again, and again.
Tasks:
- replace simple_parser.mainFunction, baseclasses.ParserInterface with a unified interface that really promotes parser reuse
- rewrite parsers, one by one, to use this interface
- cleanup the parser code in the process: pep8, dead-code, unnecessary imports
- testhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/304GUI Tests2021-06-18T07:50:45ZMarkus ScheidgenGUI TestsWhile having component tests is nice (especially for new component), we should start with end-to-end tests based on some example data. These end-to-end test should simply cover common end-user workflows like uploading, publishing, editin...While having component tests is nice (especially for new component), we should start with end-to-end tests based on some example data. These end-to-end test should simply cover common end-user workflows like uploading, publishing, editing, searching, downloading raw/archive data.v0.10.0Lauri HimanenLauri Himanenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/298Move data from mongo to the archive2020-03-24T22:44:45ZMarkus ScheidgenMove data from mongo to the archiveCurrently all of `EntryMetadata` is stored in mongodb (in `proc.Calc.metadata`). This makes the entries quite big and some bulk operations on mongo are becoming quite slow. This might become even worse, if we think about adding encyclope...Currently all of `EntryMetadata` is stored in mongodb (in `proc.Calc.metadata`). This makes the entries quite big and some bulk operations on mongo are becoming quite slow. This might become even worse, if we think about adding encyclopedia metadata as well.
Only some quantities of `EntryMetadata` should be stored in mongo (especially what is editable by the user). The rest should be part of the archive. Downside is that optimade API and indexing elastic will need to read from the archive to work.
Tasks:
- [x] allow to store filtered `EntryMetadata` in archive with working references to optimade (encyclopedia)
- [x] new index function
- [x] adopt processing
- [x] adopt optimade API
![image](/uploads/2351ef278c25372040f5b046ff6619fa/image.png)https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/263Optimized use of Elasticsearch2020-02-21T08:22:46ZMarkus ScheidgenOptimized use of ElasticsearchIn all our dealings with elasticsearch we are requesting the whole source document. This is then also used in the API/GUI communication. This is regardless of whether we need all information, or not.
- [x] only transfer calc_id/upload_...In all our dealings with elasticsearch we are requesting the whole source document. This is then also used in the API/GUI communication. This is regardless of whether we need all information, or not.
- [x] only transfer calc_id/upload_id when scanning to stream download data
- [x] do not transfer quantities to clients if not explicitly requestedhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/259Material classification and labeling2020-02-21T08:17:12ZLauri HimanenMaterial classification and labelingIn order to perform meaningful search queries to the computations within the archive, it is critical that we can identify different materials and material categories and provide these as searchable entries. In the NOMAD-coe Encyclopedia,...In order to perform meaningful search queries to the computations within the archive, it is critical that we can identify different materials and material categories and provide these as searchable entries. In the NOMAD-coe Encyclopedia, this was referred to as going from *calculation oriented* ( Archive) to *materials oriented* views (Encyclopedia).
Currently, such material categories and labels are identified from several sources (e.g. AFLOW prototypes and Springer Materials) and they are spread over different sections (section_springer_material, section_prototype). The classification currently depends on static data files that are outdated. **Both the AFLOW prototype library data and the Springer data should thus be updated and some mechanism for automatically updating this information from the data source should be added.**
To make searching more intuitive and to bring together these categorizations under a more generic framework we should also consider creating a separate section for this information. One possibility is to include this kind of information in the material section that is created by EncyclopediaNormalizer.
For example something like this:
```json
material = {
material_hash = "...",
labels = [
{label: "superconductor" source: "springer", type: "electronical"},
{label: "ceramic" source: "ref1", type: "chemical"},
{label: "Afewfwf" source: "aflow_prototype_library", type: "symmetry"},
{label: "high entropy alloy" source: "ref2", type: "miscellaneous"}
],
sources = {
"springer": {link: ...},
"ref1": {link: ...},
"ref2": {link: ...},
"aflow_prototype_library": {link: ...}
},
}
```Version 0.8https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/229Instable uploading GUI for new uploads2020-06-04T14:50:20ZMarkus ScheidgenInstable uploading GUI for new uploadsUploads provided through the UI are using a fake upload object, because the upload does not yet exist on the server. This might lead to unexpected bugs and makes handling in the respective components hard. If possible there should be a d...Uploads provided through the UI are using a fake upload object, because the upload does not yet exist on the server. This might lead to unexpected bugs and makes handling in the respective components hard. If possible there should be a different solution to displaying uploading files.Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/223A new archive format and query ("graphQL") support2020-03-02T13:42:14ZMarkus ScheidgenA new archive format and query ("graphQL") support- [x] experimental backend to HDF5 storage
- [x] benchmarks JSON vs HDF5 for storage, complete read, partial read
- [x] HDF5 integrated into processing, whole download in one HDF5
- [x] graph-QL style queries for HDF5 files
- [x] more so...- [x] experimental backend to HDF5 storage
- [x] benchmarks JSON vs HDF5 for storage, complete read, partial read
- [x] HDF5 integrated into processing, whole download in one HDF5
- [x] graph-QL style queries for HDF5 files
- [x] more sophisticated /archive API endpointVersion 0.8Alvin Noe LadinesAlvin Noe Ladineshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/222Parser use the new Metainfo2020-03-22T14:04:55ZMarkus ScheidgenParser use the new MetainfoAfter #59 and #221, we should move all parsers to use the new Metainfo. It already works (but is not used) for the VASP parser.
This includes an updated API that serves the metainfo (moved as part from #221).After #59 and #221, we should move all parsers to use the new Metainfo. It already works (but is not used) for the VASP parser.
This includes an updated API that serves the metainfo (moved as part from #221).multi-domain support/metainfo 2.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/221Refactor the datamodel to use the new metainfo2020-03-06T11:33:11ZMarkus ScheidgenRefactor the datamodel to use the new metainfo- [x] optimize the Metainfo
- [x] implement adapters for elastic, flask argparse, flask restplus models, mongo
- [x] replace the old datamodel
- [ ] API for loading the definitions to JSON
- [ ] GUI uses the loaded definitions for toolti...- [x] optimize the Metainfo
- [x] implement adapters for elastic, flask argparse, flask restplus models, mongo
- [x] replace the old datamodel
- [ ] API for loading the definitions to JSON
- [ ] GUI uses the loaded definitions for tooltips, columns specs, search, etc.multi-domain support/metainfo 2.0Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/214Refactor the search python interface (and API)2019-09-05T10:33:42ZMarkus ScheidgenRefactor the search python interface (and API)We use individual search functions, were each function only supports a subset of the features (query, entry search, quantity search, search with statistics, scrolling or pagination). Furthermore, these function became rather complex and ...We use individual search functions, were each function only supports a subset of the features (query, entry search, quantity search, search with statistics, scrolling or pagination). Furthermore, these function became rather complex and hard to document/use. We also have no classes for result objects and rely on dictionaries.
We should use a `Search` class that allows to configure complex requests based on all available features. Result typed POPOs that either are dicts and can be transform to dicts, and support json serialization.
This search class, should also be offered by the API as complex POST requests for searches.
This is a pre-requisite for #211.Markus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/212Show calculation by PIDs2019-09-10T10:20:31ZMarkus ScheidgenShow calculation by PIDs- [x] refactor the path routing, see below
- [x] add an API endpoint that resolves PIDs
The path routing should work with the following paths
- `/gui/search`
- `/gui/dataset/doi/<doi>`
- `/gui/dataset/pid/<pid>`
- `/gui/entry/pid/<pid>...- [x] refactor the path routing, see below
- [x] add an API endpoint that resolves PIDs
The path routing should work with the following paths
- `/gui/search`
- `/gui/dataset/doi/<doi>`
- `/gui/dataset/pid/<pid>`
- `/gui/entry/pid/<pid>`
- `/gui/entry/id/<upload_id>/<calc_id>`A replacement for the NOMAD CoE RepositoryMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/208Overhaul the file handling2022-05-13T09:26:41ZMarkus ScheidgenOverhaul the file handlingThere should be the following concepts
- `mainfile`, the main output file, as usual
- `codefiles`, other files generated or inputted to the code
- `auxfiles`, other files in directory (and subdirectories?) respecting some heuristics
- `...There should be the following concepts
- `mainfile`, the main output file, as usual
- `codefiles`, other files generated or inputted to the code
- `auxfiles`, other files in directory (and subdirectories?) respecting some heuristics
- `allfiles`, files in same directory (currently `auxfiles` + `mainfile`)
`auxfiles` and `allfiles` should not be part of processing, but deduced by the API on request.
- [ ] "codefiles" based on filename regexps from parser
- [ ] allow to load more files in GUI
- [ ] include sub directories
- [ ] download all, dataset download, etc. use "allfiles"
- [ ] heuristicts to detect meaningful `auxfiles`https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/191A better and more "standard" help system2019-08-07T09:51:25ZMarkus ScheidgenA better and more "standard" help systemThe current help on the GUI is very intrusive and confusing. This should be replaced with more traditional help buttons, or simple hovers.The current help on the GUI is very intrusive and confusing. This should be replaced with more traditional help buttons, or simple hovers.Production uploadsMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/171Raw file API to read files and file sizes of calculation2019-07-19T09:47:30ZMarkus ScheidgenRaw file API to read files and file sizes of calculationCurrent auxfiles strategy is not good. Using a simple cutoff on auxfiles does not seam to be a good idea, since many calcs have >50 auxfiles. Storing all the auxfile paths seems also be a bad idea? Maybe when we just store last bit? Or d...Current auxfiles strategy is not good. Using a simple cutoff on auxfiles does not seam to be a good idea, since many calcs have >50 auxfiles. Storing all the auxfile paths seems also be a bad idea? Maybe when we just store last bit? Or do not store them at all, but read them via separate API.
This is related to #169migration finishedMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/164Improved upload view2019-07-04T11:41:24ZMarkus ScheidgenImproved upload viewThe upload view should not be look like just the staging area but preset published and unpublished uploads.
- [x] api allows to selectively get published and unpublished uploads
- [x] visualise state of upload
- [x] pagination for uploa...The upload view should not be look like just the staging area but preset published and unpublished uploads.
- [x] api allows to selectively get published and unpublished uploads
- [x] visualise state of upload
- [x] pagination for uploads
- [x] uploads should be ordered by upload timeMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/131Make domain specific metadata configurable2019-04-12T09:02:52ZMarkus ScheidgenMake domain specific metadata configurable- [x] elastic mapping
- [x] datamodel
- [x] search api
- [x] gui- [x] elastic mapping
- [x] datamodel
- [x] search api
- [x] guiExperimental nomadMarkus ScheidgenMarkus Scheidgen