nomad-FAIR issueshttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues2024-03-21T15:05:25Zhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1899Memory leak in the archive packer2024-03-21T15:05:25ZMarkus ScheidgenMemory leak in the archive packerWhen packing the archive during publishing, the memory necessary scales lineary with the amount of archive data. An upload with `n` entries and a size of `s` each requires roughly `10 * s * n` total memory instead of the expected `O(s)`.When packing the archive during publishing, the memory necessary scales lineary with the amount of archive data. An upload with `n` entries and a size of `s` each requires roughly `10 * s * n` total memory instead of the expected `O(s)`.Theodore ChangTheodore Changhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1920CP2K parser failure causes failed processing without logs2024-03-05T14:34:56ZMarkus ScheidgenCP2K parser failure causes failed processing without logsThere currently is a published upload https://nomad-lab.eu/prod/v1/gui/upload/id/Dkal0CV1QwSET0ndy4WS_w with CP2K entries that failed processing.
There are two separate issues.
1. A parser problem: #1919
2. The processing does not ev...There currently is a published upload https://nomad-lab.eu/prod/v1/gui/upload/id/Dkal0CV1QwSET0ndy4WS_w with CP2K entries that failed processing.
There are two separate issues.
1. A parser problem: #1919
2. The processing does not even produce an archive and the logs entries should also look different. At least the entry page should not fail it that way.
This issue is about problem 2.
Here is an example entry: https://nomad-lab.eu/prod/v1/gui/upload/id/Dkal0CV1QwSET0ndy4WS_w/entry/id/-4_2g5SFbcIrbLUKwraKKALpfjrmMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1300Track the symmetry-reduced axes2023-12-21T16:06:51ZNathan DaelmanTrack the symmetry-reduced axesWhen a system is categorized as `2D` or `1D` (not sure whether `surface` also belongs to this class), the symmetry-reduced (real and reciprocal) axes should be saved aside from the original ones. These will be used for computing the hype...When a system is categorized as `2D` or `1D` (not sure whether `surface` also belongs to this class), the symmetry-reduced (real and reciprocal) axes should be saved aside from the original ones. These will be used for computing the hypervolume via `abs(np.linalg.det())` and in other analyses, such as `k_line_density`.
@himanel1 Do you have any suggestion where these should be saved (maybe `run.system.atoms` or `run.system.symmetry`?) and whether this is the responsibility of MatID or the normalizer.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1491Processing parser issues 2023-05-12 (corrected)2023-12-21T16:06:51ZMarkus ScheidgenProcessing parser issues 2023-05-12 (corrected)** This should be the correct list now **
These are the results from Friday. We roughly more than halfed the total amount of errornous entries. I guess, its the right direction.
For some problems you indicated previously that its just...** This should be the correct list now **
These are the results from Friday. We roughly more than halfed the total amount of errornous entries. I guess, its the right direction.
For some problems you indicated previously that its just a parser mis-match. The re-processing is simply attempt to reprocess all existing matches, including the mismatched. I am not sure yet, how to deal with these entries. Because I list all errors here, those problems will be listed again. Simply comment "mismatch" again and check them off.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1516Processing parser issues 2023-05-242023-12-21T16:06:51ZMarkus ScheidgenProcessing parser issues 2023-05-24These are the results from Yesterday. Same data as before. The numbers are going further down, but a lot of the top-errors are still there. Some of them are probably in a not-fixable category. I start to ignore those. In this issue, thes...These are the results from Yesterday. Same data as before. The numbers are going further down, but a lot of the top-errors are still there. Some of them are probably in a not-fixable category. I start to ignore those. In this issue, these are ignored:
```
Trajectory is not ASAP.
```
Let me know, if other problems should be ignores (because unfixable) as well.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1299Cleaning the normalizers2023-12-21T16:06:50ZNathan DaelmanCleaning the normalizersThis is a super-issue for collecting minor corrections to the normalizers.
- [ ] @ladinesa stated that some of the imports in `nomad/normalizing/method.py` from `nomad.datamodel.results` are misleading, since they do not overwrite those ...This is a super-issue for collecting minor corrections to the normalizers.
- [ ] @ladinesa stated that some of the imports in `nomad/normalizing/method.py` from `nomad.datamodel.results` are misleading, since they do not overwrite those in `Method`, and should be imported from there.
- [ ] the `normalize()` method in `nomad/normalizing/normalizer.py/SystemBasedNormalizer` strongly relies on the `__normalize_system` in `nomad/normalizing/system.py/SystemNormalizer` via several interlinking methods. This connection could be made to be more direct and traceable.
- [ ] the `MethodNormalizer` normalizes the `method` sections both in `run` and `results`. The distinction could be made clearer by separating these into 2 methods that are called by `normalize()`.
- [ ] as suggested by @ladinesa, simplify the `kpoints` terminology in `run.method.eigenvalues`.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/604Optimize parser matching2023-12-21T15:58:48ZMarkus ScheidgenOptimize parser matchingParser matching becomes increasingly stressful with more and more parsers added. An upload with a very large amount of files (>100k) takes more time to parser match than its processing timeout.
We need to optimise parser matching:
- co...Parser matching becomes increasingly stressful with more and more parsers added. An upload with a very large amount of files (>100k) takes more time to parser match than its processing timeout.
We need to optimise parser matching:
- consolidate regex patterns into a single regex. One for paths, one for contents.
- read file context (first few k) for regex matching in parallelhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/784NOMAD Oasis (GUI)2023-12-21T15:55:39ZMarkus ScheidgenNOMAD Oasis (GUI)The NOMAD Oasis needs to be different:
- [ ] revise the used "terms" in menus, breadcrumbs, tabs, etc. Potentially for all of NOMAD?
- [x] different logo and/or colors to make it distinguishable. Maybe prominently show an installation na...The NOMAD Oasis needs to be different:
- [ ] revise the used "terms" in menus, breadcrumbs, tabs, etc. Potentially for all of NOMAD?
- [x] different logo and/or colors to make it distinguishable. Maybe prominently show an installation name.
- [ ] oasis installations register with central installation
- [ ] show network, allow navigation between installations.
- [ ] "publish": bundle/archive, allow metadata index, allow copy, trigger copy
- [ ] no embargo on Oasishttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/641Reducing the "resolution" on large visualization data2023-12-21T15:55:15ZMarkus ScheidgenReducing the "resolution" on large visualization dataIn this example here, DOS and BS are highly resolved. The browser takes a very long time to runder all the resulting SVG elements in the charts. We need a mechanism that reduces the resolution if necessary.
https://nomad-lab.eu/prod/rae...In this example here, DOS and BS are highly resolved. The browser takes a very long time to runder all the resulting SVG elements in the charts. We need a mechanism that reduces the resolution if necessary.
https://nomad-lab.eu/prod/rae/gui/entry/id/XLDSR5laTRGdUFQFQ9e09Q/dTbzAFHYfZZ7DlVKJxOMPJ2cqYSV
This does not have to be a GUI problem. Maybe we decide to do something during processing and have a reduced normalised version, e.g. in section results. Maybe we do this on the fly in the UI.
This is not just a DOS, BS problem, as it may apply to all kinds of visualised data.
For crystals we have a threshold at wich we stop rendering. This is a possibility, but not ideal. Especially if there should be a way to canonically reduce the resolution (e.g. only draw every other value, etc.).https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1189Cannot delete uploads if .volumes lies on an nfs share2023-12-21T15:40:10ZFlorian DobenerCannot delete uploads if .volumes lies on an nfs shareI created a nomad deployment at fhi via an adapted docker-compose file. For backup and storage issues we moved the .volumes folder to an nfs share. However, this makes it impossible to delete any uploads via the nomad gui. The uploads pe...I created a nomad deployment at fhi via an adapted docker-compose file. For backup and storage issues we moved the .volumes folder to an nfs share. However, this makes it impossible to delete any uploads via the nomad gui. The uploads persists and the status changes to `Process delete_upload failed: OSError: [Errno 39] Directory not empty: 'archive'`. A second try changes the status again to `Process delete_upload failed: OSError: [Errno 16] Device or resource busy: '.nfs000000000c01329400000001'`.
I tried it via `nomad admin uploads rm`, too. It shows a similar error message:
```
1 uploads selected, deleting ...
ERROR nomad.cli 2022-11-22T07:29:36 could not delete files
- exception: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/nomad/cli/admin/uploads.py", line 424, in delete_upload
upload_files.delete()
File "/usr/local/lib/python3.7/site-packages/nomad/files.py", line 671, in delete
shutil.rmtree(self.os_path)
File "/usr/local/lib/python3.7/shutil.py", line 494, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 432, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/usr/local/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd
os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs000000000c01329400000001'
- exception_hash: JM34EDxYaKn6rnUZaYL8erVJcKaF
- nomad.commit: 88fba0386
- nomad.deployment: oasis
- nomad.service: nomad_oasis_app
- nomad.version: 1.1.5
```
This is the error message after a first try of deletion with `nomad admin uploads rm`:
```
ERROR nomad.cli 2022-11-22T07:32:02 could not delete files
- exception: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/nomad/cli/admin/uploads.py", line 424, in delete_upload
upload_files.delete()
File "/usr/local/lib/python3.7/site-packages/nomad/files.py", line 671, in delete
shutil.rmtree(self.os_path)
File "/usr/local/lib/python3.7/shutil.py", line 494, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 436, in _rmtree_safe_fd
onerror(os.rmdir, fullname, sys.exc_info())
File "/usr/local/lib/python3.7/shutil.py", line 434, in _rmtree_safe_fd
os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'archive'
- exception_hash: i4sNv8DQkHjsy-qHI4I0XBUjcmeu
- nomad.commit: 88fba0386
- nomad.deployment: oasis
- nomad.service: nomad_oasis_app
- nomad.version: 1.1.5
```
Running cli removes the uploads from the nomad gui but they still persist in `.volumes/fs`.
This seems to be related to nfs creating hidden files, while files are still open (see https://bugzilla.redhat.com/show_bug.cgi?id=1362667). Therefore `rmdir` is not able to remove the folder. So probably the access to the upload should be closed before deleting (if possible?). Alternatively, `os.remove(...)` could be used recursively to delete this hidden nfs files before calling `os.rmdir(...)`.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1172Entry_id missing from metadata2023-12-21T15:40:04ZSascha KlawohnEntry_id missing from metadataI found some entries that show an `entry_id` in the GUI but not in the metadata: [T5VGijnmKSqVgEPeQPwVaPIOQiNU](https://nomad-lab.eu/prod/v1/staging/gui/search/entries/entry/id/T5VGijnmKSqVgEPeQPwVaPIOQiNU) (and other entries in that upl...I found some entries that show an `entry_id` in the GUI but not in the metadata: [T5VGijnmKSqVgEPeQPwVaPIOQiNU](https://nomad-lab.eu/prod/v1/staging/gui/search/entries/entry/id/T5VGijnmKSqVgEPeQPwVaPIOQiNU) (and other entries in that upload 2BDahabrRruTFIuNtEWuZg)
How do they differ? Is there a workaround to get the entry_id?https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1498Bundle format test and documentation2023-12-21T15:39:10ZMarkus ScheidgenBundle format test and documentationWe should try our bundle import and export CLI functionality on our first two example uploads. David implemented this, but I am not sure what we actually have here. I want to see what this can already do and what not. We need this as a s...We should try our bundle import and export CLI functionality on our first two example uploads. David implemented this, but I am not sure what we actually have here. I want to see what this can already do and what not. We need this as a starting point for further discussions on data transfer.
As a side effect, please extend the documentation with a brief "how to import and export uploads". This should cover the commands and a rundown on the bundle format, e.g. what files it contains, whats in the nomad.json, etc. We can put this under "NOMAD Oasis" for now.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1181Bundle format and functionality2023-12-21T15:39:10ZMarkus ScheidgenBundle format and functionalityAn upload "bundle" is a data artifact that contains everything necessary to interpret the data of a single upload.
I needs to contain:
- raw files (optionally)
- archive (optionally based on a required)
- all upload and entry metadata
...An upload "bundle" is a data artifact that contains everything necessary to interpret the data of a single upload.
I needs to contain:
- raw files (optionally)
- archive (optionally based on a required)
- all upload and entry metadata
- all related DOI records
- all necessary schema
- a directory of all external references
- hashes/checksums, etc. ideally following some long-term archive standard
We should use these "bundles":
- to internally store immutable/published uploads
- transfer uploads from installation to installation
- import/export functionalityhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/319A more flexible and more celery-tonic processing module2023-12-21T15:39:04ZMarkus ScheidgenA more flexible and more celery-tonic processing moduleThis is a pre-requisite for #251
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
@mscheidg
- [x] add a redis to helm
- [ ] run some...This is a pre-requisite for #251
@himanel1 already started the implementation. I think this looks good and retains all features we had before. Here are some furhter todos that I see:
@mscheidg
- [x] add a redis to helm
- [ ] run some large scale processing
@himanel1
We should do the refactoring all the way. I think you should organize the submodules based on the processed entities Calc and Upload rather then trying to separate mongo from celery. A typical submodule structure that we also use in other modules would be:
* `processing/__init__.py` - Only docs and imports to expose to other modules
* `processing/processing.py` - All celery setup suff
* `processing/common.py` - Common sutff, the mongoengine base, our custom celery tasks/request, shared constants, etc., Pipeline, PipelineContext, Stage, empty_task
* `processing/calc.py` - Including the "celery task" `comp_process` (don't like the name, btw.)
* `processing/upload.py` - Including upload_cleanup, pipelines, get_pipeline, run_pipline
* I think we can move the tests into a singular module, or rename test_base->test_common, test_data->test_upload
upload can depend on calc; upload and calc can depend on common; all can depend on processing; no other dependencies between submodules should be necessary
In the future we could think about replacing: @process, current_process, process_status with celery. But at the moment its very convinient to use mongodb query to check on the processing status of all entities. I feel celery wasn't really designed with persistent tasks in mind. Also we would need to be far more regid with the celery infrastructure and add persistence to rabbitmq and redis.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/702Exclude Calc from queuing (and slotting)2023-12-21T15:39:02ZMarkus ScheidgenExclude Calc from queuing (and slotting)The Calc processing does not need any queuing (or slotting). Having to store additional keys, do additional updates, and most importantly add the additional risk of failure, should not be applied to 12M+ entries.
If we add queuing (and ...The Calc processing does not need any queuing (or slotting). Having to store additional keys, do additional updates, and most importantly add the additional risk of failure, should not be applied to 12M+ entries.
If we add queuing (and slotting) as additional features on top of Base it should happen in an other extending base class that is not used by Calc. I understand that `@process` does need to behave differently for those classes. But putting an "if" ?`@process` is less harmful than the alternative. The class can be determined from the `self` argument (or probably even from the function object).https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1665Processing MOF data problems2023-12-21T15:38:51ZMarkus ScheidgenProcessing MOF data problemsentry_id: `USJgCEf2AjGWqJosqsb3WdEhc3ZF`, upload_id: `fTR0JNA9T4aMfpopTSKHhQ`,\
parser: `parsers/ams`, normalizer: `ResultsNormalizer`\
[process installation](https://nomad-lab.eu/prod/v1/process/gui/entry/id/USJgCEf2AjGWqJosqsb3WdEhc3ZF...entry_id: `USJgCEf2AjGWqJosqsb3WdEhc3ZF`, upload_id: `fTR0JNA9T4aMfpopTSKHhQ`,\
parser: `parsers/ams`, normalizer: `ResultsNormalizer`\
[process installation](https://nomad-lab.eu/prod/v1/process/gui/entry/id/USJgCEf2AjGWqJosqsb3WdEhc3ZF),
[production installation](https://nomad-lab.eu/prod/v1/gui/entry/id/USJgCEf2AjGWqJosqsb3WdEhc3ZF)
```
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1228, in normalizing
normalizer(self._parser_results).normalize(logger=logger)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/results.py", line 112, in normalize
self.normalize_run(logger=self.logger)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/results.py", line 205, in normalize_run
properties, conv_atoms, wyckoff_sets, spg_number = self.properties(repr_system, repr_symmetry)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/results.py", line 804, in properties
dos_electronic = self.resolve_dos(['run', 'calculation', 'dos_electronic'])
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/results.py", line 320, in resolve_dos
if valid_array(energies) and valid_array(values):
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/results.py", line 84, in valid_array
return array is not None and len(array) > 0
File "/usr/local/lib/python3.9/site-packages/pint/quantity.py", line 1816, in __len__
return len(self._magnitude)
TypeError: object of type 'numpy.float64' has no len()
```
- [x] **20** entries in 2 uploads: *Unexpected error during normalizing*,\
entry_id: `PEFlVu7-W1-yrdRx5rGlUADNHyTU`, upload_id: `fTR0JNA9T4aMfpopTSKHhQ`,\
parser: `parsers/ams`, normalizer: `PorosityNormalizer`\
[process installation](https://nomad-lab.eu/prod/v1/process/gui/entry/id/PEFlVu7-W1-yrdRx5rGlUADNHyTU),
[production installation](https://nomad-lab.eu/prod/v1/gui/entry/id/PEFlVu7-W1-yrdRx5rGlUADNHyTU)
```
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 153, in __normalize_system
return self._normalize_system(system, representative)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 80, in _normalize_system
return self.normalize_system(system, is_representative)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/porosity.py", line 101, in normalize_system
if len(indices_in_parent) > 0:
TypeError: object of type 'numpy.int64' has no len()
```
- [x] **20** entries in 2 uploads: *process failed*,\
entry_id: `PEFlVu7-W1-yrdRx5rGlUADNHyTU`, upload_id: `fTR0JNA9T4aMfpopTSKHhQ`,\
parser: `parsers/ams`, normalizer: `PorosityNormalizer`\
[process installation](https://nomad-lab.eu/prod/v1/process/gui/entry/id/PEFlVu7-W1-yrdRx5rGlUADNHyTU),
[production installation](https://nomad-lab.eu/prod/v1/gui/entry/id/PEFlVu7-W1-yrdRx5rGlUADNHyTU)
```
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1228, in normalizing
normalizer(self._parser_results).normalize(logger=logger)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 179, in normalize
self.__normalize_system(repr_sys, True, logger)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 165, in __normalize_system
raise e
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 153, in __normalize_system
return self._normalize_system(system, representative)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/normalizer.py", line 80, in _normalize_system
return self.normalize_system(system, is_representative)
File "/usr/local/lib/python3.9/site-packages/nomad/normalizing/porosity.py", line 101, in normalize_system
if len(indices_in_parent) > 0:
TypeError: object of type 'numpy.int64' has no len()
```
- [ ] **1** entries in 1 uploads: *process failed*,\
entry_id: `GWiO4zAGtvYGorzNJW9XdpPTArpE`, upload_id: `fTR0JNA9T4aMfpopTSKHhQ`,\
parser: `parsers/ams`, normalizer: ``\
[process installation](https://nomad-lab.eu/prod/v1/process/gui/entry/id/GWiO4zAGtvYGorzNJW9XdpPTArpE),
[production installation](https://nomad-lab.eu/prod/v1/gui/entry/id/GWiO4zAGtvYGorzNJW9XdpPTArpE)
```
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1203, in parsing
parser.parse(self.mainfile_file.os_path, self._parser_results, logger=logger, **kwargs)
File "/usr/local/lib/python3.9/site-packages/nomad/parsing/parser.py", line 398, in parse
self.mainfile_parser.parse(mainfile, archive, logger)
File "/usr/local/lib/python3.9/site-packages/electronicparsers/ams/parser.py", line 1381, in parse
self.parse_configurations()
File "/usr/local/lib/python3.9/site-packages/electronicparsers/ams/parser.py", line 1337, in parse_configurations
sec_method = parse_method(geometry_opt)
File "/usr/local/lib/python3.9/site-packages/electronicparsers/ams/parser.py", line 1257, in parse_method
sec_atom_param.orbitals = [str(v) for v in val[0]]
TypeError: 'NoneType' object is not subscriptable
```Dinga WonankeDinga Wonankehttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1446Reprocessing tasks2023-12-21T15:38:49ZLauri HimanenReprocessing tasksWe are performing a thorough reprocessing due to significant changes in the metadata and normalizers. Here is a breakdown of the different issues related to this:
- [x] Band gap information was moved to it's own separate location that d...We are performing a thorough reprocessing due to significant changes in the metadata and normalizers. Here is a breakdown of the different issues related to this:
- [x] Band gap information was moved to it's own separate location that does not require a band structure to be present (#1182)
- [x] Check bandgap in GW workflow entries (#1459).
- [x] Chemical formulas were refactored: new formula type (`chemical_formula_iupac`) and fixes in the Hill formula and reduced formula creation (#1260)
- [x] The subsystem extraction with MatID has been improved. The performance was improved (#1177) and subsystem reporting is extended to cover also finite systems (#1427)
- [x] Fix inconsistencies in the metainfo names and what is shown by GUI:
- [x] Jacob's ladder vs. `xc_functional_type` (#1461)
- [x] Dimensionality vs. `structural_type` (#1466)
- [x] Hybrids vs xc_functional_type=hybrid (#1461)
- [x] Jacob's ladder: what to store in DFT section and what to store in GW? (#1461)
- [ ] Add new composition information that contains `mass_fraction` and `atomic_fraction` (#1409)
- [ ] Polymorphism support (#1432)
- [x] Workflow2 -> Workflow (#1419)
- [x] Basis sets (plane wave, lapw) (#1449)
- [x] Refactor formula shown in the GUI (#1178).
- [x] Change `entry_type` and `entry_name` for simulations (#1178).
- [ ] Use the correct timestamp when creating entry hashes (RFC3161ng). Only create timestamp once when published, not after reprocessing?
- [ ] Unify reference types (@thchang could check this?)
- [x] Allow processing to read metadata from old archives when reprocessing for a new version (#1460).
# Trial uploads to reprocess
## Classification:
- [xmzw_2XqTmiwpN97GkLYlg](https://nomad-lab.eu/prod/v1/gui/upload/id/xmzw_2XqTmiwpN97GkLYlg): Bimetallic surfaces
- [U_zHhUjsQA6B3PkqEIAlpA](https://nomad-lab.eu/prod/v1/gui/upload/id/U_zHhUjsQA6B3PkqEIAlpA): Nanoparticle surfaces
- [PGDcD6BcSdqBuYaa1Mq2Uw](https://nomad-lab.eu/prod/v1/gui/upload/id/PGDcD6BcSdqBuYaa1Mq2Uw): Bulk structures with grain boundaries
## BeyondDFT
- [WHmwXEBfRUOkfI2ZY2PeCA](https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/BvJQeuDQTNy2MrDz9U9uaA): Core-level GW data for amorphous carbon (FHI-aims)
- [0ZAxF7XxSQm5T1fLonwbyg](https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/hZXp3HvRSIG-B3Mln_6q9w): Hybrid functionals and one shot GW calculations of HaPs and PbI2 electronic structure (exciting)
- [aUKBqsEOTmq69kFe5R_nOg](https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/NxeR8RXBSyGdxq-c6F7tDQ): OE62 dataset: results of G0W0@PBE0 (vacuum) calculations with def2-QZVP basis set (FHI-aims)
- [XtmKXPo1QvKjqUX3I-v_Sg](https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/P9ODmmOMQJibJM4dyD8NIw): OE62 dataset: results of G0W0@PBE0 (vacuum) calculations with def2-TZVP basis set (FHI-aims)
- [1eMMOR07QTOUNDXy_7VlkQ](https://nomad-lab.eu/prod/v1/gui/dataset/id/uRQvzYD2Rqe4QvFBCPtZgw): HT & LT (BSE) (exciting)
- [mr5PRdbVQUm-d7awz3Q9Uw](https://nomad-lab.eu/prod/v1/gui/dataset/id/xHXf8ilBRk2q16iVaBMYJw): Ti-O-10 Simulated XANES
- [6cZnVKTLRIq27IATPtQNmQ](https://nomad-lab.eu/prod/v1/gui/dataset/id/nONsMLRzQMijt4Ipl4WDXw): GW100-finalMarkus ScheidgenMarkus Scheidgenhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/821File type in metainfo2023-12-21T15:38:35ZMarkus ScheidgenFile type in metainfoAdd an actual file type to the metainfo. Processing should check if file exists and produce warnings during processing.Add an actual file type to the metainfo. Processing should check if file exists and produce warnings during processing.https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1791Problem in published Dataset related to tabulartree2023-12-07T09:18:54ZAmir GolparvarProblem in published Dataset related to tabulartree1. The dataset `https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/UfbrNqwOQrOPs3SURIIhqw` have some entries whose `data.archive.json` files contain illegal `NaN` values set for some fields. This causes to get an error in the frontend `...1. The dataset `https://nomad-lab.eu/prod/v1/staging/gui/dataset/id/UfbrNqwOQrOPs3SURIIhqw` have some entries whose `data.archive.json` files contain illegal `NaN` values set for some fields. This causes to get an error in the frontend `not a valid JSON`
It is probably part of a bigger problem, as in python for example, one can export a json formatted string with this illegal `NaN` values in this way:
``` python
test = {"test": float('nan')}
json.dumps(test)
```
but this is not a valid format of a JSON string. All the entries in the upload `wZzlkn24S-a8O1s33owOOg` have this problem.
After replacing the NaN values in the entries with null, a reprocessing is needed.
2. The upload with upload_id `wZzlkn24S-a8O1s33owOOg` has a problematic schema in which there exists duplicate quantities under the same section (for example `surface_content_of_O_(atom_%)_O` quantity)
3. There are 4 schemas in this dataset (XRF, XRD, TPRO and BET schemas) but **None** of them are populating `tabulartree`
4. Some other entries (for example from upload_id 'qegdLWpzQ7yAFmYikFNydg') complains that HistoryCard cannot be rendered which is probably because the processed data is old and they don't contain `section_defs` in their metainfo. Reprocessing the entries should fix this. But in general, it is better to correct line 73 of the HistoryCard from `!index.section_defs.some` to `!index.section_defs?.some`.Amir GolparvarAmir Golparvarhttps://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1396Parser matching is too slow2023-08-10T14:35:32ZMarkus ScheidgenParser matching is too slowRecently we had several incidents with larger uploads that were hitting timeouts during parser matching. Most likely, opening the files and reading the initial xk for magic and regular expression matching is too slow. We should investiga...Recently we had several incidents with larger uploads that were hitting timeouts during parser matching. Most likely, opening the files and reading the initial xk for magic and regular expression matching is too slow. We should investigate, if running the matching with asyncio and many files at the same time can solve the problem. The GPFS should be fast enough, we just need to parallize to circumvent latency.Theodore ChangTheodore Chang