In this page are listed the projects related with the Repository/Archive:
-
Mirror for Repository in China.
- Main goal: Create a mirror for the Repository in China
- Status:
- Synchronization of data:
- Synchronization of user database:
- People involved:
- Thomas Zastrow.
- Fawzi Mohamed.
- China IT responsible (to be assigned)
- Xinguo Ren: contact person at China.
- Alfonso: contact person at FHI.
- Raphael Ritz: contact person at MPCDF.
- More info:
- Code from Thomas Zastrow [code] (https://gitlab.mpcdf.mpg.de/NoMaD/NomadRepositoryReplication)
-
Mirror for Archive in China.
-
Main goal: Create a mirror for the Archive in China
-
Status: Three steps have been define
- Send a summary with the technical specifications to recreate the data server: http://data.nomad-coe.eu in China.
- Create a mirror between the server located in Europe (http://data.nomad-coe.eu) and the server in China (e.g. http://data.nomad-coe.cn).
- Install the NOMAD API in China, so users can explore the data in a more friendly environment.
-
People involved:
- Someone working at China (to be assigned)
- Someone working at FHI (to be assigned)
- Xinguo Ren: contact person at China.
- Alfonso: contact person at FHI
-
More info:
-
-
Materialsproject.org (DONE)
- Main goal: To be defined
- Status: undefined
- People involved: Fawzi?, Alfonso,
- More info:
-
Normalizers
- Main goal: Create Normalizers to rich the parsed data
- Status: Collecting information about people working on normalizers.
- People involved:
- Normalizer developers: Lauri Hineman, Daria Tomecka, Jungho, Danilo
- Normalizer runs: Alfonso
- To-Do:
- Define how to create normalizers.
- Define hierarchy for normalizers:
- Level 0: normalizers that depends only of meta-data contained on parsers.
- Level 1: normalizers that depends on meta-data contained on other normalizers.
- More info:
- nomad-lab/nomad-lab-base/normalize (to be documented)
- nomad-lab/nomad-lab-base/core/src/main/scala/eu/nomad_lab/normalize (to be documented)
-
Normalizer: DOS
- Main Goal: Define Normalizer DOS
- Status: in progress:
- To-Do:
- Formal definition of the problem.
- Input: meta-data and other values.
- Calculation: explicit definition of the calculation.
- Output: define output.
- Formal definition of the problem.
- People involved: Alfonso, ...
-
Parsers
- Main goal: Create parsers for each code to extract the basic meta-data
- Status: There are codes for 32 different codes (check status of each code)
- People involved:
- Parser developers: Find list of people ()
- Parser runs: Danilo (50 %), Alfonso
- More info:
- https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base/wikis/Parsing
- https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base/wikis/how-to-write-a-parser
- https://gitlab.mpcdf.mpg.de/nomad-lab/public-wiki/wikis/ParserAssignment
- nomad-lab-base/core/src/main/scala/eu/nomad_lab/parsers (to be documented)
- nomad-lab-base/core/src/main/scala/eu/nomad_lab/parsing_queue (to be documented)
- nomad-lab-base/parsers (to be documented)
-
Raw-data generation:
- Main goal: Preprocess userdata to be processed by parsers
- Status:
- Some the steps have to be automatized (split big files).
- The interaction with the repository DDBB will be replaced by Elasticsearch.
- Check the classification of Open Access / Non Open Access.
- People Involved: Jungho, Alfonso
- More Info:
- https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base/wikis/raw-data-description
- https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base/wikis/raw-data-generation
- nomad-lab-base/repo (to be documented)
-
Repository/Archive minimal workflow
- Main goal: Create a continuous workflow for Upload -> RawData -> Parsed -> Normalizer.
- Status:
- Detected of the bottle necks.
- In process to automatize the whole flow.
- People involved: Jungho, Danilo (50 %), Alfonso
- More info:
-
Increase minimal workflow
- Main goal: extend workflow, so Parquet and json files and elastisearch entries (query and stats) are generated after normalization.
- Status: In progress
- People involved: Alfonso
- More info:
-
Query Project:
- Main goal: Create a query/api to extract information from archive files.
- Status: The api/query is working on testing mode. It should be deployed before July 13th.
- To-do:
- Test fails.
- Add more quantities with values (define the most important ones).
- Add link "Help us to improve the archive" to detect bad formatted data.
- Allow bulk queries.
- People involved: Benjamin, Alfonso
- More info:
- add link to Query notebook and documentation from Benny.
-
Data analysis of normalized data using emma:
- Main goal: create a DSL to analyze Archive data using emma library.
- Status: under development
- People involved: Felix (TU), Alfonso, more people
- More info:
- nomad-lab-base/parquet (to be documented)
- [emma example] (https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-emma-usecase)
-
Create a hadoop cluster to migrate all the data:
- Main goal: Move all our data to a Hadoop cluster to avoid GPFS limitations.
- Status: A hadoop cluster has been installed on nomad-team using Apache Ambari (to be tune)
- People involved: Alfonso
- More info:
-
Statistics:
- Main goal: Define metrics and collect data to generate statistical information about NOMAD Archive
- Status: In progress
- Verified metrics definition to start the ingestion (By Luca, Angelo, Fawzi, Alfonso)
- To-Do
- Generate updated statistics and update plots and table in Archive web-page
- Add a step in the workflow to generate statistics after Normalize.
- People involved: Alfonso
- More info:
-
Create a new Stats webpage using Elasticsearch + Kibana:
- Main goal: create a elasticsearch index containing the statistical information and display the results using Kibana.
- Status: some of the data are created for the Query, we have to integrate also data collected by stats.py scripts used in the old version. A dashboad has to be designed with the desire quantities.
- People involve: Alfonso
- More info:
- More info:
- [Kibana] (https://www.elastic.co/products/kibana)
-
Repository: Database part
- Status: ???
- People Involve:
- Thomas
- Raphael
- Jungho
- Fawzi
- Alfonso