archive · Wiki · nomad-lab / nomad-lab-base

In this page are listed the projects related with the Repository/Archive:

Mirror for Repository in China.
1. Main goal: Create a mirror for the Repository in China
2. Status:
  - Synchronization of data:
  - Synchronization of user database:
3. People involved:
  - Thomas Zastrow.
  - Fawzi Mohamed.
  - China IT responsible (to be assigned)
  - Xinguo Ren: contact person at China.
  - Alfonso: contact person at FHI.
  - Raphael Ritz: contact person at MPCDF.
4. More info:
  - Code from Thomas Zastrow [code] (https://gitlab.mpcdf.mpg.de/NoMaD/NomadRepositoryReplication)
Mirror for Archive in China.
1. Main goal: Create a mirror for the Archive in China
2. Status: Three steps have been define
  - Send a summary with the technical specifications to recreate the data server: http://data.nomad-coe.eu in China.
  - Create a mirror between the server located in Europe (http://data.nomad-coe.eu) and the server in China (e.g. http://data.nomad-coe.cn).
  - Install the NOMAD API in China, so users can explore the data in a more friendly environment.
3. People involved:
  - Someone working at China (to be assigned)
  - Someone working at FHI (to be assigned)
  - Xinguo Ren: contact person at China.
  - Alfonso: contact person at FHI
4. More info:
Materialsproject.org (DONE)
1. Main goal: To be defined
2. Status: undefined
3. People involved: Fawzi?, Alfonso,
4. More info:
  - https://materialsproject.org/
Normalizers
1. Main goal: Create Normalizers to rich the parsed data
2. Status: Collecting information about people working on normalizers.
3. People involved:
  - Normalizer developers: Lauri Hineman, Daria Tomecka, Jungho, Danilo
  - Normalizer runs: Alfonso
4. To-Do:
  - Define how to create normalizers.
  - Define hierarchy for normalizers:
    - Level 0: normalizers that depends only of meta-data contained on parsers.
    - Level 1: normalizers that depends on meta-data contained on other normalizers.
5. More info:
  - nomad-lab/nomad-lab-base/normalize (to be documented)
  - nomad-lab/nomad-lab-base/core/src/main/scala/eu/nomad_lab/normalize (to be documented)
Normalizer: DOS
1. Main Goal: Define Normalizer DOS
2. Status: in progress:
3. To-Do:
  - Formal definition of the problem.
    - Input: meta-data and other values.
    - Calculation: explicit definition of the calculation.
    - Output: define output.
4. People involved: Alfonso, ...
Parsers
1. Main goal: Create parsers for each code to extract the basic meta-data
2. Status: There are codes for 32 different codes (check status of each code)
3. People involved:
  - Parser developers: Find list of people ()
  - Parser runs: Danilo (50 %), Alfonso
4. More info:
  - Parsing
  - how to write a parser
  - ParserAssignment
  - nomad-lab-base/core/src/main/scala/eu/nomad_lab/parsers (to be documented)
  - nomad-lab-base/core/src/main/scala/eu/nomad_lab/parsing_queue (to be documented)
  - nomad-lab-base/parsers (to be documented)
Raw-data generation:
1. Main goal: Preprocess userdata to be processed by parsers
2. Status:
  - Some the steps have to be automatized (split big files).
  - The interaction with the repository DDBB will be replaced by Elasticsearch.
  - Check the classification of Open Access / Non Open Access.
3. People Involved: Jungho, Alfonso
4. More Info:
  - raw data description
  - raw data generation
  - nomad-lab-base/repo (to be documented)
Repository/Archive minimal workflow
1. Main goal: Create a continuous workflow for Upload -> RawData -> Parsed -> Normalizer.
2. Status:
  - Detected of the bottle necks.
  - In process to automatize the whole flow.
3. People involved: Jungho, Danilo (50 %), Alfonso
4. More info:
  - https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-lab-base/wikis/repository-archive%20workflow
  - [Apache Nifi] (https://nifi.apache.org/)
Increase minimal workflow
1. Main goal: extend workflow, so Parquet and json files and elastisearch entries (query and stats) are generated after normalization.
2. Status: In progress
3. People involved: Alfonso
4. More info:
Query Project:
1. Main goal: Create a query/api to extract information from archive files.
2. Status: The api/query is working on testing mode. It should be deployed before July 13th.
3. To-do:
  - Test fails.
  - Add more quantities with values (define the most important ones).
  - Add link "Help us to improve the archive" to detect bad formatted data.
  - Allow bulk queries.
4. People involved: Benjamin, Alfonso
5. More info:
  - add link to Query notebook and documentation from Benny.
Data analysis of normalized data using emma:
1. Main goal: create a DSL to analyze Archive data using emma library.
2. Status: under development
3. People involved: Felix (TU), Alfonso, more people
4. More info:
  - nomad-lab-base/parquet (to be documented)
  - [emma example] (https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-emma-usecase)
Create a hadoop cluster to migrate all the data:
1. Main goal: Move all our data to a Hadoop cluster to avoid GPFS limitations.
2. Status: A hadoop cluster has been installed on nomad-team using Apache Ambari (to be tune)
3. People involved: Alfonso
4. More info:
Statistics:
1. Main goal: Define metrics and collect data to generate statistical information about NOMAD Archive
2. Status: In progress
  - Verified metrics definition to start the ingestion (By Luca, Angelo, Fawzi, Alfonso)
3. To-Do
  - Generate updated statistics and update plots and table in Archive web-page
  - Add a step in the workflow to generate statistics after Normalize.
4. People involved: Alfonso
5. More info:
  - nomad metrics
Create a new Stats webpage using Elasticsearch + Kibana:
1. Main goal: create a elasticsearch index containing the statistical information and display the results using Kibana.
2. Status: some of the data are created for the Query, we have to integrate also data collected by stats.py scripts used in the old version. A dashboad has to be designed with the desire quantities.
3. People involve: Alfonso
4. More info:
  - elasticsearch query
  - 1. Statistics:
5. More info:
  - [Kibana] (https://www.elastic.co/products/kibana)
Repository: Database part
1. Status: ???
2. People Involve:
  - Thomas
  - Raphael
  - Jungho
  - Fawzi
  - Alfonso

Comments

Please register or sign in to add a comment.