Quality definitions for datasets
Datasets should have one or more indicators on their "quality", e.g. a star system.
How does the quality indicator get assigned?
- Check: Does the data satisfy all requirements
- Compute: number of AI toolkit notebooks that use it
- Manual: assign star by users
- ...
Critical datasets
We need a definition of "critical dataset", to secure these and assure long-term central backup. This relates to the TIER system, where "the most important" data is stored centrally. "Critical dataset" may not mean "good dataset", but critical to be included in the central repository database (e.g. for FAIRmat-political reasons). This means we need (at least) two quality definitions (well managed / complete vs. critical for NOMAD/FAIRmat).
Todo
-
Define quality indicator for "well managed dataset" -
Define quality indicator for "critical dataset" -
Automatic way to compute and assign quality indicator?