Mohamed, Fawzi Roberto (fawzi) · 618c776c
--- a/NoMaD-Base-Layer.md
+++ b/NoMaD-Base-Layer.md
@@ -92,3 +92,56 @@ Its calculation is preformed by the Nomad Lab base layer, possibly in multiple s
 This allows the parser to be as small as possible (keeping the fact that we want to parse "everything"), and still write code that works across parsers.
 For example the extraction of average and statistics data from an MD, and possibly of a short trajectory sample (if that is something that will be stored in the repository) should not be done in each parser again, but just once on the normalized full data.
 Changing what exactly is extracted can then be done in one place, and some things (like BSSE correction) can be done only after having parsed multiple calculations.
+
+# Storage
+
+What is on labenv-nomad (and labenv2-nomad):
+
+## /raw_data -&gt; /nomad/nomadlab/raw_data
+
+shared file system, one should have at least 10TB for it for now (we are seeing excellent compression down to 20%).
+
+### /raw_data/data
+
+is populated with zip archives using the proposed bagit standard
+
+    https://tools.ietf.org/html/draft-kunze-bagit-13
+
+The name of the archives stored is built with R+checksum of the files and their modification dates and recursively of all contained directories, so it uniquely represents the data in the bag.
+The archives are in a directory named after the first 3 letters of the archive, to avoid having too many files in the same directory.
+
+In /raw_data/data files are stable, and same name will always mean the same file.
+Bagit files can e verified, so corruption can e detected.
+
+For more info see the [raw data description](raw-data-description).
+
+### /raw_data/metadata
+
+will contain the information to link back the repository, citations,...
+
+### Replication
+
+To replicate the whole data only /raw_data/data and /raw_data/metadata need to be replicated.
+In particular parsing needs only /raw_data/data, which can be replicated in any way rsync,...
+All other data can be regenerated.
+
+## /parsed -&gt; /nomad/nomadlab/parsed
+shared storage currently, might be generated on demand in the future, or archived (many small files)
+contains normalized json files, one per calculation, organized by parser id.
+The name of the normalized file is P(checksum of "nmd://archive/path/to/main/file")
+
+## /normalized -&gt; /nomad/nomadlab/normalized
+
+shared storage, should be fast for enabling quick analysis 
+Will contain normalized data in HDF5 format, named as N<name of original raw data>, should be on fast shared.
+A single files might contains many calculations, avoids having too many files.
+In the file each calculation is identified with C(checksum of "nmd://archive/path/to/main/file")
+
+# /scratch
+
+Local storage (currently 2TB per VM)
+
+# /scratch/work-local/&lt;UUID&gt;
+
+local storage used by one of the single calculation workers, where files are decompressed
+