# Add Feature for Descriptive Array Quantity

After discussions with Area A and B (https://github.com/FAIRmat-NFDI/AreaA-data_modeling_and_schemas/discussions/30 and https://github.com/FAIRmat-NFDI/AreaA-data_modeling_and_schemas/discussions/31) we realized the need for having a descriptive array quantity within NOMAD.

The need for two separate quantities was specified where the first type (project name `NumericalArray`

) would be a section/class/object containing an *n*-dimensional array with precomputed properties: Mean, Min, Max, Standard Deviation, and Shape (dimensions). This is analogous to a Numpy `ndarray`

where these values are calculated by built in methods: `mean()`

, `min()`

, `max()`

, `std()`

and the properties `shape`

and `ndim`

. Additionally we discussed including quantiles (first ventile, first quartile, median, third quartile, 19th ventile) which similarly can be calculated (on the flattened array) by Numpy's `quantile(array, q=(0.05, 0.25, 0.5, 0.75, 0.95))`

. An example would be:

```
import json
import numpy as np
mu, sigma = 1, 0.1 # mean and standard deviation
a = np.random.normal(mu, sigma, (20, 30, 10))
qs = (0.05, 0.25, 0.5, 0.75, 0.95)
quants = np.quantile(a, qs)
descriptors= {
"dimensionality": a.ndim,
"shape": a.shape,
"mean": a.mean(),
"min": a.min(),
"max": a.max(),
"standard_deviation": a.std(),
"quantiles": {q: quant for q, quant in zip(qs, quants)}
}
print(json.dumps(descriptors, indent=2))
```

```
{
"dimensionality": 3,
"shape": [
20,
30,
10
],
"mean": 1.0002707071723846,
"min": 0.6440913244294681,
"max": 1.3667379386039438,
"standard_deviation": 0.09958481036605629,
"quantiles": {
"0.05": 0.8351471083009838,
"0.25": 0.931646179009207,
"0.5": 0.9991245263293533,
"0.75": 1.0693072252478522,
"0.95": 1.1614793219827306
}
}
```

In addition we need some sort of reduced preview of this array which could be solved by the changes being made to the API. There could be 3 levels of access:

- Access descriptors above
- Access subset of array
- Access the whole array

The second type of array (project name `ContextArray`

) would be a `NumericalArray`

within some context with axes, units, uncertainty and quantization. This would be analogous to XArray's `DataArray`

. Here the axes would themselves be `ContextArrays`

with units etc. The numerical values would be stored in `NumericalArrays`

with it's descriptors. This would allow us to search for, for example, the 19th ventile (max without outliers) of the transmission for measurements where a certain wavelength, lamda, was measured (lambda > min, lambda < max).

Additionally we discussed that the values could be a reference to:

- Other data field
- External file
- Virtual source (any combination or subset of the above)