Tabular Parser: sub_sub_sections in row mode
We are getting deeper with this tool and this leads us relentlessly to some new feature request.
When using the row mode, some users have a bundle of columns that repeats along the row because they use a single repeated class multiple times in that row.
This is an example of table we may face (note that process step, date, user, location, are not repeated in the row):
Process step | date | user | location | Material | Sputtering | Density | Voltage | Material | Sputtering | Density | Voltage |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 01-01-22 | john | Berlin | MgO | Y | 2.3 | 4 | TiO | Y | 4.5 | 45 |
2 | 02-01-22 | micha | Berlin | ZnO | Y | 2.3 | 4 | CS | Y | 4.5 | 45 |
and this is how it should be parsed in the archive:
data:
m_def: my_experiment
process_steps: # this is the repeated section parsed currently by our table parser
- date: 01-01-22
user: john
location: Berlin
sputtered_materials:
- name: MgO
sputtering: Y
density: 2.3
voltage: 4
- name: TiO
sputtering: Y
density: 4.5
voltage: 45
- date: 02-01-22
user: micha
location: Berlin
sputtered_materials:
- name: ZnO
sputtering: Y
density: 2.3
voltage: 4
- name: CS
sputtering: Y
density: 4.5
voltage: 45
So each row gives rise to an instance of a repeated subsection, and each bundle of columns in the same row gives rise to an instance of a repeated sub_sub_section.
I can already implement a schema where I parse every column manually into a different instance but this carries a couple of drawbacks:
- I have to rename each column with a different name so I can use the tabular annotation as it is now
- I have to redundantly define a schema for each bundle that is essentially the same class
- I have to know a priori how many times that class is used in that row, but this is not always know and can even vary
I explained already this issue to @amgo