Multiple entries from one mainfile
Currently we maintain a strict 1:1 relationship between entries and mainfile. Each entry has exactly one mainfile and each entry is created based on a mainfile matching a parser.
To support excel, or other table data, formats (#737 (closed)), we need to break this 1:1 relationship. One excel-file might contain the data of many entries (e.g. each row one entry).
Mainfiles are not database entities. They only exist as paths stored under the mainfile key. In this sense, there is no actual database relationship. In principle, many entries could show the same mainfile path. However, the mainfile path determines the entry_id. Each entry_id is a hash on the upload_id and the mainfile. Furthermore, we use use upload_id + mainfile to refer to entries in archive references. We also use the mainfile to refer to entries in nomad.yaml files. There is some merit in maintaining unique mainfile paths for each entry.
One solution could be to add a "kicker" or "tie-breaker" to the mainfile path. E.g. the "mainfile" /path/to/mainfile.xls:23
could refer to a certain "part" of the actual /path/to/mainfile.xls
file. The matching part of a parser could optionally provide a list of these "tie-breakers" instead of a true/false
value. In theses cases, we can run the parser n times to create an entry for each tie-breaker with <path/to/file>:<tie-breaker>
as mainfiles. This would only require to extend the matching interface and matching part of the processing (and parse cli). We also might need slight modifications to the GUI where we mark files are mainfiles.