Support parsers creating multiple entries from the same file
Closes #761 (closed)
We make it possible for an entry (associated with some mainfile) to have child entries. This is done by introducing a new field on the entry level:
mainfile_key. Main entries have
mainfile_key == None, child entries have the same value for
mainfile as the main/parent entry, plus some non-empty string as value for the
mainfile_key. Note, however, that most parsers will only produce a main entry without any child entries.
Both the main and the child entries are full-fledged entries, i.e. they are distinct objects in mongo and elastic search, they have their own archive files, their own metadata, and so on.
(upload_id, mainfile, mainfile_key) uniquely identifies any entry. For every child entry, a main entry must exist (i.e. an entry with the same
mainfile, but with
mainfile_key == None).
Parsers signal that they want to create child entries by returning a set of keys, one for each child, from the
is_mainfile function (instead of a boolean, like we have done up until now). Parsers that don't want to create any child entries can just return True, as before.
The parse function is called only once, for the main entry, and we pass an additional argument to it:
child_archives, a dictionary of the format