Image in RichTextEditQuantity causes ES error and processing fails
Hey nomad,
processing a file with an image seems to cause an error:
RROR nomad.processing 2023-11-21T09:52:04 could not index archive after processing failure
- exception: Traceback (most recent call last):
File "/home/a2853/Documents/Projects/nomad/nomad-FAIR/nomad/processing/base.py", line 862, in proc_task
rv = unwrapped_func(proc, *args, **kwargs)
File "/home/a2853/Documents/Projects/nomad/nomad-FAIR/nomad/processing/data.py", line 1006, in process_entry
self._process_entry_local()
File "/home/a2853/Documents/Projects/nomad/nomad-FAIR/nomad/processing/data.py", line 1081, in _process_entry_local
entry.archiving()
File "/home/a2853/Documents/Projects/nomad/nomad-FAIR/nomad/processing/data.py", line 1262, in archiving
raise RuntimeError('Failed to index in ES: ' + indexing_errors[self.entry_id])
RuntimeError: Failed to index in ES: {'type': 'illegal_argument_exception', 'reason': 'Document contains at least one immense term in field="search_quantities.str_value.keyword" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: \'[60, 117, 108, 62, 10, 60, 108, 105, 62, 10, 60, 112, 32, 115, 116, 121, 108, 101, 61, 34, 109, 97, 114, 103, 105, 110, 45, 98, 111, 116]...\', original message: bytes can be at most 32766 in length; got 69432', 'caused_by': {'type': 'max_bytes_length_exceeded_exception', 'reason': 'bytes can be at most 32766 in length; got 69432'}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/a2853/Documents/Projects/nomad/nomad-FAIR/nomad/processing/data.py", line 1153, in _on_fail
assert not indexing_errors
AssertionError
- exception_hash: McNl80zkh9-gfy4qmwAvH9zJhyWw
- nomad.commit:
- nomad.deployment: devel
- nomad.entry_id: kIJXGigMZVz1AGFZnrFRRBJqRchS
- nomad.mainfile: MAFA_and_FAPI_perovskite_slot_die_coating.archive.json
- nomad.processing.logger: nomad.processing
- nomad.processing.parser: parsers/archive
- nomad.processing.proc: Entry
- nomad.processing.process: process_entry
- nomad.processing.process_status: RUNNING
- nomad.processing.process_worker_id: XHAg_8RERYyAFHS9UCUNXA
- nomad.service: worker
- nomad.upload_id: j_7y5LCqRYS4pvS5_C2yPA
- nomad.version: 1.2.2.dev175+gc38e05f01.d20231114
I talked with @himanel1. This seems to be new with the custom data section indexing. The image is a single string which seems to be too long for ES.
Best Micha