Improved the suggestion feature (!444) · Merge requests · nomad-lab / nomad-FAIR

Lauri Himanen requested to merge suggestions into v1.0.0 Nov 01, 2021

In the GUI we offer suggestions for certain index fields. These are implemented using the ES Completion suggester feature which is a special feature built exactly for this kind of task.

Previously we were defining a new ES field with the completion type in order to create these suggestions. This way the indexing of suggestion values comes pretty much for free and the index size is not affected too much. This means, however, that only one suggestion value was supported per quantity, which makes it impossible to make autocompletion suggestions if the user types e.g. a word in the middle of a keyword or string: e.g. typing salt would not create a suggestion for rock salt. Technically this is because the Completion suggester is based on a special graph data structure that can only start suggesting values from the beginning of a string.

To overcome this limitation, the default approach is to augment the suggestion values by adding several suggestion values at index time (examples of this discussion here, here and here). This merge request implements this mechanism by storing suggestions under <quantity name>__suggestion when the values need to be tokenized.

The suggestion mechanism is selected by specifying a predefined string option (suggestion="simple": the old behaviour, suggestion="default": tokenization using whitespaces and underscores, suggestion="formula": tokenization using formula fragments, suggestion=<function>: custom tokenization function). The old mechanism of storing a single suggestion string under a field is used when suggestion="simple".

The downside of using a new index attribute instead of a field for each suggestion is that the index size becomes bigger and that the source documents look quite busy with all this new suggestion data. By default, the suggestion values are completely excluded from the source documents in all metadata searches. Depending on our experience with the index size, we can think about limiting suggestions, using a separate index for suggestions (really nasty), or experimenting with other suggestion mechanisms (here is a really good article about all reasonable options).

Edited Nov 03, 2021 by Lauri Himanen

Improved the suggestion feature

Merge request reports