Search aggregation optimization
The new search GUI requires a slightly different set of aggregated data than the old GUI. The new search API endpoint (entries/query
) is not optimal for some of the queries that are now required, consider e.g. the following case:
Imagine a dropdown for structure_name
. Without any filters applied, we can populate the options in it by simply doing the search and aggregations together in a single API call. The resulting query would look something like this:
{
query: {},
aggregations: [
"structure_name": {terms: ...}
]
}
Let's say the resulting aggregation data would contain the entries: ["diamond", "perovskite"]
.
Now let's apply a filter by selecting "diamond" from the dropdown. We can combine the aggregation and query in a single API call like this:
{
query: {
"structure_name": ["diamond"]
},
aggregations: [
"structure_name": {terms: ...}
]
}
When executing this query, the aggregation data will only contain ["diamond"]
, as the filters are applied before doing the aggregation. If we always use a fixed set of search options to populate the GUI and don't allow OR queries, this would not be a problem (like in the old GUI, where if you e.g. click system_type="bulk"
, all the other fixed options just become unavailable). But If we want to update the available options in our dropdown based on the aggregation results (like the new GUI does for dropdowns, checkboxes, etc. in order to also show "perovskite" if the other filters allow this), we have to do a separate aggregation query for each quantity, where the list of filters is modified (=any filters targeting the aggregated quantity are removed, but filters targeting other quantities still affect the returned results.).
In order to minimize the number of API calls and stress on ElasticSearch, we should think about combining these queries in the API endpoint or changing the GUI behaviour.