Commit dae825b7 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Handling of lost scroll_id.

parent 660c7405
Pipeline #47967 passed with stages
in 28 minutes and 8 seconds
......@@ -136,12 +136,14 @@ class RepoCalcsResource(Resource):
The pagination parameters allows determine which page to return via the
``page`` and ``per_page`` parameters. Pagination however, is limited to the first
100k (depending on ES configuration) hits. An alternative to pagination is to use
``scroll`` and ``scroll_id``. With ``scroll`` you will get a ``scroll_id`` on
the first request. Each call with ``scroll`` and the respective ``scroll_id`` will
return the next ``per_page`` (here the default is 1000) results. Scroll however,
ignores ordering and does not return aggregations. The scroll view used in the
background will stay alive for 1 minute between requests.
100k (depending on ES configuration) hits.
An alternative to pagination is to use ``scroll`` and ``scroll_id``. With ``scroll``
you will get a ``scroll_id`` on the first request. Each call with ``scroll`` and
the respective ``scroll_id`` will return the next ``per_page`` (here the default is 1000)
results. Scroll however, ignores ordering and does not return aggregations.
The scroll view used in the background will stay alive for 1 minute between requests.
If the given ``scroll_id`` is not available anymore, a HTTP 400 is raised.
The search will return aggregations on a predefined set of quantities. Aggregations
will tell you what quantity values exist and how many entries match those values.
......@@ -235,6 +237,8 @@ class RepoCalcsResource(Resource):
else:
scroll_id = None
total, results, aggregations, metrics = search.aggregate_search(q=q, **data)
except search.ScrollIdNotFound:
abort(400, 'The given scroll_id does not exist.')
except KeyError as e:
abort(400, str(e))
......
......@@ -21,6 +21,7 @@ from elasticsearch_dsl import Document, InnerDoc, Keyword, Text, Date, \
Object, Boolean, Search, Q, A, analyzer, tokenizer
from elasticsearch_dsl.document import IndexMeta
import elasticsearch.helpers
from elasticsearch.exceptions import NotFoundError
from datetime import datetime
from nomad import config, datamodel, infrastructure, datamodel, coe_repo, utils
......@@ -36,6 +37,9 @@ class AlreadyExists(Exception): pass
class ElasticSearchError(Exception): pass
class ScrollIdNotFound(Exception): pass
class User(InnerDoc):
@classmethod
......@@ -266,7 +270,9 @@ def scroll_search(
and no aggregation information is given.
Scrolling is done by calling this function again and again with the same ``scroll_id``.
Each time, this function will return the next batch of search results.
Each time, this function will return the next batch of search results. If the
``scroll_id`` is not available anymore, a new ``scroll_id`` is assigned and scrolling
starts from the beginning again.
See see :func:`aggregate_search` for additional ``kwargs``
......@@ -276,6 +282,7 @@ def scroll_search(
size: The batch size in number of hits.
scroll: The time the scroll should be kept alive (i.e. the time between requests
to this method) in ES time units. Default is 5 minutes.
Returns: A tuple with ``scroll_id``, total amount of hits, and result list.
"""
es = infrastructure.elastic_client
......@@ -289,7 +296,10 @@ def scroll_search(
# no results for search query
return None, 0, []
else:
resp = es.scroll(scroll_id, scroll=scroll) # pylint: disable=E1123
try:
resp = es.scroll(scroll_id, scroll=scroll) # pylint: disable=E1123
except NotFoundError:
raise ScrollIdNotFound()
total = resp['hits']['total']
results = [hit['_source'] for hit in resp['hits']['hits']]
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment