Version: 0.9.0

Paginate results

You can use the following methods to paginate search results in Lucenia:

The from and size parameters
The scroll search operation
The search_after parameter
Point in Time with search_after

The `from` and `size` parameters

The from and size parameters return results one page at a time.

The from parameter is the document number from which you want to start showing the results. The size parameter is the number of results that you want to show. Together, they let you return a subset of the search results.

For example, if the value of size is 10 and the value of from is 0, you see the first 10 results. If you change the value of from to 10, you see the next 10 results (because the results are zero-indexed). So if you want to see results starting from result 11, from must be 10.

GET shakespeare/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match": {
      "play_name": "Hamlet"
    }
  }
}

Use the following formula to calculate the from parameter relative to the page number:

from = size * (page_number - 1)

Each time the user chooses the next page of the results, your application needs to run the same search query with an incremented from value.

You can also specify the from and size parameters in the search URI:

GET shakespeare/_search?from=0&size=10

If you only specify the size parameter, the from parameter defaults to 0.

Querying for pages deep in your results can have a significant performance impact, so Lucenia limits this approach to 10,000 results.

The from and size parameters are stateless, so the results are based on the latest available data. This can cause inconsistent pagination. For example, assume a user stays on the first page of the results and then navigates to the second page. During that time, a new document relevant to the user's search is indexed and shows up on the first page. In this scenario, the last result on the first page is pushed to the second page, and the user sees duplicate results (that is, the first and second pages both display that last result).

Use the scroll operation for consistent pagination. The scroll operation keeps a search context open for a certain period of time. Any data changes do not affect the results during that time.

Scroll search

The from and size parameters allow you to paginate your search results but with a limit of 10,000 results at a time.

If you need to request volumes of data larger than 1 PB from, for example, a machine learning job, use the scroll operation instead. The scroll operation allows you to request an unlimited number of results.

To use the scroll operation, add a scroll parameter to the request header with a search context telling Lucenia for how long you need to keep scrolling. This search context needs to be long enough to process a single batch of results.

To set the number of results that you want returned for each batch, use the size parameter:

GET shakespeare/_search?scroll=10m
{
  "size": 10000
}

Lucenia caches the results and returns a scroll ID that you can use to access them in batches:

"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAUWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ=="

Pass this scroll ID to the scroll operation to obtain the next batch of results:

GET _search/scroll
{
  "scroll": "10m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAUWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ=="
}

Using this scroll ID, you get results in batches of 10,000 as long as the search context is still open. Typically, the scroll ID does not change between requests, but it can change, so make sure to always use the latest scroll ID. If you don't send the next scroll request within the set search context, the scroll operation does not return any results.

If you expect billions of results, use a sliced scroll. Slicing allows you to perform multiple scroll operations for the same request but in parallel. Set the ID and the maximum number of slices for the scroll:

GET shakespeare/_search?scroll=10m
{
  "slice": {
    "id": 0,
    "max": 10
  },
  "query": {
    "match_all": {}
  }
}

With a single scroll ID, you receive 10 results. You can have up to 10 IDs. Perform the same command with the ID equal to 1:

GET shakespeare/_search?scroll=10m
{
  "slice": {
    "id": 1,
    "max": 10
  },
  "query": {
    "match_all": {}
  }
}

Close the search context when you’re done scrolling, because it continues to consume computing resources until the timeout:

DELETE _search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAcWdmpUZDhnRFBUcWFtV21nMmFwUGJEQQ==

Sample Response

{
  "succeeded": true,
  "num_freed": 1
}

Use the following request to close all open scroll contexts:

DELETE _search/scroll/_all

The scroll operation corresponds to a specific timestamp. It doesn't consider documents added after that timestamp as potential results.

Because open search contexts consume a lot of memory, we suggest you don't use the scroll operation for frequent user queries that don't need the search context to be open. Instead, use the sort parameter with the search_after parameter to scroll responses for user queries.

The `search_after` parameter

The search_after parameter provides a live cursor that uses the previous page's results to obtain the next page's results. It is similar to the scroll operation in that it is meant to scroll many queries in parallel.

For example, the following query sorts all lines from the play "Hamlet" by the speech number and then the ID and retrieves the first three results:

GET shakespeare/_search
{
  "size": 3,
  "query": {
    "match": {
      "play_name": "Hamlet"
    }
  },
  "sort": [
    { "speech_number": "asc" },
    { "_id": "asc" } 
  ]
}

The response contains the sort array of values for each document:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4244,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "32435",
        "_score" : null,
        "_source" : {
          "type" : "line",
          "line_id" : 32436,
          "play_name" : "Hamlet",
          "speech_number" : 1,
          "line_number" : "1.1.1",
          "speaker" : "BERNARDO",
          "text_entry" : "Whos there?"
        },
        "sort" : [
          1,
          "32435"
        ]
      },
      {
        "_index" : "shakespeare",
        "_id" : "32634",
        "_score" : null,
        "_source" : {
          "type" : "line",
          "line_id" : 32635,
          "play_name" : "Hamlet",
          "speech_number" : 1,
          "line_number" : "1.2.1",
          "speaker" : "KING CLAUDIUS",
          "text_entry" : "Though yet of Hamlet our dear brothers death"
        },
        "sort" : [
          1,
          "32634"
        ]
      },
      {
        "_index" : "shakespeare",
        "_id" : "32635",
        "_score" : null,
        "_source" : {
          "type" : "line",
          "line_id" : 32636,
          "play_name" : "Hamlet",
          "speech_number" : 1,
          "line_number" : "1.2.2",
          "speaker" : "KING CLAUDIUS",
          "text_entry" : "The memory be green, and that it us befitted"
        },
        "sort" : [
          1,
          "32635"
        ]
      }
    ]
  }
}

You can use the last result's sort values to retrieve the next result by using the search_after parameter:

GET shakespeare/_search
{
  "size": 10,
  "query": {
    "match": {
      "play_name": "Hamlet"
    }
  },
  "search_after": [ 1, "32635"],
  "sort": [
    { "speech_number": "asc" },
    { "_id": "asc" } 
  ]
}

Unlike the scroll operation, the search_after parameter is stateless, so the document order may change because of documents being indexed or deleted.

Point in Time with `search_after`

Point in Time (PIT) with search_after is the preferred pagination method in Lucenia, especially for deep pagination. It bypasses the limitations of all other methods because it operates on a dataset that is frozen in time, it is not bound to a query, and it supports consistent pagination going forward and backward. To learn more, see Point in Time.