Truncate hits processor
The truncate_hits response processor discards returned search hits after a given hit count is reached. The truncate_hits processor is designed to work with the oversample request processor but may be used on its own.
The target_size parameter (which specifies where to truncate) is optional. If it is not specified, then Lucenia uses the original_size variable set by the
oversample processor (if available).
The following is a common usage pattern:
- Add the oversampleprocessor to a request pipeline to fetch a larger set of results.
- In the response pipeline, apply a reranking processor (which may promote results from beyond the originally requested top N) or the collapseprocessor (which may discard results after deduplication).
- Apply the truncateprocessor to return (at most) the originally requested number of hits.
Request fields
The following table lists all request fields.
| Field | Data type | Description | 
|---|---|---|
| target_size | Integer | The maximum number of search hits to return (>=0). If not specified, the processor will try to read the original_sizevariable and will fail if it is not available. Optional. | 
| context_prefix | String | May be used to read the original_sizevariable from a specific scope in order to avoid collisions. Optional. | 
| tag | String | The processor's identifier. Optional. | 
| description | String | A description of the processor. Optional. | 
| ignore_failure | Boolean | If true, Lucenia ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default isfalse. | 
Example
The following example demonstrates using a search pipeline with a truncate processor.
Setup
Create an index named my_index containing many documents:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
Creating a search pipeline
The following request creates a search pipeline named my_pipeline with a truncate_hits response processor that discards hits after the first five:
PUT /_search/pipeline/my_pipeline 
{
  "response_processors": [
    {
      "truncate_hits" : {
        "tag" : "truncate_1",
        "description" : "This processor will discard results after the first 5.",
        "target_size" : 5
      }
    }
  ]
}
Using a search pipeline
Search for documents in my_index without a search pipeline:
POST /my_index/_search
{
  "size": 8
}
The response contains eight hits:
Response
{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 1"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 2"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 3"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 4"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 5"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 6"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 7"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 8"
          }
        }
      }
    ]
  }
}
To search with a pipeline, specify the pipeline name in the search_pipeline query parameter:
POST /my_index/_search?search_pipeline=my_pipeline
{
  "size": 8
}
The response contains only 5 hits, even though 8 were requested and 10 were available:
Response
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 1"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 2"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 3"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 4"
          }
        }
      },
      {
        "_index" : "my_index",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "doc" : {
            "title" : "document 5"
          }
        }
      }
    ]
  }
}
Oversample, collapse, and truncate hits
The following is a more realistic example in which you use oversample to request many candidate documents, use collapse to remove documents that duplicate a particular field (to get more diverse results), and then use truncate to return the originally requested document count (to avoid returning a large result payload from the cluster).
Setup
Create many documents containing a field that you'll use for collapsing:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
Create a pipeline that collapses only on the color field:
PUT /_search/pipeline/collapse_pipeline
{
  "response_processors": [
    {
      "collapse" : {
        "field": "color"
      }
    }
  ]
}
Create another pipeline that oversamples, collapses, and then truncates results:
PUT /_search/pipeline/oversampling_collapse_pipeline
{
  "request_processors": [
    {
      "oversample": {
        "sample_factor": 3
      }
    }
  ],
  "response_processors": [
    {
      "collapse" : {
        "field": "color"
      }
    },
    {
      "truncate_hits": {
        "description": "Truncates back to the original size before oversample increased it."
      }
    }
  ]
}
Collapse without oversample
In this example, you request the top three documents before collapsing on the color field. Because the first two documents have the same color, the second one is discarded, and the request returns the first and third documents:
POST /my_index/_search?search_pipeline=collapse_pipeline
{
  "size": 3
}
Response
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 1",
          "color" : "blue"
        }
      },
      {
        "_index" : "my_index",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 3",
          "color" : "red"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [ ]
  }
}
Oversample, collapse, and truncate
Now you will use the oversampling_collapse_pipeline, which requests the top 9 documents (multiplying the size by 3), deduplicates by color, and then returns the top 3 hits:
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
{
  "size": 3
}
Response
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 1",
          "color" : "blue"
        }
      },
      {
        "_index" : "my_index",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 3",
          "color" : "red"
        }
      },
      {
        "_index" : "my_index",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 5",
          "color" : "yellow"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [ ]
  }
}