Rerank prepare processor
The rerank_prepare processor is an ingest-time processor that annotates chunks with document-level metadata and position scores. These annotations enable more effective search-time reranking by the multimodal_rerank response processor.
Syntax
{
"rerank_prepare": {
"field": "chunks",
"document_title_field": "title",
"document_url_field": "url",
"include_position_score": true
}
}
How it works
Before rerank_prepare:
┌────────────────────────────────┐
│ document: │
│ title: "Annual Report 2024" │
│ url: "s3://bucket/report.pdf"│
│ chunks: [ │
│ { text: "Executive...", │
│ chunk_index: 0 }, │
│ { text: "Revenue...", │
│ chunk_index: 1 }, │
│ { text: "Appendix...", │
│ chunk_index: 2 } │
│ ] │
└────────────────────────────────┘
│
rerank_prepare
│
▼
After rerank_prepare:
┌──────────────────────────────────────────┐
│ chunks: [ │
│ { text: "Executive...", │
│ chunk_index: 0, │
│ document_title: "Annual Report 2024",│
│ document_url: "s3://bucket/...", │
│ chunk_position: 0, │
│ total_chunks: 3, │
│ position_score: 1.0 }, ◄── highest (first chunk)
│ { text: "Revenue...", │
│ chunk_position: 1, │
│ position_score: 0.22 }, ◄── decays exponentially
│ { text: "Appendix...", │
│ chunk_position: 2, │
│ position_score: 0.05 } ◄── lowest (last chunk)
│ ] │
└──────────────────────────────────────────┘
Position score
The position score uses exponential decay to encode the intuition that earlier chunks in a document are often more relevant (executive summary, abstract, introduction):
position_score = exp(-3.0 × (position / (total_chunks - 1)))
Position 0 (first): 1.00
Position 1 of 10: 0.72
Position 5 of 10: 0.22
Position 9 of 10: 0.05
Single-chunk documents always get 1.0.
This score is a hint for the search-time reranker, not a hard filter. The reranker combines it with semantic relevance to produce the final ranking.
Configuration parameters
| Parameter | Data type | Required/Optional | Description |
|---|---|---|---|
field | String | Optional | Source field containing the chunk array. Default is chunks. |
document_title_field | String | Optional | Document field to read the title from. Default is title. |
document_url_field | String | Optional | Document field to read the URL from. Default is url. |
include_position_score | Boolean | Optional | Whether to compute exponential decay position scores. Default is true. |
description | String | Optional | A brief description of the processor. |
tag | String | Optional | An identifier tag for the processor. |
Output fields per chunk
| Field | Description |
|---|---|
document_title | Copied from the parent document's title field. |
document_url | Copied from the parent document's URL field. |
chunk_position | Zero-based position in the chunk list. |
total_chunks | Total number of chunks in the document. |
position_score | Exponential decay score [0.0, 1.0] (if include_position_score is true). |
Using the processor
Example: Full pipeline with rerank preparation
PUT _ingest/pipeline/doc-with-rerank
{
"processors": [
{
"content_extract": {
"input_mode": "reference",
"source_uri_field": "source_uri",
"preserve_image_data": true,
"reference_config": { "region": "us-east-2" }
}
},
{
"ocr": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"provider_config": { "region": "us-east-2" }
}
},
{
"chunk": {
"algorithm": "recursive",
"chunk_size": 2000,
"chunk_overlap": 200
}
},
{
"rerank_prepare": {
"field": "chunks",
"document_title_field": "title",
"document_url_field": "source_uri"
}
},
{
"embed": {
"field": "chunks",
"model_id": "amazon.titan-embed-text-v2:0",
"provider": "bedrock",
"dimensions": 1024,
"provider_config": { "region": "us-east-2" }
}
}
]
}