Skip to main content
Version: 0.11.0

Rerank prepare processor

Introduced 0.11.0

The rerank_prepare processor is an ingest-time processor that annotates chunks with document-level metadata and position scores. These annotations enable more effective search-time reranking by the multimodal_rerank response processor.

Syntax

{
"rerank_prepare": {
"field": "chunks",
"document_title_field": "title",
"document_url_field": "url",
"include_position_score": true
}
}

How it works

Before rerank_prepare:
┌────────────────────────────────┐
│ document: │
│ title: "Annual Report 2024" │
│ url: "s3://bucket/report.pdf"│
│ chunks: [ │
│ { text: "Executive...", │
│ chunk_index: 0 }, │
│ { text: "Revenue...", │
│ chunk_index: 1 }, │
│ { text: "Appendix...", │
│ chunk_index: 2 } │
│ ] │
└────────────────────────────────┘

rerank_prepare


After rerank_prepare:
┌──────────────────────────────────────────┐
│ chunks: [ │
│ { text: "Executive...", │
│ chunk_index: 0, │
│ document_title: "Annual Report 2024",│
│ document_url: "s3://bucket/...", │
│ chunk_position: 0, │
│ total_chunks: 3, │
│ position_score: 1.0 }, ◄── highest (first chunk)
│ { text: "Revenue...", │
│ chunk_position: 1, │
│ position_score: 0.22 }, ◄── decays exponentially
│ { text: "Appendix...", │
│ chunk_position: 2, │
│ position_score: 0.05 } ◄── lowest (last chunk)
│ ] │
└──────────────────────────────────────────┘

Position score

The position score uses exponential decay to encode the intuition that earlier chunks in a document are often more relevant (executive summary, abstract, introduction):

position_score = exp(-3.0 × (position / (total_chunks - 1)))

Position 0 (first): 1.00
Position 1 of 10: 0.72
Position 5 of 10: 0.22
Position 9 of 10: 0.05

Single-chunk documents always get 1.0.

This score is a hint for the search-time reranker, not a hard filter. The reranker combines it with semantic relevance to produce the final ranking.

Configuration parameters

ParameterData typeRequired/OptionalDescription
fieldStringOptionalSource field containing the chunk array. Default is chunks.
document_title_fieldStringOptionalDocument field to read the title from. Default is title.
document_url_fieldStringOptionalDocument field to read the URL from. Default is url.
include_position_scoreBooleanOptionalWhether to compute exponential decay position scores. Default is true.
descriptionStringOptionalA brief description of the processor.
tagStringOptionalAn identifier tag for the processor.

Output fields per chunk

FieldDescription
document_titleCopied from the parent document's title field.
document_urlCopied from the parent document's URL field.
chunk_positionZero-based position in the chunk list.
total_chunksTotal number of chunks in the document.
position_scoreExponential decay score [0.0, 1.0] (if include_position_score is true).

Using the processor

Example: Full pipeline with rerank preparation

PUT _ingest/pipeline/doc-with-rerank
{
"processors": [
{
"content_extract": {
"input_mode": "reference",
"source_uri_field": "source_uri",
"preserve_image_data": true,
"reference_config": { "region": "us-east-2" }
}
},
{
"ocr": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"provider_config": { "region": "us-east-2" }
}
},
{
"chunk": {
"algorithm": "recursive",
"chunk_size": 2000,
"chunk_overlap": 200
}
},
{
"rerank_prepare": {
"field": "chunks",
"document_title_field": "title",
"document_url_field": "source_uri"
}
},
{
"embed": {
"field": "chunks",
"model_id": "amazon.titan-embed-text-v2:0",
"provider": "bedrock",
"dimensions": 1024,
"provider_config": { "region": "us-east-2" }
}
}
]
}