Version: 0.11.0

Rerank prepare processor

Introduced 0.11.0

The rerank_prepare processor is an ingest-time processor that annotates chunks with document-level metadata and position scores. These annotations enable more effective search-time reranking by the multimodal_rerank response processor.

Syntax

{
  "rerank_prepare": {
    "field": "chunks",
    "document_title_field": "title",
    "document_url_field": "url",
    "include_position_score": true
  }
}

How it works

Before rerank_prepare:
┌────────────────────────────────┐
│ document:                      │
│   title: "Annual Report 2024"  │
│   url: "s3://bucket/report.pdf"│
│   chunks: [                    │
│     { text: "Executive...",    │
│       chunk_index: 0 },        │
│     { text: "Revenue...",      │
│       chunk_index: 1 },        │
│     { text: "Appendix...",     │
│       chunk_index: 2 }         │
│   ]                            │
└────────────────────────────────┘
                │
        rerank_prepare
                │
                ▼
After rerank_prepare:
┌──────────────────────────────────────────┐
│ chunks: [                                │
│   { text: "Executive...",                │
│     chunk_index: 0,                      │
│     document_title: "Annual Report 2024",│
│     document_url: "s3://bucket/...",     │
│     chunk_position: 0,                   │
│     total_chunks: 3,                     │
│     position_score: 1.0 },        ◄── highest (first chunk)
│   { text: "Revenue...",                  │
│     chunk_position: 1,                   │
│     position_score: 0.22 },       ◄── decays exponentially
│   { text: "Appendix...",                 │
│     chunk_position: 2,                   │
│     position_score: 0.05 }        ◄── lowest (last chunk)
│ ]                                        │
└──────────────────────────────────────────┘

Position score

The position score uses exponential decay to encode the intuition that earlier chunks in a document are often more relevant (executive summary, abstract, introduction):

position_score = exp(-3.0 × (position / (total_chunks - 1)))

Position 0 (first):  1.00
Position 1 of 10:    0.72
Position 5 of 10:    0.22
Position 9 of 10:    0.05

Single-chunk documents always get 1.0.

This score is a hint for the search-time reranker, not a hard filter. The reranker combines it with semantic relevance to produce the final ranking.

Configuration parameters

Parameter	Data type	Required/Optional	Description
`field`	String	Optional	Source field containing the chunk array. Default is `chunks`.
`document_title_field`	String	Optional	Document field to read the title from. Default is `title`.
`document_url_field`	String	Optional	Document field to read the URL from. Default is `url`.
`include_position_score`	Boolean	Optional	Whether to compute exponential decay position scores. Default is `true`.
`description`	String	Optional	A brief description of the processor.
`tag`	String	Optional	An identifier tag for the processor.

Output fields per chunk

Field	Description
`document_title`	Copied from the parent document's title field.
`document_url`	Copied from the parent document's URL field.
`chunk_position`	Zero-based position in the chunk list.
`total_chunks`	Total number of chunks in the document.
`position_score`	Exponential decay score `[0.0, 1.0]` (if `include_position_score` is `true`).

Using the processor

Example: Full pipeline with rerank preparation

PUT _ingest/pipeline/doc-with-rerank
{
  "processors": [
    {
      "content_extract": {
        "input_mode": "reference",
        "source_uri_field": "source_uri",
        "preserve_image_data": true,
        "reference_config": { "region": "us-east-2" }
      }
    },
    {
      "ocr": {
        "model_id": "anthropic.claude-3-haiku-20240307-v1:0",
        "provider": "bedrock",
        "provider_config": { "region": "us-east-2" }
      }
    },
    {
      "chunk": {
        "algorithm": "recursive",
        "chunk_size": 2000,
        "chunk_overlap": 200
      }
    },
    {
      "rerank_prepare": {
        "field": "chunks",
        "document_title_field": "title",
        "document_url_field": "source_uri"
      }
    },
    {
      "embed": {
        "field": "chunks",
        "model_id": "amazon.titan-embed-text-v2:0",
        "provider": "bedrock",
        "dimensions": 1024,
        "provider_config": { "region": "us-east-2" }
      }
    }
  ]
}

Syntax​

How it works​

Position score​

Configuration parameters​

Output fields per chunk​

Using the processor​

Example: Full pipeline with rerank preparation​