Multimodal rerank processor
The multimodal_rerank processor is a search response processor that reranks the top-N search results using a multimodal inference model (such as Claude 3 on Bedrock). It scores each hit's relevance to the query by sending query-passage pairs to a vision-capable LLM and reordering results by the returned relevance scores.
This is particularly useful for cross-modal search (text query against image tiles) and for improving precision on the first page of results.
Syntax
PUT /_search/pipeline/reranked-search
{
"response_processors": [
{
"multimodal_rerank": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"content_field": "text",
"top_n": 10,
"provider_config": {
"region": "us-east-2"
}
}
}
]
}
How it works
Initial search results (by vector similarity):
┌─────────────────────────────────────┐
│ Hit 1: score 0.92 "Budget report" │ ← vector match but not relevant
│ Hit 2: score 0.89 "Revenue growth" │ ← highly relevant
│ Hit 3: score 0.87 "Staff directory" │ ← not relevant
│ Hit 4: score 0.85 "Q4 performance" │ ← highly relevant
│ Hit 5: score 0.83 "Office memo" │ ← not relevant
│ ...top_n hits sent to reranker... │
└─────────────────────┬───────────────┘
│
multimodal_rerank processor
│
For each hit, sends to Claude:
"Query: quarterly financial results
Passage: [hit text/image]
Score relevance 0.0-1.0"
│
▼
Reranked results:
┌─────────────────────────────────────┐
│ Hit 1: score 0.95 "Revenue growth" │ ← promoted
│ Hit 2: score 0.91 "Q4 performance" │ ← promoted
│ Hit 3: score 0.42 "Budget report" │ ← demoted
│ Hit 4: score 0.15 "Staff directory" │ ← demoted
│ Hit 5: score 0.08 "Office memo" │ ← demoted
└─────────────────────────────────────┘
Configuration parameters
| Parameter | Data type | Required/Optional | Description |
|---|---|---|---|
model_id | String | Required | Inference model identifier (e.g., anthropic.claude-3-haiku-20240307-v1:0). |
provider | String | Required | Inference provider: bedrock or http. |
content_field | String | Optional | Source field in each hit containing passage text. Default is text. |
image_field | String | Optional | Source field in each hit containing base64 image data. Enables cross-modal reranking. |
top_n | Integer | Optional | Number of top results to rerank. Results beyond top_n keep their original order. Default is 10. |
provider_config | Object | Optional | Provider-specific configuration (region, credentials). |
tag | String | Optional | The processor's identifier. |
description | String | Optional | A description of the processor. |
Query input
The processor reads query content from ext.query_embedding in the search request (the same extension used by the query_embedding request processor). This means the processor knows the original query text and/or image.
Scoring
The processor sends each query-hit pair to the inference model with the rerank task type. The model returns a relevance score between 0.0 and 1.0:
- 1.0 = perfectly relevant
- 0.0 = completely irrelevant
Hits are stable-sorted by descending score (original position breaks ties). Hit scores are updated in-place.
Using the processor
Example 1: Text reranking
PUT /_search/pipeline/reranked-semantic
{
"request_processors": [
{
"query_embedding": {
"model_id": "amazon.titan-embed-text-v2:0",
"provider": "bedrock",
"dimensions": 1024,
"mode": "template",
"query_template": "{\"query\":{\"nested\":{\"path\":\"chunks\",\"query\":{\"knn\":{\"chunks.embedding\":{\"vector\":${embedding},\"k\":20}}},\"inner_hits\":{\"_source\":[\"chunks.text\"],\"size\":3}}}}",
"provider_config": { "region": "us-east-2" }
}
}
],
"response_processors": [
{
"multimodal_rerank": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"content_field": "text",
"top_n": 10,
"provider_config": { "region": "us-east-2" }
}
}
]
}
Example 2: Cross-modal reranking (text query vs image tiles)
PUT /_search/pipeline/reranked-geo
{
"request_processors": [
{
"query_embedding": {
"model_id": "amazon.titan-embed-image-v1",
"provider": "bedrock",
"dimensions": 1024,
"mode": "template",
"query_template": "{\"query\":{\"nested\":{\"path\":\"chunks\",\"query\":{\"knn\":{\"chunks.embedding\":{\"vector\":${embedding},\"k\":20}}}}}}",
"provider_config": { "region": "us-east-1" }
}
}
],
"response_processors": [
{
"multimodal_rerank": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"image_field": "image_data",
"top_n": 5,
"provider_config": { "region": "us-east-2" }
}
}
]
}
Set top_n to a small number (5-10) to balance reranking quality against latency. Each hit requires one inference call to the LLM.
Example 3: Full pipeline (embed + rerank + grounding)
PUT /_search/pipeline/full-search
{
"request_processors": [
{
"query_embedding": {
"model_id": "amazon.titan-embed-text-v2:0",
"provider": "bedrock",
"dimensions": 1024,
"mode": "template",
"query_template": "{\"query\":{\"nested\":{\"path\":\"chunks\",\"query\":{\"knn\":{\"chunks.embedding\":{\"vector\":${embedding},\"k\":20}}},\"inner_hits\":{\"_source\":[\"chunks.text\"],\"size\":3}}}}",
"provider_config": { "region": "us-east-2" }
}
}
],
"response_processors": [
{
"multimodal_rerank": {
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"provider": "bedrock",
"content_field": "text",
"top_n": 10,
"provider_config": { "region": "us-east-2" }
}
},
{
"retrieval_grounding": {
"include_provenance": true,
"include_chunk_context": true
}
}
]
}