Skip to main content
Version: 0.11.0

Retrieval grounding processor

Introduced 0.11.0

The retrieval_grounding processor is a search response processor that enriches search hits with provenance, spatial context, and chunk context metadata. This makes search results explainable and traceable back to their source documents and locations -- essential for RAG pipelines where you need to cite sources.

Syntax

PUT /_search/pipeline/grounded-search
{
"response_processors": [
{
"retrieval_grounding": {
"target_field": "_grounding",
"include_provenance": true,
"include_spatial_context": true,
"include_chunk_context": true,
"context_snippet_chars": 200
}
}
]
}

How it works

Search hit (raw):
┌────────────────────────────────────────────┐
│ _source: { │
│ title: "Annual Report", │
│ source_uri: "s3://bucket/report.pdf", │
│ chunks: [{ │
│ text: "Revenue grew 12%...", │
│ chunk_index: 3, │
│ bbox: { west: -120, east: -119, ... }, │
│ pixel_window: { x: 512, y: 0, ... } │
│ }] │
│ } │
└────────────────────────┬───────────────────┘

retrieval_grounding


Search hit (enriched):
┌────────────────────────────────────────────┐
│ _source: { │
│ ..., │
│ _grounding: { │
│ source_provenance: { │
│ parent_uri: "s3://bucket/report.pdf",│
│ document_title: "Annual Report", │
│ page_number: 4, │
│ ingest_timestamp: "2024-..." │
│ }, │
│ spatial_context: { │
│ bbox: { west: -120, ... }, │
│ pixel_window: { x: 512, ... }, │
│ crs: "EPSG:4326" │
│ }, │
│ chunk_context: { │
│ chunk_position: 3, │
│ total_chunks: 12, │
│ preceding_text: "...prior text...", │
│ following_text: "...next text..." │
│ } │
│ } │
│ } │
└────────────────────────────────────────────┘

Configuration parameters

ParameterData typeRequired/OptionalDescription
target_fieldStringOptionalField name for grounding metadata in each hit. Default is _grounding.
include_provenanceBooleanOptionalInclude source_provenance object with parent URI, title, page number. Default is true.
include_spatial_contextBooleanOptionalInclude spatial_context with bounding box, pixel window, CRS. Default is true.
include_chunk_contextBooleanOptionalInclude chunk_context with position and surrounding text snippets. Default is true.
context_snippet_charsIntegerOptionalMaximum characters for preceding/following text snippets. Default is 200.
generate_preview_urlsBooleanOptionalGenerate presigned S3 URLs for tile images. Requires S3 reference registry. Default is false.
preview_uri_fieldStringOptionalSource field containing the tile URI for presigning. Default is tile_uri.
preview_expiry_secondsIntegerOptionalPresigned URL expiry duration in seconds. Default is 3600 (1 hour).
reference_configObjectOptionalS3/reference resolver configuration for presigning.
tagStringOptionalThe processor's identifier.
descriptionStringOptionalA description of the processor.

Output structure

Source provenance

{
"source_provenance": {
"parent_uri": "s3://my-bucket/reports/annual-report.pdf",
"document_title": "2024 Annual Report",
"page_number": 4,
"ingest_timestamp": "2024-06-15T10:30:00Z"
}
}

Spatial context

Present for geospatial imagery tiles:

{
"spatial_context": {
"bbox": {
"west": -120.5,
"east": -120.0,
"south": 35.0,
"north": 35.5
},
"pixel_window": {
"x": 512,
"y": 0,
"width": 512,
"height": 512
},
"crs": "EPSG:4326",
"preview_url": "https://s3.amazonaws.com/bucket/tile.jpg?X-Amz-..."
}
}

Chunk context

{
"chunk_context": {
"chunk_position": 3,
"total_chunks": 12,
"preceding_text": "...end of the previous chunk providing context...",
"following_text": "...start of the next chunk for continuity..."
}
}

Using the processor

PUT /_search/pipeline/grounded-semantic
{
"request_processors": [
{
"query_embedding": {
"model_id": "amazon.titan-embed-text-v2:0",
"provider": "bedrock",
"dimensions": 1024,
"mode": "template",
"query_template": "{\"query\":{\"nested\":{\"path\":\"chunks\",\"query\":{\"knn\":{\"chunks.embedding\":{\"vector\":${embedding},\"k\":10}}},\"inner_hits\":{\"_source\":[\"chunks.text\"],\"size\":3}}}}",
"provider_config": { "region": "us-east-2" }
}
}
],
"response_processors": [
{
"retrieval_grounding": {
"include_provenance": true,
"include_chunk_context": true,
"context_snippet_chars": 150
}
}
]
}

Example 2: Geospatial search with tile previews

PUT /_search/pipeline/geo-grounded
{
"request_processors": [
{
"query_embedding": {
"model_id": "amazon.titan-embed-image-v1",
"provider": "bedrock",
"dimensions": 1024,
"mode": "template",
"query_template": "{\"query\":{\"nested\":{\"path\":\"chunks\",\"query\":{\"knn\":{\"chunks.embedding\":{\"vector\":${embedding},\"k\":10}}}}}}",
"provider_config": { "region": "us-east-1" }
}
}
],
"response_processors": [
{
"retrieval_grounding": {
"include_provenance": true,
"include_spatial_context": true,
"generate_preview_urls": true,
"preview_uri_field": "parent_uri",
"preview_expiry_seconds": 3600,
"reference_config": {
"region": "us-west-2"
}
}
}
]
}