Late Interaction Search
Late interaction search is a powerful technique for retrieving documents based on multiple embeddings per document, typically produced by models like ColBERT. Unlike standard k-NN search which sorts documents by a single vector, late interaction models allow sorting by multiple vectors, which can improve semantic matching in many cases.
How Late Interaction Works
The late interaction model uses the MaxSim (Maximum Similarity) algorithm to calculate the relevance score of a document for a query :
Where:
- is the set of embeddings for the query.
- is the set of embeddings for the document.
- is the similarity function (typically dot product or cosine similarity, although others are supported).
Getting started with Late Interaction
To use late interaction search, follow these steps:
1. Enable Late Interaction factories
Late interaction search leverages system-generated search pipeline processors. To enable them, you must add their factories to your cluster settings:
PUT _cluster/settings
{
"persistent": {
"search.pipeline.system_generated.enabled_factories": [
"late_interaction_oversample_factory",
"late_interaction_rerank_factory"
]
}
}
Once enabled, these processors will be automatically injected into search requests that contain the late_interaction
search extension.
2. Create an index with nested k-NN fields
In version 0.11.0, Lucenia supports late interaction by storing multiple vectors per document within a nested field. Each
nested document contains a single knn_vector:
PUT my-late-interaction-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_nested_field": {
"type": "nested",
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene"
}
}
}
}
}
}
}
3. Index documents with multiple vectors
Index your documents by providing an array of objects in the nested field, where each object contains one embedding vector:
POST _bulk
{ "index": { "_index": "my-late-interaction-index", "_id": "1" } }
{ "my_nested_field": [ { "my_vector": [0.1, 0.2, ...] }, { "my_vector": [0.3, 0.4, ...] } ] }
4. Perform a search
To perform a late interaction search, use a nested k-NN query to find initial candidates,
and include the late_interaction search extension in the ext section of your request. The extension provides the
full set of query vectors for reranking:
GET my-late-interaction-index/_search
{
"query": {
"nested": {
"path": "my_nested_field",
"query": {
"knn": {
"my_nested_field.my_vector": {
"vector": [0.1, 0.2, ...],
"k": 100
}
}
}
}
},
"ext": {
"late_interaction": {
"vector_field_path": "my_nested_field.my_vector",
"vectors": [
[0.1, 0.2, ...],
[0.3, 0.4, ...]
],
"candidates": 100
}
}
}
Search Extension Parameters
The late_interaction search extension supports the following parameters:
vector_field_path: (Required) The path to the k-NN vector field within the nested structure.vectors: (Required) An array of query vectors (the query embeddings).candidates: (Optional) The number of candidates to fetch and rerank. If provided, the initial k-NN query'skand the searchsizewill be automatically updated to this value by the oversample processor.vector_field_space_type: (Optional) The space type to use for similarity calculation. If not provided, it is inferred from the field mapping.index_to_vector_field_path_map: (Optional) A map of index names to vector field paths, useful for searches across multiple indices with different structures.
Scoring
The late_interaction_rerank_processor recalculates the score for each document in the top candidates
using the MaxSim algorithm with the full set of query vectors provided in the ext section. The final search results
are then sorted by these new scores, and the result set is trimmed to the original requested size.