Skip to main content
Version: 0.11.0

Late Interaction Search

Late interaction search is a powerful technique for retrieving documents based on multiple embeddings per document, typically produced by models like ColBERT. Unlike standard k-NN search which sorts documents by a single vector, late interaction models allow sorting by multiple vectors, which can improve semantic matching in many cases.

How Late Interaction Works

The late interaction model uses the MaxSim (Maximum Similarity) algorithm to calculate the relevance score of a document DD for a query QQ:

MaxSim(Q,D)=qQmaxdD(sim(q,d))MaxSim(Q, D) = \sum_{q \in Q} \max_{d \in D} (sim(q, d))

Where:

  • QQ is the set of embeddings for the query.
  • DD is the set of embeddings for the document.
  • sim(q,d)sim(q, d) is the similarity function (typically dot product or cosine similarity, although others are supported).

Getting started with Late Interaction

To use late interaction search, follow these steps:

1. Enable Late Interaction factories

Late interaction search leverages system-generated search pipeline processors. To enable them, you must add their factories to your cluster settings:

PUT _cluster/settings
{
"persistent": {
"search.pipeline.system_generated.enabled_factories": [
"late_interaction_oversample_factory",
"late_interaction_rerank_factory"
]
}
}

Once enabled, these processors will be automatically injected into search requests that contain the late_interaction search extension.

2. Create an index with nested k-NN fields

In version 0.11.0, Lucenia supports late interaction by storing multiple vectors per document within a nested field. Each nested document contains a single knn_vector:

PUT my-late-interaction-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_nested_field": {
"type": "nested",
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene"
}
}
}
}
}
}
}

3. Index documents with multiple vectors

Index your documents by providing an array of objects in the nested field, where each object contains one embedding vector:

POST _bulk
{ "index": { "_index": "my-late-interaction-index", "_id": "1" } }
{ "my_nested_field": [ { "my_vector": [0.1, 0.2, ...] }, { "my_vector": [0.3, 0.4, ...] } ] }

To perform a late interaction search, use a nested k-NN query to find initial candidates, and include the late_interaction search extension in the ext section of your request. The extension provides the full set of query vectors for reranking:

GET my-late-interaction-index/_search
{
"query": {
"nested": {
"path": "my_nested_field",
"query": {
"knn": {
"my_nested_field.my_vector": {
"vector": [0.1, 0.2, ...],
"k": 100
}
}
}
}
},
"ext": {
"late_interaction": {
"vector_field_path": "my_nested_field.my_vector",
"vectors": [
[0.1, 0.2, ...],
[0.3, 0.4, ...]
],
"candidates": 100
}
}
}

Search Extension Parameters

The late_interaction search extension supports the following parameters:

  • vector_field_path: (Required) The path to the k-NN vector field within the nested structure.
  • vectors: (Required) An array of query vectors (the query embeddings).
  • candidates: (Optional) The number of candidates to fetch and rerank. If provided, the initial k-NN query's k and the search size will be automatically updated to this value by the oversample processor.
  • vector_field_space_type: (Optional) The space type to use for similarity calculation. If not provided, it is inferred from the field mapping.
  • index_to_vector_field_path_map: (Optional) A map of index names to vector field paths, useful for searches across multiple indices with different structures.

Scoring

The late_interaction_rerank_processor recalculates the score for each document in the top candidates using the MaxSim algorithm with the full set of query vectors provided in the ext section. The final search results are then sorted by these new scores, and the result set is trimmed to the original requested size.