Most query expansion methods either dig through feedback from initial search results or rely on pre-defined thesauruses. Query2doc skips both.

Instead, it uses LLMs to generate short, relevant pseudo-documents and appends them to your query — no retraining, no architecture changes.

How It Works

Use few-shot prompting (4 examples) to generate a passage based on a query.
Combine the original query and the LLM-generated text:

For BM25: repeat the query 5 times, then add the pseudo-doc.
For dense retrievers: simple [query] [SEP] [pseudo-doc].

Why It Matters

+15% nDCG@10 boost for BM25 on TREC DL.
Also improves dense models like DPR, SimLM, and E5 — even without fine-tuning.
Works best with bigger models — GPT-4 outperforms smaller ones.
Crucially, the combo of original query + pseudo-doc works better than either alone.

Limitations

Latency: >2 seconds per query — too slow for real-time.
Cost: ~550k LLM calls = ~$5K.
LLMs can hallucinate. Still need validation layers for production.

Takeaway

Query2doc is dead simple but surprisingly effective. It’s a plug-and-play upgrade for search systems — ideal for boosting retrieval quality when training data is scarce.

Just don’t expect real-time speed or perfect factual accuracy.

Try it: query2doc_msmarco on Hugging Face
The paper

Example

import chromadb
import openai
import os

# Set your OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")

# Step 1: Few-shot prompt template
def generate_pseudo_document(query):
    prompt = f"""
      Write a passage that answers the given query.
      
      Query: what state is zip code 85282
      Passage: 85282 is a ZIP code located in Tempe, Arizona. It covers parts of the Phoenix metro area and is known for being home to Arizona State University.
      
      Query: when was pokemon green released
      Passage: Pokémon Green was released in Japan on February 27, 1996, alongside Pokémon Red. These games were the first in the Pokémon series.
      
      Query: {query}
      Passage:"""

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=150,
        temperature=0.7,
    )

    return response.choices[0].message["content"].strip()

# Step 2: Initialize Chroma and add documents
client = chromadb.Client()
collection = client.create_collection("my_docs")

docs = [
    "Pokémon Green was released in Japan in 1996.",
    "Tempe, Arizona has ZIP code 85282.",
    "Randy Newman sings the Monk theme song.",
    "HRA is employer-funded; HSA is individually owned and tax-free."
]

collection.add(
    documents=docs,
    ids=[f"doc_{i}" for i in range(len(docs))]
)

# Step 3: Expand user query
user_query = "when was pokemon green released"
pseudo_doc = generate_pseudo_document(user_query)
expanded_query = f"{user_query} {pseudo_doc}"

# Step 4: Run ChromaDB search
results = collection.query(
    query_texts=[expanded_query],
    n_results=2
)

Query2doc: Improve your RAG by expanding Queries

How It Works

Why It Matters

Limitations

Takeaway

Example

Comments

Leave a comment Cancel reply

More posts

How I Built a PDF-to-Excel App with FastAPI and Gemini

Berkson’s Paradox: Why Your Data Might Be Lying to You

Build Your First AI Agent

Query2doc: Improve your RAG by expanding Queries

Query2doc: Improve your RAG by expanding Queries

How It Works

Why It Matters

Limitations

Takeaway

Example

Share this:

Comments

Leave a comment Cancel reply

More posts

How I Built a PDF-to-Excel App with FastAPI and Gemini

Berkson’s Paradox: Why Your Data Might Be Lying to You

Build Your First AI Agent

Query2doc: Improve your RAG by expanding Queries