Data Retrievability
How your codebase makes data searchable, retrievable, and understandable to agents
Data Retrievability is how effectively your codebase enables agents to find, understand, and retrieve information. This includes vector embeddings (dense and multimodal), vector databases, hybrid search combining keyword and semantic matching, reranking for precision, intelligent chunking strategies, knowledge graphs for complex reasoning, agentic RAG patterns with query planning, and evaluation frameworks that measure retrieval quality.
Summary
In April 2026, the consensus is clear: naive dense-only vector search is outdated. Production systems use hybrid retrieval (BM25 sparse + dense vectors + reranking), agentic RAG with query planning beats single-stage pipelines, and Anthropic's Contextual Retrieval pattern (prepending summaries to chunks) reduces retrieval failures by 49–67%. pgvector + pgvectorscale now matches Pinecone's performance at 75% lower cost, and LightRAG achieves 6,000x token efficiency over traditional GraphRAG.
Key takeaways:
- Hybrid > Dense: Always combine keyword (BM25) + semantic (dense vectors) + reranking. Fusion (RRF) beats either alone.
- Contextual > Raw: Prepend chunk-specific summaries via Claude before embedding. Prompt caching keeps costs low.
- Agentic RAG: Query planning, multi-hop retrieval, and reflection loops outperform naive embed-retrieve-generate.
- LightRAG > GraphRAG: 6,000x cheaper, retrieves entities/relations directly, not community traversal.
- Evaluate obsessively: RAGAS metrics (faithfulness, context recall) + MTEB benchmarks + domain-specific metrics in CI/CD.
Decision tree: When to use what
graph TD
A[Need retrieval?] -->|No| B[Not applicable]
A -->|Yes| C{Data complexity & scale?}
C -->|Simple, `` `<100K` `` docs| D{Hybrid or agentic?}
C -->|Complex relationships| E{Cost-sensitive?}
C -->|Large scale >10M| F[pgvector + pgvectorscale or Pinecone]
D -->|Just embed & retrieve| G[Dense vectors + vector DB]
D -->|Need keyword + semantic| H[Hybrid: BM25 + dense + RRF]
D -->|Multi-hop reasoning| I[Agentic RAG with planning]
E -->|Yes| J[LightRAG or vector-only]
E -->|No| K[GraphRAG with Neo4j]
G -->|Many images/PDFs| L[Multimodal embeddings]
G -->|Text-only| M[text-embedding-3-large or Voyage]
H -->|Use Weaviate| N[Weaviate hybrid native]
H -->|DIY pipeline| O[Qdrant + OpenSearch + RRF]
I -->|Use LangGraph| P[LangGraph + Claude agents]
I -->|Use LlamaIndex| Q[LlamaIndex agentic]
F -->|Managed| R[Pinecone serverless]
F -->|Self-hosted| S[pgvector + pgvectorscale]
L -->|All modalities| T[Gemini Embedding 2]
L -->|Text + images| U[Voyage Multimodal 3.5 or Cohere v4]The 2026 retrieval stack (recommended)
For most TypeScript/Node teams:
| Layer | Choice | Why |
|---|---|---|
| Embeddings | OpenAI text-embedding-3-large or Voyage 3 | Mature, integrated APIs; Voyage cheaper |
| Multimodal | Voyage Multimodal 3.5 or Gemini 2 | Handle PDFs, slides, images natively |
| Vector DB | pgvector + pgvectorscale (cost) OR Pinecone serverless (managed) | pgvector 75% cheaper; Pinecone hands-off |
| Hybrid layer | Weaviate (native) OR Qdrant + OpenSearch (flexible) | Weaviate has BM25 built-in; Qdrant more control |
| Reranking | Cohere Rerank 3.5 or Voyage Rerank 2.5 | Two-stage always beats one-stage |
| Chunking | Anthropic Contextual Retrieval | 49–67% failure reduction; prompt caching = low cost |
| Knowledge Graph | LightRAG (efficient) OR Neo4j (enterprise) | LightRAG 6,000x cheaper; Neo4j if schema-heavy |
| Agentic RAG | LangGraph (LangChain) or LlamaIndex agents | Reflection + dynamic tool selection |
| Evaluation | RAGAS + MTEB + domain metrics in CI/CD | Measure faithfulness, context recall, recall@k |
Dimensions & pages
Dense Embeddings
OpenAI, Voyage, Cohere, Google, and open-source models. Model selection, Matryoshka embeddings, cost vs. quality.
Multimodal Embeddings
Image, video, and audio embeddings. Voyage Multimodal 3.5, Gemini 2, CLIP, SigLIP.
Vector Databases
Pinecone, Weaviate, Qdrant, pgvector, LanceDB, Milvus. When to pick each, comparison matrix.
Hybrid Search
BM25 + dense + RRF fusion. Why hybrid beats dense-only. TypeScript examples with Qdrant + OpenSearch.
Reranking
Two-stage retrieval. Cohere Rerank 3.5/4.0, Voyage Rerank 2.5, BGE. Cost vs. quality trade-offs.
Chunking Strategies
Fixed, semantic, late, contextual (Anthropic). Overlap, anti-patterns, prompt caching.
Knowledge Graphs
Neo4j, KuzuDB, GraphRAG, LightRAG. When graph retrieval beats vectors. Hybrid graph+vector.
Agentic RAG
Query planning, multi-hop retrieval, self-RAG, reflection loops. LangGraph, LlamaIndex.
Evaluation Frameworks
RAGAS (faithfulness, context recall), MTEB, BEIR, recall@k, nDCG. TypeScript eval loops.
Anti-Patterns
Dense-only retrieval, no chunking, no reranking, stale embeddings, skipping evaluation.
Why this dimension matters for agents
AI agents cannot "just figure out" where information lives or how to retrieve it. They depend entirely on what the system exposes:
- Dense-only retrieval fails on rare terms. Agents asking about niche features need exact keyword matches + semantic understanding. Hybrid search provides both.
- Single-stage retrieval misses multi-hop reasoning. Complex questions (e.g., "which vendors supply my manufacturer, and who are their competitors?") need agentic query decomposition and iteration.
- Raw chunks without context lose meaning. A chunk saying "revenue grew 3%" is useless without company, time period, and baseline. Contextual Retrieval prepends this.
- Opaque retrieval quality kills reliability. Without RAGAS metrics and domain evals, agents hallucinate based on poor retrievals. Measure everything.
- Token efficiency at scale is non-negotiable. Agents iterating over 10+ retrievals need LightRAG ($0.15 per doc) not GraphRAG ($4–7 per doc).
Implementation flow
- Choose embedding model → OpenAI, Voyage, or open-source based on cost/latency/modality
- Pick vector DB → Pinecone (managed), pgvector (cheap), Qdrant (flexible), or Weaviate (hybrid-native)
- Add hybrid layer → BM25 index (Elasticsearch, Typesense) + fusion (RRF) if not Weaviate
- Implement chunking → Fixed-size baseline; upgrade to Contextual Retrieval (Anthropic pattern)
- Add reranking → Cohere or Voyage two-stage retrieval (retrieve top-50, rerank to top-5)
- Build agentic RAG → Query planning + reflection loops (LangGraph or LlamaIndex)
- Evaluate constantly → RAGAS metrics in CI/CD, MTEB for embeddings, domain-specific tests
- Monitor drift → Re-embed samples monthly; track embedding similarity distribution
Common questions
Q: Do I need a vector database?
A: Yes, if you want agent-driven search. BM25 alone (keyword-only) fails on synonyms and paraphrasing. Vector DBs are table stakes.
Q: Dense or hybrid?
A: Hybrid always. Dense alone misses exact-match signals. Hybrid (BM25 + dense + RRF) beats either alone by 20–40% on MTEB-style benchmarks.
Q: How much does reranking cost?
A: ~$0.01 per 1M tokens (negligible). Retrieve broad (top-50 dense), rerank narrow (top-5 rerank). Saves downstream generation costs by filtering irrelevant docs.
Q: Contextual Retrieval sounds expensive.
A: Not with prompt caching. Cache the full document once (~50 tokens), reference it for chunk summaries. Cost amortizes to <<1% of embedding cost.
Q: GraphRAG or LightRAG?
A: LightRAG (2025). GraphRAG is 6,000x more expensive ($4–7 vs $0.15 per document). Use GraphRAG only if you need explicit community-traversal reasoning and budget allows.
Q: When do I need agents in RAG?
A: Simple queries (single-stage) work fine. Multi-hop (e.g., find all vendors), schema-driven (SQL + vector), or reflection-needing (rewrite, refine) queries need agentic RAG.