6:["$","div",null,{"className":"min-h-screen","style":{"backgroundColor":"var(--color-bg-primary)"},"children":[["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"LearningResource\",\"name\":\"Vector Databases\",\"description\":\"Match the workload to the right vector database\",\"url\":\"https://quest.srinivaskotha.uk/chapters/rag-pipeline/levels/4\",\"isPartOf\":{\"@type\":\"Course\",\"name\":\"Production RAG Pipeline\",\"url\":\"https://quest.srinivaskotha.uk/chapters/rag-pipeline\"},\"educationalLevel\":\"Advanced\",\"inLanguage\":\"en\",\"isAccessibleForFree\":true,\"provider\":{\"@type\":\"Person\",\"name\":\"Srinivas Kotha\",\"url\":\"https://srinivaskotha.uk\"}}"}}],["$","div",null,{"className":"max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-8","children":[["$","nav",null,{"className":"flex items-center gap-1.5 text-xs mb-6","aria-label":"Breadcrumb","children":[["$","$L12",null,{"href":"/","className":"transition-colors","style":{"color":"var(--color-text-muted)"},"children":"Hub"}],["$","svg",null,{"className":"opacity-30","width":"16","height":"16","viewBox":"0 0 24 24","fill":"none","stroke":"currentColor","strokeWidth":"2","strokeLinecap":"round","strokeLinejoin":"round","aria-hidden":"true","children":["$","path",null,{"d":"M9 18l6-6-6-6"}]}],["$","$L12",null,{"href":"/chapters/rag-pipeline","className":"transition-colors","style":{"color":"var(--color-text-muted)"},"children":"Production RAG Pipeline"}],["$","svg",null,{"className":"opacity-30","width":"16","height":"16","viewBox":"0 0 24 24","fill":"none","stroke":"currentColor","strokeWidth":"2","strokeLinecap":"round","strokeLinejoin":"round","aria-hidden":"true","children":["$","path",null,{"d":"M9 18l6-6-6-6"}]}],["$","span",null,{"style":{"color":"var(--rag)"},"children":["Level ",4]}]]}],["$","$L13",null,{"learnSections":[{"id":1034,"levelId":4,"sortOrder":1,"sectionType":"diagram","title":"Vector Database Architecture","content":{"edges":[["ingest","index"],["index","store"],["store","query"],["query","retrieve"]],"nodes":[{"id":"ingest","icon":"upload","label":"Ingest Vectors","description":"INSERT INTO embeddings (content, vector) VALUES ($1, $2::vector). Batch inserts are 3-5x faster than single-row inserts. Build HNSW index AFTER bulk load, not during."},{"id":"index","icon":"network","label":"HNSW Index","description":"Hierarchical Navigable Small World graph. Builds approximate neighborhood graph across all vectors. Parameters: m=16 (connections per node), ef_construction=64 (build quality). Higher = slower build, faster query."},{"id":"store","icon":"database","label":"Vector Store","description":"Vectors stored as PostgreSQL column type vector(512). Metadata (doc_id, section, date) stored in regular columns — queryable with standard SQL WHERE clauses alongside vector similarity."},{"id":"query","icon":"search","label":"ANN Query","description":"SELECT content, vector <=> $1 AS distance FROM embeddings ORDER BY distance LIMIT 10. The <=> operator triggers HNSW index scan. Adding WHERE clauses enables filtered search."},{"id":"retrieve","icon":"file-text","label":"Return Chunks","description":"Returns top-K chunks with their cosine similarity scores and metadata. Scores range 0-2 for <=> (0 = identical, 2 = opposite). Production: scores below 0.7 often indicate no relevant match."}],"animate":true,"stepThrough":true},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1040,"levelId":4,"sortOrder":2,"sectionType":"prediction","title":"Predict: The Index Choice","content":{"reveal":"Switch to HNSW. IVF partitions vectors into Voronoi cells and scans the nearest cells at query time — more probes = better recall but higher latency. At 10M documents, IVF with reasonable probe counts (10-20) achieves ~200ms. HNSW builds a navigable graph and traverses it in O(log n) — typical latency is 5-20ms at 10M scale. The tradeoff: HNSW uses 2-3x more memory (graph edges) and has slower index build time. For read-heavy production workloads under 50ms SLA, HNSW is the standard choice. pgvector supports both — change the index type, rebuild, done.","options":["Increase IVF probe count from 10 to 50 — scan more clusters for better accuracy","Switch from IVF to HNSW index — graph-based search is faster at this scale","Add more RAM — IVF is slow because vectors are spilling to disk","Reduce embedding dimensions from 1536 to 256 — smaller vectors are faster to compare"],"question":"You have 10M documents in pgvector using an IVF (Inverted File) index. Average query latency is 200ms. Your SLA requires <50ms. What should you do?"},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1033,"levelId":4,"sortOrder":3,"sectionType":"text","title":"Why Regular Databases Cannot Do Similarity Search","content":{"markdown":"A traditional SQL `WHERE` clause finds exact matches. But RAG needs **approximate nearest neighbor (ANN)** search — finding the K vectors closest to a query vector in high-dimensional space.\n\n**HNSW (Hierarchical Navigable Small World)** graphs solve this. The algorithm builds layers of connections between vectors, where higher layers connect distant neighbors and lower layers connect close neighbors. Query time is O(log n) with >99% recall.\n\nIf you already run PostgreSQL, **pgvector** is your default. With pgvectorscale, it achieves 471 QPS at 50M vectors — 11x faster than Qdrant on the same hardware. No extra infrastructure, no new ops burden, no data synchronization headaches.\n\n**Enterprise Skills Bridge:** HNSW is to vector search what a B-tree index is to SQL. Same principle: trade extra storage and write overhead for dramatically faster reads. You've built B-tree-backed search systems your entire career — pgvector is the same skill applied to 768-dimensional similarity."},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1039,"levelId":4,"sortOrder":4,"sectionType":"analogy","title":"Concepts You Already Know","content":{"analogies":[{"newIcon":"🕸️","background":"backend","breakPoint":"B-tree gives exact results — every matching row is returned. HNSW gives approximate results — it may miss the true nearest neighbor in exchange for 100x faster search. B-tree recall is always 100%; HNSW recall is typically 95-99% depending on index parameters (ef_construction, M). There is no 'exact mode' for HNSW at scale.","bridgeText":"A B-tree organizes data in a sorted tree for O(log n) exact lookups and range queries. HNSW (Hierarchical Navigable Small World) organizes vectors in a layered graph for O(log n) approximate nearest neighbor search. Both are index structures that trade write speed and storage for fast reads. B-tree for scalar values; HNSW for high-dimensional vectors.","newConcept":"HNSW Index","familiarIcon":"🌳","familiarConcept":"B-tree Index"},{"newIcon":"🧠","background":"backend","breakPoint":"Redis lookups are exact (hash table O(1)). Vector similarity search is approximate and involves scanning graph neighbors — even in-memory, a query over 10M vectors takes 5-50ms, not microseconds. Also, Redis data can be trivially sharded by key; vector indexes require careful partitioning to maintain search quality.","bridgeText":"Redis keeps data in RAM for microsecond lookups — fast but expensive at scale. In-memory vector indexes (HNSW in Qdrant, pgvector with shared_buffers) keep vectors in RAM for millisecond similarity search — also fast but expensive at scale. Both follow the same pattern: pay more for RAM to avoid disk latency.","newConcept":"In-Memory Vector Index","familiarIcon":"⚡","familiarConcept":"Redis Cache"},{"newIcon":"🗺️","background":"frontend","breakPoint":"CSS Grid is 2D with human-interpretable axes (x, y). Vector spaces have 1536 dimensions with no interpretable axes. You cannot visualize a vector space directly — dimensionality reduction (t-SNE, UMAP) projects to 2D for visualization but distorts distances. CSS Grid distances are exact; projected vector distances are approximate.","bridgeText":"CSS Grid positions elements in 2D space (row, column). Vector databases position documents in 1536D space (embedding dimensions). In both cases, proximity matters — elements near each other in the space are related. Grid uses pixels for distance; vector databases use cosine similarity or Euclidean distance.","newConcept":"Vector Space Search","familiarIcon":"📐","familiarConcept":"Flexbox/Grid (Multi-Dimensional Positioning)"},{"newIcon":"🎯","background":"frontend","breakPoint":"Fuzzy text matching has interpretable similarity (edit distance: 'cat' → 'bat' = 1 character). ANN similarity is a floating-point cosine score with no intuitive meaning — 0.87 vs 0.85 may or may not matter depending on the domain. Also, autocomplete operates on a vocabulary of thousands; ANN operates on millions of high-dimensional vectors.","bridgeText":"Search autocomplete shows results that fuzzy-match your typing — 'auhtentic' still shows 'authentication'. It sacrifices exactness for speed and usability. ANN search shows vectors that approximately match your query vector — it might miss the absolute closest vector but returns 95-99% of true neighbors in milliseconds instead of scanning every vector.","newConcept":"Approximate Nearest Neighbor (ANN)","familiarIcon":"🔤","familiarConcept":"Search Autocomplete (Fuzzy Matching)"},{"newIcon":"🗺️","background":"devops","breakPoint":"DNS traversal is deterministic — the same domain always resolves the same way. HNSW traversal is greedy and non-deterministic — different entry points may lead to different (approximate) results. DNS has guaranteed correctness; HNSW trades correctness for speed.","bridgeText":"DNS resolves a domain by traversing a hierarchy: root → .com → google.com → docs.google.com, narrowing the search at each level. HNSW searches by traversing graph layers: top layer (few nodes, long jumps) → middle layers → bottom layer (all nodes, short jumps), narrowing the search at each level. Both use hierarchical navigation to avoid exhaustive search.","newConcept":"HNSW Graph Navigation","familiarIcon":"🌐","familiarConcept":"DNS Resolution (Hierarchical Lookup)"},{"newIcon":"🔍","background":"devops","breakPoint":"Load balancers select ONE best server. Vector search returns top-K nearest neighbors (typically 5-20). Load balancer distance is network latency (milliseconds); vector distance is cosine similarity (0 to 1). Also, load balancers consider server health/capacity; vector databases consider only mathematical distance.","bridgeText":"A load balancer routes requests to the nearest healthy server — it measures latency/distance and picks the closest one. A vector database routes queries to the nearest document vectors — it measures cosine distance and picks the closest ones. Both select the best match from a pool of candidates based on a distance metric.","newConcept":"Vector Similarity Search","familiarIcon":"⚖️","familiarConcept":"Load Balancer (Nearest Healthy Server)"}]},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1038,"levelId":4,"sortOrder":5,"sectionType":"exploration","title":"Vector Similarity Search — Step by Step","content":{"edges":[{"id":"e-qt-em","label":"tokenize + encode","source":"query-text","target":"embedding-model","animated":true},{"id":"e-em-qv","label":"1536-dim output","source":"embedding-model","target":"query-vector","animated":true},{"id":"e-qv-ann","label":"nearest neighbors","source":"query-vector","target":"ann-search","animated":true},{"id":"e-ann-topk","label":"ranked by cosine","source":"ann-search","target":"top-k-results","animated":true}],"nodes":[{"id":"query-text","data":{"icon":"🔍","label":"Query Text","active":true,"details":"Input: \"how to configure retry backoff\"\n\nThis natural language question needs to find relevant docs even if they use different words like \"exponential delay\", \"retry policy\", or \"backoff interval\".","accentColor":"#3b82f6"},"type":"concept","position":{"x":0,"y":120}},{"id":"embedding-model","data":{"icon":"🔄","label":"Embedding Model","details":"Model: text-embedding-3-small\n1536 dimensions\n~15ms latency\nCost: $0.02 per 1M tokens\n\nConverts meaning into a point in high-dimensional space. Similar meanings → nearby points. \"retry backoff\" and \"exponential delay\" map to vectors only 0.08 apart.","accentColor":"#3b82f6"},"type":"concept","position":{"x":280,"y":120}},{"id":"query-vector","data":{"icon":"📐","label":"Query Vector","details":"[0.023, -0.041, 0.089, ..., 0.012]\n\n1536 floating point numbers.\nSimilar meanings → nearby points.\n\"retry backoff\" is close to \"exponential retry\" but far from \"database schema\".\n\nThis vector is now the search key — instead of matching characters, we match position in meaning-space.","accentColor":"#3b82f6"},"type":"concept","position":{"x":560,"y":120}},{"id":"ann-search","data":{"icon":"🗄️","label":"ANN Search","details":"Algorithm: HNSW (Hierarchical Navigable Small World)\nIndex: pgvector with ivfflat or hnsw\n\nSearches ~100K vectors in ~8ms.\nDoesn't check every vector — navigates a graph of neighbors.\nRecall@10: 0.98 (misses 2% of true top-10).\n\nTrade-off: HNSW uses more memory but is faster. IVFFlat uses less memory but needs periodic retraining after bulk inserts.","accentColor":"#3b82f6"},"type":"concept","position":{"x":840,"y":120}},{"id":"top-k-results","data":{"icon":"🎯","label":"Top-K Results","active":true,"details":"K=5 results:\n1. retry_config.md — cosine: 0.94\n2. backoff_patterns.md — cosine: 0.91\n3. timeout_handling.md — cosine: 0.87\n4. error_recovery.md — cosine: 0.82\n5. circuit_breaker.md — cosine: 0.78\n\nThreshold: drop anything below 0.75.\nTotal pipeline: ~23ms (15ms embed + 8ms search).\nCost: ~$0.00002 per query.","accentColor":"#3b82f6"},"type":"concept","position":{"x":1120,"y":120}}],"title":"Vector Similarity Search — Step by Step","description":"Follow a single query through the entire vector search pipeline. Click each node to see exact data, latencies, and costs."},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1037,"levelId":4,"sortOrder":6,"sectionType":"d2_diagram","title":"Vector Search Visualization","content":{"altText":"Visualization of vector similarity search in 2D space: a query vector (arrow) points into a cloud of document embedding points. The nearest K points are highlighted, showing how cosine similarity finds semantically related documents. An HNSW graph overlay shows the hierarchical index structure with navigable small-world connections between nodes at multiple layers.","caption":"Vector search finds documents by meaning, not keywords. Each document chunk becomes a point in high-dimensional space. The query is also a point, and the database returns the K nearest neighbors. HNSW indexes make this O(log n) instead of comparing every vector.","svgPath":"/diagrams/vector-search-visualization.svg"},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1035,"levelId":4,"sortOrder":7,"sectionType":"comparison","title":"pgvector vs Pinecone vs Qdrant","content":{"after":{"label":"pgvector (Self-Managed)","content":"**Best for:** Existing Postgres users, < 100M vectors\n\n**Pricing:** $0 additional (you already pay for Postgres)\n**Scale:** 471 QPS at 50M vectors with pgvectorscale\n**Ops burden:** Same as your existing Postgres instance\n**Metadata filtering:** Native SQL WHERE clauses\n\n**When to use:**\n- You already run PostgreSQL\n- < 100M vectors\n- Strong SQL querying needs (complex metadata filters)\n\n**Performance:** With pgvectorscale, 11x faster than Qdrant at 50M vectors on the same hardware.\n\n**Enterprise Skills Bridge:** Your existing PostgreSQL knowledge, monitoring, backup, and replication strategies all transfer directly."},"before":{"label":"Pinecone (Managed)","content":"**Best for:** Billion-scale, zero ops, serverless pricing\n\n**Pricing:** ~$700/month for 10M vectors (s1 pod)\n**Scale:** Handles 10B+ vectors with serverless tier\n**Ops burden:** Zero — fully managed, automatic scaling\n**Metadata filtering:** Native filter API\n\n**When to use:**\n- No existing PostgreSQL infrastructure\n- Need managed scaling without DevOps resources\n- > 100M vectors\n\n**When NOT to use:**\n- You already run PostgreSQL (pgvector is free)\n- Strict data locality requirements\n- < 10M vectors (massive overkill)"}},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1036,"levelId":4,"sortOrder":8,"sectionType":"callout","title":"Enterprise Skills Bridge","content":{"title":"Vector Indexes = B-Trees for Meaning","content":"You've spent years tuning B-tree indexes on SQL Server and Oracle. HNSW is the same concept applied to high-dimensional space: trade storage and write overhead for fast reads. The tuning discipline transfers: monitor index build time, watch for index bloat on high-write workloads, and benchmark recall vs query latency trade-offs the same way you'd benchmark SQL index selectivity.","variant":"enterprise"},"createdAt":"$D2026-03-30T05:04:51.010Z"},{"id":1041,"levelId":4,"sortOrder":9,"sectionType":"prediction","title":"Predict: Cosine vs Dot Product","content":{"reveal":"The embeddings are unnormalized. Cosine similarity normalizes by dividing by vector magnitudes: cos(a,b) = (a·b) / (|a|·|b|). The dot product is just a·b — no normalization. If vectors have different magnitudes (common with some open-source models), dot product ranks longer vectors higher regardless of relevance. Fix: L2-normalize all vectors before inserting into pgvector. After normalization, dot product equals cosine similarity, and you get the same rankings. Alternatively, use pgvector's cosine distance operator <=> instead of <#>.","options":["pgvector has lower quality search algorithms than Qdrant","Dot product and cosine similarity always produce different rankings","The embeddings are unnormalized — dot product scores are biased by vector magnitude","The pgvector HNSW index needs higher ef_search parameter"],"question":"You migrate from Qdrant (cosine similarity) to pgvector (dot product via <#> operator). Recall drops from 0.92 to 0.71 even though the same embeddings and same queries are used. What is the most likely cause?"},"createdAt":"$D2026-03-30T05:04:51.010Z"}],"accentColor":"var(--rag)","gameType":"ConceptMatcher","gameConfig":{"pairs":[{"left":"Already running PostgreSQL, <100M vectors","right":"pgvector"},{"left":"Billion-scale, fully managed, zero ops","right":"Pinecone"},{"left":"Heavy metadata filtering, open-source","right":"Qdrant"},{"left":"Local dev / prototyping, embedded Python","right":"ChromaDB"},{"left":"Multi-tenant SaaS, enterprise compliance","right":"Weaviate"},{"left":"Ultra-high write throughput, Kubernetes-native","right":"Milvus"}]},"levelTitle":"Vector Databases","levelId":4,"levelNum":4,"chapterId":1,"chapterSlug":"rag-pipeline","chapterTitle":"Production RAG Pipeline","xpReward":100,"keyInsight":"pgvector outperforms dedicated vector DBs at <100M vectors — no extra infra if you're on Postgres. Pinecone for serverless scale. Qdrant for metadata-heavy filtering. ChromaDB only for local dev — it doesn't scale. Your EF Core + SQL Server experience transfers directly to pgvector.","nextLevelUrl":"/chapters/rag-pipeline/levels/5","backUrl":"/chapters/rag-pipeline","isAuthenticated":false,"levelSubtitle":"Match the workload to the right vector database","hookQuote":"A startup replaced their Pinecone plan ($700/month) with pgvector after discovering they only had 2M vectors — well within pgvector's sweet spot. With pgvectorscale, they got 471 QPS at 99% recall, 11x Qdrant on the same hardware, at $0 additional infrastructure cost since they already ran PostgreSQL.","totalLevels":10,"estimatedMinutes":5}],"$L14"]}]]}]