Engineering · 11 min read

IVF-OPQ Explained: Route, Rotate, Compress, Search

IVF-PQ is the workhorse of large-scale vector search. OPQ makes it dramatically better by learning a rotation that aligns your data with the quantizer. Here's how both work and why OPQ's trick is so clever.

Abstract diagram showing IVF Voronoi cells routing to PQ subspace codes, then OPQ rotation aligning data for better quantization
FIG. 03 · ENGINEERING — IVF-OPQ Explained

If you're building anything that searches over vectors at scale (semantic search, recommendation systems, RAG), you'll eventually meet IVF-PQ. It's the technique that makes it possible to search billions of vectors on hardware that can't hold them all in memory.

And if IVF-PQ is the workhorse, OPQ is the upgrade that makes it dramatically better. This post covers both: what they do, how they work, and why OPQ's core innovation, a learned rotation, is one of the cleverest ideas in vector search.

Reference: Ge et al., "Optimized Product Quantization" (CVPR 2013).

The scale problem

A single high-dimensional vector (say, a 1,536-dimensional embedding from a modern language model) takes 6 KB as raw 32-bit floats. One million vectors: 6 GB. One billion: 6 TB. You can't brute-force search that, and you can't hold it all in RAM.

You need two things: a way to avoid looking at every vector (coarse routing), and a way to make each vector smaller (compression). IVF-PQ gives you both.

IVF: route first, search second

The "inverted file" (IVF) part is conceptually simple. During indexing, you cluster all your vectors into groups using k-means. Each cluster has a centroid, a representative point. The clusters are your inverted lists.

At query time, instead of searching all billion vectors, you:

  • Compute the distance from your query to every centroid (fast, since there are only a few thousand centroids).
  • Pick the closest few centroids (this parameter is called n_probe).
  • Search only the vectors in those lists.

If you have 10,000 lists and probe 20, you've just eliminated 99.8% of the dataset. The tradeoff is recall: your true nearest neighbor might be in a list you didn't probe. More probes = higher recall, but more work.

IVF-PQ query pipeline: query routes to nearest centroids, scans compressed PQ codes, returns ranked top-k results
The IVF-PQ pipeline: route to nearest centroids, scan compressed codes, rank results.

PQ: compress each vector to a handful of bytes

The "product quantization" (PQ) part handles compression. The idea is to split each vector into subspaces (say, 96 slices of 16 dimensions each) and independently quantize each slice.

For each subspace, you run k-means to learn a codebook of 256 representative sub-vectors (called codewords). Then, for each vector, you replace each slice with the index of its nearest codeword. A 1,536-dimensional vector becomes 96 bytes: one byte per subspace, each byte indexing into a 256-entry codebook.

That's a 64× compression (from 6,144 bytes to 96 bytes). And the codebooks let you approximate distances without decompressing: you precompute a lookup table of distances from the query's sub-vectors to all codewords, then score each compressed vector with 96 table lookups and additions. This is called Asymmetric Distance Computation (ADC), asymmetric because the query stays in full precision while the database vectors are compressed.

The quality of this approximation depends on how well the codebooks capture the structure of your data. If each subspace has nice, clusterable structure, PQ works great. If the subspaces are messy, and the important variance in your data doesn't align with the subspace boundaries, PQ introduces more error and recall suffers.

This is exactly the problem OPQ solves.

OPQ: the rotation trick

OPQ stands for Optimized Product Quantization, and its core insight is beautifully simple:

Before quantizing, rotate the data so that its structure aligns with the subspace boundaries.

Before and after OPQ rotation: data variance crosses subspace boundaries in standard PQ (high error), but aligns with subspace boundaries after OPQ rotation (minimal error)
OPQ learns a rotation that aligns data variance with PQ subspace boundaries, dramatically reducing quantization error.

Think about it geometrically. Your high-dimensional data has structure: correlations between dimensions, directions of high variance. Standard PQ chops the dimensions into fixed slices and quantizes each one independently. If a direction of high variance happens to cross a subspace boundary, PQ can't capture it well in either subspace. The quantization error is high, and your distance approximations are noisy.

OPQ learns an orthogonal rotation matrix R that transforms the data so that its principal directions of variance align with the subspace boundaries. After rotation, each subspace captures a coherent "chunk" of the data's structure, and the codebooks can represent it much more accurately.

Why orthogonal? Because an orthogonal rotation preserves all distances and angles. The rotated vectors have the exact same nearest-neighbor structure as the originals, nothing is lost or distorted. You're just changing the coordinate system to one that plays nicely with PQ's subspace decomposition.

How OPQ learns the rotation

The training procedure alternates between two steps:

Step 1: Fix R, train PQ codebooks. Apply the current rotation to all training vectors, then run standard PQ training (k-means in each subspace) on the rotated data.

Step 2: Fix codebooks, update R. Given the current codebooks, find the rotation that minimizes total quantization error. This reduces to the orthogonal Procrustes problem, a classic linear algebra problem solvable in closed form via SVD (singular value decomposition).

Repeat for a few iterations (typically 5–10). Each iteration reduces the quantization error, and the rotation converges to a stable solution that jointly optimizes the coordinate system and the codebooks.

The cost is modest: a few matrix decompositions during training, and a single matrix multiplication (O(d²)) per query and per vector at encoding time. At d=1536, that's a fraction of a millisecond.

Putting it together: the IVF-OPQ pipeline

In a fully assembled IVF-OPQ index, the pieces fit together like this:

Indexing:

  • Learn the OPQ rotation matrix R from a training sample
  • Rotate all vectors: x' = Rx
  • Train IVF centroids on rotated vectors via k-means
  • Assign each rotated vector to its nearest centroid
  • Compute residuals (vector minus its centroid)
  • Train PQ codebooks on the residuals
  • Encode each residual as a PQ code (one byte per subspace)

Query:

  • Rotate the query: q' = Rq
  • Compute distances to all centroids, pick the nearest n_probe
  • For each probed list, build an ADC lookup table from the query's residual
  • Scan all PQ codes in each list, accumulating approximate distances via table lookups
  • Return the top-k lowest-distance results (optionally rerank with exact vectors)

The whole pipeline runs in milliseconds. A 1B-vector index at 96 bytes/vector fits in ~96 GB of storage, with only centroids and codebooks in RAM (~400 MB). The PQ codes can live on disk and be read via memory-mapped I/O.

What OPQ actually buys you

The improvement from PQ to OPQ can be striking. On standard benchmarks, OPQ typically reduces quantization error by 15–30%, which translates to a 5–15 percentage point improvement in recall@10 at the same compression ratio.

Why? Because real-world embeddings are far from axis-aligned. Modern language model embeddings have complex correlation structure; dimensions are entangled, principal components are diagonal to the coordinate axes. Standard PQ is forced to quantize this entangled structure with independent per-subspace codebooks, which is fundamentally limited. OPQ disentangles it first.

The rotation is especially valuable for residual coding. After subtracting the IVF centroid, the residual vectors have a different correlation structure than the raw vectors. They tend to be more isotropic but still have non-trivial cross-subspace correlations. OPQ adapts to this residual structure and squeezes out error that standard PQ cannot.

Practical tradeoffs

IVF-OPQ is not free. Here's what to consider:

  • Training cost. OPQ adds the rotation optimization loop to training. For large datasets, this can add minutes to hours depending on dimension and training set size. But training is a one-time cost, you don't pay it at query time beyond the rotation.
  • The m parameter matters. The number of subspaces (m) controls the compression ratio and quality. More subspaces = more bytes per vector = better recall but more storage and slower ADC. Typical choices: m=8 for aggressive compression, m=32–64 for balanced, m=96–128 for high-recall applications.
  • n_probe controls the speed/recall tradeoff. Probing more lists finds more true neighbors but costs more compute. Most production systems tune n_probe to hit a target recall at acceptable latency.
  • Reranking recovers recall. PQ distances are approximate. If you can afford to store raw vectors (on disk), reranking the top candidates with exact distances recovers most of the recall lost to quantization. SABLE does this automatically via memory-mapped raw vectors.
  • IVF-OPQ is a building block, not a complete system. For production use, you typically need durability (WAL), filtering (metadata indexes), and operational features (compaction, live updates) layered on top. This is exactly what SABLE provides.

Why we're excited about this

IVF-OPQ has been around since 2013, and it's still the foundation of the most scalable vector search systems in production. The reason is simple: it gives you O(1)-per-vector storage, O(n_probe)-bounded search cost, and a quality knob (the compression ratio) that you can tune independently of the data size.

OPQ's rotation trick is the kind of insight that seems obvious in hindsight but took real mathematical sophistication to formalize and optimize. It's a reminder that the right coordinate system can be more important than a more complex algorithm.

At VectorAmp, IVF-OPQ is the compression layer inside SABLE. We've extended it with per-list micro-graphs for graph-accelerated search, filter-native bitmap and sketch accelerators, and a selectivity-aware query planner, but the foundation is still the elegant decomposition of route → rotate → compress → search that Jégou, Ge, and their collaborators established over a decade ago.

IVF-PQOPQquantizationvector searchengineering