Research · 14 min read

Introducing SABLE: A Filter-Native Engine for Billion-Scale Vector Search

Most vector databases bolt filtering on after the fact. SABLE was designed from the ground up to make filtered search fast — with per-list micro-graphs, bitmap pre-gating, and a selectivity-aware query planner.

Abstract diagram showing the SABLE filter cascade pipeline — IVF lists, bitmap pre-gating, selectivity planner, micro-graph and linear scan paths merging to top-k results
FIG. 01 · RESEARCH — SABLE Filter-Native Engine

Every production vector search query has a WHERE clause.

"Find images similar to this one: in stock, priced under $50, and in the Electronics category." That is the real shape of search in e-commerce, RAG pipelines, recommendation systems, and document retrieval. Yet the dominant vector indexes (HNSW graphs, flat IVF-PQ, DiskANN) were designed for unfiltered nearest-neighbor search. Filtering is bolted on as an afterthought, and performance suffers for it.

SABLE is our answer. It stands for Selectivity-Aware Bi-Level Engine, and it was built from the ground up to make filtered approximate nearest-neighbor search a first-class operation at billion-vector scale.

This post walks through the architecture: what problems SABLE solves, how its four core innovations work together, and why the results matter for anyone running vector search in production.

The filtered-search problem

Consider a standard IVF-PQ index. At query time, you probe the nearest centroid lists and scan compressed PQ codes to estimate distances. Fast and well-understood.

Now add a filter: category = "Electronics" AND price < 50. You have two bad options.

Post-filtering scans all candidates first, then discards those that don't match the filter. If only 1% of your data matches, you've wasted 99% of your distance computations.

Pre-filtering builds the candidate set from the filter first, then searches only within it. But if the filtered set is small and scattered across many IVF lists, the index structure provides no speedup; you're essentially doing brute-force search on a random subset.

Both approaches degrade as filters get more selective. At 1% selectivity, HNSW graphs wander through non-matching nodes; IVF scans waste compute on vectors that will be thrown away; DiskANN burns random I/O loading pages that mostly contain irrelevant results.

SABLE takes a different approach: make the filter metadata part of the index itself, so the system can prune irrelevant work before computing a single distance.

Filter-native indexing

SABLE is built on an IVF-PQ foundation, specifically an OPQ-rotated variant that learns an orthogonal rotation matrix to reduce quantization error before encoding. But where a standard IVF-PQ index stores only compressed codes per list, SABLE enriches every inverted list with three additional structures.

Filter cascade pipeline: all probed lists pass through bitmap gate, then sketch gate, then selectivity-aware scan, progressively eliminating non-matching lists before any distance computation
The filter cascade: each stage eliminates lists before any distance computation happens.

Roaring bitmaps for categorical attributes. Each inverted list maintains a compact Roaring bitmap encoding which categorical attribute values are present in that list. When a query arrives with a categorical filter (category = "Electronics"), SABLE hashes the filter value and checks each list's bitmap in O(1) time. Lists that cannot possibly contain a match are eliminated immediately: no distance computation, no I/O, no wasted work.

At 1% selectivity, bitmap pre-gating alone eliminates 95–98% of candidate lists.

Quantile sketches for numeric ranges. For numeric filters (price < 50), each list stores a lightweight quantile sketch, either a TDigest or KLL structure, capturing the distribution of that attribute's values within the list. The sketch records exact min/max values and five quantile estimates (1st, 5th, 50th, 95th, 99th percentiles). If a list's max value is below the filter's lower bound (or its min is above the upper bound), the list is pruned instantly. For softer pruning, the sketch provides a fast selectivity estimate that feeds the query planner.

Together, they form a cascade. A filtered query flows through the cascade in stages: all probed lists → bitmap check → sketch check → distance computation. Each stage reduces the work for the next. By the time SABLE actually computes distances, it's operating on a dramatically smaller set of lists that are known to contain relevant results.

The selectivity-aware query planner

Here is SABLE's key architectural insight: no single search strategy is optimal across all selectivity regimes.

When a query has no filter (or a very broad one), most vectors in each list are candidates. A graph-based traversal, hopping from neighbor to neighbor, can find the top-k in far fewer distance computations than scanning every code. But when a query is highly selective, the graph is useless: the nearest neighbors in graph space probably don't satisfy the filter, and the search wastes its budget visiting non-matching nodes.

SABLE solves this with a per-list selectivity-aware planner. For each probed list, the planner estimates the fraction of vectors that will satisfy the filter (using the bitmap and sketch metadata) and routes the list to the appropriate search path:

  • High selectivity (few matches): Linear ADC scan. Walk through all codes in the list, check the filter, compute distances only for matching vectors. This is efficient because the cascade has already pruned non-matching lists. Within a surviving list, the scan is the fastest way to find the sparse matches.
  • Low selectivity / unfiltered: Micro-graph beam search. Use the per-list navigable graph to jump directly to the nearest codes, skipping the majority of the list entirely.

The planner uses an EWMA (exponentially weighted moving average) to calibrate its estimates over time. Observed selectivity from recent queries feeds back into future estimates, so the planner learns the real data distribution rather than relying on static assumptions.

Per-list micro-graphs

SABLE's micro-graphs are one of its most distinctive features. Within each inverted list, SABLE builds a compact navigable graph over the PQ-encoded vectors, similar in structure to an NSG (Navigating Spreading-out Graph) but operating entirely within a single IVF list and using PQ-approximate distances rather than exact vectors.

Why per-list instead of a single global graph? Three reasons:

  • Memory efficiency. A global HNSW graph over a billion vectors requires ~100 bytes of edge storage per vector, roughly 100 GB of RAM just for the graph structure. SABLE's micro-graphs live alongside the PQ codes they index, use degree-8 adjacency lists, and add minimal overhead. The entire SABLE index for a billion vectors fits in ~2 GB of RAM plus ~32 GB on disk.
  • Natural partitioning. IVF lists are already locality-clustered by the coarse quantizer. Vectors within a list are relatively close in embedding space, which makes them ideal candidates for graph-based navigation. The graph only needs to be navigable within its local neighborhood, not across the entire dataset.
  • Independent construction and rebuild. Each micro-graph can be built, rebuilt, or skipped independently. SABLE uses a shadow-buffer pattern: graph construction runs in a background thread pool, and new graphs are atomically swapped in when ready. During rebuilds, queries fall back to linear scan, so recall is preserved while throughput temporarily drops.

Graph construction uses α-pruning (with α ≈ 0.985): for each node, candidates are sorted by PQ distance, and a new edge is only added if it provides a direction not already covered by an existing neighbor. This produces a well-spread, navigable graph with low degree and short paths.

For unfiltered queries on large lists (≥ 4,096 vectors), beam search on the micro-graph visits roughly 80 nodes to find the top-10, compared to scanning all 4,096+ codes linearly, a ~50× reduction in distance computations.

Tiered L0/L1 durability

Production systems can't pause for index rebuilds. SABLE uses a two-tier architecture:

  • L0 (in-memory buffer): New vectors are appended to a fixed-size FIFO buffer backed by a write-ahead log (WAL). Inserts are O(1). At query time, L0 is brute-force scanned and its results are merged with L1's. This gives immediate query coverage for freshly ingested data.
  • L1 (disk-backed IVF-PQ): When L0 fills up, its contents are flushed to L1 in a batch. Vectors are assigned to IVF lists, PQ-encoded, and written to memory-mapped disk storage (DLST format). Only the affected lists are flagged for graph rebuild, and rebuild runs asynchronously in the background.

WAL durability means no data is lost on crash: recovery replays the log and restores in-flight vectors. Deletes use soft tombstones in L1, with periodic compaction when the tombstone ratio exceeds 10%.

The result is a system that can ingest continuously while serving queries, with a clear separation between the write path (fast, durable, buffered) and the read-optimized cold tier (compressed, disk-backed, graph-indexed).

Why this matters at scale

The gap between SABLE and prior art is most visible in the numbers.

On a 1M-vector benchmark (DBpedia-OpenAI, d=1536), SABLE delivers 0.994 recall@10 at 33 ms p95 latency for unfiltered queries, and 0.986 recall@10 at 20 ms p95 for filtered queries at 11% selectivity. For comparison, DiskANN achieves 0.897 recall at 175 ms p95 (unfiltered) and 0.941 at 133 ms p95 (filtered). Standard IVF-PQ tops out at 0.608 recall.

The filtered result is the headline: SABLE is 9× faster than DiskANN at comparable recall under filter, and 60% higher recall than IVF-PQ at any operating point. Tail latency variance is remarkably low: the gap between p50 and p99 is under 2 ms, because the cascade pre-gating pipeline eliminates work deterministically rather than relying on probabilistic early termination.

For billion-vector deployments, the memory profile is equally important. SABLE's disk-backed architecture requires ~2 GB of RAM (metadata, hot centroids, offset tables) plus ~32 GB of NVMe storage for codes and raw vectors, over 300× less RAM than an HNSW graph at comparable recall. A single commodity server can host a billion-vector SABLE index.

The bottom line

SABLE exists because filtered search is the default in production, and the industry's dominant indexes weren't designed for it. By integrating filter metadata directly into the index, including per-list bitmaps, quantile sketches, a calibrated planner, and navigable micro-graphs, SABLE eliminates the tradeoff between filter power and search performance.

If you're running vector search at scale and your queries have WHERE clauses, this is the engine we built for you.


SABLE is the core index engine powering VectorAmp Intelligence. To learn more or run benchmarks on your own data, book a demo or explore the SABLE documentation.

The SABLE architecture is protected by a provisional patent application filed with the USPTO on May 9, 2026 (Serial No. 64/061,420).
SABLEvector searchfiltered searchindexingresearcharchitecture