Every memory-first vector database is announcing, or about to announce, support for disk. Qdrant has signaled it. So have several others. The market reading is straightforward: costs are coming down, scale is going up, and the gap between vector databases and traditional storage tiers is closing. That reading isn't wrong. But it's incomplete.
The architectural reading is more interesting. Most of these announcements describe bolt-on tiering: disk support added to systems that were designed, indexed, and tuned for resident memory. Bolt-ons behave very differently from architectures designed for disk-resident filtered search from the first commit. They solve a cost problem. They do not, on their own, solve the latency, write-throughput, and low-selectivity filtered-query problems that disk-resident workloads actually surface in production.
What follows is an architectural primer for evaluating any vector database that recently added, or is about to add, disk support. The goal is not to argue that disk tiering is bad. It is to give buyers a framework for understanding what they're actually getting, and what they're not.
What "disk support" actually means in practice
When a memory-first system announces disk support, it almost always falls into one of three architectural patterns. Each has very different performance characteristics, and the marketing rarely distinguishes between them.
Memory-mapped HNSW. The most common retrofit. The HNSW graph lives on disk; the operating system pages segments into memory on demand. This works beautifully on hot data and degrades sharply on cold data. HNSW was designed assuming all neighbors are reachable in nanoseconds; partial residency turns graph traversal into a series of synchronous disk reads. Write amplification gets ugly because HNSW assumes in-memory mutation.
Hot/cold tiered partitions. Recently queried vectors stay in RAM; the rest go to disk. This solves the cost problem cleanly. It does not solve latency variance (cold partition queries can be ten to a hundred times slower than hot ones) and it creates a new operational surface area. Eviction policy, partition sizing, and tier-promotion logic all become parameters the customer has to reason about.
Disk-native ANN. DiskANN and similar approaches were designed for disk-resident workloads from the start. They do not retrofit; they replace. Cold-start behavior is dramatically better than the first two patterns. But the published implementations have well-documented weaknesses on filtered queries, particularly low-selectivity ones, because the graph structure is optimized for unfiltered nearest-neighbor traversal.
None of these are bad designs. They are different things, with different failure modes. "We now support disk" collapses three architectures into one marketing claim. The buyer's job is to know which one they're actually buying.
The filtered search problem nobody's solving
Here is the workload that breaks all three of the patterns above: filtered approximate nearest-neighbor search at low selectivity.
Filtered ANN means a vector query constrained by metadata predicates: tenant_id, jurisdiction, document type, date range, well number, case number. This is the enterprise workload. Nobody runs unfiltered semantic search in production. Every legal-tech RAG system, every compliance archive, every multi-tenant SaaS application filters before, or while, it does vector retrieval.
Selectivity matters. A high-selectivity filter (say, 1% of the corpus matches) is relatively easy. The system can pull a small candidate set, filter it, and return results. Low-selectivity filters, 30%, 50%, 70% of the corpus, are where things break. The industry has quietly optimized for high-selectivity workloads because they're tractable. Most vector databases handle low-selectivity filtered search by one of two methods, both of which degrade at scale:
- Post-filtering: pull k×N candidate vectors, then apply the filter, and hope you have enough survivors. At low selectivity this works; at high filter restrictiveness combined with large corpora, recall collapses.
- Pre-filtering by brute force: scan the filtered subset linearly. Correct, but performance scales poorly with subset size.
Disk tiering makes this worse, not better. When the filtered subset is scattered across cold partitions, mmap thrashes. Hot/cold tiering misses, because the filtered subset cuts orthogonally to recency. DiskANN's graph traversal pays a disk I/O on every hop, and filtered queries traverse far more of the graph than unfiltered ones.
This is the workload where VectorAmp's SABLE, our Selectivity-Aware Bi-Level Engine, separates from retrofitted approaches. The filter is a first-class citizen of the index structure, not a post-hoc constraint applied after the graph has chosen its neighbors. Internal benchmarks against DiskANN as a strong disk-native baseline show 9–10× lower p99 latency on filtered workloads, depending on selectivity, with RAM consumption reduced by roughly 312× against in-memory HNSW at comparable recall. Numbers will vary by workload; the point is architectural, not promotional.
Write latency: the cost nobody quotes
Disk announcements lead with cost-per-vector. They rarely lead with write latency. There is a reason.
Memory-first systems retrofitting disk tiers pay a real penalty on writes. Index updates that were RAM-speed now incur disk persistence on the hot path, or more commonly, get batched. Batching means staleness: a vector inserted at t=0 may not be queryable until t=batch_interval. For static or slowly-changing corpora this is fine. For continuous-ingestion workloads, it is not.
Every one of our target verticals is a continuous-ingestion workload. Legal tech ingests filings as they're entered. Compliance archives ingest communications as they're sent. Energy and oil-and-gas operations ingest sensor and log data continuously. A vector database that is cheap at rest but slow to update is not viable for production retrieval-augmented generation over live document corpora.
It's a falsifiable test. Run it.
A buyer's framework: five questions
When the next disk-support announcement lands (and there will be more), here are the five questions that separate workload fit from cost reduction. They map directly to the dimensions on which disk-native, selectivity-aware architectures differ from memory-first retrofits.
- Cold-start latency. What is p99 on a partition that hasn't been queried in 24 hours?
- Filtered query performance at low selectivity. How does p99 change as filter selectivity moves from 1% to 50%?
- Write latency under sustained ingestion. Does enabling the disk tier degrade write p99, and by how much?
- Pricing transparency. Is the disk tier separately metered? How does total cost of ownership compare to a disk-native system at the same workload?
- Architectural provenance. Was this designed for disk-resident filtered search, or added to a system originally built for resident memory?
None of these are gotcha questions. They are the questions any serious workload owner already asks internally. We're publishing them because the next round of disk-tier announcements will work hard to avoid them, and buyers benefit from having the framework before the marketing arrives.
Closing
Disk tiering is good for the industry. Cost reduction matters. Customers running large vector workloads have been waiting for the economics to improve, and they are improving.
But cost reduction is not the same as workload fit. A retrofitted disk tier on a memory-first system will deliver a cheaper bill at the bottom of the page. Whether it delivers the latency, the write throughput, and the filtered-query performance that production RAG requires is a separate question, and one the announcement is unlikely to answer.
If you're running filtered retrieval at scale, in legal tech, compliance, energy, or any vertical where metadata predicates dominate query patterns, we'd welcome a side-by-side benchmark. Bring your hardest workload. The architectural questions matter most when you can see them in your own data.
VectorAmp leverages the SABLE architecture, a selectivity-aware vector database designed from day one for disk-resident filtered ANN at production scale.