Research · 10 min read

Navigating Spreading-Out Graphs: An Intuitive Guide to NSG

Graph-based vector search sounds intimidating, but the core idea is elegant. NSG builds a sparse, well-spread graph that lets you navigate to any nearest neighbor in a handful of hops.

Abstract diagram showing a dense graph being pruned into a sparse, spreading-out navigable graph with a highlighted search path
FIG. 02 · RESEARCH — Navigating Spreading-Out Graphs

If you've spent time in the vector search world, you've probably heard of HNSW, the hierarchical navigable small-world graph that powers many popular vector databases. NSG, or Navigating Spreading-out Graph, is a related but distinct algorithm that takes a different approach to the same problem: how do you find nearest neighbors fast in high-dimensional space?

This post explains NSG from the ground up, for readers who don't want to parse the math-heavy original paper. We'll cover the intuition, the construction, and why "spreading out" is the key insight.

Reference: Fu et al., "Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph" (2019).

The basic idea: a graph of neighbors

Imagine you have a million vectors: embeddings of documents, images, products, whatever. You want to answer queries like "find the 10 vectors closest to this query vector." Brute force means comparing against all million vectors, which is too slow.

The graph approach: connect each vector to a small set of neighbors, then search by navigating the graph, starting at some entry point and greedily hopping to whichever neighbor is closest to the query. After a few hops, you're in the right neighborhood.

Greedy navigation through a spreading-out graph: starting at an entry node, each hop moves to the neighbor closest to the query, converging in 5 hops
Greedy search: at each hop, move to the neighbor closest to your query. A well-built graph converges fast.

This works remarkably well, if the graph is built right. The challenge is deciding which edges to include. Too many edges wastes memory and slows down search. Too few edges creates dead ends where greedy search gets stuck. The wrong edges (connecting only to very close neighbors) create local clusters with no paths between them.

What "spreading out" means

Here's NSG's core insight. Consider a node and its neighbors in a naive k-nearest-neighbor graph. All the neighbors tend to cluster in the same direction, because they're the closest points, so they're all roughly in the same part of the space.

This is a problem. If a query comes from a different direction, none of your neighbors help you make progress toward it. Greedy search stalls because every hop moves sideways instead of forward.

Spreading out means choosing neighbors that cover diverse directions from each node. Instead of connecting to your 8 closest neighbors (who might all be in the same cluster), you connect to neighbors that are well-distributed around you, so that no matter where a query comes from, at least one neighbor is closer to it than you are.

This is the navigability guarantee: from any node, you can always make progress toward any query point. No dead ends, no local minima.

How NSG builds the graph

NSG construction happens in three stages.

Stage 1: Start with a dense candidate graph. Build an approximate k-nearest-neighbor graph using a fast method (like NN-Descent or a random-projection forest). Each node gets connected to its ~60 closest neighbors. This graph has good local connectivity but way too many edges.

Stage 2: Prune with the MRNG rule. This is where spreading-out happens. For each node, NSG walks through its candidate neighbors in order of increasing distance and applies a pruning rule:

"Keep neighbor v only if no already-selected neighbor w is closer to v than you are to v."

In other words: if you've already selected a neighbor that "covers" the direction toward v, skip v. It's redundant. This is sometimes called α-pruning (with a parameter α controlling how aggressive the pruning is).

The result is a sparse graph where each node has just 4–8 neighbors, but those neighbors point in well-spread directions. Every edge earns its place by covering a direction that no other edge covers.

Stage 3: Select a navigating node. NSG picks a single global entry point, typically the vector closest to the centroid of the entire dataset. All searches start here. Because the graph is well-spread and navigable, even a single entry point is sufficient (unlike HNSW, which uses multiple layers to provide multiple entry points at different scales).

Greedy search on NSG

Searching an NSG graph is clean and simple:

  • Start at the entry node.
  • Look at all the entry node's neighbors. Compute the distance from each neighbor to the query.
  • Move to the neighbor that's closest to the query. Add it to a "candidates" set.
  • From the new node, look at its neighbors. If any are closer to the query than the best you've seen, add them to the candidates.
  • Repeat until no neighbor improves on the current best. Return the top-k from the candidates set.

In practice, this converges in 5–15 hops for million-scale datasets, with each hop computing distances to ~8 neighbors. That's roughly 40–120 distance computations to find the nearest neighbor out of a million vectors.

The beam width parameter (often called ef_search or search_L) controls quality: a wider beam keeps more candidates alive and explores more of the graph, trading speed for recall. At beam width 1 you get the fastest search with lowest recall; at beam width 100+ you get near-perfect recall at higher cost.

Why not just use more edges?

A natural question: why not skip the pruning and just give each node 32 or 64 neighbors? More edges means more options at each hop, which should help, right?

It does, up to a point. But there are diminishing returns, and the costs are real:

  • Memory. Each edge is a pointer (4–8 bytes). At 64 edges per node and a billion vectors, that's 256–512 GB just for the graph structure. NSG's 8-edge graph uses 32–64 GB, a 4–8× savings that determines whether the index fits in RAM.
  • Search speed. At each hop, you compute distances to all neighbors. With 64 neighbors, each hop is 8× more expensive than with 8 neighbors. The extra edges rarely provide enough recall improvement to justify the cost.
  • Diminishing navigability. Beyond a certain degree, additional edges tend to be redundant, pointing in directions already covered by existing edges. The spreading-out pruning rule is precisely the formalization of this intuition.

The sweet spot, empirically, is degree 8–32 depending on the dataset. NSG's pruning automatically finds the right subset.

NSG vs. HNSW

Both NSG and HNSW are navigable proximity graphs, but they differ in important ways.

HNSW builds a multi-layer hierarchy: the top layer is a sparse graph for coarse navigation, each lower layer adds more nodes and more edges for finer search. This multi-scale approach means search naturally zooms in, first finding the right neighborhood, then refining. The tradeoff is higher memory usage (multiple layers of edges) and more complex construction.

NSG uses a single flat graph with a single entry point. It achieves navigability through the spreading-out property alone, no hierarchy needed. This makes it simpler to build, more memory-efficient, and easier to reason about. The tradeoff is that a single entry point can be a bottleneck for very large datasets, and the construction algorithm is harder to parallelize.

In practice, both achieve competitive recall-vs-speed tradeoffs. NSG tends to have slightly lower memory overhead; HNSW tends to be easier to tune and more robust to adversarial query distributions.

How NSG relates to vector search at VectorAmp

NSG's pruning strategy is a building block in SABLE, VectorAmp's filter-native index engine. SABLE builds micro-graphs, small NSG-like graphs within each inverted list of an IVF-PQ index. These micro-graphs give SABLE the best of both worlds: the memory efficiency of IVF-PQ with the fast traversal of graph-based search, all within a filter-aware architecture.

The spreading-out property is especially valuable within an inverted list, where the vectors are already clustered by the coarse quantizer. Within that local cluster, a well-spread graph lets SABLE find the nearest codes in far fewer distance computations than a linear scan, while the IVF structure handles the global routing.

Key takeaways

  • Graph-based search works by connecting vectors to neighbors and navigating greedily toward the query.
  • Spreading out means choosing neighbors that cover diverse directions, so greedy search never gets stuck.
  • α-pruning is the mechanism: keep a neighbor only if it covers a direction not already handled by an existing edge.
  • The result is a sparse, navigable graph where search converges in a handful of hops, roughly 100× fewer distance computations than brute force.
  • NSG uses a single flat graph (unlike HNSW's hierarchy), trading some flexibility for simplicity and memory efficiency.

The original paper has the full formal treatment, including proofs of monotonic search path existence and complexity bounds. But the intuition is straightforward: build a graph where every node has well-spread neighbors, and greedy search just works.

NSGgraph searchalgorithmsvector searchresearch