Rebuilding an HNSW index is among the most resource-intensive points of utilizing HNSW in manufacturing workloads. In contrast to conventional databases, the place knowledge deletions will be dealt with by merely deleting a row in a desk, utilizing HNSW in a vector database typically requires an entire rebuild to take care of optimum efficiency and accuracy.
Why is Rebuilding Needed?
Due to its layered graph construction, HNSW isn’t inherently designed for dynamic datasets that change often. Including new knowledge or deleting present knowledge is crucial for sustaining up to date knowledge, particularly to be used circumstances like RAG, which goals to enhance search relevence.
Most databases work on an idea known as “onerous” and “comfortable” deletes. Onerous deletes completely take away knowledge, whereas comfortable deletes flag knowledge as ‘to-be-deleted’ and take away it later. The difficulty with comfortable deletes is that the to-be-deleted knowledge nonetheless makes use of vital reminiscence till it’s completely eliminated. That is notably problematic in vector databases that use HNSW, the place reminiscence consumption is already a major concern.
HNSW creates a graph the place nodes (vectors) are linked based mostly on their proximity within the vector house, and traversing on an HNSW graph is completed like a skip-list. With the intention to help that, the layers of the graph are designed in order that some layers have only a few nodes. When vectors are deleted, particularly these on layers which have only a few nodes that function crucial connectors within the graph, the entire HNSW construction can grow to be fragmented. This fragmentation might result in nodes (or layers) which are disconnected from the principle graph, which require rebuilding of the complete graph, or on the very least will end in a degradation within the effectivity of searches.
HNSW then makes use of a soft-delete approach, which marks vectors for deletion however doesn’t instantly take away them. This method lowers the expense of frequent full rebuilds, though periodic reconstruction continues to be wanted to take care of the graph’s optimum state.