Skip to content

RAG vector index

Hadron's vector index lets a client retrieve nodes by meaning rather than substring match. A memory owner opts a memory in; the platform asynchronously embeds each node's abstract (or content chunks, or both); a client calls h-find-nodes with mode: vector or mode: hybrid and receives nodes ranked by semantic similarity to a natural-language query.

This page is the technical reference for the server-side feature (spec 033). Portal controls for configuring the index land in a follow-up release — for now the surfaces below are the GraphQL updateMemory mutation, the MCP h-find-nodes tool, and the relevant Memory fields. See GraphQL API for the generated schema.

When to use it

Reach for the vector index when:

  • A client needs to find nodes by what they're about, not what words they contain — "how do I restore data after a crash" should hit a backups node even if that exact phrase never appears.
  • You're building a RAG consumer (an agent stuffing context into a prompt) and need the passages most semantically relevant to a query — with their span text, character offsets, and parent node URN ready for citation.
  • A memory holds a structured corpus (articles, docs, notes) and keyword-only h-find-nodes is missing relevant nodes because they paraphrase rather than quote.

It's not the right tool for exact-match lookup (use keyword mode for that — it's faster and deterministic), nor for retrieving across memories in one call (v1 scopes search to a single memory or the caller's accessible set; no cross-memory ranking).

Turning it on

Vector indexing is opt-in per memory, with a class-derived default:

Memory class Non-encrypted default Encrypted default
knowledge enabled disabled (requires disclosure ack)
system, app, group, personal, private disabled disabled (requires disclosure ack)

A knowledge-class memory is opted in by default because it's the corpus class meant to be searched. Every other class — and every encrypted memory regardless of class — is opt-out by default, so private content is never silently indexed.

To enable the index on a memory (GraphQL):

mutation EnableVectorIndex {
  updateMemory(
    id: "<memoryId>"
    vectorIndexEnabled: true
    embeddingSource: abstract            # or contentChunks, or both
  ) {
    id
    vectorIndexEnabled
    embeddingSource
  }
}

For an encrypted memory you must also pass acknowledgeVectorInversionRisk: true in the same mutation — see Encrypted memories below. Without the ack the mutation fails with ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED and no vectors are stored.

What gets embedded: the three sources

The embeddingSource memory field selects what the index is built from. It is a memory property, not a query parameter — a client searching the memory never picks the source; it asks "search by vector" and gets hits regardless of how the index was built.

Source Embeds Use when
abstract (default) One vector per node, derived from Node.abstract Clients author abstracts (or will). Cheapest path; abstracts already exist by spec 031.
contentChunks One vector per content chunk (multiple per node) Clients can't or won't author abstracts (research-article corpora, credit-constrained ingest). Chunks need no LLM in the loop.
both Both abstract + chunk vectors Both retrieval modes useful on the same memory; a single search ranks across both kinds of vector.

Abstract-less memories on the default source retrieve nothing

The abstract is client-supplied. A memory on abstract whose nodes carry no abstracts has an empty index and mode: vector returns nothing — not a bug, but worth knowing before you turn it on.

If your corpus is large, abstract-less, or credit-constrained (research articles ingested at scale, code dumps, scraped pages), set embeddingSource: contentChunks instead. Chunking needs no LLM in the loop, so there's no per-node embedding cost beyond the embedding API itself.

Headless abstract-generation that would populate empty abstracts is reserved for a future spec.

Chunking (when source includes chunks)

For contentChunks and both, the platform splits each node's content into chunks at write time and embeds each chunk. Strategy is selected per node:

  • Structure-aware by default — chunks are cut along markdown sections, paragraphs, or other natural breaks when the content has usable structure.
  • Fixed-size token window with overlap when the content has no usable structure (plain blobs, code without clear sections).

You can force the fixed-size strategy on the whole memory with forceFixedSize: true, and tune the dials:

Field Default Range
chunkTokens 512 64 … 4096
chunkOverlap 64 0 … chunkTokens - 1
forceFixedSize false boolean

Changing any of these fields, or changing embeddingSource in a way that affects chunking, triggers a re-index of the memory. The previous chunk vectors are dropped and new ones are enqueued.

Each chunk carries enough locator metadata for chunk-level retrieval (see Passages):

  • text — the span text
  • charStart, charEnd — character offsets within the parent node's content (stable across re-chunking; not token offsets)
  • chunkIndex — position among the node's chunks
  • parentNodeId, parentNodeUrn — what the chunk belongs to

The async embedding pipeline

Embedding never blocks a write. The write path computes the abstract (or chunks) synchronously, sets a durable pending marker, and returns to the client. A background worker drains the marker and calls the embedding API.

write → set embeddingPendingAt → return to client
                ↓ (async)
          prompt drain trigger
        embed via EMBEDDING_API_URL
        store vector + clear marker

The durable marker is the source of truth: a worker restart, an embedding API outage, or a near-simultaneous re-write all converge through the marker rather than depending on in-memory state.

Interim transport — tracked tech-debt

The marker-driven queue rides Postgres in v1 because it's the minimal-but-correct shape (durable, restart-safe, no new infrastructure). It is explicit interim tech-debt that will migrate to the platform message bus when that lands — the embedding job is transport-agnostic by design so the migration doesn't reshape the pipeline. Tracked as FR-029 in spec 033 with a forward link to the message-bus work.

Markers on Node

Field Meaning
embeddingPendingAt Set when the node needs (re-)embedding. Cleared on success or terminal failure.
embeddingFailedAt Stamped on every failed attempt (transient or terminal). Stays set on terminal failure for diagnosability; surfaced by h-validate as [embed-failed].
embeddingError Last failure reason. Set alongside embeddingFailedAt.
embeddingAttempts Per-job attempt counter. Caps transient retries before terminal.

Eventual consistency

A just-written node is not retrievable until its embedding lands. The search does not block waiting; the node is simply absent from the result set until the worker drains.

Backfill

When a memory is opted into the index after nodes already exist, the opt-in enqueues those nodes for embedding. No write event is required — the periodic sweep picks them up.

The same backfill path runs when you change embeddingSource (the prior vectors are dropped) or change a chunking dial (chunks are re-derived).

Operator-visible failures

A node whose embedding terminally fails — typically because the memory is encrypted and the worker has no plaintext to embed (see #206 disclosure) — keeps its embeddingFailedAt marker. h-validate surfaces these as:

[embed-failed] <node-urn> — <reason>

so an operator can tell "the index is empty because nothing matched" from "the index is empty because every embed permanently failed". An empty embeddingError falls back to unknown.

Searching: h-find-nodes and nodeSearch

There is one search surface, three modes. Both the MCP h-find-nodes tool and the GraphQL Query.nodeSearch field accept the same mode, expand, and granularity arguments — only the default mode when mode is omitted differs between the two surfaces (see the table below).

Modes

Mode Behavior
keyword Whitespace-separated keywords; each must literally appear (case-insensitive) in name, loc, description, or tags. Default on MCP h-find-nodes (backward-compat with the pre-spec-033 tool — absence of mode is byte-identical to today's behavior).
vector Natural-language query embedded with the platform model, ranked by cosine similarity against the memory's stored vectors. Default on GraphQL nodeSearch (new surface — designed as the vector-aware entrypoint, so callers opt in to keyword explicitly).
hybrid Both keyword and vector run in parallel; results fused by reciprocal-rank fusion with k = 60 (a deterministic, reproducible fusion across calls).

A query of "*" or "" on mode: keyword is the existing "match all" survey behavior. On mode: vector it's rejected — "*" is meaningless to an embedder.

Granularity

granularity Returns
node (default) One entry per matching node — the nodes array. For chunk-bearing indexes, the node is ranked by its best-matching chunk.
chunk One entry per matching passage — the passages array. Requires mode: vector — hybrid + chunk is out of v1 (RRF over passages is reserved).

expand

Optional graph-neighbor expansion. With expand: <n> (n in 0 … 3, default 0), the response appends undirected neighbors within depth n (all edge types, both directions) after the ranked direct hits, as unscored expansion context.

Expanded neighbors are never interleaved into the similarity ranking — the ranked list keeps its meaning ("best query matches"); expansion only adds context. Useful for a RAG consumer that wants the top hits plus their immediate graph neighborhood in one call.

Result envelope

type NodeSearchResult {
  nodes: [Node!]!
  passages: [Passage!]!     # populated when granularity:chunk
  reason: String            # set when the query couldn't run as requested
  degraded: String          # set when the query ran at reduced fidelity
}

reason and degraded are the two flags that surface non-fatal, machine-readable outcomes — never as exceptions, never silently:

Flag value When
reason: "no_vector_index" mode: vector on a memory without vectorIndexEnabled. nodes is empty.
reason: "embedding_unavailable" mode: vector or mode: hybrid when no embedding endpoint is configured (see Operator config).
degraded: "no_vector_index" mode: hybrid on a memory without vectorIndexEnabled — keyword half still runs and is returned; the vector half is missing. The flag tells the caller the result is keyword-only.

Per-hit staleness is not carried on the envelope — it's a property of an individual node hit (see Staleness below for how each surface communicates it).

Passages (chunk-level retrieval)

type Passage {
  parentNodeId: ID!
  parentNodeUrn: String!
  chunkIndex: Int!
  charStart: Int!
  charEnd: Int!
  text: String!
  score: Float!
}

A passage is everything a RAG consumer needs to cite: the span text, where it lives inside the parent node (character offsets), which chunk it is, and which node owns it. Returned only when granularity: chunk on mode: vector.

Staleness

When a vector hit's embedded abstract is stale relative to current content (spec 032's abstractOriginHash predicate disagrees with the current computeContentHash(content)), the response serves the hit and marks it stale — it does not skip or block it.

How the signal is surfaced depends on the surface:

  • MCP h-find-nodes — each stale hit gets a Source: abstract-stale text line beneath the hit's score line. The literal string is exactly Source: abstract-stale (no other variants in v1).
  • GraphQL nodeSearch — staleness is not currently surfaced on the result envelope (the degraded field is reserved for envelope-wide conditions like no_vector_index). A GraphQL caller wanting the per-hit signal computes it from the returned node's abstractOriginHash and the current content hash, or reads the node via h-read-node (which emits the same Source: abstract-stale line in its meta block).

Other invariants:

  • Abstract vectors carry the staleness signal; chunk vectors do not. Chunks are re-derived and re-enqueued on every content edit (FR-017), so a chunk that exists is by construction derived from current content (modulo the async window). There is no stored summary to drift, so no chunk-staleness predicate is defined in v1.
  • Encrypted memories still get the marker on the MCP surface, because the origin hash is stamped at write time on plaintext (the worker holds the plaintext at upsert). This is something spec 032's read-time gate couldn't provide.
  • Self-healing. Every content write re-queues the node's embedding, so the stale marker clears on the next search after the re-embed drains.

Encrypted memories

You can opt an encrypted memory into the vector index, but only after acknowledging an explicit disclosure. The disclosure is the same on every surface (GraphQL, MCP, and CLI) and covers four points verbatim:

(a) Stored vectors are plaintext — they are NOT encrypted at rest the way the content column is.

(b) Embedding inversion can partially reconstruct source text from a vector (partial, not full reconstruction).

(c) Anyone with database access is therefore a potential viewer of that partial content, even though the content column itself is encrypted.

(d) The opt-in is revocable; revoking stops further indexing AND removes the memory's already-stored vectors.

Without the ack, enabling vectorIndexEnabled on an encrypted memory fails with the typed error ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED and no vectors are stored. The ack timestamp is persisted on Memory.vectorIndexEncryptedAckAt.

To enable indexing on an encrypted memory:

mutation EnableEncryptedVectorIndex {
  updateMemory(
    id: "<memoryId>"
    vectorIndexEnabled: true
    embeddingSource: abstract
    acknowledgeVectorInversionRisk: true   # required for encrypted memories
  ) {
    id
    vectorIndexEnabled
    vectorIndexEncryptedAckAt
  }
}

Revoking the opt-in (vectorIndexEnabled: false) stops further indexing and removes the memory's stored vectors.

Why the disclosure exists

Embeddings are computed on plaintext at write time, just before the encryption envelope is applied. This sidesteps the read-path can't-decrypt problem entirely — but stored vectors are plaintext, and the embedding-inversion research line (vec2text) shows partial source reconstruction is feasible. Disclosing the tradeoff explicitly and making it the owner's choice is a deliberate trust posture: the platform doesn't silently exclude encrypted memories from indexing, and it doesn't silently index them either.

A known limit: encrypted writes via MCP

MCP currently bypasses encryption end-to-end (neither encrypts on write nor decrypts on read — tracked as hadron-server #206). On an encrypted memory whose nodes are written via MCP, the embedding worker has no plaintext available at upsert and the node terminally fails embedding with encrypted-no-plaintext (#206). The embeddingFailedAt marker stays set and h-validate surfaces the node. Until #206 lands, the practical recommendation is to enable vector indexing on encrypted memories only when the canonical write path goes through the GraphQL surface (which does encrypt).

Vector storage

  • Database: Postgres + pgvector, in the same database as everything else. One backup story, no new infrastructure.
  • Single platform-fixed model. Every vector in v1 is produced with the model configured via EMBEDDING_MODEL. All vectors share one dimension (EMBEDDING_DIM, default 768), so all vectors are mutually comparable.
  • model id stored per vector so a future platform-wide re-embed is deliberate and observable. v1 performs no model migration and serves no mixed-model reads.
  • HNSW index with vector_cosine_ops, parameters m = 16, ef_construction = 64. Cosine similarity at query time.

Operator configuration

The embedding endpoint is self-hosted — no node plaintext leaves Hadron infrastructure (a privacy posture, FR-026 reinforcer). The worker calls the configured URL with the platform-supplied model id and expects either the OpenAI {data: [{embedding}]} shape or the Ollama batch {embeddings: number[][]} shape.

Env var Required when Default Notes
EMBEDDING_API_URL Any memory has vectorIndexEnabled none Dev: Ollama (ollama pull nomic-embed-text, batch endpoint …/api/embed). Prod: HuggingFace TEI / vLLM. Register in Doppler.
EMBEDDING_MODEL Always nomic-embed-text Sent verbatim to the endpoint; must match the served tag. The Ollama tag nomic-embed-text serves nomic-embed-text-v1.5.
EMBEDDING_DIM Always 768 MUST equal the pgvector vector(N) column dimension or embeds are rejected.
EMBEDDING_API_KEY If the endpoint requires auth none Optional.

When EMBEDDING_API_URL is unset, every mode: vector and mode: hybrid query returns reason: "embedding_unavailable" (vector half degraded; keyword half still runs on hybrid).

For a local dev endpoint, see Run local LLMs with llama.cpp (an offline nomic-embed-text server, no daemon) or the Ollama path above; for the managed production endpoint, see Configure AWS SageMaker for vector embeddings.

What's reserved for later

Not in v1; the storage and API shapes don't foreclose any of these:

  • Semantic chunking — embed sentences, cut at similarity drops.
  • Message-bus transport — replaces the interim Postgres-backed marker queue.
  • Headless abstract generation — would unblock the abstract source for corpora whose clients don't author abstracts.
  • Cross-memory / multi-memory ranked search — preserved as possible by the single-model choice, but the query surface currently scopes to one memory or the caller's accessible set.
  • Platform model migration — the deliberate whole-corpus re-embed when the platform model changes. The stored model id and the unique-model invariant exist so this migration can happen, but v1 doesn't perform one.
  • expand filtering — v1 is undirected over all edge types; per edge type and per direction are reserved.
  • Embedding cost controls — rate limits, per-memory caps, per-token quotas. Acknowledged risk because backfill, chunk re-derivation, and config-change re-index all multiplicatively expand embedding work.

See also

  • Node typesabstract, info, record, system, reference.
  • MCP tools — the full h-* surface, including h-find-nodes and h-validate.
  • GraphQL API — generated schema reference.
  • Data model — generated entity reference, including the Memory and Node fields named on this page.