RAG vector index¶

Hadron's vector index lets a client retrieve nodes by meaning rather than substring match. A memory owner opts a memory in; the platform asynchronously embeds each node's abstract (or content chunks, or both); a client calls h-find-nodes with mode: vector or mode: hybrid and receives nodes ranked by semantic similarity to a natural-language query.

This page is the technical reference for the server-side feature (spec 033). Portal controls for configuring the index land in a follow-up release — for now the surfaces below are the GraphQL updateMemory mutation, the MCP h-find-nodes tool, and the relevant Memory fields. See GraphQL API for the generated schema.

When to use it¶

Reach for the vector index when:

A client needs to find nodes by what they're about, not what words they contain — "how do I restore data after a crash" should hit a backups node even if that exact phrase never appears.
You're building a RAG consumer (an agent stuffing context into a prompt) and need the passages most semantically relevant to a query — with their span text, character offsets, and parent node URN ready for citation.
A memory holds a structured corpus (articles, docs, notes) and keyword-only h-find-nodes is missing relevant nodes because they paraphrase rather than quote.

It's not the right tool for exact-match lookup (use keyword mode for that — it's faster and deterministic), nor for retrieving across memories in one call (v1 scopes search to a single memory or the caller's accessible set; no cross-memory ranking).

Turning it on¶

Vector indexing is opt-in per memory, with a class-derived default:

Memory class	Non-encrypted default	Encrypted default
`knowledge`	enabled	disabled (requires disclosure ack)
`system`, `app`, `group`, `personal`, `private`	disabled	disabled (requires disclosure ack)

A knowledge-class memory is opted in by default because it's the corpus class meant to be searched. Every other class — and every encrypted memory regardless of class — is opt-out by default, so private content is never silently indexed.

To enable the index on a memory (GraphQL):

mutation EnableVectorIndex {
  updateMemory(
    id: "<memoryId>"
    vectorIndexEnabled: true
    embeddingSource: abstract            # or contentChunks, or both
  ) {
    id
    vectorIndexEnabled
    embeddingSource
  }
}

For an encrypted memory you must also pass acknowledgeVectorInversionRisk: true in the same mutation — see Encrypted memories below. Without the ack the mutation fails with ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED and no vectors are stored.

What gets embedded: the three sources¶

The embeddingSource memory field selects what the index is built from. It is a memory property, not a query parameter — a client searching the memory never picks the source; it asks "search by vector" and gets hits regardless of how the index was built.

Source	Embeds	Use when
`abstract` (default)	One vector per node, derived from `Node.abstract`	Clients author abstracts (or will). Cheapest path; abstracts already exist by spec 031.
`contentChunks`	One vector per content chunk (multiple per node)	Clients can't or won't author abstracts (research-article corpora, credit-constrained ingest). Chunks need no LLM in the loop.
`both`	Both abstract + chunk vectors	Both retrieval modes useful on the same memory; a single search ranks across both kinds of vector.

Abstract-less memories on the default source retrieve nothing

The abstract is client-supplied. A memory on abstract whose nodes carry no abstracts has an empty index and mode: vector returns nothing — not a bug, but worth knowing before you turn it on.

If your corpus is large, abstract-less, or credit-constrained (research articles ingested at scale, code dumps, scraped pages), set embeddingSource: contentChunks instead. Chunking needs no LLM in the loop, so there's no per-node embedding cost beyond the embedding API itself.

Headless abstract-generation that would populate empty abstracts is reserved for a future spec.

Chunking (when source includes chunks)¶

For contentChunks and both, the platform splits each node's content into chunks at write time and embeds each chunk. Strategy is selected per node:

Structure-aware by default — chunks are cut along markdown sections, paragraphs, or other natural breaks when the content has usable structure.
Fixed-size token window with overlap when the content has no usable structure (plain blobs, code without clear sections).

You can force the fixed-size strategy on the whole memory with forceFixedSize: true, and tune the dials:

Field	Default	Range
`chunkTokens`	`512`	`64 … 4096`
`chunkOverlap`	`64`	`0 … chunkTokens - 1`
`forceFixedSize`	`false`	boolean

Changing any of these fields, or changing embeddingSource in a way that affects chunking, triggers a re-index of the memory. The previous chunk vectors are dropped and new ones are enqueued.

Each chunk carries enough locator metadata for chunk-level retrieval (see Passages):

text — the span text
charStart, charEnd — character offsets within the parent node's content (stable across re-chunking; not token offsets)
chunkIndex — position among the node's chunks
parentNodeId, parentNodeUrn — what the chunk belongs to

The async embedding pipeline¶

Embedding never blocks a write. The write path computes the abstract (or chunks) synchronously, sets a durable pending marker, and returns to the client. A background worker drains the marker and calls the embedding API.

write → set embeddingPendingAt → return to client
                ↓ (async)
          prompt drain trigger
                ↓
        embed via EMBEDDING_API_URL
                ↓
        store vector + clear marker

The durable marker is the source of truth: a worker restart, an embedding API outage, or a near-simultaneous re-write all converge through the marker rather than depending on in-memory state.

Interim transport — tracked tech-debt

The marker-driven queue rides Postgres in v1 because it's the minimal-but-correct shape (durable, restart-safe, no new infrastructure). It is explicit interim tech-debt that will migrate to the platform message bus when that lands — the embedding job is transport-agnostic by design so the migration doesn't reshape the pipeline. Tracked as FR-029 in spec 033 with a forward link to the message-bus work.

Markers on `Node`¶

Field	Meaning
`embeddingPendingAt`	Set when the node needs (re-)embedding. Cleared on success or terminal failure.
`embeddingFailedAt`	Stamped on every failed attempt (transient or terminal). Stays set on terminal failure for diagnosability; surfaced by `h-validate` as `[embed-failed]`.
`embeddingError`	Last failure reason. Set alongside `embeddingFailedAt`.
`embeddingAttempts`	Per-job attempt counter. Caps transient retries before terminal.

Eventual consistency¶

A just-written node is not retrievable until its embedding lands. The search does not block waiting; the node is simply absent from the result set until the worker drains.

Backfill¶

When a memory is opted into the index after nodes already exist, the opt-in enqueues those nodes for embedding. No write event is required — the periodic sweep picks them up.

The same backfill path runs when you change embeddingSource (the prior vectors are dropped) or change a chunking dial (chunks are re-derived).

Operator-visible failures¶

A node whose embedding terminally fails — typically because the memory is encrypted and the worker has no plaintext to embed (see #206 disclosure) — keeps its embeddingFailedAt marker. h-validate surfaces these as:

[embed-failed] <node-urn> — <reason>

so an operator can tell "the index is empty because nothing matched" from "the index is empty because every embed permanently failed". An empty embeddingError falls back to unknown.

Searching: `h-find-nodes` and `nodeSearch`¶

There is one search surface, three modes. Both the MCP h-find-nodes tool and the GraphQL Query.nodeSearch field accept the same mode, expand, and granularity arguments — only the default mode when mode is omitted differs between the two surfaces (see the table below).

Modes¶

Mode	Behavior
`keyword`	Whitespace-separated keywords; each must literally appear (case-insensitive) in `name`, `loc`, `description`, or `tags`. Default on MCP `h-find-nodes` (backward-compat with the pre-spec-033 tool — absence of `mode` is byte-identical to today's behavior).
`vector`	Natural-language query embedded with the platform model, ranked by cosine similarity against the memory's stored vectors. Default on GraphQL `nodeSearch` (new surface — designed as the vector-aware entrypoint, so callers opt in to keyword explicitly).
`hybrid`	Both `keyword` and `vector` run in parallel; results fused by reciprocal-rank fusion with `k = 60` (a deterministic, reproducible fusion across calls).

A query of "*" or "" on mode: keyword is the existing "match all" survey behavior. On mode: vector it's rejected — "*" is meaningless to an embedder.

Granularity¶

`granularity`	Returns
`node` (default)	One entry per matching node — the `nodes` array. For chunk-bearing indexes, the node is ranked by its best-matching chunk.
`chunk`	One entry per matching passage — the `passages` array. Requires `mode: vector` — hybrid + chunk is out of v1 (RRF over passages is reserved).

`expand`¶

Optional graph-neighbor expansion. With expand: <n> (n in 0 … 3, default 0), the response appends undirected neighbors within depth n (all edge types, both directions) after the ranked direct hits, as unscored expansion context.

Expanded neighbors are never interleaved into the similarity ranking — the ranked list keeps its meaning ("best query matches"); expansion only adds context. Useful for a RAG consumer that wants the top hits plus their immediate graph neighborhood in one call.

Result envelope¶

type NodeSearchResult {
  nodes: [Node!]!
  passages: [Passage!]!     # populated when granularity:chunk
  reason: String            # set when the query couldn't run as requested
  degraded: String          # set when the query ran at reduced fidelity
}

reason and degraded are the two flags that surface non-fatal, machine-readable outcomes — never as exceptions, never silently:

Flag value	When
`reason: "no_vector_index"`	`mode: vector` on a memory without `vectorIndexEnabled`. `nodes` is empty.
`reason: "embedding_unavailable"`	`mode: vector` or `mode: hybrid` when no embedding endpoint is configured (see Operator config).
`degraded: "no_vector_index"`	`mode: hybrid` on a memory without `vectorIndexEnabled` — keyword half still runs and is returned; the vector half is missing. The flag tells the caller the result is keyword-only.

Per-hit staleness is not carried on the envelope — it's a property of an individual node hit (see Staleness below for how each surface communicates it).

Passages (chunk-level retrieval)¶

type Passage {
  parentNodeId: ID!
  parentNodeUrn: String!
  chunkIndex: Int!
  charStart: Int!
  charEnd: Int!
  text: String!
  score: Float!
}

A passage is everything a RAG consumer needs to cite: the span text, where it lives inside the parent node (character offsets), which chunk it is, and which node owns it. Returned only when granularity: chunk on mode: vector.

Staleness¶

When a vector hit's embedded abstract is stale relative to current content (spec 032's abstractOriginHash predicate disagrees with the current computeContentHash(content)), the response serves the hit and marks it stale — it does not skip or block it.

How the signal is surfaced depends on the surface:

MCP h-find-nodes — each stale hit gets a Source: abstract-stale text line beneath the hit's score line. The literal string is exactly Source: abstract-stale (no other variants in v1).
GraphQL nodeSearch — staleness is not currently surfaced on the result envelope (the degraded field is reserved for envelope-wide conditions like no_vector_index). A GraphQL caller wanting the per-hit signal computes it from the returned node's abstractOriginHash and the current content hash, or reads the node via h-read-node (which emits the same Source: abstract-stale line in its meta block).

Other invariants:

Abstract vectors carry the staleness signal; chunk vectors do not. Chunks are re-derived and re-enqueued on every content edit (FR-017), so a chunk that exists is by construction derived from current content (modulo the async window). There is no stored summary to drift, so no chunk-staleness predicate is defined in v1.
Encrypted memories still get the marker on the MCP surface, because the origin hash is stamped at write time on plaintext (the worker holds the plaintext at upsert). This is something spec 032's read-time gate couldn't provide.
Self-healing. Every content write re-queues the node's embedding, so the stale marker clears on the next search after the re-embed drains.

Encrypted memories¶

You can opt an encrypted memory into the vector index, but only after acknowledging an explicit disclosure. The disclosure is the same on every surface (GraphQL, MCP, and CLI) and covers four points verbatim:

(a) Stored vectors are plaintext — they are NOT encrypted at rest the way the content column is.

(b) Embedding inversion can partially reconstruct source text from a vector (partial, not full reconstruction).

(c) Anyone with database access is therefore a potential viewer of that partial content, even though the content column itself is encrypted.

(d) The opt-in is revocable; revoking stops further indexing AND removes the memory's already-stored vectors.

Without the ack, enabling vectorIndexEnabled on an encrypted memory fails with the typed error ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED and no vectors are stored. The ack timestamp is persisted on Memory.vectorIndexEncryptedAckAt.

To enable indexing on an encrypted memory:

mutation EnableEncryptedVectorIndex {
  updateMemory(
    id: "<memoryId>"
    vectorIndexEnabled: true
    embeddingSource: abstract
    acknowledgeVectorInversionRisk: true   # required for encrypted memories
  ) {
    id
    vectorIndexEnabled
    vectorIndexEncryptedAckAt
  }
}

Revoking the opt-in (vectorIndexEnabled: false) stops further indexing and removes the memory's stored vectors.

Why the disclosure exists¶

Embeddings are computed on plaintext at write time, just before the encryption envelope is applied. This sidesteps the read-path can't-decrypt problem entirely — but stored vectors are plaintext, and the embedding-inversion research line (vec2text) shows partial source reconstruction is feasible. Disclosing the tradeoff explicitly and making it the owner's choice is a deliberate trust posture: the platform doesn't silently exclude encrypted memories from indexing, and it doesn't silently index them either.

A known limit: encrypted writes via MCP¶

MCP currently bypasses encryption end-to-end (neither encrypts on write nor decrypts on read — tracked as hadron-server #206). On an encrypted memory whose nodes are written via MCP, the embedding worker has no plaintext available at upsert and the node terminally fails embedding with encrypted-no-plaintext (#206). The embeddingFailedAt marker stays set and h-validate surfaces the node. Until #206 lands, the practical recommendation is to enable vector indexing on encrypted memories only when the canonical write path goes through the GraphQL surface (which does encrypt).

Vector storage¶

Database: Postgres + pgvector, in the same database as everything else. One backup story, no new infrastructure.
Single platform-fixed model. Every vector in v1 is produced with the model configured via EMBEDDING_MODEL. All vectors share one dimension (EMBEDDING_DIM, default 768), so all vectors are mutually comparable.
model id stored per vector so a future platform-wide re-embed is deliberate and observable. v1 performs no model migration and serves no mixed-model reads.
HNSW index with vector_cosine_ops, parameters m = 16, ef_construction = 64. Cosine similarity at query time.

Operator configuration¶

The embedding endpoint is self-hosted — no node plaintext leaves Hadron infrastructure (a privacy posture, FR-026 reinforcer). The worker calls the configured URL with the platform-supplied model id and expects either the OpenAI {data: [{embedding}]} shape or the Ollama batch {embeddings: number[][]} shape.

Env var	Required when	Default	Notes
`EMBEDDING_API_URL`	Any memory has `vectorIndexEnabled`	none	Dev: Ollama (`ollama pull nomic-embed-text`, batch endpoint `…/api/embed`). Prod: HuggingFace TEI / vLLM. Register in Doppler.
`EMBEDDING_MODEL`	Always	`nomic-embed-text`	Sent verbatim to the endpoint; must match the served tag. The Ollama tag `nomic-embed-text` serves nomic-embed-text-v1.5.
`EMBEDDING_DIM`	Always	`768`	MUST equal the pgvector `vector(N)` column dimension or embeds are rejected.
`EMBEDDING_API_KEY`	If the endpoint requires auth	none	Optional.

When EMBEDDING_API_URL is unset, every mode: vector and mode: hybrid query returns reason: "embedding_unavailable" (vector half degraded; keyword half still runs on hybrid).

For a local dev endpoint, see Run local LLMs with llama.cpp (an offline nomic-embed-text server, no daemon) or the Ollama path above; for the managed production endpoint, see Configure AWS SageMaker for vector embeddings.

What's reserved for later¶

Not in v1; the storage and API shapes don't foreclose any of these:

Semantic chunking — embed sentences, cut at similarity drops.
Message-bus transport — replaces the interim Postgres-backed marker queue.
Headless abstract generation — would unblock the abstract source for corpora whose clients don't author abstracts.
Cross-memory / multi-memory ranked search — preserved as possible by the single-model choice, but the query surface currently scopes to one memory or the caller's accessible set.
Platform model migration — the deliberate whole-corpus re-embed when the platform model changes. The stored model id and the unique-model invariant exist so this migration can happen, but v1 doesn't perform one.
expand filtering — v1 is undirected over all edge types; per edge type and per direction are reserved.
Embedding cost controls — rate limits, per-memory caps, per-token quotas. Acknowledged risk because backfill, chunk re-derivation, and config-change re-index all multiplicatively expand embedding work.