RAG vector index¶
Hadron's vector index lets a client retrieve nodes by meaning rather than
substring match. A memory owner opts a memory in; the platform asynchronously
embeds each node's abstract (or content chunks, or both); a client calls
h-find-nodes with mode: vector or mode: hybrid and receives nodes
ranked by semantic similarity to a natural-language query.
This page is the technical reference for the server-side feature
(spec 033). Portal controls for configuring the index land in a follow-up
release — for now the surfaces below are the GraphQL updateMemory mutation,
the MCP h-find-nodes tool, and the relevant Memory fields. See
GraphQL API for the generated schema.
When to use it¶
Reach for the vector index when:
- A client needs to find nodes by what they're about, not what words
they contain — "how do I restore data after a crash" should hit a
backupsnode even if that exact phrase never appears. - You're building a RAG consumer (an agent stuffing context into a prompt) and need the passages most semantically relevant to a query — with their span text, character offsets, and parent node URN ready for citation.
- A memory holds a structured corpus (articles, docs, notes) and
keyword-only
h-find-nodesis missing relevant nodes because they paraphrase rather than quote.
It's not the right tool for exact-match lookup (use keyword mode for that — it's faster and deterministic), nor for retrieving across memories in one call (v1 scopes search to a single memory or the caller's accessible set; no cross-memory ranking).
Turning it on¶
Vector indexing is opt-in per memory, with a class-derived default:
| Memory class | Non-encrypted default | Encrypted default |
|---|---|---|
knowledge |
enabled | disabled (requires disclosure ack) |
system, app, group, personal, private |
disabled | disabled (requires disclosure ack) |
A knowledge-class memory is opted in by default because it's the corpus
class meant to be searched. Every other class — and every encrypted memory
regardless of class — is opt-out by default, so private content is never
silently indexed.
To enable the index on a memory (GraphQL):
mutation EnableVectorIndex {
updateMemory(
id: "<memoryId>"
vectorIndexEnabled: true
embeddingSource: abstract # or contentChunks, or both
) {
id
vectorIndexEnabled
embeddingSource
}
}
For an encrypted memory you must also pass
acknowledgeVectorInversionRisk: true in the same mutation — see
Encrypted memories below. Without the ack the
mutation fails with ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED and no
vectors are stored.
What gets embedded: the three sources¶
The embeddingSource memory field selects what the index is built from.
It is a memory property, not a query parameter — a client searching
the memory never picks the source; it asks "search by vector" and gets
hits regardless of how the index was built.
| Source | Embeds | Use when |
|---|---|---|
abstract (default) |
One vector per node, derived from Node.abstract |
Clients author abstracts (or will). Cheapest path; abstracts already exist by spec 031. |
contentChunks |
One vector per content chunk (multiple per node) | Clients can't or won't author abstracts (research-article corpora, credit-constrained ingest). Chunks need no LLM in the loop. |
both |
Both abstract + chunk vectors | Both retrieval modes useful on the same memory; a single search ranks across both kinds of vector. |
Abstract-less memories on the default source retrieve nothing
The abstract is client-supplied. A memory on abstract whose
nodes carry no abstracts has an empty index and mode: vector
returns nothing — not a bug, but worth knowing before you turn it on.
If your corpus is large, abstract-less, or credit-constrained
(research articles ingested at scale, code dumps, scraped pages),
set embeddingSource: contentChunks instead. Chunking needs no LLM
in the loop, so there's no per-node embedding cost beyond the
embedding API itself.
Headless abstract-generation that would populate empty abstracts is reserved for a future spec.
Chunking (when source includes chunks)¶
For contentChunks and both, the platform splits each node's content
into chunks at write time and embeds each chunk. Strategy is selected
per node:
- Structure-aware by default — chunks are cut along markdown sections, paragraphs, or other natural breaks when the content has usable structure.
- Fixed-size token window with overlap when the content has no usable structure (plain blobs, code without clear sections).
You can force the fixed-size strategy on the whole memory with
forceFixedSize: true, and tune the dials:
| Field | Default | Range |
|---|---|---|
chunkTokens |
512 |
64 … 4096 |
chunkOverlap |
64 |
0 … chunkTokens - 1 |
forceFixedSize |
false |
boolean |
Changing any of these fields, or changing embeddingSource in a way that
affects chunking, triggers a re-index of the memory. The previous
chunk vectors are dropped and new ones are enqueued.
Each chunk carries enough locator metadata for chunk-level retrieval (see Passages):
text— the span textcharStart,charEnd— character offsets within the parent node's content (stable across re-chunking; not token offsets)chunkIndex— position among the node's chunksparentNodeId,parentNodeUrn— what the chunk belongs to
The async embedding pipeline¶
Embedding never blocks a write. The write path computes the abstract (or chunks) synchronously, sets a durable pending marker, and returns to the client. A background worker drains the marker and calls the embedding API.
write → set embeddingPendingAt → return to client
↓ (async)
prompt drain trigger
↓
embed via EMBEDDING_API_URL
↓
store vector + clear marker
The durable marker is the source of truth: a worker restart, an embedding API outage, or a near-simultaneous re-write all converge through the marker rather than depending on in-memory state.
Interim transport — tracked tech-debt
The marker-driven queue rides Postgres in v1 because it's the minimal-but-correct shape (durable, restart-safe, no new infrastructure). It is explicit interim tech-debt that will migrate to the platform message bus when that lands — the embedding job is transport-agnostic by design so the migration doesn't reshape the pipeline. Tracked as FR-029 in spec 033 with a forward link to the message-bus work.
Markers on Node¶
| Field | Meaning |
|---|---|
embeddingPendingAt |
Set when the node needs (re-)embedding. Cleared on success or terminal failure. |
embeddingFailedAt |
Stamped on every failed attempt (transient or terminal). Stays set on terminal failure for diagnosability; surfaced by h-validate as [embed-failed]. |
embeddingError |
Last failure reason. Set alongside embeddingFailedAt. |
embeddingAttempts |
Per-job attempt counter. Caps transient retries before terminal. |
Eventual consistency¶
A just-written node is not retrievable until its embedding lands. The search does not block waiting; the node is simply absent from the result set until the worker drains.
Backfill¶
When a memory is opted into the index after nodes already exist, the opt-in enqueues those nodes for embedding. No write event is required — the periodic sweep picks them up.
The same backfill path runs when you change embeddingSource (the prior
vectors are dropped) or change a chunking dial (chunks are re-derived).
Operator-visible failures¶
A node whose embedding terminally fails — typically because the memory is
encrypted and the worker has no plaintext to embed (see #206
disclosure) — keeps its
embeddingFailedAt marker. h-validate surfaces these as:
so an operator can tell "the index is empty because nothing matched" from
"the index is empty because every embed permanently failed". An empty
embeddingError falls back to unknown.
Searching: h-find-nodes and nodeSearch¶
There is one search surface, three modes. Both the MCP h-find-nodes
tool and the GraphQL Query.nodeSearch field accept the same mode,
expand, and granularity arguments — only the default mode when
mode is omitted differs between the two surfaces (see the table below).
Modes¶
| Mode | Behavior |
|---|---|
keyword |
Whitespace-separated keywords; each must literally appear (case-insensitive) in name, loc, description, or tags. Default on MCP h-find-nodes (backward-compat with the pre-spec-033 tool — absence of mode is byte-identical to today's behavior). |
vector |
Natural-language query embedded with the platform model, ranked by cosine similarity against the memory's stored vectors. Default on GraphQL nodeSearch (new surface — designed as the vector-aware entrypoint, so callers opt in to keyword explicitly). |
hybrid |
Both keyword and vector run in parallel; results fused by reciprocal-rank fusion with k = 60 (a deterministic, reproducible fusion across calls). |
A query of "*" or "" on mode: keyword is the existing "match all"
survey behavior. On mode: vector it's rejected — "*" is meaningless
to an embedder.
Granularity¶
granularity |
Returns |
|---|---|
node (default) |
One entry per matching node — the nodes array. For chunk-bearing indexes, the node is ranked by its best-matching chunk. |
chunk |
One entry per matching passage — the passages array. Requires mode: vector — hybrid + chunk is out of v1 (RRF over passages is reserved). |
expand¶
Optional graph-neighbor expansion. With expand: <n> (n in 0 … 3,
default 0), the response appends undirected neighbors within
depth n (all edge types, both directions) after the ranked direct
hits, as unscored expansion context.
Expanded neighbors are never interleaved into the similarity ranking — the ranked list keeps its meaning ("best query matches"); expansion only adds context. Useful for a RAG consumer that wants the top hits plus their immediate graph neighborhood in one call.
Result envelope¶
type NodeSearchResult {
nodes: [Node!]!
passages: [Passage!]! # populated when granularity:chunk
reason: String # set when the query couldn't run as requested
degraded: String # set when the query ran at reduced fidelity
}
reason and degraded are the two flags that surface non-fatal,
machine-readable outcomes — never as exceptions, never silently:
| Flag value | When |
|---|---|
reason: "no_vector_index" |
mode: vector on a memory without vectorIndexEnabled. nodes is empty. |
reason: "embedding_unavailable" |
mode: vector or mode: hybrid when no embedding endpoint is configured (see Operator config). |
degraded: "no_vector_index" |
mode: hybrid on a memory without vectorIndexEnabled — keyword half still runs and is returned; the vector half is missing. The flag tells the caller the result is keyword-only. |
Per-hit staleness is not carried on the envelope — it's a property of an individual node hit (see Staleness below for how each surface communicates it).
Passages (chunk-level retrieval)¶
type Passage {
parentNodeId: ID!
parentNodeUrn: String!
chunkIndex: Int!
charStart: Int!
charEnd: Int!
text: String!
score: Float!
}
A passage is everything a RAG consumer needs to cite: the span text,
where it lives inside the parent node (character offsets), which chunk
it is, and which node owns it. Returned only when granularity: chunk
on mode: vector.
Staleness¶
When a vector hit's embedded abstract is stale relative to current
content (spec 032's abstractOriginHash predicate disagrees with the
current computeContentHash(content)), the response serves the hit
and marks it stale — it does not skip or block it.
How the signal is surfaced depends on the surface:
- MCP
h-find-nodes— each stale hit gets aSource: abstract-staletext line beneath the hit's score line. The literal string is exactlySource: abstract-stale(no other variants in v1). - GraphQL
nodeSearch— staleness is not currently surfaced on the result envelope (thedegradedfield is reserved for envelope-wide conditions likeno_vector_index). A GraphQL caller wanting the per-hit signal computes it from the returned node'sabstractOriginHashand the currentcontenthash, or reads the node viah-read-node(which emits the sameSource: abstract-staleline in its meta block).
Other invariants:
- Abstract vectors carry the staleness signal; chunk vectors do not. Chunks are re-derived and re-enqueued on every content edit (FR-017), so a chunk that exists is by construction derived from current content (modulo the async window). There is no stored summary to drift, so no chunk-staleness predicate is defined in v1.
- Encrypted memories still get the marker on the MCP surface, because the origin hash is stamped at write time on plaintext (the worker holds the plaintext at upsert). This is something spec 032's read-time gate couldn't provide.
- Self-healing. Every content write re-queues the node's embedding, so the stale marker clears on the next search after the re-embed drains.
Encrypted memories¶
You can opt an encrypted memory into the vector index, but only after acknowledging an explicit disclosure. The disclosure is the same on every surface (GraphQL, MCP, and CLI) and covers four points verbatim:
(a) Stored vectors are plaintext — they are NOT encrypted at rest the way the content column is.
(b) Embedding inversion can partially reconstruct source text from a vector (partial, not full reconstruction).
(c) Anyone with database access is therefore a potential viewer of that partial content, even though the content column itself is encrypted.
(d) The opt-in is revocable; revoking stops further indexing AND removes the memory's already-stored vectors.
Without the ack, enabling vectorIndexEnabled on an encrypted memory
fails with the typed error ENCRYPTED_VECTOR_INDEX_NOT_ACKNOWLEDGED
and no vectors are stored. The ack timestamp is persisted on
Memory.vectorIndexEncryptedAckAt.
To enable indexing on an encrypted memory:
mutation EnableEncryptedVectorIndex {
updateMemory(
id: "<memoryId>"
vectorIndexEnabled: true
embeddingSource: abstract
acknowledgeVectorInversionRisk: true # required for encrypted memories
) {
id
vectorIndexEnabled
vectorIndexEncryptedAckAt
}
}
Revoking the opt-in (vectorIndexEnabled: false) stops further
indexing and removes the memory's stored vectors.
Why the disclosure exists¶
Embeddings are computed on plaintext at write time, just before the
encryption envelope is applied. This sidesteps the read-path
can't-decrypt problem entirely — but stored vectors are plaintext, and
the embedding-inversion research line (vec2text) shows partial source
reconstruction is feasible. Disclosing the tradeoff explicitly and
making it the owner's choice is a deliberate trust posture: the
platform doesn't silently exclude encrypted memories from indexing, and
it doesn't silently index them either.
A known limit: encrypted writes via MCP¶
MCP currently bypasses encryption end-to-end (neither encrypts on write
nor decrypts on read — tracked as
hadron-server #206).
On an encrypted memory whose nodes are written via MCP, the embedding
worker has no plaintext available at upsert and the node terminally
fails embedding with encrypted-no-plaintext (#206). The
embeddingFailedAt marker stays set and h-validate surfaces the
node. Until #206 lands, the practical recommendation is to enable
vector indexing on encrypted memories only when the canonical write
path goes through the GraphQL surface (which does encrypt).
Vector storage¶
- Database: Postgres + pgvector, in the same database as everything else. One backup story, no new infrastructure.
- Single platform-fixed model. Every vector in v1 is produced with
the model configured via
EMBEDDING_MODEL. All vectors share one dimension (EMBEDDING_DIM, default768), so all vectors are mutually comparable. modelid stored per vector so a future platform-wide re-embed is deliberate and observable. v1 performs no model migration and serves no mixed-model reads.- HNSW index with
vector_cosine_ops, parametersm = 16, ef_construction = 64. Cosine similarity at query time.
Operator configuration¶
The embedding endpoint is self-hosted — no node plaintext leaves
Hadron infrastructure (a privacy posture, FR-026 reinforcer). The
worker calls the configured URL with the platform-supplied model id and
expects either the OpenAI {data: [{embedding}]} shape or the Ollama
batch {embeddings: number[][]} shape.
| Env var | Required when | Default | Notes |
|---|---|---|---|
EMBEDDING_API_URL |
Any memory has vectorIndexEnabled |
none | Dev: Ollama (ollama pull nomic-embed-text, batch endpoint …/api/embed). Prod: HuggingFace TEI / vLLM. Register in Doppler. |
EMBEDDING_MODEL |
Always | nomic-embed-text |
Sent verbatim to the endpoint; must match the served tag. The Ollama tag nomic-embed-text serves nomic-embed-text-v1.5. |
EMBEDDING_DIM |
Always | 768 |
MUST equal the pgvector vector(N) column dimension or embeds are rejected. |
EMBEDDING_API_KEY |
If the endpoint requires auth | none | Optional. |
When EMBEDDING_API_URL is unset, every mode: vector and
mode: hybrid query returns reason: "embedding_unavailable"
(vector half degraded; keyword half still runs on hybrid).
For a local dev endpoint, see
Run local LLMs with llama.cpp
(an offline nomic-embed-text server, no daemon) or the Ollama path
above; for the managed production endpoint, see
Configure AWS SageMaker for vector embeddings.
What's reserved for later¶
Not in v1; the storage and API shapes don't foreclose any of these:
- Semantic chunking — embed sentences, cut at similarity drops.
- Message-bus transport — replaces the interim Postgres-backed marker queue.
- Headless abstract generation — would unblock the
abstractsource for corpora whose clients don't author abstracts. - Cross-memory / multi-memory ranked search — preserved as possible by the single-model choice, but the query surface currently scopes to one memory or the caller's accessible set.
- Platform model migration — the deliberate whole-corpus re-embed
when the platform model changes. The stored
modelid and the unique-model invariant exist so this migration can happen, but v1 doesn't perform one. expandfiltering — v1 is undirected over all edge types; per edge type and per direction are reserved.- Embedding cost controls — rate limits, per-memory caps, per-token quotas. Acknowledged risk because backfill, chunk re-derivation, and config-change re-index all multiplicatively expand embedding work.
See also¶
- Node types —
abstract,info,record,system,reference. - MCP tools — the full
h-*surface, includingh-find-nodesandh-validate. - GraphQL API — generated schema reference.
- Data model — generated entity reference,
including the
MemoryandNodefields named on this page.