Skip to content
Download

Architecture

graph LR
subgraph "Go CLI (imprint binary)"
SETUP[setup]
STATUS[status]
ENABLE[enable / disable]
INGEST[ingest]
REFRESH[refresh]
RETAG[retag]
IURL[ingest-url]
SERVER[server start/stop]
SYNC[sync]
RELAY[relay]
UI[ui]
end
subgraph "Python (imprint/)"
MCP[FastMCP Server<br/><small>12 tools</small>]
VS[vectorstore.py<br/><small>Qdrant HTTP client</small>]
QR[qdrant_runner.py<br/><small>auto-spawn daemon</small>]
EMB[embeddings.py<br/><small>ONNX Runtime, GPU/CPU</small>]
KG[imprint_graph.py<br/><small>SQLite facts</small>]
CHK[chunker.py<br/><small>Chonkie hybrid</small>]
TAG[tagger.py<br/><small>4-source metadata</small>]
CLS[classifier.py<br/><small>type detection</small>]
PRJ[projects.py<br/><small>manifest detection</small>]
QUEUE[queue.py<br/><small>single-slot FIFO<br/>+ dispatcher + cancel</small>]
API[api.py<br/><small>FastAPI dashboard</small>]
end
subgraph "Daemon"
QSRV[(Qdrant server<br/>127.0.0.1:6333)]
end
subgraph "Storage (data/)"
QSTORE[qdrant_storage/<br/><small>vectors + payload</small>]
SQLITE[(SQLite<br/>facts per workspace)]
WAL[wal.jsonl<br/><small>WAL per workspace</small>]
WSCFG[workspace.json<br/><small>active + known</small>]
PROTO[label_prototypes.npy<br/><small>zero-shot cache</small>]
QBIN[qdrant-bin/<br/><small>downloaded binary</small>]
QDB[(queue.sqlite3<br/>+ queue.lock)]
end
MCP --> VS --> QSRV --> QSTORE
MCP --> KG --> SQLITE
VS --> QR
QR --> QBIN
QR --> QSRV
VS --> EMB
VS --> WAL
INGEST --> QUEUE
REFRESH --> QUEUE
RETAG --> QUEUE
IURL --> QUEUE
UI --> API --> QUEUE
QUEUE --> QDB
QUEUE --> CHK --> EMB
QUEUE --> TAG
TAG --zero-shot--> EMB
TAG --zero-shot--> PROTO
TAG -.LLM opt-in.-> LLM_API[LLM API<br/><small>anthropic/openai/<br/>ollama/vllm/gemini</small>]
INGEST --> PRJ
SERVER --> QR
style QSRV fill:#1a1a3a,stroke:#60a5fa,color:#fff
style QSTORE fill:#1a1a3a,stroke:#60a5fa,color:#fff
style SQLITE fill:#1a1a3a,stroke:#4ecdc4,color:#fff
style MCP fill:#0d1117,stroke:#a78bfa,color:#fff
style EMB fill:#0d1117,stroke:#fbbf24,color:#fff
style CHK fill:#0d1117,stroke:#f472b6,color:#fff
style TAG fill:#0d1117,stroke:#34d399,color:#fff
style QR fill:#0d1117,stroke:#ff6b6b,color:#fff
style LLM_API fill:#0d1117,stroke:#fbbf24,color:#fff,stroke-dasharray: 5 5
ComponentTechnologyPurpose
Vector storeQdrant server (auto-spawned daemon)HNSW + int8 scalar quantization, payload-indexed filters, multi-client safe
Server runnerqdrant_runner.pyDownloads + spawns + supervises the local Qdrant daemon
EmbeddingsEmbeddingGemma-300M via ONNX Runtime768-dim, 2048 ctx. Configurable model via imprint config
ChunkingChonkie 1.6+CodeChunker (tree-sitter) + SemanticChunker (topic shifts) + sliding overlap
Metadata taggerPythonDeterministic + keyword dict + zero-shot (default) + opt-in multi-provider LLM
Imprint graphSQLiteTemporal facts with valid_from/ended
MCP serverFastMCP (Python)12 tools — search/neighbors/graph_scope/list_sources/file_summary/file_chunks for reads, store/delete/ingest_url for writes, kg_query/kg_edit for facts, status for stats. Connects to Qdrant via HTTP.
CLIGosetup, status, enable, disable, update, uninstall, ingest, learn, ingest-url, refresh, refresh-urls, retag, migrate, server, workspace, wipe, sync, relay, ui, config
RelayGo (nhooyr/websocket)Stateless WebSocket forwarder for P2P sync
Command queueimprint/queue.py + imprint/queue_lock.py + internal/queuelockSingle-slot FIFO (SQLite at data/queue.sqlite3, advisory flock at data/queue.lock). Serializes heavy jobs across the Go CLI and the FastAPI dispatcher; cancel sends SIGTERM → SIGKILL (3s) to the subprocess process group. See queue.md.
sequenceDiagram
participant U as You
participant C as Claude Code
participant K as Imprint MCP
participant E as Embedding ONNX
participant Q as Qdrant
Note over C,K: Session Start
C->>K: wake_up()
K->>Q: scroll recent points + KG facts
K-->>C: project list + essential context (~800 tokens)
Note over U,C: Working on a task
U->>C: "Why is CORS set to wildcard?"
C->>K: search("CORS wildcard", lang="python", domain="auth")
K->>E: embed query → dense vector
K->>Q: HNSW search + payload filter
Q-->>K: top-K matching chunks + structured tags
K-->>C: ranked results with metadata
C-->>U: answers from memory (no file reads needed)
Note over C,K: Claude learns something
C->>K: store("CORS wildcard is by design because...")
K->>E: embed content → vector
K->>Q: upsert point (vector + payload)
Note over C,K: Session End
C->>K: Stop hook fires (async)
K->>Q: auto-extract decisions, batch upsert

Embedded Qdrant (the path=... mode) is single-writer — only one process can hold the on-disk lock. That breaks the moment your MCP server, your hooks, and an imprint ingest all try to write at once. Imprint sidesteps the limitation by auto-spawning a local Qdrant server on 127.0.0.1:6333.

sequenceDiagram
participant MCP as MCP server (Claude Code)
participant H as Stop hook
participant CLI as imprint ingest
participant R as qdrant_runner
participant Q as qdrant daemon
MCP->>R: ensure_running()
alt server not reachable
R->>R: download binary (first time)<br/>spawn detached daemon
R->>Q: start
R->>Q: poll /readyz until 200
end
R-->>MCP: 127.0.0.1:6333
CLI->>R: ensure_running() (server already up)
R-->>CLI: 127.0.0.1:6333
H->>R: ensure_running()
R-->>H: 127.0.0.1:6333
par
MCP->>Q: search (HTTP)
and
CLI->>Q: upsert points (HTTP)
and
H->>Q: upsert decisions (HTTP)
end
Q-->>MCP: results
Q-->>CLI: ack
Q-->>H: ack

qdrant_runner.py handles the lifecycle:

  • First call: downloads the pinned Qdrant binary (~50 MB) from GitHub releases into data/qdrant-bin/, then subprocess.Popen([..., start_new_session=True]) so the daemon survives the parent process. Logs to data/qdrant.log, PID written to data/qdrant.pid.
  • Subsequent calls: cheap HTTP probe to /readyz — returns immediately if alive.
  • Storage: data/qdrant_storage/ (collection data) + data/qdrant_snapshots/. Both gitignored.
  • Shutdown: imprint server stop (or imprint disable) sends SIGTERM via the PID file.
Env varDefaultPurpose
IMPRINT_QDRANT_HOST127.0.0.1Bind / connect host
IMPRINT_QDRANT_PORT6333HTTP port
IMPRINT_QDRANT_GRPC_PORT6334gRPC port
IMPRINT_QDRANT_VERSIONv1.17.1Pinned release tag
IMPRINT_QDRANT_BIN(auto)Override binary path (e.g. system-installed qdrant)
IMPRINT_QDRANT_NO_SPAWN0Set 1 to disable auto-spawn — connect to your own managed server

Why server mode and not embedded? Embedded mode pins a filesystem lock and rejects any second client — this conflicts with Claude Code (always-on MCP) running alongside imprint ingest, hooks writing decisions, and tools like imprint ui reading the collection. Server mode supports unlimited concurrent connections at the cost of a single ~50 MB binary in your data dir and a ~50 MB resident process. Worth it.

Bring your own server: set IMPRINT_QDRANT_NO_SPAWN=1 and point IMPRINT_QDRANT_HOST at a Docker (docker run -p 6333:6333 qdrant/qdrant) or remote Qdrant. Auto-spawn is disabled and the runner connects directly.

Ingest / refresh / retag / ingest-url each load the embedding model, scan Qdrant, and — when LLM tagging is enabled — hold a per-batch HTTP connection to the tagger provider. Two of them running in parallel on the same box easily exhaust RAM or VRAM, so they are serialized by a shared advisory lock.

  • Lock file: data/queue.lock guarded by fcntl.flock(LOCK_EX|LOCK_NB). The Go CLI (internal/queuelock) and the Python dispatcher (imprint/queue_lock.py) agree on the path and the JSON body ({pid, job_id, command, started_at}), so whichever process acquires it first blocks the other.
  • CLI semantics: Direct invocations of imprint ingest|refresh|retag|ingest-url|refresh-urls try the lock non-blocking. If held, they exit nonzero and print the current holder’s PID, command, and start time — the user cancels from the UI or kills the PID.
  • UI semantics: POST /api/commands/{cmd} enqueues into data/queue.sqlite3; the FastAPI startup task (queue.dispatcher_loop) pops one queued row at a time, waits blocking on the lock, then Popens the subprocess with start_new_session=True so the child owns its own process group.
  • Cancel: POST /api/jobs/{id}/cancel either marks a queued row cancelled (dispatcher skips it) or, for a running job, fires killpg(pgid, SIGTERM) followed 3 s later by SIGKILL if the group is still alive. Because the child runs in its own session, the escalation reaps the Python subprocess, its httpx worker threads (so the in-flight LLM tagger call drops), any llama-cpp inference thread, and any descendant git ls-files helpers — all together.
  • Restart recovery: queue.recover_on_startup() marks rows in status='running' whose PID is dead as failed (error='api_restart') and clears stale lock files, so the dispatcher resumes cleanly after an API crash.
  • Progress integration: imprint/progress.py keeps its single-slot ingest_progress.json; /api/queue joins it with the active DB row so the UI still sees phase/processed/total/ETA while gaining queue/position/history.

See queue.md for endpoint reference, SQLite schema, and verification steps.

Terminal window
imprint status # is the system enabled? server pid? memory count?
imprint enable # idempotent re-wire of MCP + hooks + server
imprint disable # stops daemon, removes MCP registration, strips hooks
imprint server start # explicit server boot (auto on first MCP/CLI call)
imprint server stop # SIGTERM the daemon
imprint server status # JSON: pid, host, port, log path
imprint server log # path to qdrant.log for tailing

disable / enable are kill switches. Disable stops the Qdrant daemon, removes the MCP server registration from Claude Code, and strips the imprint hooks from ~/.claude/settings.json. Your venv and data directory are kept intact, so re-enabling is instant — no re-ingest needed. imprint status shows the current state.

Example status output:

═══ Imprint Status ═══
[+] ENABLED
✓ MCP server registered (Claude Code)
✓ Hooks installed (5 entries)
✓ Qdrant server http://127.0.0.1:6333 (pid 19803)
✓ Python venv /home/you/code/imprint/.venv/bin/python
✓ Data dir /home/you/code/imprint/data
Memories: 14293 across 32 projects
my-web-app (1551)
backend-api (1053)
...