MCP memory layer · works with Claude Code, Cursor, Copilot, Codex, Cline

The context window is dying. Imprint replaces it.

Persistent, semantic memory for your AI coding tools. -70.4% tokens, -31.7% cost across 150 benchmark runs. 100% local by default.

curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash
Full benchmark suite · 150 runs · Claude Sonnet
Without Imprint
$2.84
With Imprint
$1.94
Same prompts. Same model. Memory-served answers replace raw file reads.
Why Imprint

Your agent keeps re-reading the same files. Fix that.

Every session starts blank. Claude Code reads chunker.py for the tenth time. Imprint gives every MCP-capable agent a semantic, local, persistent brain — searched, not grepped.

Local-first

Your code never leaves the box

EmbeddingGemma runs on your CPU/GPU. Qdrant is a local daemon on 127.0.0.1. SQLite on disk. Nothing is sent to a paid API unless you explicitly opt in.

Cost

-31.7% cost across 150 runs

Debug queries alone drop by -68.3% cost because known failure modes are served from memory instead of re-read from disk.

Tokens

-70.4% tokens end-to-end

Cross-project questions drop by -90.6% tokens. Semantic search returns the 5 chunks that matter, not the 3 files that contain them.

Zero-config

One command wires every host

`imprint setup all` auto-detects Claude Code, Cursor, Codex CLI, Copilot, and Cline. No sudo. Self-managed .venv. Qdrant auto-spawned. Nothing leaks into your system install.

Same identity, any machine

Project detection that survives renames

Canonical project name from manifests (package.json, go.mod, pyproject.toml), not path. Sync memory between your laptop and a cloud VPS and everything lines up.

Temporal graph

Not just vectors — facts with a timeline

A SQLite knowledge graph tracks subject → predicate → object with validity windows. Decisions, bug fixes, and patterns are recalled with the context of when they were true.

How it works

Everything in the dashed box runs on your machine.

Your AI agent speaks MCP to a local Imprint server. Imprint talks to a Qdrant daemon on 127.0.0.1:6333, a SQLite fact graph, and an on-device ONNX embedding model. No API call leaves this box — unless you deliberately flip a switch.

YOUR MACHINE AI coding agent Claude Code · Cursor Copilot · Codex · Cline MCP (stdio) Imprint MCP server search · store · kg_query 12 tools · Python + FastMCP HTTP 6333 facts ONNX EmbeddingGemma 768-dim · GPU/CPU Qdrant (local daemon) 127.0.0.1:6333 int8 quantized · on-disk payload indexes Imprint Graph (SQLite) subject → predicate → object temporal facts · per-workspace WAL journal Hooks Stop → auto-extract PreCompact → save Chunker · Tagger scan → chunk → embed → tag → upsert ingestion flow OPT-IN ONLY Optional cloud LLM tagging · Anthropic OpenAI · Gemini off by default OTHER MACHINE Imprint peer sync via relay
Default posture

All storage, embeddings, and tagging stay on-device. Nothing crosses the dashed line.

Opt-in cloud LLM tagging

A single imprint config set tagger.llm true enables LLM-quality topics. Off by default.

Multi-machine sync

WebSocket relay lets a laptop and a VPS share memory while keeping the data plane peer-to-peer.

Benchmarks

Real numbers, reproducible in one command.

Every prompt runs 5 times in each mode. Medians are reported. Raw JSON per run is checked in. See the full deep-dive or BENCHMARK.md on GitHub.

See every prompt →
Information
-87.2%
-42.6%
Decision Recall
-78.8%
-46.1%
Debugging
-94.2%
-68.3%
Cross-Project
-90.6%
-46.9%
Session Summary
+179.6%
+204.1%
Creation
+9.9%
+15.1%
Token % first, cost % below. Both measured as ON vs OFF medians per category. Session Summary and Creation are honest losses — ON mode explored more context than it needed.
Any MCP agent

One memory. Every coding tool you use.

imprint setup all auto-wires every supported host that's installed. Missing tools are skipped with a warning, never a failure.

Claude Code
claude-code
Config
~/.claude/settings.json
Enforcement
Hard (PreToolUse)
Cursor
cursor
Config
~/.cursor/mcp.json + rules/imprint.mdc
Enforcement
Rule-driven
GitHub Copilot
copilot
Config
<VSCode user>/mcp.json
Enforcement
Text-only
Codex CLI
codex
Config
~/.codex/config.toml
Enforcement
Text-only
Cline
cline
Config
VSCode globalStorage / ~/.cline
Enforcement
Text-only
Install

30 seconds. No sudo. No system pollution.

Self-contained .venv, auto-downloaded Qdrant binary, and every host configured in one run. imprint disable is symmetric. Prefer a direct download? Grab the latest build from the download page.

curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash
Then: imprint setup all · imprint ingest .

Give your agent a memory worth remembering.

Apache 2.0. Self-updating. Uninstall with one command. Start on your laptop, sync to a VPS when you're ready.