MCP memory layer · works with Claude Code, Cursor, Copilot, Codex, Cline

The context window is dying. Imprint replaces it.

Persistent, semantic memory for your AI coding tools. -70.4% tokens, -31.7% cost across 150 benchmark runs. 100% local by default.

Install v0.8.0 ↓ See the benchmark numbers

curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash

Full benchmark suite · 150 runs · Claude Sonnet

Without Imprint

$2.84

→

With Imprint

$1.94

Same prompts. Same model. Memory-served answers replace raw file reads.

Why Imprint

Your agent keeps re-reading the same files. Fix that.

Every session starts blank. Claude Code reads chunker.py for the tenth time. Imprint gives every MCP-capable agent a semantic, local, persistent brain — searched, not grepped.

Local-first

Your code never leaves the box

EmbeddingGemma runs on your CPU/GPU. Qdrant is a local daemon on 127.0.0.1. SQLite on disk. Nothing is sent to a paid API unless you explicitly opt in.

Cost

-31.7% cost across 150 runs

Debug queries alone drop by -68.3% cost because known failure modes are served from memory instead of re-read from disk.

Tokens

-70.4% tokens end-to-end

Cross-project questions drop by -90.6% tokens. Semantic search returns the 5 chunks that matter, not the 3 files that contain them.

Zero-config

One command wires every host

`imprint setup all` auto-detects Claude Code, Cursor, Codex CLI, Copilot, and Cline. No sudo. Self-managed .venv. Qdrant auto-spawned. Nothing leaks into your system install.

Same identity, any machine

Project detection that survives renames

Canonical project name from manifests (package.json, go.mod, pyproject.toml), not path. Sync memory between your laptop and a cloud VPS and everything lines up.

Temporal graph

Not just vectors — facts with a timeline

A SQLite knowledge graph tracks subject → predicate → object with validity windows. Decisions, bug fixes, and patterns are recalled with the context of when they were true.

How it works

Everything in the dashed box runs on your machine.

Your AI agent speaks MCP to a local Imprint server. Imprint talks to a Qdrant daemon on 127.0.0.1:6333, a SQLite fact graph, and an on-device ONNX embedding model. No API call leaves this box — unless you deliberately flip a switch.

Default posture

All storage, embeddings, and tagging stay on-device. Nothing crosses the dashed line.

Opt-in cloud LLM tagging

A single imprint config set tagger.llm true enables LLM-quality topics. Off by default.

Multi-machine sync

WebSocket relay lets a laptop and a VPS share memory while keeping the data plane peer-to-peer.

Benchmarks

Real numbers, reproducible in one command.

Every prompt runs 5 times in each mode. Medians are reported. Raw JSON per run is checked in. See the full deep-dive or BENCHMARK.md on GitHub.

See every prompt →

Information

-87.2%

-42.6%

Decision Recall

-78.8%

-46.1%

Debugging

-94.2%

-68.3%

Cross-Project

-90.6%

-46.9%

Session Summary

+179.6%

+204.1%

Creation

+9.9%

+15.1%

Token % first, cost % below. Both measured as ON vs OFF medians per category. Session Summary and Creation are honest losses — ON mode explored more context than it needed.

Any MCP agent

One memory. Every coding tool you use.

imprint setup all auto-wires every supported host that's installed. Missing tools are skipped with a warning, never a failure.

Claude Code

claude-code

Config

~/.claude/settings.json

Enforcement

Hard (PreToolUse)

Cursor

cursor

Config

~/.cursor/mcp.json + rules/imprint.mdc

Enforcement

Rule-driven

GitHub Copilot

copilot

Config

<VSCode user>/mcp.json

Enforcement

Text-only

Codex CLI

codex

Config

~/.codex/config.toml

Enforcement

Text-only

Cline

cline

Config

VSCode globalStorage / ~/.cline

Enforcement

Text-only

Install

30 seconds. No sudo. No system pollution.

Self-contained .venv, auto-downloaded Qdrant binary, and every host configured in one run. imprint disable is symmetric. Pin a specific version? Use IMPRINT_VERSION=vX.Y.Z before the install command. All releases on GitHub ↗.

curl -fsSL https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.sh | bash

Then: imprint setup all · imprint ingest .

irm https://raw.githubusercontent.com/alexandruleca/imprint-memory-layer/main/install.ps1 | iex

Then: imprint setup all · imprint ingest .

docker run -d -p 8420:8420 -v imprint-data:/data ghcr.io/alexandruleca/imprint:latest

Dashboard UI on localhost:8420 · ingest via docker exec or the ingest_content MCP tool. Stdio MCP still installs locally for host IDE integration. Full guide: Docker docs.

docker run -d -p 8430:8430 ghcr.io/alexandruleca/imprint-relay:latest

Self-hosted sync relay only — small Go binary, stateless. For the full runtime use the Docker tab above. See Docker docs.

Give your agent a memory worth remembering.

Apache 2.0. Self-updating. Uninstall with one command. Start on your laptop, sync to a VPS when you're ready.

Install Imprint Star on GitHub ↗