Skip to content
Download

Configuration

All settings can be persisted via imprint config instead of setting environment variables. Settings are stored in data/config.json (gitignored).

Precedence: env var > config.json > hardcoded default. Environment variables always win, so you can override config.json for one-off runs.

Terminal window
# Switch to a different embedding model
imprint config set model.name nomic-ai/nomic-embed-text-v2-moe
imprint config set model.dim 768
imprint config set model.seq_length 512
# Use local Ollama for LLM tagging
imprint config set tagger.llm true
imprint config set tagger.llm_provider ollama
imprint config set tagger.llm_model llama3.2
# Or fully in-process via llama-cpp (no server to run)
imprint config set tagger.llm true
imprint config set tagger.llm_provider local
# Custom Qdrant server
imprint config set qdrant.host 192.168.1.50
imprint config set qdrant.no_spawn true
# See what's changed
imprint config
# Show one setting + where the current value came from
imprint config get model.device
# One-off env override (doesn't persist)
IMPRINT_DEVICE=gpu imprint ingest ~/code
# Remove one override
imprint config reset model.device
# Wipe all overrides
imprint config reset --all

imprint config groups settings visually by prefix (model.*, qdrant.*, chunker.*, tagger.*, tagger.local.*, ingest.*, chat.*, summarizer.*). Non-default values are highlighted by source (config.json cyan, env yellow).

KeyDefaultDescription
Embedding model
model.nameonnx-community/embeddinggemma-300m-ONNXHuggingFace embedding model repo
model.fileautoONNX model file (auto = pick by device)
model.deviceautoCompute device: auto / cpu / gpu
model.dim768Embedding vector dimension
model.seq_length2048Token cap per embed call
model.threads4CPU intra-op threads for ONNX
model.gpu_mem_mb2048VRAM cap for ORT CUDA arena (MB)
model.gpu_device0CUDA device ID
model.batch_size0Embedding batch size (0 = auto: 32 GPU, 16 CPU)
model.poolingautoPooling strategy: auto / cls / mean / last
Qdrant
qdrant.host127.0.0.1Qdrant bind/connect host
qdrant.port6333Qdrant HTTP port
qdrant.grpc_port6334Qdrant gRPC port
qdrant.versionv1.17.1Pinned Qdrant release for auto-download
qdrant.no_spawnfalseSkip auto-spawn (BYO server)
Chunker
chunker.overlap400Sliding overlap chars between chunks
chunker.size_code4000Soft target chunk size for code
chunker.size_prose6000Soft target chunk size for prose
chunker.hard_max8000Absolute max chunk size
chunker.semantic_threshold0.5Topic-shift threshold for SemanticChunker (lower = sharper splits)
Tagger
tagger.zero_shottrueEnable zero-shot topic tagging
tagger.llmfalseEnable LLM topic tagging (replaces zero-shot during ingest/refresh)
tagger.llm_provideranthropicLLM provider: anthropic / openai / ollama / vllm / gemini / local
tagger.llm_modelclaude-haiku-4-5LLM tagger model name
tagger.llm_base_urlLLM tagger API base URL override
Tagger — local model (llama-cpp, provider=local)
tagger.local.model_repounsloth/Qwen3-1.7B-GGUFHF repo for GGUF auto-download
tagger.local.model_fileQwen3-1.7B-Q4_K_M.ggufGGUF filename within the repo
tagger.local.model_pathAbsolute path to a local GGUF (overrides repo/file)
tagger.local.n_ctx8192Tagger context window in tokens
tagger.local.n_gpu_layers-1GPU layers to offload (-1 = all)
Ingest (docs + URLs)
ingest.doc_formatspdf,docx,pptx,xlsx,csv,epub,rtf,html,eml,jsonComma-separated doc formats enabled for the file walker
ingest.ocr_enabledfalseOCR for scanned PDFs + images (requires tesseract)
ingest.ocr_langengTesseract language codes (e.g. eng+fra)
ingest.max_doc_size_mb25Per-file byte cap for document extraction
ingest.url_timeout_sec30HTTP connect timeout for URL fetch
ingest.url_read_timeout_sec300Per-chunk read timeout for URL fetch (raise for very large files)
ingest.url_user_agentimprint/1.0HTTP User-Agent header
ingest.url_respect_robotstrueCheck robots.txt before fetching
Chat (dashboard panel)
chat.enabledtrueEnable the dashboard chat panel
chat.providerlocalChat provider: local / vllm / openai / ollama / gemini / anthropic
chat.modelModel name for remote providers (default per provider)
chat.base_urlBase URL override for OpenAI-compat providers
chat.model_repounsloth/gemma-4-E4B-it-GGUFHF repo for GGUF auto-download (local provider)
chat.model_filegemma-4-E4B-it-Q4_K_M.ggufGGUF filename within the repo
chat.model_pathAbsolute path to a local GGUF (overrides repo/file)
chat.n_ctx16384Chat context window tokens
chat.n_gpu_layers-1GPU layers to offload (-1 = all)
chat.max_tokens1024Max tokens per chat response
chat.temperature0.3Chat sampling temperature
chat.max_tool_iters6Max tool-call iterations per chat turn
Session summarizer (opt-in, Stop hook)
summarizer.enabledfalseEnable the LLM-based session summarizer
summarizer.providerollamaProvider: ollama / vllm / anthropic / openai / gemini
summarizer.modelqwen3:1.7bModel name for the chosen provider
summarizer.base_urlAPI base URL override
summarizer.min_messages5Skip sessions with fewer messages
summarizer.max_input_tokens20000Truncate transcript before summarizing
Other
collectionmemoriesDefault Qdrant collection name (workspace suffix is appended automatically)

Each setting also has an IMPRINT_* env var (shown via imprint config get <key>). Env vars still work and take priority over config.json.

These don’t live in config.json because they’re path-shaped infra knobs the config system is not designed to manage. Set them at process start:

Env varDefaultPurpose
IMPRINT_DATA_DIR<repo>/data or ~/.local/share/imprint/dataRoot for workspaces, Qdrant storage, SQLite graphs, config.json. Override for multi-install setups or when running out of a read-only tree.
IMPRINT_QDRANT_BINauto (downloaded into data/qdrant-bin/)Path to a system-installed qdrant binary, bypassing the pinned auto-download.
IMPRINT_MCP_IDLE_S30Seconds of MCP inactivity after which the server releases the embedded Qdrant client lock so a separate imprint ingest can grab it.