Mar 26, 2026 airag

Creating a Domain-Based Memory System for AI Assistants

How I gave every AI tool in my homelab persistent, domain-isolated memory across sessions using ChromaDB, Ollama embeddings, and a custom REST API.

The Problem: AI Amnesia

Every time you start a new AI session — whether that's Claude Code, Open WebUI, or a custom agent — it starts from zero. It doesn't remember what you worked on yesterday, what conventions you prefer, or what patterns you've already established. For a homelab where I'm constantly iterating on the same infrastructure, this was frustrating.

The obvious first approach is markdown files — a CLAUDE.md, a project notes file, or a running log of decisions. That works fine when the knowledge base is small. But as it grows, you're loading hundreds or thousands of lines of context just to start a session, most of which isn't relevant to what you're about to work on. You burn through context window before you've written a single line of code, and the AI has to wade through everything to find the parts that actually matter.

I wanted my AI tools to remember things between sessions — past decisions, code patterns, project context, and my preferences. So I built a persistent memory system using vector embeddings and semantic search.

The Architecture

The system has three core components running on my AI server (homelab-ai):

Memory Service (port 8200) — A custom FastAPI REST API that handles storing and querying memories
ChromaDB (port 8100) — A vector database that stores the actual embeddings
Ollama (port 11434) — Provides the embedding model (nomic-embed-text) that converts text into vectors

The flow is simple: when you store a memory, the service sends the text to Ollama to generate an embedding vector, then stores both the text and vector in ChromaDB. When you query, your question gets embedded the same way, and ChromaDB finds the most semantically similar stored memories.

The Memory API

The REST API is straightforward:

# Store a memory
POST /memory/store
{
  "content": "The homelab uses Traefik for reverse proxying with Let's Encrypt SSL",
  "collection": "project_context",
  "tags": ["infrastructure", "traefik"]
}

# Semantic search
POST /memory/query
{
  "query": "How does SSL work in my homelab?",
  "n_results": 5
}

# List recent memories
GET /memory/list/sessions?limit=10

# Health check
GET /health

Results come back with a distance score — lower means more relevant. A query about "SSL certificates" would match the Traefik memory even though neither used the exact same words, because the embeddings capture the semantic relationship.

Organizing Memories Into Collections

I use four collections to keep things organized:

sessions — AI session summaries and learnings. After a productive session, key takeaways get stored here regardless of which tool I was using.
code_patterns — Reusable patterns and solutions. When I figure out something tricky, it goes here so future sessions can find it.
project_context — Project-specific decisions and architecture notes. Why we chose Traefik over Nginx, how the network is laid out, etc.
user_preferences — My conventions and preferences. Coding style, preferred tools, naming patterns.

Memory Domains

Collections handle the type of memory, but domains handle who gets access to it. Each domain has its own set of ChromaDB collections and its own access permissions, so different AI tools and agents only see what they're supposed to.

I currently run three domains:

Homelab — Infrastructure knowledge, service configurations, troubleshooting history, network topology. This is what most of the article covers. Most AI tools on the network have read access here.
Personal — Travel memories, life events, personal notes. More restricted — only specific tools and agents can query this domain.
Financial — Financial details and records. Tightly locked down. A general-purpose chatbot in Open WebUI should never have access to this, even transparently.

The domain boundary is enforced at the API level. Each request specifies a domain, and the service checks whether the caller's API key is authorized for that domain before touching ChromaDB. This means you can give an Ollama-backed chatbot broad access to homelab knowledge without worrying it'll accidentally surface financial records in a response.

Who Uses the Memory?

This is a model-agnostic system — it doesn't care whether the consumer is a cloud model, a local Ollama model, or a custom Python agent. Anything that can make an HTTP request can use it.

AI Assistant Skill

A custom skill lets me type /memory save or /memory recall [topic] during any AI session. It shells out to curl against the REST API. This is how I persist learnings across sessions regardless of which tool I'm using.

Open WebUI (Transparent RAG)

This is the most interesting integration. I built a RAG filter that automatically intercepts every message before it reaches the LLM — whether that's llama3.2, mistral, gemma, or any other model loaded in Ollama. It queries the memory API with my message, filters by relevance, and prepends matching memories into the system prompt. The model gets relevant context without me having to ask for it, and it works identically across every model I switch between.

AL-1S Worker Agents

My custom Python automation agents (the AL-1S workers) query and write to the memory service as part of their workflows. When an agent troubleshoots a service outage, it logs the resolution to memory so future agents — and future me — can find it. These agents talk directly to the API with their own domain-scoped keys.

Wiki Sync

An async Python service polls the memory collections every 60 seconds and auto-publishes to Wiki.js. It uses Ollama (llama3.2) to summarize each memory entry before publishing, and generates a "What's New" dashboard showing recent activity. This gives me a browsable knowledge base that stays in sync with what the AI knows.

The Embedding Model

I use Ollama's nomic-embed-text model for generating embeddings. It's lightweight enough to run alongside my LLM workloads and produces good quality embeddings for technical content. The same model is shared between the memory service and Open WebUI's built-in RAG features.

Deployment

Everything runs in Docker on homelab-ai, managed by Ansible:

# From ai/docker-compose.yml
claude-memory:
  image: claude-memory-service:latest
  ports:
    - "8200:8200"
  environment:
    - CHROMA_HOST=chromadb
    - CHROMA_PORT=8100
    - OLLAMA_HOST=host.docker.internal
    - OLLAMA_PORT=11434

chromadb:
  image: chromadb/chroma:latest
  ports:
    - "8100:8000"
  volumes:
    - chromadb_data:/chroma/chroma

Traefik provides HTTPS at memory.local.brianrogers.dev (with a themed alias at databank.local.brianrogers.dev because everything in my homelab has a Star Wars name).

Backups

ChromaDB data gets backed up daily at 3 AM via a systemd timer. The backup script does an rsync + tar.gz compression, stores it locally with 7-day retention, and pushes to MinIO (my self-hosted S3-compatible storage). There's a restore script that handles stopping ChromaDB, swapping the data, and restarting — it even creates a pre-restore safety backup.

What Makes It Useful

The transparent RAG integration in Open WebUI is the killer feature. When I ask a question about my homelab, the AI already knows my setup because relevant memories are silently injected into the context. It's the difference between starting every conversation from scratch and having a knowledgeable assistant that actually remembers your infrastructure.

The wiki sync is a nice bonus — it turns the AI's accumulated knowledge into a searchable knowledge base that's useful even without an AI interface.

Getting Started

If you want to build something similar:

Start with ChromaDB and Ollama — they're the foundation
Build a simple REST API that wraps ChromaDB's client library
Pick an embedding model (nomic-embed-text is a solid default)
Create a few collections to organize your memories
Build integrations one at a time — start with manual store/query, then add transparent RAG

The whole system is maybe 400 lines of Python for the memory service, plus another couple hundred for each integration. Nothing fancy, but it fundamentally changes how useful AI tools are when they can remember what you've told them.