Skip to main content

Memory

Clawdia memory is plain Markdown in the agent workspace. The files are the source of truth; the model only “remembers” what gets written to disk. Memory search tools are provided by the active memory plugin (default: memory-core). Disable memory plugins with plugins.slots.memory = "none".

Memory files (Markdown)

The default workspace layout uses two memory layers:
  • memory/YYYY-MM-DD.md
    • Daily log (append-only).
    • Read today + yesterday at session start.
  • MEMORY.md (optional)
    • Curated long-term memory.
    • Only load in the main, private session (never in group contexts).
These files live under the workspace (agents.defaults.workspace, default ~/clawd). See Agent workspace for the full layout.

When to write memory

  • Decisions, preferences, and durable facts go to MEMORY.md.
  • Day-to-day notes and running context go to memory/YYYY-MM-DD.md.
  • If someone says “remember this,” write it down (do not keep it in RAM).
  • This area is still evolving. It helps to remind the model to store memories; it will know what to do.
  • If you want something to stick, ask the bot to write it into memory.

Automatic memory flush (pre-compaction ping)

When a session is close to auto-compaction, Clawdia triggers a silent, agentic turn that reminds the model to write durable memory before the context is compacted. The default prompts explicitly say the model may reply, but usually NO_REPLY is the correct response so the user never sees this turn. This is controlled by agents.defaults.compaction.memoryFlush:
{
  agents: {
    defaults: {
      compaction: {
        reserveTokensFloor: 20000,
        memoryFlush: {
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
        }
      }
    }
  }
}
Details:
  • Soft threshold: flush triggers when the session token estimate crosses contextWindow - reserveTokensFloor - softThresholdTokens.
  • Silent by default: prompts include NO_REPLY so nothing is delivered.
  • Two prompts: a user prompt plus a system prompt append the reminder.
  • One flush per compaction cycle (tracked in sessions.json).
  • Workspace must be writable: if the session runs sandboxed with workspaceAccess: "ro" or "none", the flush is skipped.
For the full compaction lifecycle, see Session management + compaction. Clawdia can build a small vector index over MEMORY.md and memory/*.md so semantic queries can find related notes even when wording differs. Defaults:
  • Enabled by default.
  • Watches memory files for changes (debounced).
  • Uses remote embeddings by default. If memorySearch.provider is not set, Clawdia auto-selects:
    1. local if a memorySearch.local.modelPath is configured and the file exists.
    2. openai if an OpenAI key can be resolved.
    3. gemini if a Gemini key can be resolved.
    4. Otherwise memory search stays disabled until configured.
  • Local mode uses node-llama-cpp and may require pnpm approve-builds.
  • Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
Remote embeddings require an API key for the embedding provider. Clawdia resolves keys from auth profiles, models.providers.*.apiKey, or environment variables. Codex OAuth only covers chat/completions and does not satisfy embeddings for memory search. For Gemini, use GEMINI_API_KEY or models.providers.google.apiKey. When using a custom OpenAI-compatible endpoint, set memorySearch.remote.apiKey (and optional memorySearch.remote.headers).

Gemini embeddings (native)

Set the provider to gemini to use the Gemini embeddings API directly:
agents: {
  defaults: {
    memorySearch: {
      provider: "gemini",
      model: "gemini-embedding-001",
      remote: {
        apiKey: "YOUR_GEMINI_API_KEY"
      }
    }
  }
}
Notes:
  • remote.baseUrl is optional (defaults to the Gemini API base URL).
  • remote.headers lets you add extra headers if needed.
  • Default model: gemini-embedding-001.
If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy), you can use the remote configuration with the OpenAI provider:
agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://api.example.com/v1/",
        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
        headers: { "X-Custom-Header": "value" }
      }
    }
  }
}
If you don’t want to set an API key, use memorySearch.provider = "local" or set memorySearch.fallback = "none". Fallbacks:
  • memorySearch.fallback can be openai, gemini, local, or none.
  • The fallback provider is only used when the primary embedding provider fails.
Batch indexing (OpenAI + Gemini):
  • Enabled by default for OpenAI and Gemini embeddings. Set agents.defaults.memorySearch.remote.batch.enabled = false to disable.
  • Default behavior waits for batch completion; tune remote.batch.wait, remote.batch.pollIntervalMs, and remote.batch.timeoutMinutes if needed.
  • Set remote.batch.concurrency to control how many batch jobs we submit in parallel (default: 2).
  • Batch mode applies when memorySearch.provider = "openai" or "gemini" and uses the corresponding API key.
  • Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
Why OpenAI batch is fast + cheap:
  • For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
  • OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
  • See the OpenAI Batch API docs and pricing for details:
Config example:
agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      fallback: "openai",
      remote: {
        batch: { enabled: true, concurrency: 2 }
      },
      sync: { watch: true }
    }
  }
}
Tools:
  • memory_search — returns snippets with file + line ranges.
  • memory_get — read memory file content by path.
Local mode:
  • Set agents.defaults.memorySearch.provider = "local".
  • Provide agents.defaults.memorySearch.local.modelPath (GGUF or hf: URI).
  • Optional: set agents.defaults.memorySearch.fallback = "none" to avoid remote fallback.

How the memory tools work

  • memory_search semantically searches Markdown chunks (~400 token target, 80-token overlap) from MEMORY.md + memory/**/*.md. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
  • memory_get reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside MEMORY.md / memory/ are rejected.
  • Both tools are enabled only when memorySearch.enabled resolves true for the agent.

What gets indexed (and when)

  • File type: Markdown only (MEMORY.md, memory/**/*.md).
  • Index storage: per-agent SQLite at ~/.clawdia/memory/<agentId>.sqlite (configurable via agents.defaults.memorySearch.store.path, supports {agentId} token).
  • Freshness: watcher on MEMORY.md + memory/ marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
  • Reindex triggers: the index stores the embedding provider/model + endpoint fingerprint + chunking params. If any of those change, Clawdia automatically resets and reindexes the entire store.

Hybrid search (BM25 + vector)

When enabled, Clawdia combines:
  • Vector similarity (semantic match, wording can differ)
  • BM25 keyword relevance (exact tokens like IDs, env vars, code symbols)
If full-text search is unavailable on your platform, Clawdia falls back to vector-only search.

Why hybrid?

Vector search is great at “this means the same thing”:
  • “Mac Studio gateway host” vs “the machine running the gateway”
  • “debounce file updates” vs “avoid indexing on every write”
But it can be weak at exact, high-signal tokens:
  • IDs (a828e60, b3b9895a…)
  • code symbols (memorySearch.query.hybrid)
  • error strings (“sqlite-vec unavailable”)
BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. Hybrid search is the pragmatic middle ground: use both retrieval signals so you get good results for both “natural language” queries and “needle in a haystack” queries.

How we merge results (the current design)

Implementation sketch:
  1. Retrieve a candidate pool from both sides:
  • Vector: top maxResults * candidateMultiplier by cosine similarity.
  • BM25: top maxResults * candidateMultiplier by FTS5 BM25 rank (lower is better).
  1. Convert BM25 rank into a 0..1-ish score:
  • textScore = 1 / (1 + max(0, bm25Rank))
  1. Union candidates by chunk id and compute a weighted score:
  • finalScore = vectorWeight * vectorScore + textWeight * textScore
Notes:
  • vectorWeight + textWeight is normalized to 1.0 in config resolution, so weights behave as percentages.
  • If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
  • If FTS5 can’t be created, we keep vector-only search (no hard failure).
This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes. If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization (min/max or z-score) before mixing. Config:
agents: {
  defaults: {
    memorySearch: {
      query: {
        hybrid: {
          enabled: true,
          vectorWeight: 0.7,
          textWeight: 0.3,
          candidateMultiplier: 4
        }
      }
    }
  }
}

Embedding cache

Clawdia can cache chunk embeddings in SQLite so reindexing and frequent updates (especially session transcripts) don’t re-embed unchanged text. Config:
agents: {
  defaults: {
    memorySearch: {
      cache: {
        enabled: true,
        maxEntries: 50000
      }
    }
  }
}

Session memory search (experimental)

You can optionally index session transcripts and surface them via memory_search. This is gated behind an experimental flag.
agents: {
  defaults: {
    memorySearch: {
      experimental: { sessionMemory: true },
      sources: ["memory", "sessions"]
    }
  }
}
Notes:
  • Session indexing is opt-in (off by default).
  • Session updates are debounced and indexed asynchronously once they cross delta thresholds (best-effort).
  • memory_search never blocks on indexing; results can be slightly stale until background sync finishes.
  • Results still include snippets only; memory_get remains limited to memory files.
  • Session indexing is isolated per agent (only that agent’s session logs are indexed).
  • Session logs live on disk (~/.clawdia/agents/<agentId>/sessions/*.jsonl). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
Delta thresholds (defaults shown):
agents: {
  defaults: {
    memorySearch: {
      sync: {
        sessions: {
          deltaBytes: 100000,   // ~100 KB
          deltaMessages: 50     // JSONL lines
        }
      }
    }
  }
}

SQLite vector acceleration (sqlite-vec)

When the sqlite-vec extension is available, Clawdia stores embeddings in a SQLite virtual table (vec0) and performs vector distance queries in the database. This keeps search fast without loading every embedding into JS. Configuration (optional):
agents: {
  defaults: {
    memorySearch: {
      store: {
        vector: {
          enabled: true,
          extensionPath: "/path/to/sqlite-vec"
        }
      }
    }
  }
}
Notes:
  • enabled defaults to true; when disabled, search falls back to in-process cosine similarity over stored embeddings.
  • If the sqlite-vec extension is missing or fails to load, Clawdia logs the error and continues with the JS fallback (no vector table).
  • extensionPath overrides the bundled sqlite-vec path (useful for custom builds or non-standard install locations).

Local embedding auto-download

  • Default local embedding model: hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf (~0.6 GB).
  • When memorySearch.provider = "local", node-llama-cpp resolves modelPath; if the GGUF is missing it auto-downloads to the cache (or local.modelCacheDir if set), then loads it. Downloads resume on retry.
  • Native build requirement: run pnpm approve-builds, pick node-llama-cpp, then pnpm rebuild node-llama-cpp.
  • Fallback: if local setup fails and memorySearch.fallback = "openai", we automatically switch to remote embeddings (openai/text-embedding-3-small unless overridden) and record the reason.

Custom OpenAI-compatible endpoint example

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://api.example.com/v1/",
        apiKey: "YOUR_REMOTE_API_KEY",
        headers: {
          "X-Organization": "org-id",
          "X-Project": "project-id"
        }
      }
    }
  }
}
Notes:
  • remote.* takes precedence over models.providers.openai.*.
  • remote.headers merge with OpenAI headers; remote wins on key conflicts. Omit remote.headers to use the OpenAI defaults.