Apr 1, 2026 · 11 min read
When AI Agents Dream: Memory Consolidation for the Obsidian Vault
Teaching AI agents to review their own session transcripts, catch lost decisions, and consolidate memory overnight. Part 3 of the AI memory series.
TL;DR
/memory-sync --dream scans session transcripts for corrections, decisions, and preferences the agent heard but never wrote down. Plus an append-only decisions log and codebase archaeology on init.
Part 3 of the AI Memory series. Part 1: AI Memory Without the Overhead | Part 2: Making AI Memory Stick
In which the agent learns to review its own homework, and I still haven't done mine.
I ended Part 2 saying I was going to study. You know, the cert resit that's been haunting my daily note for weeks. The one I keep rolling forward because there's always something more interesting to build.
Well let me tell you something, I didn't study. But somewhere between telling myself I'd start tomorrow and actually opening the practice exams, I found out that Anthropic has been building a /dream command into Claude Code. It's not released yet; someone reverse-engineered it from the binary and found it sitting behind a tengu_onyx_plover feature flag. Claude's Corner did a good writeup on it alongside a bunch of other persistence tooling that dropped the same week.
Its basically a background agent that consolidates your session memories while you're away. Reviews what happened, resolves contradictions, prunes stale entries, normalises dates. The community even built an open-source replica that works today without waiting for the official rollout.
Your AI agent doing what your brain does during sleep. Consolidating the day's noise into clean, structured memory. I liked that idea a lot.
And then my brain did the thing again. The "oh that's interesting, but what if it also wrote to Obsidian" thing. The thing that keeps happening instead of doing the other stuff i sat down at my computer to do in the first place.
Yeah yeah, I know. I'll get to it.
Part 2 left me with a system that was more reliable at capturing memories. Hooks that fire every time, a staging bridge to Obsidian, three tiers of storage. But "reliably capturing" and "actually learning from" turned out to be different problems. The system was writing session notes dutifully. Nobody was reading them back and asking "hang on, didn't we decide the opposite of this three weeks ago?"
So that's what this post is about. Teaching the system to review its own work.
The Three Gaps
After running the Part 2 system across a handful of projects for a few weeks, three patterns kept showing up.
Decisions were scattered. I'd make an architectural decision mid-session - "use DynamoDB for state locking, not S3" - and the agent would acknowledge it, act on it, and then the decision would end up buried in paragraph three of a session note somewhere. Three weeks later, a fresh session on the same project would re-debate the same question. Because finding the decision meant searching across a dozen session notes and hoping the right keywords were in there.
New projects started blind. The /memory-init command detected tech stack and build commands, but it knew nothing about the codebase's history. Which files broke most often? What patterns were already established? What had been tried and binned? The agent was starting every brownfield project like it was greenfield. Technically correct about the stack, completely ignorant about the context.
Silent knowledge loss. This was the subtle one. You'd correct the agent mid-session - "actually no, always use pytest, not unittest for this project" - and it would adjust immediately. But that correction lived only in the conversation. When the session ended without a /memory-sync (because it didn't feel "significant enough"), the correction died with it. Next session, same mistake. Every time.
Decisions That Don't Get Lost
The fix for the first gap is almost embarrassingly simple. An append-only _decisions.md file per project.
Every project in 5 Agent Memory/sessions/by-project/<slug>/ now gets a single decisions file. Not one per session. One per project, ever-growing, chronological. When /memory-sync runs, it doesn't just write a session note anymore. It also appends each decision to _decisions.md with the date, context, rationale, and a backlink to the session it came from.
### 2026-03-17 — Use hooks over CLAUDE.md-only enforcement
**Context:** CLAUDE.md instructions are advisory and ignored after compaction.
**Decision:** Three deterministic hooks enforce memory behaviour.
**Rationale:** Hooks fire every time. Belt and braces.
**Session:** [[2026-03-17-persistent-memory-architecture]]
If you've come across Architecture Decision Records before, this is the same idea but lighter. No template bureaucracy. Just the four fields that matter. When a fresh agent needs to know "why DynamoDB and not S3 for state locking?", it searches one file, gets the answer with rationale, and cracks on.
There's also a standalone /decision command for logging decisions outside of a full sync. Mid-session, you realise something important just got decided - type /decision and it appends to the log without the overhead of writing a whole session note. The decisions accumulate either way.
Projects That Know Their Own History
The second gap needed a different approach. Tech stack detection in /memory-init was already solid. What was missing was codebase archaeology.
When you run /memory-init on a brownfield project now, it dispatches three parallel subagents after the tech stack detection:
| Subagent | What It Does |
|---|---|
| Structure | Maps the directory layout, entry points, key modules, how things connect |
| Patterns | Identifies coding conventions, architecture patterns, naming standards already in use |
| History | Scans git log for recent activity, frequent change areas, infers past decisions from commit messages |
The output gets folded into the project CLAUDE.md under an Architecture section with three subsections. It's not perfect - git log archaeology can only infer so much. But it's a massive step up from nothing. The agent starts its first session knowing that src/discovery.py has been the most modified file, that the project uses the repository pattern, and that someone switched from CloudFormation to Terraform six months ago.
Is this as thorough as a dedicated tool like codeset.ai that does full AST parsing and caller graph extraction? No. But it runs at init time, costs nothing, and covers the 80% case. Good enough for free beats perfect for money.
The Dream
Right, the interesting bit.
Claude Code has an unreleased auto-dream feature baked into the binary. It's behind a feature flag (tengu_onyx_plover) and not in the official docs. The community found it, reverse-engineered the behaviour, and someone built a standalone skill that replicates the same four-phase consolidation without needing the feature flag. You can use it today.
What it does: periodically spawns a background agent that performs a consolidation pass over your auto-memory files. Like your brain during sleep. Reviewing the day, keeping important memories, binning noise, resolving contradictions. Four phases: orient (read current state), gather signal (scan recent transcripts), consolidate (merge findings), prune and index (keep things lean).
The problem - for me at least - is that both Anthropic's version and the community replica only operate on Tier 2. Claude Code's auto-memory at ~/.claude/projects/*/memory/. Fine if that's your only memory system. I have three tiers and an Obsidian vault. I needed to dream bigger.
So /memory-sync --dream does the same four phases but writes to Obsidian.
Phase 2 Is the One That Matters
Orient and prune are housekeeping. Consolidate is merging. Phase 2 is the one that fills the gap: gather signal.
Claude Code stores every session as JSONL files. Every message you sent, every response the agent gave - it's all sitting on disk. The dream command greps these transcripts for specific patterns:
| What It Looks For | Why |
|---|---|
| "actually", "no,", "wrong", "I meant" | Corrections the agent acknowledged but never wrote down |
| "I prefer", "always use", "from now on" | Preferences expressed casually mid-conversation |
| "let's go with", "we're using", "the plan is" | Decisions made but never logged |
| "every time", "keep forgetting", "as usual" | Recurring patterns worth codifying |
It doesn't read entire transcripts. That would eat tokens for breakfast. Targeted grep, then read only the surrounding context of each hit. The output is a list of findings with dates, confidence levels, and source sessions.
What Happens with the Findings
Each finding gets cross-referenced against what's already in the vault.
Decisions go to _decisions.md. Already logged? Skip. Contradicts an existing decision? Flag it for review. Don't auto-resolve.
Corrections and preferences get proposed as learnings. Still needs my approval. The system was designed with human curation at the top, and the dream doesn't change that.
Contradictions get presented explicitly:
⚠ CONTRADICTION
Existing: "Use pytest for all testing" (learnings, 2026-03-01)
New: "Use unittest, pytest is overkill" (session 2026-03-20)
Project-specific override or global preference change?
The agent asks. I decide. That's the whole point.
The 24-Hour Timer
The dream doesn't run every session. That would be wasteful. The Stop hook already tracks message counts and session duration. Now it also checks a per-project timestamp:
LAST_DREAM=$(cat "$PROJECT_DIR/.last-dream" 2>/dev/null || echo "0")
HOURS_SINCE=$(( ($(date +%s) - LAST_DREAM) / 3600 ))
if [[ "$HOURS_SINCE" -ge 24 ]]; then
touch "$PROJECT_DIR/.dream-pending"
fi
Per-project, not global. That was a deliberate decision from the implementation session - a global flag means running a dream on project A would clear the nudge for project B. Each project has its own timer. Your main project dreams daily. That side project you touch once a month doesn't burn cycles.
When the SessionStart hook sees .dream-pending, it tells Claude to run the dream as a background subagent. You don't wait for it. You start working. The dream runs alongside you, and when it finishes, the next /memory-load picks up whatever it found.
What Changed vs Parts 1 and 2
| Part 1 | Part 2 | Part 3 |
|---|---|---|
| Write memory manually | Hooks enforce writing | Dream reviews what was written (and what wasn't) |
| Flat session notes | Project-scoped sessions | Append-only decisions log per project |
| No codebase awareness | Tech stack detection | Codebase archaeology via subagents |
| — | Check for prior context on start | Scan transcripts for lost knowledge |
| — | — | Contradiction detection across learnings |
| — | — | 24hr auto-trigger, background execution |
The Honest Assessment
Is it worth it? The decisions log alone justified the work. One file per project, searchable, chronological, append-only. That should have existed from Part 1. But we're all learning together, especially me.
The codebase analysis is useful but not transformative. It makes the first session on a brownfield project noticeably better. After that, the agent has session history and doesn't need the init context as much. Nice to have. Not essential.
The dream is the one I'm still making my mind up about. The transcript scanning catches things I missed - corrections and preferences that slipped through because I didn't run /memory-sync after a quick session. But it's also the most expensive operation (scanning JSONL files uses real tokens), and the approval flow for background runs isn't fully smooth yet. Queuing findings for the next interactive session works but feels like it needs tightening.
The token cost of scanning a week of transcripts is... noticeable. Not ruinous, but not free. Running it per-project every 24 hours feels about right. Running it globally across all projects would be taking the piss.
Maybe a /dream-big command for a future iteration aimed at the global scope to understand more your coding style and further consolidate those decisions and preferences across all projects?
Is it perfect? Still no. Is it meaningfully better than Part 2? Yes. The system doesn't just record anymore. It reviews, consolidates, and catches the things I forgot to tell it to remember.
Yeah yeah, I know I'm supposed to be studying. I'll get to it.
Part 1 was the vault. Part 2 was the enforcement. Part 3 is the consolidation. Part 4... honestly, I still want to build the semantic search layer. SQLite FTS5 with sqlite-vec for hybrid keyword + vector search. But I said that at the end of Part 2 and then built something completely different instead. So who knows.
The system is open source at agent-memory-cc-v2. If you've been running your own agent memory setup, I'd be curious how you handle the consolidation problem. If you haven't started yet, Part 1 is still the right place to begin. Walk before you dream.
✦ Key Takeaways
- An append-only _decisions.md per project gives agents a single searchable file for past architectural decisions
- Codebase archaeology via parallel subagents gives brownfield projects meaningful context at init time
- Dream consolidation scans session transcripts for corrections and preferences that were never formally saved
- A 24-hour per-project timer prevents wasteful dream cycles while keeping active projects current
- Contradiction detection flags conflicting decisions for human review rather than auto-resolving