A few weeks ago, my OpenClaw stopped working. At first, I thought it was a model problem. It took me a while to figure out I was wrong.
Here’s what happened: my agent kept forgetting important context. Skills weren’t being triggered correctly. Cron jobs stopped running reliably. I kept getting context-related errors in Telegram. And the longer a session ran, the more tokens it seemed to burn.
This happened when I was running Kimi K2.5 as my main model. I switched over to Sonnet 4.6, and things got a little better, especially with triggering skills and calling tools. But the real problem wasn’t the model. It was how OpenClaw manages memory and context behind the scenes.
Right around the time I was struggling with this, I saw a post from a developer named Ramya, who documented spending a week debugging her agent’s memory. She was running into some of the exact same issues I was hitting. Her article helped me a ton and inspired me to completely rethink the way my OpenClaw runs. The fixes discussed here are based on what I changed in my own setup to make my OpenClaw agent, Jarvis, more stable and much cheaper to run.
If you’d rather watch how to do this than read about it, I have a full video walkthrough on YouTube, which you can access here: https://youtu.be/UTztjR4o7Y8
Before You Start
If you followed my last article on the 3 Essential Tools for OpenClaw, this setup is the same. On whatever machine OpenClaw is running, open a Terminal and run:
cd ~/.openclaw
This puts you in the OpenClaw workspace folder. Launch Claude Code, Codex, or your preferred coding agent from this directory. All of the prompts below assume the agent can see your OpenClaw workspace.
If you’re wondering why I use Claude Code instead of OpenClaw itself for this kind of work, here’s the short answer: I want OpenClaw executing systems, not burning tokens to build them. Claude Code handles the engineering. OpenClaw runs the result. I covered this in more detail in the last article, so I won’t repeat myself here.
Now, let’s move on to the fixes.
Fix 1: Stop Losing Important Context During Compaction
Here’s the first thing to understand. OpenClaw has a finite context window, just like any AI agent. This is basically the agent’s short-term memory. As your conversation gets longer, OpenClaw does something called compaction: it compresses older messages into a summary to make room for new ones.
Sounds reasonable, right? The problem is that compaction treats everything equally. That important instruction you gave 20 messages ago gets the same compression treatment as everything else, no matter how relevant. Names, numbers, exact decisions, and other important details can get lost as the compaction process distills everything into a generic summary.
The rule is simple: if it’s only in the context window, it’s temporary. If it’s on disk, it survives.
The first fix is memory flush. This gives the agent a chance to write important information out to disk before compaction runs.
Paste this prompt into Claude Code:
Read the OpenClaw memory docs here:
https://docs.openclaw.ai/concepts/memory
Then inspect my current openclaw.json and enable memory flush under compaction.
Requirements:
1. Turn memory flush on
2. Use a soft threshold amount that is recommended in the documentation (if none is recommended, use 4000 tokens)
3. If the compaction block does not exist yet, create it
4. Do not duplicate existing settings
5. After making the change, explain in plain English what will happen before compaction runs
That handles the first failure mode. But there’s a second one: very long sessions.
Memory flush triggers once per compaction cycle. If you’re in a really long session, like a four-hour deep work session, compaction might run multiple times, and only the first one gets the flush treatment. So you also want session pruning to aggressively clean up old context:
Read the OpenClaw session pruning docs here:
https://docs.openclaw.ai/concepts/session-pruning
Then inspect my openclaw.json and configure session pruning (also known as context pruning).
Requirements:
1. Use cache-ttl mode
2. Set the TTL to whatever is recommended in the documentation (if nothing is recommended, use 4 hours)
3. Keep the last 3 assistant messages
4. If session/context pruning already exists, update it safely instead of creating duplicate blocks
5. After the change, summarize the final behavior in plain English
Fix 2: Make Retrieval Actually Work
Saving important information via memory flush is only half the job. Your agent also has to find this information again. And more importantly, it has to remember to look.
This is where a lot of setups quietly fail. The data exists somewhere on disk, but the agent never searches for it, or the default retrieval isn’t good enough to surface the right result.
If you followed my last article, you already know about QMD: the hybrid search engine by Tobi Lutke that combines keyword matching, vector semantic search, and an LLM re-ranker. If you haven’t set that up yet, go read that article first. It’s the single biggest upgrade you can make to OpenClaw’s memory.
But here’s the more subtle problem I didn’t cover last time: even with great search, your agent has to actually decide to search. And if the conversation doesn’t trigger the right cues, it just won’t look things up. The information exists. The agent doesn’t use it.
So you need to add explicit retrieval instructions to the top of your agent’s file (AGENTS.md). Think of this as a checklist: before doing anything, the agent searches for relevant context.
Paste this prompt into Claude Code:
Read the OpenClaw memory docs for QMD here:
https://docs.openclaw.ai/concepts/memory#qmd-backend-experimental
Then inspect my current OpenClaw setup and improve retrieval.
Requirements:
1. If I am not already using QMD as the memory backend, install and configure it
2. Verify the memory backend is working with a test query
3. Update the top of my AGENTS.md instructions so that before starting any task the agent:
- searches daily logs for relevant context
- checks a central learnings file (LEARNINGS.md) for rules related to the task
4. Keep the boot instructions concise and operational
5. After the changes, explain the new retrieval flow in plain English
What this does is it grounds your OpenClaw’s actions in any relevant memories that can be searched in QMD, as well as a central list of rules that guard against mistakes and reinforce other lessons learned: a LEARNINGS.md file. This was another revelation from Ramya’s article: that every time the OpenClaw agent makes a mistake, you make the agent write a one-line or two-line rule in a LEARNINGS.md file to prevent the agent from making the same mistake again. This, coupled with the agent’s smarter search and retrieval of memories via QMD, will make your agent much smarter and more efficient.
Fix 3: The Heartbeat Cost Trap
This was probably the most expensive mistake in my setup, and nobody really warns you about it.
If you have heartbeat enabled, which you should by default, your agent wakes up every 30 minutes to check on things. You arguably need this to make your OpenClaw truly a realtime agent. But, you should also know that every single heartbeat is a full agent turn, not a lightweight ping. A full API call that carries the entire session context.
That means every 30 minutes, your agent re-sends your entire system prompt — AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, HEARTBEAT.md, MEMORY.md — plus all your skill metadata, plus whatever conversation history is in the session. That’s potentially 10,000 to 15,000 tokens, forty-eight times a day. Even on a cheap model, that adds up fast.
Here’s what the optimized heartbeat config looks like:
{
"agents": {
"defaults": {
"heartbeat": {
"every": "30m",
"lightContext": true,
"model": "google/gemini-3.1-flash-lite-preview",
"activeHours": { "start": "08:00", "end": "23:00" }
}
}
}
}
The key changes, and why each one matters:
- lightContext: true — This is a big one. By default, every heartbeat loads your entire system prompt. With light context enabled, the heartbeat only loads HEARTBEAT.md. That’s it. All other files get skipped. Instead of sending 10,000 to 15,000 tokens on every heartbeat, you’re sending maybe a few hundred tokens.
- Cheap model — There’s no reason to burn Opus or Sonnet tokens on a heartbeat. It’s just checking whether anything needs attention. Use the cheapest model that can read a checklist. I like Gemini 3.1 Flash-Lite, but you could even use a local model, such as Qwen 3.5:9B, potentially eliminating heartbeat costs entirely.
- Active hours — If your agent doesn’t need to check in at 3 AM, don’t let it. That cuts your heartbeat calls almost in half. If you must run your heartbeat all day, alternatively, you can have your heartbeat run every 1 or 2 hours, instead of every 30 minutes.
And one more thing: keep HEARTBEAT.md tiny. If that file is bloated with instructions, you’re paying for all of it on every heartbeat run. Trim it to just the most essential checklist of things to check.
Paste this prompt into Claude Code:
Read the OpenClaw heartbeat docs here:
https://docs.openclaw.ai/gateway/heartbeat
Then inspect my current OpenClaw heartbeat configuration and optimize it for low token usage.
Requirements:
1. Enable lightContext for heartbeat
2. Set the heartbeat model to google/gemini-3.1-flash-lite-preview
3. Limit active hours to 08:00 through 23:00
4. Review HEARTBEAT.md and trim it so it only contains the minimum instructions needed for heartbeat runs
5. Do not remove any heartbeat behavior that is actually necessary
6. After the change, explain what context heartbeat will still load on each run
You do not need to enable all of the heartbeat cost-saving recommendations, though. At a minimum, you really only need to use a cheap model and keep HEARTBEAT.md tiny.
Fix 4: Do a Full System Prompt Audit
This was the lesson that tied everything together for me.
When I was running Kimi K2.5, I kept thinking it was the model’s fault. When I started getting endless “context full” error messages when I messaged Jarvis in Telegram, I switched to Sonnet 4.6. The increased context window helped a bit. But I was still burning more context than I should have.
That’s when I realized: I had no idea what was actually in my agent’s context. All these files — AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, MEMORY.md — they all auto-load on every turn. And I’d been adding stuff to them without ever auditing how big they’d gotten or checking for redundancy.
You can inspect some of this inside OpenClaw with the command /context detail. But the best move is to let Claude Code audit the whole prompt surface for duplication and bloat. Claude Code is perfect for this because it can read all those files, understand the relationships between them, and give you a concrete list of what to trim.
Paste this prompt into Claude Code:
Read all of my OpenClaw system prompt files:
- AGENTS.md
- SOUL.md
- TOOLS.md
- IDENTITY.md
- USER.md
- HEARTBEAT.md
- MEMORY.md
Then do a full system prompt audit.
Requirements:
1. Identify anything redundant, duplicated across files, or unnecessarily long
2. Give me a concrete cut list with reasons
3. Apply the trims carefully without removing important behavior
4. Keep responsibilities clearly separated between files
5. After editing, summarize the biggest reductions and any risks or follow-up checks I should make
After I did this, my system prompt shrank significantly. And that’s really why Kimi K2.5 was choking: it has a smaller context window, and I was wasting a huge chunk of it on bloated system files. The real fix wasn’t switching models. It was trimming the unnecessary stuff.
This is why I now think that context engineering and memory management are super important when you’re customizing your own personal AI agent like OpenClaw.
Recap
Here’s the practical sequence I’d recommend:
- Memory flush — Write important context to disk before compaction wipes it.
- Session pruning — Stop long-running conversations from dragging dead context forever.
- Retrieval + boot instructions — Upgrade search with QMD, and make sure the agent is explicitly told to search for prior context before starting any task. Save important rules in a central .md file to prevent the agent from repeating mistakes.
- Heartbeat optimization — Enable light context, use a cheap model, limit active hours, and/or keep HEARTBEAT.md small.
- System prompt audit — Let Claude Code do a full audit on your system prompt to cut the bloat.
The main takeaway here is that if you’re building with OpenClaw, the models you pick to run your agent are only half the battle (or less). The real value is in making sure your agent has a good memory and context management system.
Resources
- Ramya’s (code_rams) article on agent memory debugging: https://x.com/i/article/2025615759771123712
- OpenClaw memory docs: https://docs.openclaw.ai/concepts/memory
- OpenClaw session pruning docs: https://docs.openclaw.ai/concepts/session-pruning
- OpenClaw heartbeat docs: https://docs.openclaw.ai/gateway/heartbeat
- QMD by Tobi Lutke: https://github.com/tobi/qmd
