Mounir RAJI

OpenClaw 3.13 → 3.28: The Migration Report

Three traps, a TTS crash loop, and why Docker saved me from an npm bug I never saw coming. Direct migration on a live self-hosted agent.

· 7 min de lecture 🇫🇷 Lire en français

Personal AI Agent Series — Article 7 Neog × OpenClaw × Docker × Ubuntu VM


On March 29, I opened the OpenClaw Control UI to start the migration I had been putting off. I was on 3.13. My target was 3.24 — the release that dropped the TTS breaking change, ClawHub, and native web search providers. I had read the changelogs. I had a plan.

The banner said: “Update available: v2026.3.28.”

Four releases in four days. We requalified before touching anything.

This isn’t a changelog summary. It’s what actually happened when you try to bridge 15 versions on a live agent running Docker on a Ubuntu VM — the traps, the workarounds, the things openclaw doctor --fix was supposed to handle and didn’t, and the three decisions I consciously chose not to make.


The upgrade path: 3.13 → 3.28, no intermediate steps

The first good news: the jump is direct. No need to land on 3.22, then 3.24, then 3.28. One pull, one relaunch.

docker compose pull
docker compose up -d

That’s it in principle. In practice, those two lines hid two traps I walked straight into.


Trap #1: docker compose restart doesn’t do what you think

I pulled 3.28. Confirmed the image was there. Then I ran:

docker compose restart

The containers came back up. I ran a version check:

docker exec neog-gateway openclaw --version
# OpenClaw 2026.3.13

Still 3.13. I pulled again, thinking it hadn’t worked. Same result.

Classic Docker behavior that I knew intellectually but had never been bitten by concretely: restart restarts the process inside the existing container. It doesn’t recreate the container from the new image. For that, you need up -d, which tears down and rebuilds.

docker compose up -d
# OpenClaw 2026.3.28

Obvious in retrospect. The kind of thing that costs you 20 minutes when you’re in the middle of a migration and the gateway keeps returning the wrong version.

Lesson locked into restore-openclaw-config.py: always up -d, never restart when upgrading.


Trap #2: The TTS breaking change and why doctor --fix missed it

The 3.22 release dropped a breaking change in the TTS config:

tts.openai → tts.providers.openai

openclaw doctor --fix is supposed to handle this kind of migration automatically. It didn’t — and the reason is a timing issue that’s easy to miss.

I had run doctor --fix before pulling the new image, as a pre-flight check on my 3.13 setup. The tool ran against the 3.13 image, found nothing to migrate (because nothing was broken on 3.13), and exited clean.

Then I pulled 3.28, launched with up -d, and the gateway immediately went into a crash loop:

messages.tts: Unrecognized key: "openai"
messages.tts: Unrecognized key: "openai"
messages.tts: Unrecognized key: "openai"

The config that was perfectly valid on 3.13 was now invalid on 3.28. And doctor --fix on the new image couldn’t run because the gateway needed to be up first — and the gateway crashed because of the bad config. A catch-22.

The fix: stop all containers, edit the JSON directly on the VM filesystem while everything is down, then bring it back up.

docker compose down
# edit openclaw.json on the VM:
# "tts": { "openai": { ... } }
# becomes:
# "tts": { "providers": { "openai": { ... } } }
docker compose up -d

Gateway came up clean. Then doctor --fix ran fine — on a healthy instance, as it should.

The lesson for restore-openclaw-config.py: doctor --fix must run after the new image is up, not before. We updated the script to enforce this order.


Trap #3: The CLI profile is useless when the gateway is down

OpenClaw ships with a openclaw-cli profile in Docker Compose specifically for running CLI commands when the gateway has issues. Good idea in theory. The profile uses:

network_mode: "service:openclaw-gateway"

Which means the CLI container shares the gateway’s network namespace. If the gateway is down, the CLI container has no network. If the gateway is in a crash loop, you can’t attach to it.

In the TTS crash loop situation above, openclaw-cli was completely unavailable. The container would start and immediately lose connectivity.

Workaround: Python directly on the VM filesystem. Not elegant, but it works, and it doesn’t require any container to be healthy:

# On the VM, with all containers stopped
python3 restore-openclaw-config.py --dry-run
# inspect the output, then:
python3 restore-openclaw-config.py --apply

This is why restore-openclaw-config.py exists. It was already part of the stack for exactly this scenario. The migration validated that it works under real stress conditions.


The npm bug I didn’t experience (and why Docker is the reason)

OpenClaw 3.22 shipped with a packaging bug in the npm distribution that broke the Dashboard and the WhatsApp integration for users who had installed via npm install -g openclaw. The issue was patched in 3.22.1, but not before a wave of reports in the community.

I didn’t experience it. Not because I was careful, but because I’m not on npm. Neog runs entirely in Docker. My docker compose pull grabbed the corrected image, and there was nothing to debug.

This is the recurring argument for Docker over npm for anything that runs continuously: you’re insulated from host-level packaging issues, and you can roll back to a previous image in seconds if something is wrong. I knew this abstractly. The 3.22 situation made it concrete.


ClawHub: I looked, I didn’t touch

One of the headline features of recent OpenClaw releases is ClawHub — a centralized registry for skills and plugins, now the default over npm for plugin management. The openclaw doctor output flagged it immediately:

✓ 7 skills eligible for ClawHub sync
⚠ 45 skills with unresolved requirements (likely missing integrations)

I closed that section and moved on.

My skills are installed manually in ~/.openclaw/workspace/skills/. They work. The 45 with missing requirements are almost certainly community skills that expect integrations I haven’t configured — third-party calendars, CRMs, platforms I don’t use. They’re not broken, they’re just not relevant to my setup.

ClawHub is the next article. Not this one. Making a migration session into a ClawHub evaluation session would have been a mistake — too many variables in motion at once. Neog is a production agent. I don’t introduce multiple changes simultaneously.


SearXNG vs. native search providers: I stayed put

OpenClaw 3.24+ ships with native integration for Exa, Tavily, Firecrawl, and xAI’s x_search. These are proper semantic search providers, natively wired into the OpenClaw tool layer. No bash workarounds required.

My current setup has tools.web.search.enabled = false in openclaw.json. The agent uses SearXNG via bash+curl, a Docker container on the internal neog-net network, zero external traffic. The setting survived the entire migration unchanged.

SearXNG (current)ExaTavily
Cost€0Paid (free tier)Paid
API keyNoneRequiredRequired
Privacy100% localData leaves infraData leaves infra
ThroughputUnlimitedRate-limitedRate-limited

The only real argument for switching is result quality — Exa in particular does semantic search that SearXNG doesn’t. But that would mean paying for an API key, accepting rate limits, and routing my search queries through external servers.

I don’t have a concrete signal that SearXNG is failing me right now. Honest verdict: I looked at the native providers, understood what they offer, and made a deliberate choice to keep my setup. That’s different from ignoring the feature.


What the migration hardened in restore-openclaw-config.py

Before this migration, the script handled the basics: restore openclaw.json from backup, restart the stack, verify the gateway came up. After this migration, it does more:

  • Enforces up -d over restart for any upgrade operation — checks whether the image tag changed and forces a full container recreation if it did.
  • Validates the TTS config structure before writing to disk — if the key structure doesn’t match the expected schema for the detected OpenClaw version, it flags the mismatch before the gateway starts.
  • Defers doctor --fix to post-launch — waits for a healthy gateway status before running the fix pass.

These aren’t theoretical improvements. They’re the direct result of three concrete failures during the 3.13 → 3.28 migration. The script now encodes those lessons.


State of Neog on 3.28

The agent is running. The migration took one session, one crash loop, and three lessons that are now in the documentation and the restore script.

What I didn’t do: migrate to ClawHub, switch to native search providers, or touch anything that was working.

What I did do: upgrade cleanly from 3.13 to 3.28, harden the restore script, and document exactly what broke and why.

Next: ClawHub. That evaluation deserves its own session, without a migration running in parallel. 7 skills eligible, 45 with missing requirements — there’s a real question about whether the ClawHub workflow fits how Neog is built.


Update command, in the right order:

docker compose pull && docker compose up -d && \
docker exec neog-gateway openclaw doctor --fix

up -d first. doctor --fix after. Not the other way around.

Partager cet article

Articles similaires