Automation
Why a 1500+ article migration needed a workflow, not just Puppeteer
Notes from turning a large WordPress migration into a resumable Node.js automation pipeline with logs, checkpoints, and browser boundaries.
Date
Read
2 minLarge content migrations are rarely technically glamorous, but they are good tests of engineering judgment. A manual migration can look faster at the start, then become slow, error-prone, and difficult to verify once the count grows.
One automation project involved migrating 1500+ WordPress articles. The useful decision was to stop treating it as repeated browser work and start treating it as a controlled workflow.
The bottleneck was not whether Puppeteer could click buttons. The bottleneck was whether the process could fail, resume, and explain itself without losing track of hundreds of articles.
The shape of the automation
The script work centered on a few practical needs:
- reading source content consistently
- preserving titles, body structure, categories, and metadata where possible
- handling browser sessions with Puppeteer
- recording progress so failed runs could resume
- separating extraction, transformation, and publishing steps
The goal was not to write a clever script. The goal was to reduce manual error and make the migration observable.
The checkpoint became the contract
For a migration like this, a small progress file matters more than a polished abstraction:
{
"sourceId": "post-1482",
"title": "Example article title",
"status": "published",
"attempts": 2,
"lastError": null
}
That kind of state lets the workflow answer practical questions: which articles are done, which ones failed, which ones need review, and where the next run should start.
What mattered most
Retries and logging mattered more than speed. A fast migration that fails silently is worse than a slower one that tells you exactly which article failed and why.
I also learned to keep transformation logic separate from browser automation. Browser automation is already fragile because it depends on page structure, timing, and logged-in state. Mixing it with content cleanup makes the whole script harder to reason about.
The tradeoff
A workflow-first migration takes more setup. You have to define statuses, write logs, persist progress, and handle partial failure. If there are only ten articles, that is probably overkill.
At 1500+ articles, it becomes the safer path. The script is allowed to be slower as long as it is observable, resumable, and reviewable.
The practical lesson
Automation becomes valuable when it turns repeated work into a reviewable process. For migration work, that means checkpoints, logs, resumability, and clear boundaries between steps.