Giving My AI Apprentice Arms and Legs: How Playwright Changed My Workflow

In the old stories, a craftsman would labour at the forge alone — hammer ringing, anvil singing, every defect found by their own two eyes. The work was good. The work was thorough. But between each hammer-stroke lay the dull pause of inspection, the frustrating loop of making a thing, looking at it, finding the flaw, and starting again. My experience with AI coding agents was, for a long time, this same cycle — but louder.

I’d prompt. It would write. I’d find the bugs. I’d reprompt. It would write again. I’d find more bugs. Reprompt again. And on it went, a treadmill of frustration that left me wondering whether the AI was actually helping or just generating work for both of us.

Then I gave it eyes. And arms. And the whole dynamic shifted.

The Problem With Being the Only Tester

When you’re working with an AI coding agent — in my case, OpenCode paired with MiniMax — you become the bottleneck by default. The AI writes code fast. Properly fast. But what it won’t do, unprompted, is verify that code works before handing it to you. You run the tests. You check the output. You find the edge case it didn’t think about. Then you feed that back and it writes another batch of code, and you test again.

For simple tasks this is fine. For a project like Adventchore — a real-time game engine with dice rolls, character positions, NPC interactions, and WebSocket state — it becomes exhausting. Every time I thought I’d caught all the edge cases, another surfaced. I was spending more time testing than the AI was spending writing, which rather defeated the purpose of having an AI in the first place.

The Lightbulb: Let It Test Its Own Work

Playwright had been sitting in my toolkit for a while. I’d used it before on other projects, mostly for end-to-end browser testing — the kind of thing you set up once and then forget about. But the idea of giving an AI agent access to it changed how I thought about the workflow entirely.

If the AI could run its own browser tests, it could verify its output before it ever reached me. Not just syntax-checking or linting — actually loading the page, clicking the buttons, reading the results, and comparing them against what it expected. It could catch its own mistakes in the same way a human tester would: by interacting with the thing, not by reading the source code.

How It Worked in Practice

The change was immediate and significant. Previously, the loop looked like this:

Prompt the AI to implement a feature
AI writes code
I manually test it
Find a bug
Reprompt the AI with the bug description
AI writes new code
Test again
Another bug — repeat

With Playwright in the loop, it became this:

Prompt the AI to implement a feature
AI writes code and writes the Playwright tests for it
AI runs the tests — catches its own bugs — fixes them autonomously
AI comes to me only with what it genuinely cannot resolve
I assess the hard cases, make the calls, move on

The difference in step 3 is the whole point. The AI stopped handing me half-finished work and started handing me something it had already stress-tested against its own criteria. Fewer bugs reached me. The ones that did were genuinely interesting problems — the kind that need human judgment, not just another code iteration.

What the AI Learned to Catch

Once it had the test harness, I watched the AI start catching things I’d previously had to find manually:

Null reference errors when an element wasn’t present on the page — it wrote a test that asserted the element existed before trying to interact with it
State not updating after a dice roll — it wrote a test that read the dice value before and after the roll and asserted the change
WebSocket message ordering issues — it wrote a test that fired a sequence of actions and verified the resulting board state

These weren’t edge cases I fed it. It identified them itself, wrote tests for them, fixed the underlying code, and then showed me the clean run. I went from being the primary tester to being the escalation path. That’s not a small shift.

The Result

The project moved faster. Not because the AI wrote more code — it was always fast at that. It moved faster because the feedback loop shortened. Bugs that would have taken me three or four reprompt cycles to resolve were being caught and fixed by the AI in a single pass. I was spending my time on the actual design decisions, the tricky logic, the things that actually need a human to reason about.

It also changed how I prompted. Once the AI knew it would be testing its own output, it started writing more defensively — adding null checks without being asked, handling edge cases upfront rather than waiting to be corrected. The tests became a kind of specification it held itself to.

What I’d Tell Myself at the Start

If you’re using an AI coding agent and finding the reprompt cycle frustrating, the answer isn’t a better model or a cleverer prompt. It’s adding a test harness that the AI can run itself. Give it arms and legs. Let it look at its own work before showing it to you. The difference between an AI that generates code and one that solves problems is mostly whether it has to hand its work off to a human for verification — or whether it can verify itself.

The craftsperson at the forge is still there. But now they’ve got an apprentice who checks the blade before presenting it. And that makes all the difference.

Skills Forged

OpenCode — AI coding agent with MiniMax as the model backend
Playwright — Browser automation giving the AI a way to verify its own output
Agentic Workflows — Designing loops where AI tests itself before escalating to human review
Go — Game engine backend
Vue — Frontend UI and game board rendering
Real-Time Systems — WebSocket state management and turn-based game logic

Fare thee well, dear reader. Until the next dispatch from the frontier!

John Antrobus – The Binary Bard