In the old stories, a craftsman would labour at the forge alone — hammer ringing, anvil singing, every defect found by their own two eyes. The work was good. The work was thorough. But between each hammer-stroke lay the dull pause of inspection, the frustrating loop of making a thing, looking at it, finding the flaw, and starting again. My experience with AI coding agents was, for a long time, this same cycle — but louder.
I’d prompt. It would write. I’d find the bugs. I’d reprompt. It would write again. I’d find more bugs. Reprompt again. And on it went, a treadmill of frustration that left me wondering whether the AI was actually helping or just generating work for both of us.
Then I gave it eyes. And arms. And the whole dynamic shifted.
The Problem With Being the Only Tester
When you’re working with an AI coding agent — in my case, OpenCode paired with MiniMax — you become the bottleneck by default. The AI writes code fast. Properly fast. But what it won’t do, unprompted, is verify that code works before handing it to you. You run the tests. You check the output. You find the edge case it didn’t think about. Then you feed that back and it writes another batch of code, and you test again.
For simple tasks this is fine. For a project like Adventchore — a real-time game engine with dice rolls, character positions, NPC interactions, and WebSocket state — it becomes exhausting. Every time I thought I’d caught all the edge cases, another surfaced. I was spending more time testing than the AI was spending writing, which rather defeated the purpose of having an AI in the first place.
The Lightbulb: Let It Test Its Own Work
Playwright had been sitting in my toolkit for a while. I’d used it before on other projects, mostly for end-to-end browser testing — the kind of thing you set up once and then forget about. But the idea of giving an AI agent access to it changed how I thought about the workflow entirely.
If the AI could run its own browser tests, it could verify its output before it ever reached me. Not just syntax-checking or linting — actually loading the page, clicking the buttons, reading the results, and comparing them against what it expected. It could catch its own mistakes in the same way a human tester would: by interacting with the thing, not by reading the source code.
How It Worked in Practice
The change was immediate and significant. Previously, the loop looked like this:
- Prompt the AI to implement a feature
- AI writes code
- I manually test it
- Find a bug
- Reprompt the AI with the bug description
- AI writes new code
- Test again
- Another bug — repeat
With Playwright in the loop, it became this:
- Prompt the AI to implement a feature
- AI writes code and writes the Playwright tests for it
- AI runs the tests — catches its own bugs — fixes them autonomously
- AI comes to me only with what it genuinely cannot resolve
- I assess the hard cases, make the calls, move on
The difference in step 3 is the whole point. The AI stopped handing me half-finished work and started handing me something it had already stress-tested against its own criteria. Fewer bugs reached me. The ones that did were genuinely interesting problems — the kind that need human judgment, not just another code iteration.
What the AI Learned to Catch
Once it had the test harness, I watched the AI start catching things I’d previously had to find manually:
- Null reference errors when an element wasn’t present on the page — it wrote a test that asserted the element existed before trying to interact with it
- State not updating after a dice roll — it wrote a test that read the dice value before and after the roll and asserted the change
- WebSocket message ordering issues — it wrote a test that fired a sequence of actions and verified the resulting board state
These weren’t edge cases I fed it. It identified them itself, wrote tests for them, fixed the underlying code, and then showed me the clean run. I went from being the primary tester to being the escalation path. That’s not a small shift.
The Result
The project moved faster. Not because the AI wrote more code — it was always fast at that. It moved faster because the feedback loop shortened. Bugs that would have taken me three or four reprompt cycles to resolve were being caught and fixed by the AI in a single pass. I was spending my time on the actual design decisions, the tricky logic, the things that actually need a human to reason about.
It also changed how I prompted. Once the AI knew it would be testing its own output, it started writing more defensively — adding null checks without being asked, handling edge cases upfront rather than waiting to be corrected. The tests became a kind of specification it held itself to.
What I’d Tell Myself at the Start
If you’re using an AI coding agent and finding the reprompt cycle frustrating, the answer isn’t a better model or a cleverer prompt. It’s adding a test harness that the AI can run itself. Give it arms and legs. Let it look at its own work before showing it to you. The difference between an AI that generates code and one that solves problems is mostly whether it has to hand its work off to a human for verification — or whether it can verify itself.
The craftsperson at the forge is still there. But now they’ve got an apprentice who checks the blade before presenting it. And that makes all the difference.
Skills Forged
- OpenCode — AI coding agent with MiniMax as the model backend
- Playwright — Browser automation giving the AI a way to verify its own output
- Agentic Workflows — Designing loops where AI tests itself before escalating to human review
- Go — Game engine backend
- Vue — Frontend UI and game board rendering
- Real-Time Systems — WebSocket state management and turn-based game logic
Fare thee well, dear reader. Until the next dispatch from the frontier!


Leave a Reply