I’m a geek. Always have been, always will be.
One of my geekier traits is my fascination with computer chess. It was an obsession that started when I was about 14 years old, running tournaments between ChessMaster 4000 personalities on a 386 16MHz PC that cost my parents way too much money. While the popular kids were doing whatever popular kids did (sports, I think) I was busy meticulously documenting my self-hosted computer chess tournament results.
So naturally, I’ve always wanted to write a chess engine of my own. I’d been thinking about that more lately. I had a grand vision. This theoretical chess engine of mine wouldn’t just be any chess engine, it would be THE chess engine. A world-class, Stockfish-crushing, CPU-frying monster that would make Magnus Carlsen question his life choices and force the CCRL leaderboard to create a new category just for me.
As it so happened, I found myself with some time on my hands. I had my laptop, WiFi, a rented RV parked at a family cabin in almost Canada (far northern Minnesota), and I decided this would be a good opportunity to finally go for it. There was one small flaw in the plan, though. I’m not exactly an expert in chess engine development, and I don’t know C++. I can toss around some Python and bash to get things done, but that’s about it.
Enter AI.
or is it AGI? ASI? LLM?
Whatever The Marketing Department has decided on. That thing.
AI promised a coding revolution. With tools like Cursor, Cline, and Claude CLI at my fingertips, surely building my dream chess engine would be just a few prompts away. The AI (we’re gonna go with AI) would handle all of the complex stuff. I’d just orchestrate. Direct the flow, if you will.
The plan was to vibe my way to computer chess fame and glory.
That was not quite what happened.
Part 1: When Your Chess Engine Plays Like a Drunk Toddler
Day 1: The Grand Delusion
I started with modest goals given the amazing state of AI today. We were just going to vibe-out a quick 3500 ELO engine. Something that could compete with Stockfish 16 (~3600 ELO) and get my name up in lights on CCRL. You know, reasonable expectations for a Sunday afternoon.
Twenty minutes in and more C++ than I’d ever seen in my life had whizzed across my screen. The AI told me I needed alpha-beta pruning and transposition tables, maybe some Zobrist.
Sounds good to me.
You know what, let’s double it, I want it to be the best. Two Zobrist for me, please.
The first red flag that perhaps this was more complicated than I thought came when our new engine decided to invent its own chess variant. I had loaded up HIARCS Chess Explorer (yes, I use a Mac for chess development, judge me) and added my engine, excited to finally see what we’d built.
The excitement didn’t last long. Pieces weren’t just moving, they were performing what only could be described as teleportation. The rook on a1 would suddenly end up on h8 without explanation. Pawns moved backward, forward, sideways. The king decided castling meant “I can move wherever I want.”
Philosophically, it was hard to argue with that. It is a King, after all.
But the pièce de résistance was the opening strategy. My engine consistently opened with 1. Nh3. Not once, mind you, but every game. When playing Black: 1…Na6. The knights rushed the corners like they were trying to escape the board entirely. Either this was a revolutionary opening strategy that centuries of chess theory had somehow missed, or they were tipped off that I was running the show and they decided they just wanted out.
I asked Claude what it thought might be going on, and it suggested the lack of an opening book (pre-populated initial moves based on sound chess theory) was leaving the evaluation function to fend for itself in the opening position. Then it added this gem, simultaneously insulting itself as the author, and me as the architect.
“The evaluation function is so broken that it’s actually impressive the engine can play at all. “
I double-checked the Anthropic pricing page. Yep, $15 per million tokens. Thanks, Claude.
Day 2: Vibing Harder with Agents
What do you do when your chess engine development dreams start to come crashing down? We vibe harder, of course.
If one AI assistant couldn’t build a working chess engine, surely one AI assistant with seven agents could. (7+1) > 1. It’s basic math.
I read up on the Claude CLI “/agent” command, and I created the agent Dream Team:
- An architecture specialist
- A move generation expert
- A search algorithm guru
- An evaluation function wizard
- A debugging champion
- A C++ performance optimizer
- A UCI protocol master
I set up a Docker environment for Claude to operate in because hitting “Enter” on permissions prompts was beneath me now.
Within 30 minutes I had hit Claude’s usage limits.
Claude Pro. Then Max $100. Then Max $200. I was doubling down on my dreams, I couldn’t let little things like credit card statements get in the way.
It was worth it though. The agents began to collaborate beautifully. They generated thousands of lines of code. They implemented perft testing to make sure move generation was sound. They celebrated when we hit 90% perft(5) accuracy on the starting position.
Here’s the problem with 90%. It sounds impressive, until you realize that 90% of chess is kind of like 90% of a bicycle. 90% of chess is not chess. 90% is something else entirely, something that looks like chess after it was bit by a rabid raccoon.
It meant that, among other things, my engine couldn’t detect check. “Check” in chess is like a big, flashing warning light saying “Hey you! Be very thoughtful about your choices right now!”. The engine literally couldn’t comprehend the concept of the king being in danger.
Somehow this seemed metaphorically appropriate.
Dream Team FTW.
Day 3: The Debugging Marathon
ASan.
I really had no idea what I was doing anymore.
I was so far down into the dungeons of C++ that I couldn’t remember what Python even looked like. Memory corruption had turned my chess engine into a shape-shifter. A white pawn would transform into a black queen mid-game and bishops would change color based on their mood. The position after move 10 looked nothing like the position that should exist after those moves.
The Dream Team and I went on a side-quest, spending six hours converting between the UCI notation standard and internal move representation. For context, that’s longer than most human world championship games.
Then came the perspective bug. It just so happened that the engine was evaluating positions from the wrong side. It would make moves that helped its opponent win. “Here, let me put my queen where you can take it for free. You’re welcome!”
In the middle of this chaos, VS Code just quits. Boop. Gone. The tool I was using to fix my tools had abandoned me.
I laughed because the alternative was admitting that this wasn’t working, and admission meant stopping, and stopping meant accepting that maybe AI can’t actually do everything, and that’s a thought I wasn’t ready to entertain at my current Max subscription tier.
Day 4: Finally Using Git
Four days in, a thought occurred to me: “Maybe I should track these changes.”
Hey Siri, how do I initialize a git repo again?
Oh. Right. That makes sense.
Can you tell I’m not a classically-trained developer.. or really, any sort of developer?
About halfway into the day I’d implemented Static Exchange Evaluation, only to watch positions corrupt themselves during calculation. Pieces would evaluate captures, decide on the best sequence, then execute something entirely different. Kind of like planning to cook dinner and then accidentally building a spaceship. (Not a good spaceship, either).
My initial plan by Day 4, in all its humility, was to test my engine against HIARCS 14, a world champion chess engine from 2012 before stepping up to the likes of Stockfish or Berserk. I figured by day 4, surely we’d moon-walk over the lifeless digital husk of the former world champion. After all, 2012 was like iPhone 5. World champion, sure, but come on, that engine was ancient.
One tiny hiccup. The engine and I were not putting the AI-vibe-chess smackdown on HIARCS 14. Certainly not beating it. Not even matching it. It was brutal. I started the day expecting flawless victory. I ended it hoping my engine would simply exist in the same universe as HIARCS 14.
Kimi K2, or When All Else Fails, Hire a Consultant
Desperate times called for desperate measures. I was taking a page out of my Corporate America Leadership Training Manual. When times get tough, pay someone $500 an hour to make it all go away.
I brought in Kimi K2, a top open source model released recently by Moonshot AI, for a consultation. Now I had two full-fledged AIs.
2 > 1. Once again, the math checks out.
Together, we discovered a phantom move bug. The engine would announce f4xe5 (pawn takes pawn) but execute f4xf5 (pawn takes… nothing, because f5 was empty). The engine’s moves were ghosts, violating the foundations of the UCI protocol and haunting the board with their non-existence.
ASAN comes back again and finally revealed the truth: buffer overflows everywhere. Pointers pointing into the void. State corruption. Data structures eating themselves.
This AI-coded center could not hold.
Part 2: Starting Small: The Mate-in-2 Revelation
Marcel van Kervinck’s Wisdom
The late Marcel van Kervinck was a big contributor to numerous areas of technology, including computer chess. Before his passing in 2020, he had written and shared the source code for several chess engines. He wrote his Master’s thesis on a chess engine he developed, called Rookie. I decided I should probably learn a bit more about what I was attempting to do (brilliant and timely insight, I know..), so I read his thesis on Rookie. I understood about half of it. From the half I did understand, something jumped out at me. He mentioned his work had actually evolved from a mate-in-2 puzzle solver. This approach was different, and it made sense. Start small, keep it simple, and grow from there. That’s what I’d do! I’d start with a mate-in-2 solver. Get that working perfectly. Then expand.
This wasn’t giving up. This was growing up.
You don’t learn to run by entering a marathon. You don’t learn to swim by jumping off a cruise ship. And you don’t build a chess engine by trying to beat Stockfish (or even HIARCS 14) on your first go.
Python Implementation Success
I started over. A nice, shiny new git repo (yep, day 1 version control.. he’s learning..). Python. Clean architecture from the start. No agents. No Docker. Just me, my buddy Claude, the language I actually understood, and a clear goal: solve mate-in-2 puzzles.
The 0x88 board representation made sense once Claude and I stopped trying to be clever. Full rule implementation that actually worked. Wouldn’t you know it… Every piece moved legally. Pawns stayed pawns and the bishops remembered their color.
Testing with known positions was satisfying in a way the previous week hadn’t been. Position: Black to move, find mate in 2. The solver would think, search, and announce: “Bb5+ Nc4 Rd2#.” The vibe was now “progress”.
Part 3: When You (Kinda) Actually Know What You’re Doing
Serena MCP and Dev Containers
We can put a big green check-mark emoji down for Python.
It was time to put the big boy pants back on. C++ for performance, but with some safety nets. I needed something to help give Claude a bit more sense of organization, and to help us through those horrible context compactions. Serena MCP seemed like a fantastic option, and I’d wanted to get some MCP servers running anyway. I also realized rolling my own docker container for Claude to operate in was reinventing the wheel. VSCode has supported Dev containers forever, and as it so happens, Anthropic actually provides one for us for this exact purpose. Dev containers meant isolation. When AI writes system-level code, you want walls between it and your actual machine.
I also tempted the fates by enabling Claude Code’s “YOLO” mode. The name should have been a warning. But this time, I knew what I was doing. Or at least, I knew enough to know what I didn’t know. The known, unknowns, if you will.
It turns out there were unknown unknowns. The ARM64 versus AMD64 container adventure on Mac was its own special hell. Turns out clangd and Serena MCP don’t play nice with ARM64 in certain configurations. Hours of debugging environment issues before writing a single line of chess code.
Critical Bugs That Matter
My C++ mate-in-2-solver was nowhere near as easy as it was in Python. Kings would leap right onto squares controlled by pawns because the detection was checking the wrong direction. White pawns moving up the board control the squares upward-diagonally. We had it backwards, and the pawns were controlling the squares behind them. Other pieces were happily moving into squares ready to be captured by pawns. The kings were suicidal.
Then came the under promotion bug. When a pawn promoted, it was anyone’s guess what would appear. Sometimes a queen. Sometimes a knight. Sometimes nothing. Sometimes two nothings.
Eventually, though, Claude and I coded our way to victory. The C++ version hit 500k nodes per second. That still pales in comparison to any modern engine, but after the week of chaos, seeing those numbers felt like victory. Real positions evaluated. Real moves generated. Real chess played.
C++ doesn’t forgive mistakes. But that’s what Claude and I needed. Harsh reality instead of comfortable delusion. A reminder that sometimes the best teacher is the one that refuses to let you cut corners.
Part 4: The AI Coding Tool Evaluation
The Tools
Let us press pause on the chess engine development for just a moment and discuss tooling.
Claude CLI and VS Code eventually became my primary development weapons, but I did explore other avenues. Options are surprisingly plentiful in this space, and I experimented with many. I mentioned trying out Kimi K2 via Moonshot API access, and that brought fresh perspective, but different biases. Gemini CLI showed promise but lacked many of Claude’s CLI’s niceties. I also had no idea how I was using Gemini for free and when it was going to shake me down for money.
I tried Crush with Qwen3 Coder via OpenRouter. It kept stopping mid-generation. “Here’s your function that—” and nothing. Like a comedian forgetting the punchline. Probably my fault and something I didn’t configure properly.
I tried Cursor and learned I hated writing code in a browser. I tried the Cursor IDE, which seemed.. like VS Code.
Speaking of VS Code, I had plenty of time between Claude crunching to check out themes. I’m now team Synthwave ‘84 for life.
Cline and Roo seemed really powerful, but I hadn’t quite run into a situation (at least for my needs) where they did something I needed that Claude CLI couldn’t.
The wonderful ccusage tool became my financial advisor. It surprised me to see that I was racking up thousands in what would’ve been API charges every few days. The $200 a month for Claude Super Duper Max Plus was actually a steal. (This is, of course relative, when you consider that $200 can also buy things, tangible things, things that work.)
I had seen what was out there, and I had returned back to the comfort of my VS Code setup for one final round.
The Dream Team II. I was determined that I was going to get the “team of agents” approach to work.
Part 5: When Agents Go Rogue
The Agent Experiment
Armed with experience from Dream Team I, and with an update to Claude CLI to more directly support sub-agents, I decided to try agentic development once more. This time was different though, I would lean on the experts at VoltAgent. They had developed specialized sub-agents with expertly hand-crafted prompts for surgical expertise in their respective domains. I wasn’t messing around.
The goal was simple: rebuild the C++ mate-in-2 solver from scratch, YOLO style (with dev container safety), and use agents to make it perfect. It would be clean, focused, minimal.
Dream Team II was set loose. I sat back in my chair, excited to watch AI coding perfection. What I got instead was a committee-designed digital monster.
I kid you not, the agents built an interactive menu system. Nobody asked for that. They added keyboard input, like I was playing a 1990’s BBS Door Game. “Press Enter to continue” appeared after every operation.
The worst part? Dream Team II decided to build an evaluation function. For a mate-in-2 solver. A brute-force mate finder doesn’t need to evaluate positions. It needs to search every position in the tree and find checkmate or not checkmate. It’s the chess version of hotdog / not hotdog. That’s it. But the agents added material counting, position scoring, piece-square tables. Features that actively prevented finding mates.
The agents also decided every search node needed visualization. ASCII art chess boards with fancy box-drawing characters.
╔═══╤═══╤═══╗ everywhere.
In tight loops. Millions of them.
12 million lines of output in total.
They caused the Claude CLI to crash, which I didn’t know was possible. That felt like an achievement, just… not the achievement I wanted.
When I finally stopped the process and examined the code, I found board visualization in the main loop. In the search function. In the move generator. They’d built a chess visualization program that happened to search for mates, instead of a mate solver that could display positions.
One of our test positions (FEN notation):
r1b2k1r/1p2bppp/p1B2q2/8/8/8/PPPQ1P1P/2KRR3 w - - 0 1 .
The solution requires sacrificing the queen: Qd8+ Bxd8 Re8#. A beautiful, forcing checkmate.
The agent’s delicately crafted and unnecessary evaluation function saw it differently.
Sacrificing a queen? That’s -9 points. Instead, it chose Rxe7. Capturing material. +5 points.
The opponent’s king escaped, but hey guys, we won a bishop!
The agents had tried to build a Ferrari when I asked for a bicycle. Impressive? Sure. Useful for going to the store? Not really. They’d added complexity that made the simple, well, impossible.
I felt like Christian Bale’s team ripping out the electronic test equipment that Ford added to his race car in that movie with Matt Damon. (“Let’s get awl o’ this junk aht!”). The menu system had to go. The visualization had to go. The evaluation function had to go. Someone get me some yarn and Scotch tape.
It took about 30 minutes for Dream Team II to unwind the original vibe, but it finally worked at the end of the day. What remained was what I’d asked for originally. A mate solver. It found mates. Nothing more. Nothing less.
Part 6: What Vibe Coding Really Teaches Us
The Hard Truths
I learned a lot thanks to this deep dive into AI-assisted coding. The most important thing is that AI can’t replace domain knowledge. You need to understand what you’re building. Not the syntax, the AI handles that. But the architecture. The approach. The difference between a mate solver and a chess engine.
Traditional practices still matter. Version control isn’t optional because AI writes the code. Testing isn’t optional because AI seems confident. Architecture isn’t optional because AI can generate thousands of lines quickly.
The hubris trap is real. Thinking AI eliminates complexity is like thinking cars eliminate the need to know where you’re going. You’ll move fast, but it’ll probably be in the wrong direction.
Agents excel at complex orchestration. When you need multiple systems coordinating, they shine. But they also create complexity where none existed. My mate solver didn’t need a menu system. It didn’t need visualization. It didn’t need an evaluation function.
Despite basic math telling me otherwise, sometimes one focused developer beats seven agents. Sometimes simple is actually simple, and making it complex looks fancy, but it’s missing the point.
The Real Power of AI Coding
In my mind, the real power of AI-assisted coding is acceleration, not replacement. AI coding tools are force multipliers for developers who know what they’re doing. They’re force randomizers for those who don’t. (And they’re credit card bill multipliers for everyone).
In my opinion, AI development tools work best when paired with experienced developers who can guide and constrain. Who know when the AI is hallucinating, who can spot the difference between clever and stupid, and who understand that just because code compiles doesn’t mean it works.
All being said, AI helped me discover some very old wisdom. The humble, incremental approach wins. Start small. Verify each step. Build on solid foundations.
The vibe-rollercoaster from hubris to humility was an experience I’m glad I went through. I like to think it was time well spent: the constant crashes were my curriculum, and the constant failures and reworking of code was an education. And as it would turn out, that drunk toddler of a chess engine was one of my best teachers.
Epilogue
I’ve really enjoyed working in this new development paradigm. I always felt that I’d had a big technical gap when it came to proper development, and as a result I’d shied away from building solutions or starting certain projects. Now I feel much more confident in my ability to write clean, usable code and to tackle larger problems that previously I would’ve ignored. It’s been empowering, and I’m glad I invested the time to get closer to this space.
Do I consider myself a developer now? Most certainly not. I have the utmost respect for those who’ve battled in the development trenches, who know git inside-out and can say stuff like “well, idempotency is important, but it’s not a defining property of functional programming…”
I also have a new-found level of respect for anyone who has ever attempted to build a chess engine. The sheer level of debugging and testing needed is astounding. Those who’ve actually created a working engine (not to mention, a good one) have done something monumental.
I still hope to join that club one day. I’ve started writing a new chess engine, SeaJay, from scratch. This time, though, I’m doing it with much more humility and hopefully, a better understanding of just what I’m in for. AI will help, but I’ll handle the thinking.
Feel free to follow along at the GitHub repo, or with the (AI-generated) development diary.
If you’re interested in building a chess engine yourself, the larger chess community has assembled some amazing resources to get you started. See the section below.
Additional Reading
Marcel van Kervinck’s Master’s Thesis