Five months ago, I started building an AI-powered game studio. Eight specialized agents — designers, developers, marketers — all orchestrated through code, all controllable from my phone. The vision was simple: I'd be the creative director, and AI would do everything else.
I haven't shipped a single product. But I've learned more about intelligence — artificial and human — than I did in the previous ten years of my career.
Here's what actually happened when I tried to replace a team with AI.
The first thing that happens when you start working with AI agents is that you build too much. Not because you're lazy or unfocused, but because each piece of infrastructure reveals three more pieces you need. An orchestrator needs a dashboard. A dashboard needs real-time updates. Real-time updates need WebSockets. WebSockets need a deployment. The deployment needs monitoring.
Within two weeks, I had a server running eight agents, a mobile dashboard, a brainstorming system where three AI agents debated game ideas 24/7, and a game engine that could generate different games from configuration files. I was building the machine that builds the machines, and it felt like the future.
The problem? Zero humans had used anything I built. Not one. I had an elaborate production system producing things for an audience of one: me.
Every developer will recognize this pattern. It's the refactoring trap scaled up. Instead of polishing code instead of shipping, you're polishing pipelines instead of shipping. The feeling is identical: productive, satisfying, and completely disconnected from reality.
Through trial and expensive error, I discovered three distinct failure modes when working with AI agents. Each one looks different on the surface, but they all come from the same root cause: misunderstanding the relationship between human judgment and machine capability.
My first serious project had almost no specification. I gave the agent a goal and let it figure out the approach. The result was impressive-looking and internally incoherent. Multiple systems doing the same thing with different abstractions. Dead code everywhere. Locally smart decisions that were globally contradictory.
An AI agent without clear framing is like a brilliant employee who was never told what department they work in. They'll produce something, and it might be technically excellent, but it won't serve any coherent purpose.
My second project was specified down to the last detail. Every decision pre-made. Every constraint explicit. The result was clean, correct, and completely lifeless. The agent executed faithfully and added nothing. No common-sense improvements. No creative choices. No moments where it thought "the plan doesn't mention this, but obviously it needs it."
I'd written plans so detailed that there was no room for the agent to be intelligent. The output was a mirror of my own thinking, transcribed into code. And my thinking alone wasn't enough.
The most seductive failure. I set up AI agents in a loop: one designs, another builds, another reviews, feed the output back in, repeat. Pure automation. No human in the loop.
It looked like it was working. Each iteration produced cleaner code, more elegant architecture, more sophisticated language. But the direction was drifting. The agents were validating each other, agreeing more than they disagreed, spiraling deeper into solutions that sounded increasingly impressive and were increasingly disconnected from what I actually needed.
The system was converging. Just not toward anything useful.
The common thread in all three failures is the same: AI agents don't doubt.
They don't step back and ask "wait, is this the right thing to build?" They don't feel that nagging sense that something is off even when every metric looks fine. They don't have the moment at 2am where you realize the whole direction is wrong, not because of a bug, but because of a gut feeling you can't articulate yet.
Doubt is not a weakness. It's the mechanism by which intelligence corrects itself. It's pattern recognition operating below the level of conscious analysis — decades of experience compressed into an instinct that fires before you can explain why.
AI agents can analyze. They can compare. They can find logical inconsistencies. But they can't feel that something is wrong. And for the problems that matter most — are we building the right thing? should we continue? has the world changed since we started? — feeling is faster and more accurate than analysis.
The human in the loop isn't doing the work. The human is making sure the work is worth doing.
Here's the counter-intuitive lesson that took me five months to learn.
In the AI era, code is the cheap part. Any LLM can write code. What's expensive is the thinking that determines what to code. The framing of the problem. The constraints that matter. The definition of what "done" looks like. The judgment about when to continue and when to stop.
I built a methodology around this insight. Not a set of rules — I tried that, and 5,400 lines of instructions just constrained the AI without improving the output. Instead, I built a framework of roles and boundaries. A Director who frames problems and checks intent. A Planner who develops approaches. A Builder who executes with freedom within the frame.
The key insight: the plan needs to be tight enough that the builder can't go wrong on the big things, and loose enough that the builder can be smarter than you on the small things. That balance is harder to find than it sounds. But when you find it, the results are dramatically better than either extreme.
The thing nobody tells you about working with AI is how human it feels.
We're not really coding anymore. Not in the traditional sense. We're describing. Framing. Hoping to be understood. Interpreting results and adjusting our language. Building relationships with systems that have no memory of us, and somehow the quality of the relationship still matters.
Before I learned to code, I almost became a psychologist. I've always been drawn to understanding what drives people, what they really mean beneath what they say. I chose programming instead. Spent a decade reading lines of logic, debugging behavior in systems, tracing causes through layers of abstraction.
Now both paths are converging. I'm not debugging code anymore. I'm debugging behavior — in agents. Why did it make that choice? What in the context pushed it that direction? How did the framing of my question shape the answer?
The skills that matter now aren't syntax and algorithms. They're clarity of thought, quality of framing, and the honesty to admit when your plan isn't good enough. They're human skills. The irony of the AI revolution is that it makes the most human capabilities — judgment, doubt, empathy, intuition — more valuable than ever.
The psychology I didn't study turned out to be the skill I need most.
If you're starting your own journey with AI agents, here's what I wish I'd known:
Ship before you're ready. I spent months building infrastructure before a single user touched anything. The data from 10 real users is worth more than your entire architecture. I know this and I still didn't do it. You probably won't either. But at least recognize the trap when you're in it.
Design your context strategy before you write your first agent. Every token is money. 64 million tokens in a month is what happens when you don't think about efficiency until the bill arrives.
Specify less than you think you should. If your agent is producing lifeless output, you probably wrote too much, not too little. Give it a frame, not a script.
Keep a human in the loop for intent, not for execution. The agents can check whether the code compiles. You check whether the project should exist.
And pay attention to the relationship itself. How you frame a question matters. What you include in context matters. The quality of your thinking determines the quality of the output. That's not a limitation of AI. That's just how collaboration works.
I started this journey wanting to build games. What I'm actually building is a way of thinking. A methodology for directing intelligence that's different from my own. The games will come. But the practice — the practice of framing, doubting, directing, and collaborating — that's the real skill.
And it's a fascinating journey.
Before any of this makes sense, you need to know how I learn things.
I learnt to code at 42 school. If you don't know 42, it's a coding school with no teachers, no lectures, no curriculum in the traditional sense. You get a problem. You figure it out. You learn by doing, by failing, by asking the person next to you who failed differently than you did. There's no one to tell you if you're on the right track. You find out when the thing works or when it doesn't.
That experience shaped everything I've done since. I don't read the manual first. I dive in. I break things. I form my own understanding through friction and failure. I've always believed that's how you build real intuition — not by following someone else's path, but by stumbling down your own, even when it takes three times longer.
Here's the other thing. Before I chose code, I was seriously considering becoming a psychologist. I've always been drawn to deep relationships, to understanding what drives people, what they really mean beneath what they say. I chose programming. But that pull toward understanding behavior never went away. Keep that in the back of your mind. It'll come back.
I'm a senior Unity developer. Ten years in. ECS/DOTS, URP, the whole stack. Somewhere along the way I developed a design philosophy that became an obsession: procedural worlds built from a single base shape. The hexagonal prism.
Everything in the world — trees, rocks, buildings, characters — is composed of the same hex mesh at different scales, rotations, and colors. One mesh, instanced thousands of times, arranged by code instead of art. It sounds like a constraint. It is. That's the point. Constraints force creativity.
I also developed strong opinions about game design. Automation must never remove the satisfaction of upgrades. Multiple resource loops, always following consumption → resource → reinvestment patterns. Player agency preserved at all times. Games that are "playful but smart" — hypercasual entry points that can evolve into indie depth. My references: Kingdom Two Crowns, Hole.io, Octonaut Odyssey.
That was the philosophy. But I'd been building alone for a long time, and I was starting to wonder if there was a faster way.
In early February 2026, I switched from ChatGPT to Claude. And I did something that, looking back, set the tone for everything that followed.
I didn't ask a question. I uploaded my entire developer profile. My technical background. My design philosophy. My hard constraints: no radial procedural patterns, Poisson/grid-derived randomness only, exposed tuning parameters, event-driven architecture, mobile-first. I gave Claude the full picture.
I wasn't looking for an assistant. I was onboarding a collaborator.
The reason was practical. I'd been frustrated with AI conversations that start from zero every time. You explain your project, your constraints, your preferences, and by the next session it's all gone. I wanted to establish a foundation once and build on it. Claude's conversation style felt right for that — longer context, more nuance, less eager to please.
And then I told Claude about the thing I'd already built.
Here's what I had before Claude even entered the picture. A Python Flask application running 8 specialized AI agents: two Game Designers, a Software Architect, a Gameplay Developer, a UI/UX Designer, an Artistic Director, a Marketing Consultant, and a Marketing Executive. All orchestrated through code, all controllable from a mobile dashboard I'd built.
My role? Boss. Creative Director. The person who approves decisions from their phone while sitting on the couch.
I know how that sounds. Grandiose, maybe. But the vision was genuine: a single creative director commanding an AI team to design, build, and ship games autonomously. I'd already built the orchestrator. I'd already assigned the roles. I had WebSocket-based real-time updates, a distinctive "blob" UI element, a clean database layer. It was real, working software.
What I didn't have was a clear methodology for making it produce good work. The agents could run. They could generate output. But the output was unfocused, inconsistent, and I was spending more time reviewing and correcting than I would have spent just doing the work myself.
I didn't realize it at the time, but that gap — between a working system and a system that produces good work — would become the central problem of the next five months.
Those first conversations with Claude were different from anything I'd experienced with AI. Instead of asking "how do I do X", I was having discussions. Design discussions. Architecture discussions. "What if the engine worked this way?" "What would break if we changed this constraint?"
Claude pushed back. Not in the aggressive way, but in the useful way. When I proposed something overambitious, it would lay out the implications. When I missed an edge case, it would find it. When I had three half-formed ideas competing for attention, it would help me see the shape underneath.
I started treating the conversations themselves as valuable artifacts. Not just the code they produced. The thinking. The decisions. The moments where I changed my mind about something because a question I hadn't considered was raised.
I didn't know it yet, but that instinct — valuing the conversation over the output — would eventually become the foundation of everything.
The studio was running. The collaborator was onboarded. I had years of game development expertise and a clear design philosophy. Everything was in place.
Time to build something impossible.
February 2026 was the most intense two weeks of my professional life. I was sleeping less and building more than I had in years. The energy was unreal. Every morning I'd wake up with three new ideas and by noon I'd have Claude helping me prototype them.
Looking back, it was a manic phase. Not clinically — just the kind of creative hyperdrive you get when you feel like you've unlocked something. Every conversation generated more possibilities. Every prototype opened new doors. I couldn't stop.
Before building anything new, I needed to bring my existing work into the conversation. I had a sophisticated entity management system from my Horizon Worlds days: state machines, object pooling, tweening engines, tag-based indexing. Thousands of lines of TypeScript representing years of iteration.
The translation journey went through three phases, and each one taught me something about working with AI.
First: a direct port from TypeScript to C#. Claude preserved everything — the networking, the async patterns, the complex state transitions. The result was 1,500 lines of faithful translation. And it was completely wrong for what I needed. I was building a single-player game. All that multiplayer complexity was dead weight.
Second: a simplification pass. Strip the networking. Flatten the async. Remove the dirty flag patterns. The codebase dropped to 606 lines. Better, but still carrying the shape of the old architecture.
Third: a complete rewrite for Unity ECS/DOTS. Pure components. Burst-compiled systems. A procedural tree generation system based on actual botanical principles — phyllotaxy patterns, phototropism, gravitropism, branches that reach toward light and roots that fight gravity.
That last rewrite was the good one. Not because it was technically superior (though it was), but because it was the first time I let go of my previous solution and let the new constraints shape the architecture. Claude didn't just translate — it helped me rethink.
Here's where the real obsession lived. The hex engine.
The idea was pure and ambitious: a game engine in TypeScript using bitECS for entity management and Three.js for rendering, where every single visual element in the world is composed of the same hexagonal prism mesh. Trees? Hex prisms stacked and rotated. Rocks? Clusters of hex prisms. Buildings? Hex prisms arranged architecturally. Characters? Hex prisms with personality.
One mesh, one InstancedMesh call, thousands of instances. The constraint forced a particular aesthetic — chunky, geometric, immediately recognizable — and the instancing made it fast.
I stress-tested six completely different game genres against the engine specification. Hex Bloom, a territory game where your influence spreads across a grid. Hex Quarry, an idle builder where you dig and process resources. Hex Siege, a tower defense. Hex Match, a puzzle game. Hex Hive, a merge-and-craft system. Hex Flow, a logistics network.
Each genre pushed the specification further. The territory game needed spread mechanics. The idle builder needed tick-based resource processing. The tower defense needed pathing and waves. The puzzle game needed pattern matching. By the time I'd tested all six, the engine spec had grown into something genuinely comprehensive: a declarative system where game designers write JSON configurations and the engine interprets them into playable experiences.
Midway through the engine work, I hit a problem that kept recurring across every generator. Shapes wouldn't orient correctly. Petals, leaves, stems, architectural elements — they all suffered from Euler angle math that Claude Code would get "almost right" but never perfect.
Instead of fixing each generator individually (which is what I kept doing, over and over), I finally stepped back and asked a different question: why does this keep happening? The answer was Euler angles themselves. Gimbal lock. Ambiguous rotations. The wrong tool for pointing shapes in arbitrary directions.
We developed quaternion-based placement helpers with semantic names — stick, spiky, round, plank, disc — that eliminated manual angle calculations entirely. Then I wrote a "Generator Bible": a reference document that any AI agent must read before creating generators.
Running parallel to the engine work, I built something I was particularly excited about: the Creative Room.
Three AI agents in a persistent loop. A Market Scout tracking trends and researching the competitive landscape. A Game Designer developing concepts and mechanics. A Devil's Advocate testing every idea for feasibility, technical complexity, and market fit.
They'd run continuously — API calls every 15–30 minutes — building on previous conversations rather than starting from zero. When all three reached consensus on an idea, the system would auto-generate a structured Idea Card: pitch summary, core mechanics, monetization hook, build complexity estimate.
This wasn't just brainstorming automation. It was the first time I saw AI agents as a team having discussions, not individual tools executing commands. The quality of their output depended on the quality of their disagreement. When the Devil's Advocate was too polite, the ideas were weak. When it was ruthless, the surviving ideas were stronger.
I was learning something about AI collaboration that I wouldn't articulate until much later: the best results come from structured disagreement, not consensus.
I deployed the whole thing to a $6/month DigitalOcean VPS running Ubuntu 24.04. Flask dashboard on port 80. Claude Code terminal access via ttyd on the /terminal/ path. tmux for persistent sessions. nginx as reverse proxy. SSH keys for GitHub.
I built a custom mobile overlay — sleek black interface with quick command buttons, special key inputs, touch-optimized controls. The whole game studio, accessible from any phone browser.
I'd lie on the couch, open my phone, and I was inside my studio. Reviewing Creative Room conversations. Approving Idea Cards. Checking engine build status. Triggering Claude Code sessions. It felt like the future.
I planned five sequential Claude Code sessions to build out the pipeline: Backend, Dashboard, Engine, Infrastructure, Bridge. Each session had specific scope boundaries and "leash rules" to prevent drift.
I was building the machine that builds the machines. I was feeling unstoppable.
The engine specification was growing daily. The Creative Room was generating ideas. The mobile dashboard let me control everything from anywhere. The deployment was clean. The architecture was sound.
I was convinced I'd figured out the future of game development.
Then reality showed up.
Reality doesn't arrive all at once. It comes in waves, each one a little harder to ignore than the last.
I checked my API usage one morning and nearly fell off my chair. 64.7 million input tokens. 1.15 million output tokens. A massive spike on February 16th that I couldn't even explain. My Claude subscription showed €153.64 in extra usage costs, with only €31.71 left on my monthly cap.
Running 8 AI agents with rich context windows is expensive. I knew this intellectually. I'd even thought about optimization "at some point." Seeing the actual bill made it real in a way that thinking about it never did.
Every system prompt is money. Every conversation history token is money. Every time an agent loads context to understand its role, that's tokens burned. And I had 8 agents, each with detailed prompts, each maintaining conversation history, each with access to engine documentation.
I started thinking about efficiency for the first time. Per-agent API key tracking. Token logging in the Python orchestrator. System prompt optimization. These aren't exciting problems. They're the kind of problems that separate toy systems from real ones.
But the cost wasn't the real wake-up call. It was just the first wave.
I finally sat down to play my own games. All 10 strategy games, generated from JSON configurations by the hex engine.
The bug list was long. Non-functional shops in one game. Broken chain reactions in Hex Bomb. Movement issues in Hex Defense. Resource displays not updating in Hex Territory. None of these were architectural problems — they were the kind of integration bugs you find when you test.
But the bugs weren't the real problem.
The real problem was the camera.
I'd spent significant time building multi-prism visual generators. Beautiful arrangements of hex prisms forming trees, rocks, structures. Carefully tuned scales and rotations to create recognizable shapes from the single base mesh. And from the overhead strategy camera? They were tiny. Unreadable dots. All that visual work was invisible at the distance the games required.
The engine was designed for strategy games with a top-down view. The visuals were designed for close-up admiration. These two decisions had been made independently, and they were fundamentally incompatible.
This triggered a significant pivot discussion. What if the camera followed a player character instead? Like Kingdom Two Crowns, but hex. You'd be down in the world, walking through the hex-prism trees, seeing them up close. It would showcase the visuals at the distance they were designed for, and it would open up hypercasual genres: horde survival, collector games, action titles.
I stress-tested six new player-mode game designs in a single conversation. The engine could handle it with additions — world-space positioning, smooth movement, proximity-based interactions. The ideas were good. But I was already layering new plans on top of unvalidated old ones.
This is the one that hurt.
I was talking with Claude about the games, trying to figure out why none of them produced that feeling. You know the one — the "oh that's cool" moment when you tap something and it feels right, when the world responds in a way that surprises you, when you forget you're testing and start playing.
None of my 10 games had that moment. They were functional. Correct. And completely generic.
That thought showed up in the middle of a conversation and I couldn't unsee it. Ten years of technical expertise. Months of infrastructure. A sophisticated engine with comprehensive type systems. And the output was indistinguishable from what someone with zero game development experience could get by typing a single prompt into Claude.
I sat with that for a while.
February 20th. The lowest point.
What started as a strategic discussion with Claude turned into something closer to an honest reckoning. The question underneath everything was: what am I actually good at?
If AI can generate games from prompts, what's the point of my 10 years of game development knowledge? The knowledge that used to take years to acquire — ECS architecture, procedural generation, physics systems, URP pipelines — an AI can now apply in seconds. My hard-won expertise had become commodity infrastructure.
Claude pushed me on something I didn't want to see. Every model I'd proposed — the game feed, the micro-game chain, the experimentation lab, the platform — had something in common. They were all ways to avoid shipping a game to real people. Layers of abstraction between my work and the market.
Zero people had played my games. Not one stranger. Not a single data point about whether anyone outside my bubble cared.
I was building a machine that builds machines that build games that no one plays.
Every proposed model was essentially a way to avoid direct market contact.
I'd been a developer long enough to recognize the pattern. It's the refactoring trap: making the code cleaner instead of shipping the product. The same impulse, scaled up. Instead of refactoring functions, I was refactoring architectures. Instead of polishing code, I was polishing pipelines. Same avoidance, bigger scope.
But recognizing a pattern and breaking it are two different things. I didn't have an answer yet.
Just the question. Sitting there, uncomfortable, refusing to go away.
When you realize your main project isn't working, there's a phase that looks like exploration but is really closer to panic. You try everything. You generate options at high speed. You tell yourself you're "exploring the design space." What you're actually doing is looking for a feeling — the spark that tells you you're on to something real.
I went through that phase in about 48 hours. It was messy, scattered, and more useful than I expected.
The first thing I challenged was my most fundamental assumption: the hexagonal prism.
I'd built my entire identity around this shape. Every engine, every generator, every prototype used it. But what if the hex wasn't the right base primitive? What if something else would produce more interesting worlds?
Claude and I built an interactive comparison tool. Over 30 different geometric primitives, each building the same island scene: cubes, cylinders, truncated octahedrons, Penrose tiles, tetrahedrons, dodecahedrons. Side by side. Same world, different atoms.
The result was clarifying. No single shape was obviously better. Each had trade-offs. Cubes tile perfectly but feel blocky. Cylinders are organic but don't stack well. The exotic geometries were interesting to look at but impractical to build with.
More importantly, I realized the shape itself wasn't the problem. The problem was that I'd been using shapes as decoration instead of communication.
This led to a genuinely interesting idea. What if different shapes had meaning? Tetrahedrons for danger and damage. Cubes for static structures. Spheres for living entities. The geometry itself becomes a gameplay language. Players learn instinctively: pointy means dangerous, round means alive, blocky means environment.
We built a playable prototype. A sphere player character that morphs toward jagged polyhedrons as health decreases. Enemies that follow the same rule — you can see their health by their shape. Projectiles are sharp tetrahedra. Buildings are cubic. No UI needed. The world tells you everything through geometry.
It was intellectually interesting. Visually, it felt too abstract, too literal. The shapes were the whole experience instead of serving a deeper game world. But the underlying idea — that visual form should communicate game state without explicit UI — stuck with me. It's still in the design philosophy. Just waiting for the right game.
Then I went somewhere completely different.
I'd been thinking about viral content, fast iteration, and games that ride trending moments. What if I created an Instagram account called "Play the Meme"? Take viral memes and videos, turn them into playable mini-games with a consistent geometric aesthetic. The contrast between "dumb internet meme" and "gorgeous low-poly diorama" would be the hook.
The first prototype was a 3D hex-prism scene of a guy falling — warm golden lighting, floating particles, gentle animation, a little bird watching from a tree. All built from the same hex prism geometry. It was beautiful in a way the strategy games never were. Because it was a scene, not a system. A moment, not a mechanic.
Then I built a game about Punch, a baby macaque at a Japanese zoo who carries around a stuffed IKEA orangutan toy. It was trending everywhere. The game: protect baby Punch as he walks to his teddy while mean monkeys try to bully him. Tap to scare the bullies. Score in "HUGS" instead of points. Sadness meter instead of health bar.
I built it twice. Once in 2D with Comic Sans and emoji — intentionally broken, intentionally janky, the meme game aesthetic. Once in 3D with all hex prisms, full juice: squash-and-stretch, screen shake, chromatic aberration, hitstop, particle systems. Both from scratch, in a conversation, in maybe an hour each.
These prototypes taught me something the hex engine never did: the feeling matters more than the system. Punch the monkey had more personality in 200 lines than my engine had in 4,000. Not because the code was better. Because the intent was clearer. I knew exactly what feeling I wanted: protective, cute, a little sad. That clarity made everything else easy.
Somewhere in the middle of all this, I pulled out an old passion project I'd been carrying around for years. Project Monkey: a gibbon ecosystem game.
The concept was rich. Procedural trees growing with real botanical behavior. A gibbon character with interconnected needs — hunger, thirst, energy, mood. Indirect player control where the gibbon has some disobedience built in. An ecosystem loop: eat fruit, poop seeds, new trees grow, forest expands. Documentary-style narration. Minimal UI. Information communicated through the character's behavior, not meters and numbers.
Claude built a working prototype. Procedural branches growing toward light. A gibbon AI swinging between trees based on its needs. Day-night cycles with full sky transitions. The fruit-to-seed-to-tree ecosystem loop. It was rough — coordinate system bugs, trees not growing enough branches for the gibbon to sit on — but you could see the game in there. You could feel the world.
I didn't pursue it further. Not because it was bad, but because I recognized what was happening. I was reaching into the past for comfort. Project Monkey was a known quantity — an idea I'd already fallen in love with. What I needed wasn't a better idea. I needed to understand why none of my ideas were translating into something I could ship.
The last exploration of this phase was the most technically interesting. I'd been thinking about Townscaper, Tiny Glade, and the work of Anastasia Opara — games where every visual element is an individually placed mesh piece rather than a textured surface. Brick-by-brick construction. Walls made of actual stones in running bond patterns. Roofs of individual tiles. Doors assembled from separate planks.
We moved from hex prisms to triangle-based mesh generation. Three iterations. The first was too architectural — realistic buildings that were unreadable at game distance. The second was too simplified. The third nailed it: the true brick-by-brick philosophy where even terrain is composed of individual stone tiles, cliff faces show geological strata as layered rock bands, and vegetation is individual grass blades grouped into tufts.
This was the first time something visual made me pause and think: that's actually beautiful. The approach was right. The aesthetic was distinctive. It just wasn't inside a game yet.
Looking at everything I'd built in those 48 hours, a pattern emerged.
The things that felt alive — the meme diorama, Punch the monkey, the brick-by-brick terrain — all had something in common. Clear creative intent. A specific feeling driving the design decisions. Not "what systems should this engine support?" but "what should this moment feel like?"
The things that felt dead — the 10 strategy games, the endless engine specs — were system-first. They started with architecture and hoped the soul would show up later. It never did.
And then, somewhere in the middle of processing all of this, the real breakthrough hit. It wasn't about shapes or games or engines at all.
The real value wasn't the engine. It wasn't the games. It wasn't the pipeline. It was the conversations.
The methodology I'd developed for thinking through problems, stress-testing ideas, producing clear specifications, and handing off work to AI builders — that was the skill. That was the 10-year expertise showing up. Not in the code. In the plan.
Everything after this moment was different.
Once I saw plans as the primary output, everything reorganized. The engine was a tool. The games were products. But the plans — the thinking that determines what gets built, how it gets built, and how to tell if it worked — that was the real IP.
I spent the next few days building a formal methodology. Not a style guide. Not best practices. A deterministic build system for human-directed, AI-executed projects.
I called it the Oisif Method.
Four principles, arrived at through direct experience with failure.
Planning and building must never happen in the same context. When you're planning, you think in possibilities. When you're building, you think in constraints. Mixing them corrupts both.
I'd seen this fail repeatedly. An agent asked to "design and build a game" would produce mediocre design because it was already thinking about implementation, and mediocre implementation because it was still second-guessing the design. Two separate contexts, two separate agents, clean handoff between them. The quality of both improved dramatically.
The Director frames the problem. The Planner develops the specification. The Builder executes. Each role has clear boundaries and structured handoffs.
This isn't just organizational tidiness. It's a defense against a specific failure mode: meta-task leakage. When a builder starts making design decisions mid-implementation, the result is a hybrid that matches neither the original design nor any coherent alternative. The builder builds what the plan says. If the plan is wrong, we fix the plan in the next cycle. We don't fix it by having the builder improvise.
JSON contracts replace written descriptions. They enforce structure, prevent ambiguity, and enable automated validation.
I built four canonical schemas. A session-brief schema defines what to build — scope, constraints, acceptance criteria, what's in and out. A done-file schema captures what was built — decisions made, discoveries, failures, wisdom for the next person. A feedback-log schema structures problems and suggestions. A stress-test-report schema records adversarial analysis.
Every field was stress-tested. Duplicates removed. Required fields enforced. Domain-specific examples replaced with universal ones. These aren't templates. They're contracts between roles.
Every completed project feeds learnings back into the methodology. The method isn't static. It evolves through use.
The done-file is the mechanism. Each completed build produces a structured record of what worked, what didn't, and what the next agent should know. Over time, these accumulate into the project's institutional memory. The orchestrator can query them when assembling context for future builds.
For game development specifically, I layered a domain module on top of the base method. A 7-phase pipeline.
SPARK: the initial idea. A sentence, a feeling, a reference. SHAPE: design exploration, visual spikes, core loop sketches. FRAME: the full specification, tight enough to build from. BUILD: implementation by a builder agent. JUICE: polish, feel, the stuff that makes a game feel alive. SHIP: packaging, deployment, distribution. LEARN: telemetry, feedback, iteration decisions.
Each phase has kill gates. If the idea doesn't survive SHAPE, it dies. If the spec doesn't survive stress-testing in FRAME, it goes back. Time budgets prevent phases from expanding indefinitely. Explicit deliverables make completion unambiguous.
I wrote efficiency primers for each of my 10 agent roles. Not generic instructions — specific anti-patterns for each role. The Co-Director designs complete core loops in single passes, not iterative essays. It outputs structured Idea Cards, not paragraphs of possibility. The Gameplay Developer writes production-ready code without line-by-line commentary. The Devil's Advocate scores severity instead of offering polite suggestions.
One of the most useful additions: adversarial refinement. A Red Team / Blue Team loop.
One Claude instance builds a plan (Blue Team). Another attacks it (Red Team): logical gaps, unrealistic assumptions, missing edge cases, contradictions, things that would break in practice. The Planner revises. The Challenger attacks again. The loop iterates until the issues are minor.
The key design insight: the Challenger must be asymmetrically aggressive. If it's polite, it finds surface issues. If it's ruthless, it finds structural problems. And you need a convergence signal — severity ratings on each issue — so the loop terminates naturally instead of running forever.
This automated something I'd been doing manually: asking ChatGPT to challenge my plans before giving them to builders. But the automated version was faster, more thorough, and didn't need me to copy-paste between windows.
The first version of the Oisif Method was 1,156 lines. Comprehensive. Clear. I was proud of it.
Then I started expanding it. Adding role-specific documentation. Knowledge layer systems. Deeper sections on each principle. Best practices. Anti-patterns. Examples. The expanded version ballooned to 5,417 lines across 14 files.
I looked at it and felt something familiar. The same feeling I'd had looking at the hex engine when it reached 4,000 lines. The same feeling I'd had when the 8-agent studio was producing output but not good output. Something impressive that wasn't working.
The methodology was too heavy. It was optimizing for completeness instead of clarity. Every edge case covered. Every scenario anticipated. And in doing so, it had become exactly the kind of document that an AI agent would struggle to internalize — too much information, too many rules, not enough room to think.
Version 2 took the opposite direction. Not more specification. Less.
The insight was counter-intuitive but backed by direct experience: Claude already knows how to plan and build. It's been trained on millions of projects. What it needs from me is not a rulebook. It needs clear framing of the problem, real constraints that matter, and a definition of done. Everything else fills the context window without improving output quality.
Instead of 47 behavioral rules, I defined a personality. Not "follow these instructions" but "you are a craftsperson who cares about clarity and ships working software." The personality approach is self-correcting in ways rules can never be. A rule says "don't add unnecessary abstractions." A personality says "you're the kind of builder who values simplicity." The first requires checking against a list. The second generates the right behavior naturally.
Three tiers of cognitive depth. AWARENESS: what the agent needs to know exists. CONTEXT: what it needs to understand to work effectively. MASTERY: deep domain knowledge for specific roles. Each agent only loads the layers relevant to its work.
A builder doing backend work needs MASTERY of the tech stack but only AWARENESS of the business model. A planner needs CONTEXT on technical constraints but MASTERY of the problem domain. Matching depth to role keeps context windows lean.
The most important addition was the smallest. The Builder now ends each completed task with a "Suggested Next": one or two sentences about what it sees as the most natural next step. Based on what it just built and what it discovered during the build.
Not a plan. A suggestion the Director can use or ignore.
This tiny change transformed the dynamic. The builder isn't just executing anymore. It's thinking about what comes after. It's looking at the work it just completed with the best context anyone could have — because it literally just did the work — and offering a perspective. The Director still decides. But now the decision is informed by someone who was just in the trenches.
Command-and-execute became collaboration.
That became the governing principle. If I need more words to explain the process, my thinking isn't clear enough.
There's a particular kind of momentum that happens when a methodology clicks. You stop thinking about one project and start seeing every project through the same lens. The method becomes a multiplier. And then, if you're not careful, the multiplier starts multiplying itself.
That's what happened next.
The Oisif Method was domain-agnostic by design. It described roles, handoffs, schemas, and quality loops. Nothing in it was specific to game development — the game framework was a module that plugged in, not the core.
Once I saw that clearly, the scope expanded fast. If the method works for games, it works for any AI-executed project. Websites. Tools. Data pipelines. Digital products of any kind. Oisif wasn't a game studio anymore. It was a creation ecosystem.
I architected eight repositories with clear separation of concerns:
The method itself, sacred and versioned. Trust and verification protocols for AI-to-AI interactions. An IP vault for all game outputs. A multi-agent orchestrator. Claude Bridge — a phone-first web app giving Claude Code persistent memory, the next thing to build. A space for non-game products. A dormant marketplace. And the hex engine, frozen and archived.
Each repo had its own CLAUDE.md — a constitution file that any Claude Code session reads before touching anything. Each had licensing. Each had its place in a dependency graph: T0 foundation, T1 multipliers, T3 endgame.
It was clean, professional, enterprise-grade architecture. The kind of thing you'd show a client.
The kind of thing that can also be a trap.
This is where the ideas started compounding dangerously.
What if there was a marketplace where AI agents buy and sell from each other? Not a traditional code marketplace — an AI-native one. A seller's code could be inspected by a disposable, read-only AI sub-context that evaluates fit against buyer criteria, then self-destructs without leaking the source code. IP protection through isolation instead of obfuscation.
I designed a feedback system where buyers send structured integration reports in exchange for discounts, helping sellers improve their code while maintaining protection. I designed blockchain smart contracts for the enforcement layer: payment escrow, ownership proof through content hashing, immutable reputation systems, zero-knowledge proofs to verify that evaluation sub-contexts actually self-destruct after inspection.
Then I went further. A compound knowledge economy. Every project generates tracking documents — what worked, what failed, what the next person should know. These documents become increasingly valuable as they accumulate learnings. A project that's been attempted three times, with three sets of failure documentation, is cheaper and faster to execute the fourth time. The tracking documents themselves become tradeable assets.
I designed gamification layers: micro-funding with €0.01 taps, community voting to distribute platform revenue, creator reinvestment loops. A public marketplace alongside a private client side offering transparent compute pricing for AI-built products.
Each concept was more sophisticated than the last. Each plan was more detailed. I was deep in the architecture, designing systems that would interlock beautifully.
Around the same time, I started seriously mapping out a transition to AI solutions consulting. Not as an abstract career goal — as a concrete plan with a 16-week roadmap.
The logic was sound. The skills I'd developed weren't theoretical. I'd actually built multi-agent systems from scratch. I'd dealt with real API costs at scale. I'd solved real orchestration problems where agents need to hand off work without losing context. I'd developed a methodology for managing AI teams through structured plans.
That combination is rare. Companies are scrambling to figure out how to use AI agents effectively, and most of them are at the "let's have ChatGPT write some code" stage. I'd gone through the full evolution: from that stage, through the failures, to something that actually works. That journey has consulting value.
The roadmap targeted premium rates. Portfolio projects across high-paying industries. Content and community for visibility. A positioning as the person who's actually done it, not the person who read about it.
But I also recognized something about my own pattern. Consulting roadmaps are another form of planning. Another layer of abstraction. Another thing that isn't shipping.
Here's what was really happening. I was caught in a planning spiral.
Each new idea generated three more ideas. Each plan revealed new infrastructure needs. Each infrastructure piece suggested new capabilities. The method spawned the ecosystem which spawned the marketplace which spawned the contracts which spawned the blockchain layer. Every thread was interesting. Every thread made me feel productive. And every thread took me further from producing anything a stranger could use.
The Oisif ecosystem was growing into a comprehensive vision of AI-driven creation. But visions don't validate. Products do.
And then I did something I should have done months earlier. I searched for what already existed.
Agent orchestration frameworks? Everywhere. LangChain, CrewAI, AutoGen. Plan-first development pipelines? Multiple companies were building them. AI marketplaces? AWS had one. Microsoft had one. Contra had just launched agent-native payments. Databricks was in the game.
Almost everything I was thinking about was already out there. The ideas I thought were uniquely mine were convergent evolution. The whole industry was heading in the same direction. Or maybe I was heading in the same direction as them — hard to tell when you've been in a bubble for months.
This could have been crushing. It wasn't, exactly. Because the flip side of convergent evolution is validation. If everyone's building the same thing, the problem space is real. The question isn't whether these ideas have value. It's whether my specific version adds something the others don't.
The real test came when I went back to building.
I took my detailed, beautiful plans and handed them to a builder agent. The builder respected every specification. Every constraint honored. Every deliverable produced exactly as described.
The result was technically correct and completely lame.
Missing basic things any developer would have added instinctively. No error handling where it was obvious. No common-sense optimizations. No initiative. The builder didn't think "the plan doesn't mention this, but obviously it needs it." It just... applied.
The builder had no thinking anymore. It was just executing.
Over-specifying hadn't just constrained the output. It had killed the agent's intelligence. I'd written plans so detailed that there was no room left for the agent to be smart.
The planning method found its limit. Not too little specification. Too much.
That was the moment the ecosystem vision crystallized into something real. Not as a set of repositories and schemas and marketplace designs. As a methodology for collaboration. A way of working with AI that's rigorous enough to produce consistent results and flexible enough to let the agent be smarter than you.
Everything before this was preparation. Everything after was building toward that balance.
This is the last piece. Not because the journey is over — it's not even close. But because this is where the thinking lands. The lessons from five months of building, breaking, and rebuilding, distilled into the things I believe are actually true.
Some of these I learned from success. Most of them I learned from failure.
Somewhere between the first scrapped prototype and the over-specified second attempt, I tried the thing everyone tries: full automation.
Let the agents loop on their own. One designs, another builds, another reviews, feed the output back in, repeat. No human in the loop. Just AI talking to AI, converging on a solution. Maximum efficiency. The dream.
It doesn't work. And it fails in a way that's hard to see until you've burned real money and real time on it.
The problem isn't that agents make mistakes. Humans make mistakes too. The problem is that agents don't doubt.
When you put two agents in a loop, they validate each other. One proposes, the other agrees or suggests minor adjustments, and they spiral deeper into a direction that sounds increasingly sophisticated and is increasingly disconnected from what you actually need.
Each iteration looks like progress. The code gets cleaner. The architecture gets more elegant. The language gets more confident. And the whole thing drifts further from the point.
Agents don't step back and ask "wait, is this actually what we should be building?" They don't feel that nagging sense that something is off even when every metric looks fine. They don't have the moment at 2am where you realize the whole direction is wrong, not because of a bug or a failed test, but because of a gut feeling you can't articulate yet.
They don't doubt. And doubt, it turns out, is not a weakness. It's the mechanism by which intelligence corrects itself.
A human brings something agents genuinely can't simulate: the ability to look at something technically correct and well-structured and say "I don't know why, but this isn't right."
That's not irrationality. That's pattern recognition operating below the level of conscious analysis. It's the accumulated experience of 10 years of debugging, refactoring, shipping, and watching things break in production — compressed into an instinct that fires before you can explain why.
Agents can't do that. They can analyze. They can compare against criteria. They can find logical inconsistencies. But they can't feel that something is wrong. And feeling is faster and more accurate than analysis for the kind of problems that matter most: direction problems. Are-we-building-the-right-thing problems. The problems where the spec is technically correct but the intent has shifted.
This is why the Director role in the Oisif Method isn't optional. It's not a management layer added for organizational tidiness. It's the part of the system that has the capacity to doubt.
The Director checks intent, not output. The agents can check output — they can verify that code compiles, tests pass, specs are met. What they can't check is whether the intent is still valid. Whether the problem being solved is still the right problem. Whether the project should continue at all.
The Director brings the willingness to throw away good work because the intent shifted. The deep thinking that happens not in the conversation but in the silence between conversations — when you're walking or staring at the ceiling and your brain is quietly reorganizing everything you've been processing.
That's why the method has the Frame → Build → Done loop with the Director at the gate, not inside the loop. The agents build. The Director decides whether to continue, redirect, or stop. The decision isn't based on metrics. It's based on judgment. And judgment requires doubt.
Looking back, I can see three clean phases in how I failed, and each one taught me something different about the human-AI relationship.
My first prototype. Almost no specification. The agent had freedom to make any decision. The result was a mess: dead code, conflicting systems, architectural drift. The agent was smart but unguided. It made locally reasonable decisions that were globally incoherent. Without a human checking direction, every fork in the road was a coin flip.
Lesson: freedom without framing produces chaos.
My second project. Every detail specified before a single line was written. The result was clean, correct, and lifeless. The agent executed faithfully. But it had no room to add value beyond what I'd already thought of. Every decision was pre-made. The agent was a transcription machine, converting my plans into code without any intelligence of its own.
Lesson: control without space kills the agent's ability to be smart.
The AI-only loops. Agents validating each other, spiraling deeper into sophisticated-sounding nonsense. Burning tokens on convergence toward the wrong destination. The system looked productive. The output looked polished. And the direction was completely wrong.
Lesson: agents without doubt will confidently build the wrong thing, beautifully.
The method I'm building now lives in that space. Not a blank canvas. Not a paint-by-numbers. A frame.
The Director frames the problem: what are we building, what constraints matter, what does done look like. The Planner develops the approach: within the frame, apply intelligence, make choices, structure the work. The Builder executes: within the plan, apply craft, make tactical decisions, flag surprises. And the Builder ends with a Suggested Next: here's what I'd do after this, based on what I just learned.
At every boundary, a human checks intent. Not output. Intent. Is this still the right thing to build? Does this still serve the purpose we started with? Has something changed in the time since we planned this?
The human isn't doing the work. The human is making sure the work is worth doing.
Remember when I said I almost became a psychologist? This is where it comes back.
I'm not debugging code anymore. I'm debugging behavior. Not in people — in agents. Why did it make that choice? What in the context pushed it in that direction? How did the framing of my question shape the answer I got back?
I used to spend my days understanding what machines do: tracing execution paths, analyzing state, predicting behavior from code. Now I spend them understanding how they think: how personality in a system prompt changes output, how the sequence of information affects decisions, how too much context is worse than too little for the same reason that too many rules are worse than too few.
The parallel with psychology isn't a metaphor. It's operational. Understanding why an agent behaves the way it does requires the same skills as understanding why a person behaves the way they do: observation, hypothesis, experiment, empathy. Empathy for a system that doesn't feel, applied to predict behavior that emerges from structure rather than emotion.
It turns out, the psychology I didn't study is the skill I need most.
Five months in, here's what I think working with AI actually is.
It's not coding. Not in the traditional sense. The code is the cheap part now. Any LLM can write code. What's expensive is the thinking that determines what to code. The plans. The framing. The judgment calls.
It's not managing, either. It's not assigning tickets and checking deliverables. It's more intimate than that. You're shaping intelligence. Tuning how another mind approaches problems. Deciding what to include in its context and what to leave out, knowing that those decisions change its behavior in ways you can't fully predict.
It's empirical. We describe what we want. We ask and hope to be understood. We interpret results and adjust our language. We build relationships with systems that have no memory of us, and somehow the quality of the relationship still matters. It's less like engineering and more like conversation.
And it's fundamentally human. In an era when machines can do almost everything we used to do with our hands, the skills that matter are the ones that were always uniquely human: clarity of thought, quality of framing, the honesty to admit when your plan isn't good enough, and the willingness to sit with doubt long enough for the right answer to emerge.
Claude Bridge is next. A phone-first web app that gives Claude Code persistent, folder-scoped memory across sessions. Hierarchical context that flows upward through project structures. Semantic search. Git-versioned artifacts.
It's the first real project to go through the Oisif Method v2 from start to finish. If the method works, Bridge will validate it. If the method has gaps, Bridge will reveal them. Either way, the loop continues. The done-file feeds back into the method, and the next project starts from a better foundation.
Beyond Bridge, the path branches. Consulting, where the methodology and the experience become a service. More games, where the creative vision gets another chance with better tools. The marketplace, maybe someday, when there's enough completed work to make it real instead of theoretical.
I don't know which branch I'll take. Five months ago I would have tried to plan all of them simultaneously. Now I know better.
I started this journey wanting to build games. What I'm actually building is a way of thinking. The games will come. The methodology will evolve. But the practice of learning to direct intelligence that's different from your own — that's the skill of the decade. And it's a fascinating journey.
After all, that's how I've always done it.
I had the tokens. The color palette, the shadows, the fonts, the noise texture. A graphical bible that could make anything look like it belonged to Oisif. What I didn’t have was a way to show it all. A home. A surface where the projects, the articles, the about page, the prototypes could live together without fighting each other.
So I did what I always do. I built eight different answers and hoped one would stick.
The first was the Board — a semantic zoom canvas with circle packing. You float through a space of nested circles, each one a category. Zoom in and the circles reveal cards. Zoom deeper and you get reader overlays, embedded iframes, markdown rendered in real time. It was technically impressive. DOM pooling, lazy loading, infinite canvas with momentum. You could explore an entire site map by zooming.
The Telescope was next. Hierarchical drill-down. Click a node, the parent shrinks behind you, the children expand forward. Neat transitions. Clean hierarchy. But it felt like navigating a file system with extra animation.
The Flow prototype turned navigation into rivers. Literal streams of particles flowing between pool nodes. You follow the current to move between sections. Beautiful to watch, confusing to use. The metaphor was stronger than the utility.
The Compositor was the most ambitious. An AI scene renderer — you type what you want to see, Claude generates a layout from a module registry, and the interface rearranges itself in real time. Terminal here, chat there, files in a sidebar. A natural language UI. Fascinating as an experiment. Completely impractical as a landing page.
Then the Lens. Same data viewed through four perspectives: design, structure, status, connections. Switch lenses and the same project cards morph — different colors, different information, different layout. A powerful idea for a dashboard. Too conceptual for a homepage.
The Stack was physical. A deck of cards. Swipe to fling the top card behind, fan them out to see all at once, tap to pick. Satisfying to interact with. But it hid content behind gestures. You had to know the cards were there to find them.
The Dial turned everything into radio frequencies. Drag a tuner across a continuous track, content crossfades between bands. Identity at 88.1, Structure at 91.4, Status at 94.7. Snap to the nearest station on release. A gorgeous interaction. An absurd navigation model.
The Fractal Book was the closest to something real. One page where your zoom level determines information density. Zoomed out: project names. Zoomed in: session details. Zoomed all the way in: individual chat messages. A book that gets more detailed the closer you look. I almost kept this one. It was the candidate for Bridge’s navigation.
They were all clever. Every single one. And that was the problem.
Each prototype was a solution looking for someone to impress. They optimized for the moment of discovery — the “oh, that’s cool” when you first see a canvas zoom or a card fan or a radio dial. But none of them optimized for the moment after. The “ok, now where’s the about page?”
Navigation should disappear. You shouldn’t have to learn a metaphor to find content. You shouldn’t need to understand that zooming means drilling down, or that dragging means tuning, or that swiping means cycling. These are fun interactions. They are not good information architecture.
I’d been designing for novelty when I should have been designing for familiarity.
The answer had been sitting in front of me since 1995.
Windows. Actual windows. Title bar at the top with a name. Three buttons: minimize, maximize, close. A body with content inside. Drag to move. Click to bring to front. A taskbar at the bottom to open and switch between them.
That’s it. That’s the whole interaction model.
Everyone on Earth knows how to use this. You don’t explain it. You don’t teach it. You see a window, you know it moves. You see a title bar, you know it drags. You see an × button, you know what it does. Thirty years of muscle memory, free of charge.
I built it in one session. Dark background as the desktop. A main window centered with the worm logo and a description. Three more windows — About, Articles, Work — hidden by default, spawned from the taskbar at random positions when you click their buttons. A clock ticking in the corner. The whole thing in Oisif’s design language: neumorphic shadows for the window chrome, Doto font for the labels, the warm gold accent on active states.
It looked right immediately. Not because it was novel. Because it was obvious.
The real breakthrough wasn’t the windows. It was what goes inside them.
The About window embeds my personal site. Not a copy of it, not a redesigned version — the actual site, live, in an iframe. Click the links, scroll the content, it’s all there. The Articles window embeds the blog — the full single-page app with hash routing, article reading, series navigation. All functional. The Work window has project cards.
Each window is a viewport into something else. The desktop is just the frame — the chrome that holds everything together. The content can be anything. A website, a blog, a tool, a prototype. If it runs in a browser, it fits in a window.
This is what the eight prototypes were trying to do with zoom levels and lenses and radio frequencies. They were trying to solve the problem of showing different types of content in one place. The windows desktop solves it by not solving it. Each piece of content stays exactly what it is. The desktop just gives you a way to arrange them on screen.
I could feel embarrassed about building eight prototypes to arrive at something this simple. But I don’t think the exploration was wasted.
The Board taught me about spatial navigation and DOM performance. The Telescope taught me that hierarchy needs to be flat to feel fast. The Flow taught me that metaphors can be beautiful and useless at the same time. The Compositor taught me that AI-generated layouts are a fascinating idea for tools, not for landing pages. The Lens taught me about multi-perspective data. The Stack taught me that hiding content behind gestures is a trap. The Dial taught me that continuous interaction is addictive but directionless. The Fractal Book taught me that zoom-as-navigation almost works — almost.
Each one was a wrong answer that clarified the right question. The question wasn’t “what’s the most interesting way to navigate?” It was “what’s the most invisible way to hold everything together?”
And the answer is a window manager. Draggable rectangles with title bars. The most boring, most proven, most invisible interface pattern in computing history.
There’s something satisfying about rendering a Windows 95 desktop in a design system that looks nothing like Windows 95. The window chrome is dark, neumorphic, with the subtle 3D edge highlights of a surface catching light from above. The title bars use Doto in uppercase with generous letter spacing. The buttons are tiny raised squares that press inward when clicked. The taskbar has an inset clock and raised buttons that glow gold when active.
It’s the interaction model of 1995 wearing the skin of 2026. The familiarity of the desktop makes the dark aesthetic feel approachable instead of intimidating. The Oisif tokens make the desktop feel authored instead of retro. Neither style would work alone. Together, they click.
And I think that’s the real lesson. Aesthetic search isn’t about finding something new. It’s about finding the right combination of old and new. The innovation isn’t in the interaction. The innovation is in the presentation. You keep what works — windows, dragging, a taskbar — and you make it yours.
Eight prototypes. Hundreds of lines of JavaScript. Fractal zoom engines and radio tuners and AI scene renderers. And the answer was a title bar and an × button.
Sometimes the longest path leads to the simplest place.