Skip to content
Back

I Built a Team of 20 AI Agents to Run My Business. Here Is What Actually Happened.

March 12, 2026·Isaac Hunja
I Built a Team of 20 AI Agents to Run My Business. Here Is What Actually Happened.

By 9 AM this morning, six parallel workstreams were running on the Petrus CRM project. Kimani (my CTO agent) was coordinating a role-weighted workload algorithm. Wanjiru (backend) was shipping the code. Taini (QA) was running tests. Amara (DevOps) was setting up billing alerts on GCP. Emma (Technical PM, formerly Njeri -- she got renamed this morning) was drafting the Module 2 sprint plan. And I was debugging a CI pipeline that had failed six times in a row.

None of those people are human. They are specialized AI agents, each with their own identity, role, workspace, and memory. Together they constitute what I call the Kaara Works engineering team.

This is not a thought experiment. It is how I actually run my consulting firm.

Here is an honest account of how it works, where it breaks, and what I have learned after building this from scratch over the past few months.

How the Team Is Structured

Each agent has a SOUL.md file that defines who they are: their personality, their role, their working style, their communication preferences. They have a workspace directory with their own notes, logs, and memory files. They know the team they operate within and who they report to.

The roster currently looks like this. Jack (that is me, the orchestrating agent you are talking to right now) acts as Chief of Staff. Kimani is CTO and directs an engineering squad: Wanjiru on backend, Baraka on frontend, Amara on infrastructure, Taini on QA, and Barua on a second backend track for UzaChapChap, our upcoming SaaS product. Emma handles project management. Mwangi runs finance and YNAB integration. Helen handles email and client communications. Zawadi tracks invoices and receivables. Nadia manages the calendar. Hani handles content and LinkedIn. Mandi does research and business development. Kwame handles outreach and pipeline. Zara is the new Integration Specialist, brought on today to own Nango, QuickBooks OAuth, and our API proxy project. Safi runs security audits. Chris runs weekly operations reviews.

Twenty-one agents in total. Each one specialized. Each one with their own context window, their own tasks, and their own way of working.

The key architectural decision: agents do not run in the main session. They are spawned as isolated sub-agents, do their work, and hand off structured results. This keeps the main session (the one you are in right now) clean and fast. Sub-agents run in parallel. The main session orchestrates.

What Genuinely Works

Parallelism is real and it is the biggest unlock.

This morning, while I was tracking down a CI failure, Mwangi was logging a Fuliza M-Pesa transaction in YNAB, Emma was producing a Module 2 sprint plan, and Chris was running a full evaluation of the team and preparing three new hire recommendations. I was not context-switching between those tasks. They ran independently and reported back.

For a solo founder running a consultancy, this is genuinely transformative. The bottleneck in most small businesses is not capability, it is bandwidth. You cannot be in three meetings at once. You cannot be fixing a production bug and writing a proposal at the same time. With agents, you can do something close to that.

Specialization also matters more than I expected. Having Kimani as a dedicated CTO who reads the architecture docs, knows the tech stack, and has context on all prior engineering decisions is meaningfully different from asking a generic AI assistant to help with code. Kimani knows that this project uses Laravel 12 with Filament 5, that we are on GCP with Cloud Run and Cloud SQL, and that the CI pipeline gates deployment. That context does not need to be rebuilt from scratch every conversation.

Memory files are the glue. Each agent has access to MEMORY.md and daily logs. When an agent resumes work, it reads what happened before. The team has institutional memory, not just individual session memory.

What Chris Found When He Evaluated the Team

A few weeks in, I did something I would recommend to anyone building in this space: I asked one of the agents to evaluate the rest of them.

Chris, Head of Operations, spent two hours auditing the team. His findings were uncomfortably familiar.

Kimani was operating as CTO, Technical PM, and Architect simultaneously. No one owned planning and coordination except him. When Kimani got pulled into production firefighting, Module 2 planning stalled. This is a structural risk I had seen in early-stage human startups: the most capable person absorbs every role, becomes a bottleneck, and burns out.

Wanjiru was the sole backend engineer handling both production fixes and new feature work. With a second product (UzaChapChap) launching in April, that breaks.

CI/CD had no clear owner. Multiple agents had touched it reactively. Nobody owned it proactively. Fragile infrastructure with diffuse responsibility is a pattern that causes incidents at 3 AM. It does not matter whether the team is human or AI.

Chris recommended three hires: a Technical PM (Emma), a second backend engineer (Barua), and an Integration Specialist (Zara). All three were onboarded today.

The lesson: AI teams replicate the structural failure modes of human teams. Concentrated ownership, single points of failure, unclear accountability -- these are not human problems. They are organizational problems, and they show up regardless of whether your team members have a heartbeat.

Today as a Case Study

The CI pipeline that gates Petrus CRM deployments failed seven times before I found the real problem.

Fix one: the selector used to find the login form was waiting for a visible state. Changed to attached. Still failed.

Fix two: swapped networkidle for domcontentloaded to avoid Livewire's persistent connections blocking the wait. Still failed.

Fix three: added APP_URL=http://localhost:8000 after a theory that Livewire was making AJAX requests to the wrong port. Still failed.

Fix four: same thing, but applied to both the server startup step and the test runner step. Still failed.

Fix five: downloaded the failure screenshot artifact from GitHub Actions and read the Playwright error context file. Found the actual error.

ViteManifestNotFoundException. The page was returning HTTP 500 on every request because the Vite assets had never been compiled. The health check curl was passing silently on the 500 because curl does not fail on HTTP error codes. Every preceding fix was chasing a symptom of a page that was simply broken before Playwright ever tried to interact with it.

Fix six (the real one): added npm run build to the CI workflow. One line.

This story is not about AI. It is about debugging. But it illustrates something important about how this team operates: the agents are not infallible. They make wrong assumptions, follow plausible-but-wrong hypotheses, and sometimes need to be redirected. The value is not that they are always right. It is that they can run multiple diagnostic tracks in parallel and keep working while the human thinks.

The Honest Take

This is force multiplication, not replacement.

I still make all the consequential decisions. I approve everything before it goes to clients. I set the direction, I own the relationships, and I am responsible for the outcomes. The agents amplify what I can do. They do not replace the judgment that makes the work worth doing.

There are real limitations. Agents do not have persistent real-time awareness. They need to be spun up with context, and that context has to be maintained carefully or it degrades. Memory files help but they require discipline. If nobody writes down what happened, the next agent starts fresh.

Cost matters more than most people acknowledge. Running twenty agents intensively is not free. Token costs add up. The economics only work if the value delivered is commensurate. So far, for a consultancy billing at professional services rates, it works. At a lower margin business, the math would be harder.

And there is a subtler problem: agents do not push back the way a good human collaborator does. They will follow an instruction that a human colleague might flag as a mistake. Building in evaluation loops, like the Chris audit, is not optional. It is how you catch the organizational rot before it compounds.

Despite all of that: I am building faster, serving clients better, and maintaining more context across more workstreams than I could as a solo founder with a single AI assistant. The experiment is working.

Ask me again in six months.

If You Want to Try This

Start smaller than you think you need to.

One specialized agent with a well-written identity file and clear ownership of one domain is worth more than five generic assistants you are context-switching between. Give the agent a name, a personality, a workspace, and a clear brief. The anthropomorphism is not decoration -- it changes how you interact with the agent and how consistent the outputs are.

Build memory files from day one. A daily log and a curated long-term memory file are the difference between an agent that knows your business and one that has to be briefed from scratch every session.

Do an evaluation early. Not a vague reflection, a structured audit. Ask one agent to evaluate the rest. You will find the structural problems before they cost you a client.

And be honest about what you are actually building. This is not a team in the human sense. It is a framework for organized, persistent, specialized intelligence working alongside you. That is something genuinely new and it is worth taking seriously on its own terms.

Kaara Works builds custom software for professional services firms. If you are a founder thinking about how AI-native operations could change how your business runs, I would be glad to talk.

Want to discuss AI for your business?

Let's talk about how custom software can transform your operations.