Skip to main content

Command Palette

Search for a command to run...

Why I built a boring AI company in the age of ChatGPT

Updated
9 min read
Why I built a boring AI company in the age of ChatGPT
S
Co-founder and CEO at Hashnode

tldr: Every founder I know is building AI agents and copilots. I built a QA automation company. While vibe coding and AI coding tools are accelerating development, nobody's asking who catches the bugs. That's the real opportunity.


The age of vibe coding

It's 2026. Cursor AI is everywhere. Claude Code has 183K monthly searches. GitHub Copilot is table stakes. Engineers are vibe coding, using AI to generate entire features from natural language prompts, in an afternoon. AI pair programming went from party trick to default workflow.

About 80% of YC's Winter 2025 batch was AI-focused. W26 doubled down, with roughly 60% AI companies across 196 startups, 14 of them crossing $1M ARR before Demo Day. AI agents that schedule your meetings. AI agents that write your emails. AI agents that talk to your database. The pitches blur together after a while.

And I'm over here building… a testing company. E2E testing and QA automation for other companies. On purpose. With a straight face. While everyone around me was chasing the next ChatGPT wrapper.

My co-founder Sandeep and I could have built anything. We've been running Hashnode for five years, a developer media platform with millions of users. We had the network, the credibility, the technical chops to build something flashy. Something that demos well at a conference. Something that makes investors lean forward.

We chose testing.


AI isn't replacing developers. It's replacing their caution.

The "AI replacing developers" conversation misses the point entirely. AI didn't replace developers. It replaced the slow, careful part of development. The part where you re-read your code. Test the edge case. Check the mobile view. Click through the flow one more time before merging.

That part is gone now.

GitHub's own research claims developers complete tasks 55% faster with Copilot. But an independent study by METR found experienced developers were actually 19% slower with AI tools on real codebases, despite believing they were 20% faster. Developers thought they were flying. The stopwatch said otherwise.

Meanwhile, GitClear's analysis of 211 million lines of code found code duplication increased 4x and refactoring dropped from 25% to under 10% of changed lines between 2021 and 2024. More code, lower quality. And Stack Overflow's 2025 Developer Survey found only 3% of developers highly trust AI-generated output, while 46% actively distrust it. The people writing the code don't trust the code.

Vibe coding is real. I use Cursor and Claude Code every day. They're great for velocity. But velocity without verification is just shipping bugs faster.

Every team using AI coding tools now has a QA gap they didn't have 18 months ago. More code, same (or fewer) people checking it. That gap is the entire business case for Bug0.


Why "boring" wins

Bug0.com's screenshot

The best infrastructure companies solve problems nobody wants to think about.

Stripe: payments. Datadog: monitoring. Vercel: deploys. Cloudflare: security. Nobody tweets about their payment processor. But try ripping Stripe out of a production app. You can't. That's the power of boring. Once it works, nobody touches it.

QA automation is the same category. No engineer wakes up excited about regression testing. But every broken deploy, every customer-facing bug, every 2am Slack message that starts with "hey, is checkout broken?" traces back to missing test coverage.

Automated testing isn't exciting. It's load-bearing.

Here's what I've noticed about AI agent startups: most of them are competing with OpenAI, Google, and Anthropic. They're building thin wrappers around foundation models and hoping the model doesn't eat their lunch. Good luck with that.

QA automation has a different problem. The old guard, tools like LambdaTest and Testsigma, are script generators. They output Cypress, Selenium, or Playwright scripts. Sounds useful until you realize scripts are the reason QA has never seen true automation. Scripts break when the UI changes. Scripts need an engineer to maintain them. Scripts pile up into a maintenance backlog that nobody wants to own. Every team that's tried "automated testing" with script-based tools knows the pattern: you spend a month writing tests, two months maintaining them, and then you abandon the suite entirely.

Bug0 works differently. Our core engine is Passmark, which we open-sourced. Playwright under the hood, but tests are defined as natural language steps, not scripts. Instead of page.click('[data-testid="submit-btn"]'), you write { description: "Add to cart", waitUntil: "My Cart is visible" }. When a button moves or a form gets redesigned, the intent is preserved. The test auto-heals, adapting to UI changes without anyone touching it. Self healing tests. That's why test suites built on Passmark don't get abandoned after six months like script-based ones do.

Passmark.dev's screenshot

Passmark solves the speed problem too. On first run, AI agents navigate your app and cache every action to Redis. On subsequent runs, those cached actions replay at native Playwright speed, milliseconds per step, with zero LLM calls. When UI changes break a cached step, AI re-engages for that step only and updates the cache. You get the flexibility of AI with the speed of hand-written Playwright.

For assertions, Passmark runs a multi-model consensus engine. Claude and Gemini evaluate each assertion independently. When they disagree, a third model breaks the tie. Tests pass only on agreement. This is how you get deterministic results from non-deterministic models.

But the engine is only half the story. Testing needs to be deterministic yet fast, and it also needs human judgment at the end. AI can discover flows, write tests, maintain them, and run them at speed. What AI can't do is tell you whether a failing test is a real bug or a false alarm. That last-mile call is human. Every Bug0 customer gets a dedicated Forward-Deployed Engineer for human-in-the-loop verification. Your FDE reviews every run, files bugs with real repro steps, and brings peace of mind to every deploy. We're not selling a testing tool. We're selling the confidence that your product works.

Boring markets have real money. The software testing services market is valued at $50.7B in 2025, projected to hit $107.2B by 2032 at 11.3% CAGR. The software testing services segment alone is growing by $27.6B between 2025 and 2030. Real customers who pay every month because the problem never goes away. The kind of revenue that doesn't depend on hype cycles.

Y Combinator sees it too. Garry Tan put it plainly in YC's Spring 2025 Request for Startups: YC wants founders who treat AI agents not as features but as the core operating system of brand-new companies. Jared Friedman calls these "full-stack AI companies". Not startups that sell AI tools to existing businesses. Startups that become a more productive version of the existing business itself.

That's exactly what Bug0 is. We're not selling a testing tool to QA teams. We are the QA team. AI-native, forward-deployed, operating the entire testing function with a human who owns the outcome. The "boring" version of YC's thesis, applied to QA.


What boring looks like on a Tuesday

Let me tell you what I actually spend my time on. Because this is the part that never makes it into founder Twitter threads.

We wire up E2E testing in CI/CD pipelines. Regression testing on every pull request. When something breaks, we catch it before it hits production. I've never once tweeted about this. It's also the thing customers thank us for most.

Our Forward-Deployed Engineers sit in customer Slack channels. They attend standups. They file bugs with video and repro steps. They work in the customer's timezone, in their sprint. Last month, one of our FDEs caught a payment flow regression 4 hours before a customer's product launch. That's not a scalable story for a pitch deck. It's the reason that customer renewed.

We debug flaky test infrastructure on a Saturday because a customer has a Monday release. We argue about whether an assertion confidence score of 82 is high enough to pass. We read through Playwright changelogs so our customers never have to.

The agentic AI hype cycle rewards demos. This work rewards showing up.


The math nobody's doing

A fully-loaded QA engineer costs \(130-150K/year. Base salary averages around \)101K according to Glassdoor, and that's before the 30-40% overhead for benefits, taxes, and equipment. Add \(5-15K for a test automation tool license. Another \)3-10K for cloud test infrastructure. Then add the opportunity cost: developers spending 30-50% of their time on bug fixes and unplanned rework instead of building features.

Total: $150K+/year. And that's one engineer. Who needs to ramp up. Who might leave in 18 months.

Bug0 replaces that for $30K/year. One flat subscription covers the QA engineer, the automation platform, the AI, and the infrastructure. 100% of critical user flows covered in 1-2 weeks. Full application coverage in 4 weeks. Not months. Weeks.

An 80% cost reduction isn't a pitch deck number. It's what our customers actually experience. When your alternative is "$150K and 6 months to maybe get decent coverage" versus "$30K and results in week one," the decision makes itself.

I'll take real unit economics over a chatbot demo any day.


What I'd tell a founder choosing right now

Pick the problem with the biggest gap between "everyone has it" and "nobody wants to solve it."

AI agent startups are fighting over the same territory. Every week there's a new "AI agent for sales outreach" or "AI agent for customer support." They're all using the same models, the same APIs, the same architecture. And they're all one OpenAI product launch away from irrelevance.

Meanwhile, there are thousands of engineering teams shipping code without proper E2E testing. They know it's a problem. They've just accepted the risk because the alternatives (hire a QA team, learn Playwright, maintain a test suite) all sound like more work than the bugs themselves.

That's the gap. That's where you build.

If your product makes someone's Tuesday less stressful, you have a business. If it makes a good demo at a conference, you have a pitch deck. I've seen enough pitch decks die to know which one I'd rather have.

Testing is boring. Bug0 is boring. I'm fine with that.

I'd rather have 200 engineering teams relying on us every sprint than a million Twitter impressions. One of those pays rent.

R

Well very nice article I have to say 🎖️

M

Really interesting perspective. The "boring" approach to AI — finding specific, repeatable use cases rather than chasing moonshots — is exactly what creates durable business value. We cover a lot of AI tools at TechSifted and the ones with staying power are almost always the focused niche specialists.