The Appropriate Level of Complexity

There’s a genre of LinkedIn post that goes something like this: “I spent three months building micro-services for a product with zero users. Start simple. Scale when it hurts.” It gets thousands of likes. It’s not wrong. But it’s not the whole story either: and the part it leaves out is where most real damage happens.

Over-engineering is a well-known sin. We all recognise the team that introduces Kafka because someone read about it on the Uber blog, or reaches for CQRS when a CRUD app would do. Resume-driven development — choosing a technology because it looks good on your CV rather than because the problem demands it — is a real phenomenon. It doesn’t need a study to prove it. It ships production systems.

But the advice that follows — “just use a monolith”, “start simple”, “scale when it hurts” — can be almost as problematic as the over-engineered garbage on the other end. Not because simplicity is wrong, but because simplicity without architecture is just naivety with good branding.

Simple is not simplistic

Dan North has the best framing I’ve found for this. He calls it the “Best Simple System for Now” — the simplest system that meets the needs of the product right now, written to an appropriate standard. Not the simplest thing you can hack together. Not the system that anticipates every future requirement. The system that fits now, built well enough that you can change it later without hating yourself.

Kent Beck’s four rules of simple design say the same thing from a different angle. In his original priority order: passes the tests, reveals intention, no duplication, fewest elements. Notice the ordering. “Fewest elements” is last. Developers obsess over minimising moving parts when they should be obsessing over clarity and correctness. More small, well-named classes that reveal intent is simpler — in every way that matters — than fewer large classes that hide it.

Kevlin Henney puts it sharply: “Simplicity before generality, use before reuse.” Too often, generalisation becomes a work item in itself — adding to complexity rather than reducing it, running up technical debt rather than repaying it. Real simplicity requires understanding what to throw away, not just what to add.

Survivorship bias dressed as advice

The Instagram example is everywhere. “They scaled to fourteen million users on a Django monolith!” True — and misleading. Instagram’s server is a monolith: several million lines of Python, a few thousand Django endpoints, hundreds of engineers shipping hundreds of commits a day. But it runs on Cinder, Meta’s custom CPython fork with a JIT compiler and significant performance optimisations. Video encoding, machine learning, and other compute-intensive workloads are offloaded to specialised C++ and Rust services. Meta shipped Threads on this same monolith in five months — not because monoliths are simple, but because this one has had over a decade of extraordinary engineering investment behind it. The simple surface hides enormous complexity underneath.

GitHub is the same story. Nearly two million lines of Rails, over a thousand engineers, a billion API calls a day — and they maintain their own fork of Ruby to make it all work, deploying up to twenty times daily. Stack Overflow ran its entire Q&A platform on nine on-premise servers handling six thousand requests per second — until 2025, when they migrated to Google Cloud and Kubernetes, not because the monolith failed but because the data centre contract ended. These aren’t examples of simplicity winning. They’re examples of extraordinary engineering effort applied to a monolithic architecture. Drawing the lesson “start with a monolith and you’ll be fine” from these cases is like watching a Formula 1 driver take a corner at 200 km/h and concluding that corners are easy.

The teams that built those systems didn’t stumble into success by keeping things simple. They made deliberate architectural decisions — modularity, clear boundaries, performance investment — that let them evolve without rewriting.

The cost of starting too simple

Airbnb started with a Rails monolith they called Monorail. Appropriate for the early stage — no argument there. But without modular design principles, the codebase became tightly coupled as the team grew from 200 engineers to over a thousand. Deploys escalated from minutes to an entire day. Engineers lost an average of fifteen hours a week to reverts and rollbacks. Jessica Tai’s QCon talks on the migration are worth watching — the pain was real, and the fix wasn’t cheap. The lack of initial architectural thinking meant that scaling required not evolution but a ground-up rewrite into service-oriented architecture, eventually landing on a hybrid of micro- and macro-services fronted by GraphQL.

The lesson isn’t that they should have started with micro-services. It’s that they should have started with a monolith that was designed to evolve. There’s a difference between a monolith with clear module boundaries and a monolith that’s a ball of mud with a single deploy target.

Gregor Hohpe put it perfectly: “Excessive complexity is nature’s punishment for organisations that are unable to make decisions.” Perhaps a bit presumptuously, I’d add a corollary: excessive simplicity is nature’s punishment for organisations that refuse to make architectural decisions at all. Both are forms of avoidance.

Right-sizing in practice

I recently designed the data layer for a regulated reporting system. The requirements were real-time change data capture from operational databases, event streaming, and analytical queries across millions of trading records — with full auditability.

The over-engineered part of the spectrum (and, to be honest, my initial architectural design) was a data lakehouse: a streaming ingestion layer, Apache Iceberg for storage, Trino for federated queries, always-on Spark clusters for transformations, and a separate serving layer on top. Beautiful architecture diagram. Six-figure annual infrastructure cost. Months of implementation before the first query could run.

We built something different. CDC feeds into Kafka. Kafka feeds into a columnar OLAP database with two layers: raw events for auditability, materialised views for fast analytical queries. Every component scales horizontally. The infrastructure costs a fraction of the lakehouse proposal — roughly a hundred times less.

The critical decision wasn’t “keep it simple.” It was “design it to evolve.” The architecture handles current scale comfortably. If we ever need federated queries across heterogeneous sources, we can add Trino without rewriting. If we need time-travel queries on petabytes of historical data, Iceberg slots in without disrupting the existing pipeline. None of that is built today — because none of it is needed today. But the architecture doesn’t fight it, either. That’s the difference between simplicity and the right complexity.

Evolutionary architecture

Neal Ford, Rebecca Parsons, and Patrick Kua introduced the concept of evolutionary architecture — an architecture that supports guided, incremental change across multiple dimensions. The key word is “guided.” You’re not predicting the future. You’re building in the ability to respond to it.

Their mechanism is fitness functions: automated checks that protect what matters — latency, modularity, security, deployability — as the codebase changes. I’ve found these work best when they’re few and brutal. Three or four that actually fail builds, not twenty that everyone ignores. The moment a fitness function becomes advisory instead of mandatory, it stops being architecture and starts being a suggestion.

Fowler’s YAGNI principle complements this — don’t build capabilities for presumptive features. But most people miss the caveat: YAGNI does not apply to effort that makes the software easier to modify. Refactoring, clean boundaries, self-testing code — these aren’t speculative features. They’re the scaffolding that makes evolution possible. YAGNI is not a justification for neglecting the health of your codebase.

Sam Newman and Simon Brown both advocate starting with a modular monolith — and Newman’s Monolith to Microservices makes the case clearly. Not because micro-services are bad, but because premature decomposition is expensive — especially when you’re still discovering your domain. Almost every successful micro-service story started with a monolith that got too big and was broken up. Almost every system built as micro-services from scratch ended up in serious trouble.

The appropriate level of complexity

The truth you see after years of designing systems is that it depends. I know — unsatisfying. But “it depends” isn’t a cop-out if you can articulate what it depends on.

Conway’s Law is the first dependency most teams ignore. Your architecture will mirror your communication structure whether you plan for it or not. A five-person team building micro-services will produce five tightly coupled services that are micro-services in name only. A fifty-person organisation with clear domain boundaries might genuinely benefit from service decomposition — because the org chart already reflects it. The architecture that works is the one your team can actually operate.

Some teams can handle a few coarse-grained services from day one — they have the operational maturity, the observability tooling, and the deployment pipeline to make it work. Some teams are better served by a modular monolith with clean boundaries and a single deploy target. A few — very few — can run a full micro-services architecture from the start, but only because they’ve built the platform capabilities to support it. Distributed systems are genuinely hard. The complexity isn’t in the code; it’s in the failure modes, the latency, the partial failures, the error recovery, the eventual consistency that Pat Helland warns is a term so vague it means something different to everyone who uses it.

And none of it works without observability. You can’t evolve what you can’t see. The team that ships a modular monolith with structured logging, distributed tracing, and real-time alerting will outperform the team that ships micro-services with console.log — every time, without exception.

Starting simple is better than trying to build at Netflix scale on day one. Nobody disputes that. But starting at the correct complexity — and creating an architecture that can evolve — is what the best teams actually try to do. The difference is the word “evolve.” A system that’s simple but rigid is no better than a system that’s complex but brittle. Both break when they interact with reality.

The goal was never simplicity. It was the right complexity, at the right time, with the ability to change when the time comes.