Distributed vs Single-Node Systems: The Real Costs of Microservices, Network Hops, and Compliance - AlexWebLab in Bangkok, Thailand now, before in Hong Kong 香港

Distributed systems are often presented as the natural next step once software gets serious.

That is half true.

Some workloads are inherently distributed. Some products need global reach, fault tolerance, elasticity, or scale beyond a single machine. But many teams also distribute systems earlier than necessary, and then spend years paying a complexity tax they barely needed.

This chapter makes an important point that is easy to lose in architecture discourse: distribution is not a badge of maturity. It is a trade-off.

A Single Machine Is Better Than Many Teams Admit

Before talking about why systems become distributed, it is worth stating the counterpoint clearly: a single machine can do a lot.

Modern CPUs, memory, disks, and single-node databases are more capable than a lot of architecture diagrams imply. Many workloads that people instinctively label "distributed" can run perfectly well on one machine for far longer than expected.

That matters because a single-node system buys you several forms of simplicity at once:

no network hops between internal components,
fewer failure modes,
easier debugging,
stronger local consistency,
lower coordination overhead,
simpler deployment and operations.

If one machine can do the job, that is not primitive. It is efficient.

Why Teams Distribute Systems Anyway

There are still many legitimate reasons to go distributed.

Examples:

multiple users interact from different devices and regions,
services need to call other services over a network,
one machine cannot handle the required scale,
redundancy is needed for availability,
workloads are elastic and capacity needs change quickly,
different parts of the system benefit from different hardware,
data residency or compliance rules require location-aware placement.

Those are real drivers. The mistake is not using distributed systems. The mistake is pretending those advantages arrive without new costs.

Every Network Hop Adds Ambiguity

Inside a single process, a function call either returns or throws in a relatively narrow environment.

Across a network, many more things can happen:

the request never arrives,
the response never comes back,
the downstream service is overloaded,
the dependency succeeds after the caller times out,
the network is slow enough that the result becomes useless,
a retry risks repeating an action that already happened.

That ambiguity is one of the defining costs of distributed systems.

It is why idempotency matters, why timeouts matter, why partial failure becomes a first-class design problem, and why frontend teams end up dealing with confusing states like "did it actually save?"

Latency Is Not a Footnote. It Is a Design Force.

A call over the network is drastically slower than work happening in the same process or on the same machine. Add several service hops together and latency compounds quickly.

This is why a system that looks clean on an architecture diagram can feel sluggish in a browser.

Browser
  -> API gateway
  -> auth service
  -> profile service
  -> recommendation service
  -> database

Each hop might be justifiable. Together, they spend the user's patience budget.

That is also why not every decomposition is progress. If splitting one system into five creates more coordination, more retries, and more tail latency than it removes, the architecture may be getting more elegant on paper while becoming worse in use.

Distributed Systems Are Hard to Observe

When a single-node system is slow, the investigation surface is relatively narrow.

When a distributed system is slow, the problem might live in:

one overloaded service,
one bad downstream dependency,
a queue backlog,
retry storms,
a network bottleneck,
a misconfigured timeout,
a slow database on one hop,
an API version mismatch between services.

This is why observability is not optional in distributed systems. Metrics alone are rarely enough. You also need logs, traces, request correlation, and enough instrumentation to understand how one user action traveled across the system.

Without that, "the system is slow" turns into a blame game instead of a diagnosis.

Consistency Gets Harder the Moment Data Lives in Many Places

A single database can still be hard, but data spread across services introduces a different class of problem.

Each service can maintain local correctness while the overall product still feels inconsistent.

Examples:

a payment succeeds but an email service fails,
a profile update hits one service before a search index catches up,
a recommendation model is trained on lagging data,
a retry duplicates work because the caller cannot tell what completed.

That is why a distributed system often needs explicit strategies for:

retries,
idempotency,
asynchronous workflows,
eventual consistency,
explicit reconciliation.

These are not incidental details. They are the price of decomposition.

Microservices Help by Making Coordination Harder

Microservices are attractive for good reasons.

They can:

let teams deploy independently,
isolate domains,
allow different services to scale separately,
let each service choose technology that fits its job,
reduce the blast radius of some changes.

But they buy those advantages by moving coordination into APIs, networks, versioning, observability, deployment infrastructure, and inter-service data flows.

That is why microservices are not a free upgrade from a monolith. They are a trade where organizational flexibility is often purchased with technical complexity.

For small teams, this trade is frequently bad. For larger organizations with real domain boundaries and coordination pain, it can be worth it.

The right question is not "are microservices modern?" It is "what coordination problem are we solving that justifies this much distributed overhead?"

Serverless Changes the Shape of the Trade

Serverless infrastructure pushes the outsourcing model even further.

Instead of managing long-running services directly, you pay for execution on demand. That can be very attractive for bursty or intermittent workloads.

But the trade-offs are also familiar in a new form:

runtime limits,
cold starts,
constrained execution environments,
tighter provider coupling,
more hidden infrastructure behavior.

Serverless does not make distributed systems simple. It changes which parts of the system you control and which parts are delegated to the provider.

Compliance and Privacy Also Shape Architecture

One of the most valuable sections in this chapter is the reminder that architecture is not determined only by performance and scale.

Law, privacy, and social impact matter too.

Examples:

data residency requirements may force region-specific storage,
privacy laws affect retention and deletion rules,
immutable logs become complicated when users have deletion rights,
derived datasets create questions about how far deletion must propagate,
storing sensitive data at all may be the wrong choice if the risk is too high.

This is where architecture becomes more honest.

The question is not only "can we store this?" It is also "should we?" and "what obligations do we take on if we do?"

That is not a policy sidebar. It changes system design at the storage, pipeline, retention, and deletion levels.

Data Minimization Is an Architectural Principle

A lot of teams default to collecting and storing everything because storage looks cheap.

That is a narrow accounting model.

The real cost of retained data includes:

breach impact,
regulatory exposure,
deletion complexity,
compliance overhead,
user harm if sensitive data is abused or disclosed.

Sometimes the best architecture decision is not a more sophisticated storage pattern. It is deciding not to store the data in the first place.

That is a strong counterweight to the instinct that more data is always better.

A Better Distribution Heuristic

Before distributing a system, ask:

Which concrete limit of a single machine are we hitting?
Which user-facing problem requires this extra complexity?
How will we debug failures across service boundaries?
What consistency behavior are we willing to tolerate?
What privacy, compliance, or residency constraints shape placement and retention?
Are we solving a technical bottleneck, or copying an architecture pattern by habit?

If those questions do not have crisp answers, the system probably is not ready to be more distributed yet.

Conclusion

Distributed systems can unlock scale, resilience, flexibility, and regional reach. They can also introduce retries, uncertainty, latency, observability pain, consistency challenges, and coordination overhead that a single-node design largely avoids.

Microservices and serverless are not inherently better. They are tools for specific pressures. And architecture is not only shaped by throughput and availability, but also by privacy law, user safety, and the consequences of storing data you may later wish you had never collected.

The right architecture is the one whose trade-offs you can actually defend.

a page that sometimes loads quickly and sometimes feels broken.

That variability is often the shape of a distributed request path leaking into product behavior.

Observability Becomes a Requirement, Not a Luxury

Troubleshooting a distributed system is much harder because the truth is fragmented across components.

If the system is slow, you now have to ask:

which service introduced the delay?
which dependency retried?
which queue backed up?
which region was affected?
which client saw the failure first?

That is why tracing, metrics, structured logs, and correlation IDs matter so much in distributed environments. Without them, the system becomes a mystery made of guesses.

This is also why microservice enthusiasm without observability discipline is reckless. Decomposition multiplies the number of places where the truth can hide.

Microservices Solve Some Problems by Creating Others

Microservices can be excellent when teams genuinely need independent deployability, isolated ownership, or specialized scaling characteristics.

But the cost side is real:

more network calls,
more APIs to version,
more infrastructure to operate,
more health checks and alerts,
more opportunities for partial failure,
more coordination when behavior spans services.

This does not mean microservices are bad. It means they are a technical response to organizational and scaling needs, not a universal sign of maturity.

If the team is small and the product boundaries are still fluid, a simpler architecture is often the better design.

Serverless Changes the Packaging, Not the Tradeoff

Serverless can make distribution feel deceptively cheap because provisioning is abstracted away and billing follows execution.

That is useful, but it does not erase tradeoffs. You still inherit:

network boundaries,
cold-start behavior,
service limits,
dependency sprawl,
debugging complexity,
contract management between components.

In many cases, serverless shifts the cost model more than it changes the underlying systems reality.

That is worth remembering before treating it as architecture magic.

Consistency Gets Harder the Moment State Spans Boundaries

Distributed systems are not only about communication; they are about state spread across components.

Once different services own different databases, consistency becomes an application concern rather than a local storage concern.

That is where teams start encountering questions like:

what is the source of truth?
when are retries safe?
what happens when one step succeeds and the next fails?
how stale is acceptable on the read path?
how do we reconcile diverging views of reality?

Those are not details. They are the architecture.

Distribution Is Also Shaped by Non-Technical Constraints

One of the strongest points in this chapter is that architecture is not governed only by performance and scalability.

Legal and social constraints matter too.

Examples:

data residency laws may require certain user data to stay in specific jurisdictions,
compliance obligations may restrict who can access or move data,
privacy expectations may make broad replication or retention inappropriate,
societal harm can follow from storing data whose risk exceeds its value.

These constraints can influence where systems run, how data is partitioned, what must be deletable, and which architectural shortcuts are unacceptable.

That is not peripheral policy work. It is part of system design.

The Cheapest Data to Protect Is the Data You Never Stored

This is where the law-and-society section becomes especially important.

A lot of architecture decisions are justified by speculative future usefulness: keep more logs, store more events, retain more identifiers, copy data everywhere just in case it helps later.

But stored data has a carrying cost beyond storage bills.

It creates:

privacy risk,
breach impact,
deletion complexity,
regulatory burden,
reputational damage,
new failure modes for users, not just systems.

This is why data minimization is an engineering principle as much as a legal principle. The easiest compliance burden is often the one you designed away before it became a retention problem.

A Better Default: Prove the Need for Distribution

The disciplined question is not, "how do we make this distributed?"

It is:

Which concrete limitation of a single-node design are we hitting?
Which benefit of distribution do we need right now?
What new failure modes are we willing to accept in return?
How will we observe, debug, and govern the resulting system?
Does the added architecture still respect user rights, deletion needs, and data-safety obligations?

If you cannot answer those questions clearly, the move to distribution is probably still too early or too fuzzy.

Conclusion

Distributed systems are valuable because they let us serve larger, more resilient, more geographically broad, and more specialized workloads. They are dangerous because they transform clear local behavior into a system of partial failures, network ambiguity, consistency tradeoffs, and harder debugging.

The right comparison is not "distributed is modern, single-node is naive." The right comparison is whether the benefits of distribution outweigh the very real costs for this workload, this team, and these users. And that answer has to include not just latency and scale, but also privacy, compliance, and the human consequences of the data you decide to move and keep.