Table of Contents
Open Table of Contents
Introduction
Imagine you have $100 in your bank account. One morning, you’re running late for work and stop at an ATM to withdraw $80. The machine dispenses the cash, you put it in your wallet, and continue with your day.
A few hours later, you check your banking app and notice something alarming:
balance: -$60
How is that possible? 😨
You only had $100 and withdrew $80. At first glance, this sounds like a bug. And it is. But more importantly, it reveals one of the most fundamental concepts in software engineering: invariants.
An invariant is a condition that must always remain true, regardless of how many operations are performed on the system.
In the case of a bank account, one obvious invariant is:
Account balance >= 0
A customer should never be able to withdraw more money than is available in their account.
Simple enough.
Now imagine two ATMs processing withdrawals at almost exactly the same time. Each ATM checks the balance and sees $100 available. Both approve a withdrawal of $80.
The result?
sequenceDiagram
participant ATM_A as ATM A
participant ATM_B as ATM B
participant Account
Note over Account: Balance = $100
par Concurrent Withdrawals
ATM_A->>Account: Read balance ($100)
ATM_A->>Account: Withdraw $80
and
ATM_B->>Account: Read balance ($100)
ATM_B->>Account: Withdraw $80
end
Note over Account: Final balance = -$60
The invariant has been violated. The interesting part is that neither ATM was individually wrong. Each one made a decision based on information that was correct at the time it was read.
This is where distributed systems become difficult. The challenge is not simply moving data between computers. The challenge is ensuring that important invariants continue to hold, even when multiple users, processes, servers, or data centers are operating concurrently.
A Simple Example in Go
Let’s model our bank account in Go. We’ll start with a simple rule:
type Account struct {
balance int
}
Remember the invariant we introduced earlier:
balance >= 0
In other words, an account should never have a negative balance. A straightforward implementation of a withdrawal operation might look like this:
func (a *Account) Withdraw(amount int) error {
if a.balance < amount {
return fmt.Errorf("insufficient funds")
}
a.balance -= amount
return nil
}
At first glance, this seems perfectly reasonable. Before subtracting money from the account, we verify that enough funds are available. If the balance is too low, we reject the withdrawal.
Let’s see it in action:
func main() {
account := Account{balance: 100}
err := account.Withdraw(80)
if err != nil {
panic(err)
}
fmt.Println(account.balance)
}
Output:
20
Everything looks correct. The invariant still holds:
20 >= 0
So where does the problem come from?
The answer is that this code assumes only one withdrawal is happening at a time.
In the real world, multiple requests can arrive simultaneously. Two ATMs, two mobile devices, or two backend servers may attempt to withdraw money from the same account at nearly the same moment.
And that’s when things start to get interesting.
When Two Withdrawals Happen at the Same Time
The implementation from the previous section appears to protect our invariant.
Before withdrawing money, we check that the account has sufficient funds available. However, there’s a subtle assumption hidden in the code:
Only one withdrawal can happen at a time.
In a real system, that assumption rarely holds.
Imagine two ATMs processing withdrawals for the same account at nearly the same moment. Both requests arrive when the account balance is still $100.
Let’s simulate that scenario:
package main
import (
"fmt"
"sync"
"time"
)
type Account struct {
balance int
}
func (a *Account) Withdraw(amount int) error {
if a.balance < amount {
return fmt.Errorf("insufficient funds")
}
// Simulate some processing delay
time.Sleep(100 * time.Millisecond)
a.balance -= amount
return nil
}
func main() {
account := Account{balance: 100}
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
account.Withdraw(80)
}()
go func() {
defer wg.Done()
account.Withdraw(80)
}()
wg.Wait()
fmt.Println("Final balance:", account.balance)
}
At first glance, it may seem impossible for this code to produce a negative balance.
After all, every withdrawal checks whether sufficient funds are available before proceeding.
The problem is that both goroutines can read the balance before either one updates it.
A possible execution might look like this:
sequenceDiagram
participant A as Goroutine A
participant B as Goroutine B
participant Account
Note over Account: Balance = 100
par Concurrent Read + Validation
A->>Account: Read balance
Account-->>A: 100
Note over A: Approve withdrawal
and
B->>Account: Read balance
Account-->>B: 100
Note over B: Approve withdrawal
end
A->>Account: balance = balance - 80
Note over Account: Balance = 20
B->>Account: balance = balance - 80
Note over Account: Balance = -60
Neither withdrawal was individually incorrect.
Each goroutine observed a balance of $100 and made a decision based on information that was valid at the time it was read.
The issue is that the validation step and the update step were not performed atomically. As a result, our invariant has been violated:
balance >= 0
It is no longer true.
This example illustrates an important idea:
The hardest part of building reliable systems is often not performing the operation itself, but ensuring that critical invariants continue to hold when multiple operations occur concurrently.
And if preserving an invariant becomes challenging with just two goroutines running in the same process, imagine how much harder it becomes when those operations are happening across multiple servers, regions, or data centers!
What Mechanisms Do Real Systems Use to Preserve invariants?
Preserving the Invariant
The problem in our previous example wasn’t the withdrawal itself.
The problem was that two goroutines could execute the validation and update steps concurrently.
To preserve the invariant, we need to ensure that only one goroutine can perform a withdrawal at a time.
One way to achieve this in Go is by using a mutex:
type Account struct {
balance int
mu sync.Mutex
}
func (a *Account) Withdraw(amount int) error {
a.mu.Lock()
defer a.mu.Unlock()
if a.balance < amount {
return fmt.Errorf("insufficient funds")
}
a.balance -= amount
return nil
}
Now the withdrawal operation becomes atomic from the perspective of other goroutines.
While one goroutine is executing the withdrawal, every other goroutine must wait for the lock to be released.
Let’s revisit our invariant:
balance >= 0
With the mutex in place, both withdrawals can no longer validate the balance simultaneously. A possible execution now looks like this:
sequenceDiagram
participant A as Goroutine A
participant B as Goroutine B
participant Account
Note over Account: Balance = 100
A->>Account: Lock()
B->>Account: Lock() (blocks)
A->>Account: Validate balance = 100
A->>Account: Withdraw 80
Note over Account: Balance = 20
A->>Account: Unlock()
B->>Account: Lock() succeeds
B->>Account: Validate balance = 20
Note over B: Return "insufficient funds"
B->>Account: Unlock()
The invariant is preserved.
This illustrates an important principle:
Correctness is often achieved by protecting the operations that enforce an invariant.
In our example, a mutex is enough because everything happens inside a single process. But what happens when the same account can be modified by multiple application servers?
A mutex cannot protect data that lives on another machine. This is where databases, transactions, distributed locks, and consensus algorithms enter the picture.
They are all different mechanisms designed to solve the same fundamental problem: preserving invariants in the presence of concurrency.
Conclusion
At first glance, an invariant may seem like nothing more than a business rule. In our example, the rule was simple:
balance >= 0
However, as soon as multiple operations can occur concurrently, preserving that rule becomes much more challenging than it appears.
Execute the complete example in this playground