# Atomic Transaction Posting in Go: Getting Balance Invariants Right Under Concurrency

Imagine an account with 100 taka in it, and two withdrawals of 100 taka arriving at the exact same instant. Both read the balance, both see 100, both say "yep, enough money," and both go through. The account is now at minus 100. Nobody wrote a bug, exactly. Each withdrawal was correct on its own. They were just allowed to happen at the same time, and the system let them both believe they were alone.

That is the whole problem of week 4. Last week I put the ledger on disk and proved a single transaction posts correctly. This week the question is harder: what happens when a hundred of them land on the same accounts at once, and how do I make absolutely sure the books still balance? This is post 4 of 12.

## The thing that must never happen

The ledger's one rule, from day one, is that every transaction's postings sum to zero. Money moves, it never appears or vanishes. Under concurrency the failure mode is subtle. It is not that one transaction is wrong. It is that two correct transactions, interleaved, produce a state neither of them intended.

Here is my dinner-debt example from earlier in the series, now with a twist. Say two different transactions both touch my wallet at the same moment: I pay back a friend 600 taka, and at the same time a refund of 200 lands.

```plaintext
   Transaction A: pay friend   Transaction B: refund arrives
   sohag:wallet  -600           sohag:wallet  +200
   friend:wallet +600           shop:wallet   -200

   Both read sohag:wallet's history at the same instant.
   Both append their postings based on what they saw.
   If the database lets them interleave wrong...
   ...the final balance can reflect one, but not the other.
```

Both transactions are individually balanced. The danger is in the interleaving. So the real work this week was not writing the posting logic. It was choosing how the database serializes conflicting transactions, and then proving it holds.

## Two ways to be safe

There are two classic answers, and they are opposites.

The **pessimistic** one: lock the rows. Before touching an account, grab an exclusive lock on it (`SELECT ... FOR UPDATE`), do the work, release. Anyone else who wants that account waits their turn. It works, but correctness now depends on me remembering to lock the right rows, in the right order, on every single code path, forever. The day someone adds a new way to write a posting and forgets the lock, the race is back, silently.

The **optimistic** one: don't lock, just let the database watch. Run every transaction at `SERIALIZABLE` isolation, the strictest level Postgres offers. Postgres tracks the reads and writes of concurrent transactions and, if committing them all would produce a result that no serial (one-at-a-time) ordering could, it aborts one of them with a specific error. Nothing blocks. Conflicts surface as a retry-this error.

I went optimistic. The reason is in the failure mode: with locking, a forgotten lock fails silently and corrupts money. With SERIALIZABLE, the database refuses to let an unsafe interleaving commit, no matter what the application code does or forgets. Correctness stops being something I have to remember and becomes something the database guarantees. I wrote this up in [ADR-004](https://github.com/sohag-pro/go-ledger/blob/main/docs/adr/004-concurrency-control.md).

The catch is that "abort with a retry-this error" means I have to actually retry.

## Retrying without making it worse

When Postgres aborts a transaction for a serialization conflict, it returns SQLSTATE 40001. The fix is to roll back and run the whole thing again, because the second time around the conflicting transaction has usually finished and yours can now commit cleanly.

One subtlety bit me immediately: the conflict often does not show up when you run your statements. It shows up at `COMMIT`. So the retry logic has to watch both the work and the commit:

```plaintext
  attempt:
    BEGIN SERIALIZABLE
    insert transaction + postings   --> maybe 40001 here
    COMMIT                          --> or maybe 40001 here
    if 40001 anywhere: roll back, wait a bit, try again
```

The "wait a bit" matters more than it looks. If a hundred transactions all conflict and all retry after exactly the same delay, they just collide again, in lockstep, forever. So the backoff is exponential with **full jitter**: each retry waits a random amount up to a growing cap. The randomness is the point. It scatters the retriers so they stop landing on top of each other. Bounded retries, and if they ever run out, the caller gets a typed "conflict, try again" error that the API will map to a 503, not a confusing 500.

## Belt and suspenders: enforce it in the database too

The domain already checks the sum-to-zero rule in Go before anything touches the database. But I wanted the database itself to refuse an unbalanced transaction, so that a bad migration, a direct `psql` session, or some future second service physically cannot write one.

The trouble is that the rule spans many rows. A normal `CHECK` constraint only sees one row at a time, and a single posting of plus 600 is not "wrong," it is half of a pair. The sum only makes sense once all the postings are in.

The answer is a **deferred constraint trigger**. It is queued up and fires once, at `COMMIT`, after every posting is written, and it checks that each transaction's postings sum to zero:

```plaintext
  BEGIN
    insert posting +600   (trigger queued, not checked yet)
    insert posting -600   (trigger queued, not checked yet)
  COMMIT
    --> now the trigger runs: SUM(postings) for this txn = 0 ? yes -> commit
                                                               no  -> abort
```

Now the invariant is guaranteed in two independent places: the Go domain catches it early with a clean error, and the database guarantees it absolutely. There is a test that bypasses all my Go code, inserts a single lopsided posting with raw SQL, and confirms the commit is rejected.

## The afternoon one in six payments failed

Then I wrote the stress test the week was really about: 100 goroutines, 10,000 balanced transactions, all hammering a shared pool of accounts. The bar was zero invariant violations.

The first run was ugly. Roughly **one in six posts failed**, all with the same serialization error, even after retries. Nearly 2,000 transactions gave up. My first instinct was that my retry logic was weak, so I improved the backoff. It got better, not fixed. Still over a hundred failures, and the run took a full minute.

The real cause was somewhere I never would have guessed: a missing index.

My balance trigger reads postings filtered by their transaction id. There was no index on that column, so Postgres did a sequential scan to find them. And here is the part that is easy to miss: under `SERIALIZABLE`, a scan does not just read rows, it takes a broad **predicate lock** over the range it scanned. Two transactions writing completely unrelated postings would both scan, both lock wide swaths of the table, and Postgres would see overlapping read/write footprints and abort one of them as a conflict. The transactions were not actually fighting over the same data. The unindexed read was inventing the conflict.

I added one index on `postings(transaction_id)`. The failures dropped to **zero**, and the run went from about 62 seconds to 5.

```plaintext
  before: trigger reads postings by transaction_id
          no index -> sequential scan -> wide predicate lock
          -> phantom serialization conflicts -> ~1 in 6 posts fail

  after:  same read, now an index lookup -> narrow lock
          -> real conflicts only -> 10,000 / 10,000 commit, zero failures
```

That is the lesson I will keep from this week. Under SERIALIZABLE, an unindexed read is not merely slow. It actively manufactures conflicts that have nothing to do with your actual data. Indexes are not a performance nicety here, they are load-bearing for correctness-under-load.

One honest footnote on speed: my local numbers were p50 around 35ms, p99 around 68ms, which is above the sub-10ms target I had in mind. The cause is that I run Postgres in a Linux VM on my Mac, and every commit waits for the disk to fsync the write-ahead log through that virtualization layer. I am reporting the real numbers rather than turning off durability to get a prettier graph. For a ledger, the durable write is the entire point.

## What a fresh pair of eyes caught

Before merging, I had my AI pair do a senior-level review of everything through week 4, specifically from a fintech angle. It caught a genuinely good one. Currency lived on accounts and on transactions, but nothing in the database tied them together, so a USD transaction could technically post into a EUR account. Invisible today because everything is USD, but exactly the kind of gap that becomes a real bug the day multi-currency arrives. That became a second small trigger and its own short decision record. The review also pushed the metrics endpoint off the public port (it leaks volumes and latencies) and turned a few raw database errors into typed ones the API can map cleanly.

That is the deal I keep describing in this series: the AI accelerates the work and pressure-tests my thinking, and I own every decision and can explain why each line exists. The currency catch is a good example. It suggested the gap, I decided how to close it and wrote down why.

## Where this leaves us

The ledger now posts transactions atomically, stays correct under heavy concurrency, enforces its core invariant in two independent layers, and has the metrics to watch contention. The stress test runs 10,000 concurrent posts and the books balance to the cent, every time.

What is still missing is a way to actually call any of this over the network. Everything so far is exercised by tests. Next week is the REST API: chi, real endpoints to create accounts and post transactions, proper validation, and RFC 7807 error responses, all sitting on top of the service this week built. That is when the ledger stops being a test suite and starts being something you can curl.

The repo is open source at [github.com/sohag-pro/go-ledger](https://github.com/sohag-pro/go-ledger). Follow along, or go read your own database's docs on predicate locks before they teach you the hard way.
