Files
tarun-elango 26810e43d0 sd text
2026-04-26 13:27:19 -04:00

63 KiB

Financial Systems

Financial systems are where backend engineering stops being forgiving. A normal CRUD bug might show the wrong profile picture or duplicate a notification. A financial bug can double-charge a customer, underpay a merchant, misstate revenue, break compliance rules, or create an accounting discrepancy that surfaces weeks later.

That is why payment, billing, invoicing, and ledger design show up so often in system design interviews. These systems force you to think about correctness, retries, immutability, eventual consistency, external dependencies, and operational recovery all at once.

This guide is designed for two goals:

  1. Help you explain financial systems clearly in backend and system design interviews.
  2. Help you build a realistic mental model of how production financial systems are actually built.

Examples in this guide are generalized from public industry patterns used by systems like Stripe, PayPal, Amazon, Uber, Netflix, Shopify, GitHub, and typical SaaS subscription platforms.

1. Big Picture: Why Financial Systems Are Different

At a high level, a financial system answers a deceptively simple question:

"Who owes whom how much money, for what reason, in what currency, at what time, and how certain are we?"

That question is harder than it looks because the answer changes over time and is often influenced by external systems you do not control.

In a normal application, the request-response cycle often feels authoritative. In financial systems, the HTTP response is usually only the beginning of the story.

1.1 Why These Systems Are Hard

Problem Why it exists What it forces you to design for
Money movement has real consequences A bug affects customer trust, legal exposure, and revenue correctness, auditability, strong operational controls
External systems are involved banks, card networks, processors, and payment providers can be delayed or inconsistent async workflows, reconciliation, retries
Requests are retried clients, load balancers, and workers will retry on timeouts idempotency, deduplication, safe replays
Truth is distributed your DB, the processor, and the bank may disagree temporarily state machines, eventual consistency, operational review
History matters you cannot casually mutate financial records without losing traceability immutable records, append-only ledgers, reversals
Regulations and contracts matter taxes, invoices, refunds, disputes, and payouts have legal rules compliance-aware data design
Failure is partial auth may succeed while your app times out, or webhook may arrive late ambiguity handling, polling, reconciliation

1.2 The Core Mental Model

A strong production design separates responsibilities instead of letting one table do everything.

  • Product systems decide what happened in the business domain.
  • Billing decides what should be charged.
  • Payments decide whether money collection succeeded.
  • Invoices define what the customer legally owes.
  • The ledger records financial truth as immutable movements.
  • Reconciliation confirms that internal truth matches external truth.
  • Audit logs explain who changed what and when.

If a company tries to collapse all of that into a single payments table with a status column, the system usually becomes fragile very quickly.

1.3 Non-Negotiable Design Rules

These are not optional nice-to-haves. They are survival rules.

  1. Represent money as integers in minor units, not floating point.
  2. Separate order state, payment state, invoice state, entitlement state, and ledger state.
  3. Assume every network call can be retried or duplicated.
  4. Assume every webhook can be delayed, reordered, or redelivered.
  5. Prefer append-only financial history over mutable rows.
  6. Build operational tools for refunds, disputes, and reconciliation from day one.
  7. Treat external provider reports as mandatory inputs, not optional debug data.

1.4 End-to-End Financial Architecture

flowchart LR
	PROD[Product Events\norders seats rides storage usage] --> BILL[Billing Engine]
	PLAN[Plan Catalog\npricing discounts tax rules] --> BILL
	BILL --> INV[Invoice Engine]
	INV --> PAY[Payment Orchestrator]
	PAY --> GW[Gateway / Processor / Provider]
	GW --> RAILS[Card Network / Bank Rails / Wallet Rails]
	GW --> WEBHOOKS[Webhook / Settlement Events]
	WEBHOOKS --> PAY
	PAY --> LEDGER[Ledger Posting Engine]
	INV --> LEDGER
	PAY --> ENT[Entitlement Service]
	BILL --> ENT
	LEDGER --> RECON[Reconciliation Engine]
	GW --> REPORTS[Provider Reports]
	RAILS --> BANK[Bank Statements / Settlement Files]
	REPORTS --> RECON
	BANK --> RECON
	PAY --> AUDIT[Audit Log]
	BILL --> AUDIT
	LEDGER --> ACCOUNTING[Accounting Export / Finance Warehouse]
	RECON --> OPS[Finance Ops / Support Review]

This diagram is important because it shows the most common interview mistake: assuming payment success automatically means all other financial systems are done. In reality, payment collection, invoice generation, ledger posting, entitlements, and reconciliation are separate concerns.

2. Payment System

2.1 What a Payment System Is

A payment system is the software and infrastructure that lets a customer transfer money to a merchant or platform. In practice, it is less about "charging a card" and more about coordinating many moving parts safely.

A production payment system often needs to:

  • accept a payment request
  • authenticate or verify the payment method
  • call external payment providers
  • handle redirects or additional customer action such as 3DS
  • track payment state transitions over time
  • prevent duplicates
  • post financial entries
  • support refunds and disputes
  • reconcile internal state with provider and bank reports

2.2 Why Payment Systems Are Difficult

Payment systems are difficult because they combine all of the hardest parts of distributed systems:

  • external dependencies you do not control
  • user-facing latency requirements
  • irreversible or expensive side effects
  • asynchronous confirmations
  • legal and audit requirements
  • fraud and abuse pressure

If an API that sends emails times out, you can usually retry safely and move on. If a payment API times out, you often do not know whether money was already authorized, captured, declined, or is still in flight.

That ambiguity is one of the defining properties of payment engineering.

2.3 Trust, Correctness, and Money Movement Requirements

Financial systems are built around trust:

  • customers must trust they will not be double charged
  • merchants must trust they will get paid correctly
  • finance teams must trust balances and reports
  • regulators and auditors must trust the records

That leads to three core requirements:

Requirement What it means in practice
Correctness no duplicate charges, balanced ledger entries, accurate invoice totals
Durability once a financial event is committed, it should not disappear
Traceability every state change should be explainable later

2.4 High-Level Payment Actors

The exact terminology varies across providers, but the standard card ecosystem usually includes these actors.

Actor Role Practical note
Customer the person or business paying may abandon, retry, dispute, or fail authentication
Merchant the business receiving payment often your company or your platform merchant
Payment gateway API layer that tokenizes or routes payment requests often confused with processor in interviews
Payment processor coordinates transaction processing with acquiring banks and networks Stripe, Adyen, Braintree, PayPal, etc. can abstract this
Acquiring bank bank that processes payments for the merchant receives funds on merchant side
Issuing bank bank that issued the customer's card decides approve, decline, fraud challenge
Card network Visa, Mastercard, AmEx, RuPay, etc. routes messages and settlement rules

Important interview nuance: in modern systems, one provider may abstract multiple roles. Stripe or Adyen may expose a unified API, but the underlying ecosystem still includes processors, acquirers, networks, and issuers.

2.5 High-Level Payment Flow

sequenceDiagram
	participant C as Customer
	participant UI as Merchant App
	participant M as Merchant Backend
	participant P as Payment Service
	participant G as Gateway/Processor
	participant A as Acquirer
	participant N as Card Network
	participant I as Issuer
	participant W as Webhooks

	C->>UI: Checkout and submit payment method
	UI->>M: Create order
	M->>P: Create payment intent / payment session
	P->>G: Authorize payment
	G->>A: Forward request
	A->>N: Route auth
	N->>I: Approve or decline
	I-->>N: Decision
	N-->>A: Decision
	A-->>G: Decision
	G-->>P: Initial auth result
	P-->>M: Payment pending / authorized / requires action
	M-->>UI: Show status to customer
	G-->>W: Async settlement/refund/dispute events
	W-->>P: Webhook delivery
	P->>M: Update order / trigger fulfillment / update ledger

2.6 Payment Lifecycle Overview

A payment usually goes through several phases rather than a single binary success/failure outcome.

  1. Payment is created.
  2. Payment method is attached or selected.
  3. Additional customer action may be required.
  4. Authorization may be attempted.
  5. The merchant may capture immediately or later.
  6. Funds clear and settle later.
  7. Refunds or disputes may happen days or weeks later.

That is why a payment system is almost always modeled as a state machine rather than a single insert into a charges table.

2.7 Online vs Offline Payment Flows

Online payments

Online payments involve real-time network communication at transaction time.

Examples:

  • card-not-present ecommerce checkout
  • UPI or instant bank payment
  • wallet checkout using an external provider

Characteristics:

  • synchronous API call initiates the payment
  • immediate or near-immediate response exists
  • still may need async confirmation later

Offline or deferred flows

Offline payments do not always consult the full network in real time, or the final financial confirmation happens later.

Examples:

  • transit systems with offline authorization logic
  • POS terminals in poor connectivity environments
  • invoice payments by bank transfer
  • cash on delivery or manual collections

Characteristics:

  • customer action and financial confirmation are decoupled
  • risk is shifted to later operational validation
  • reconciliation becomes more important

2.8 Synchronous vs Asynchronous Confirmation

This distinction is one of the most important concepts to explain in an interview.

Type Meaning Example
Synchronous confirmation request returns an immediate initial result card authorization approved or declined
Asynchronous confirmation final truth arrives later via webhook, polling, file, or settlement report ACH success later, refund completed later, dispute opened later

Real-world payment systems are often both. A card payment may synchronously return authorized, but the final fulfillment decision may still wait for provider webhooks, fraud checks, or successful capture.

2.9 Failure Handling and Retries

Payment failures are not all the same. Good systems classify them.

Failure type Example Correct response
User/business decline insufficient funds, expired card, stolen card suspicion do not blindly retry; ask for new method or user action
Technical transient error provider timeout, connection reset, temporary 5xx retry safely with idempotency
Ambiguous outcome request timed out after provider received it query status, use idempotency, reconcile later
Async failure auth succeeded, capture failed later state machine transition plus operational handling

Best practices:

  • separate retriable and non-retriable failures
  • never retry without an idempotency strategy
  • preserve provider reference IDs for later lookup
  • expose pending states to the business system instead of guessing

2.10 How Payment Systems Differ from Normal CRUD Systems

Normal CRUD system Payment system
request-response often feels authoritative response is often provisional
updates overwrite prior state financial history is usually append-only or versioned
duplicate requests may be harmless duplicate requests can create duplicate charges
a row can often be edited freely financial records may require reversal instead of mutation
internal DB is primary source of truth external providers and banks also matter
logs are mostly for debugging logs and audit trails are operationally and legally important

3. Payment Processing

Payment processing is the detailed mechanics of what happens after the customer initiates a payment.

3.1 Core Stages

Stage What it means Why it exists
Authorization reserve funds or confirm the payment method can be charged lets merchant validate ability to pay before finalizing
Capture convert an authorization into an actual charge useful when final amount or fulfillment timing is delayed
Clearing exchange transaction details between parties needed for network and bank processing
Settlement actual money movement between institutions this is when funds are truly transferred
Refund return some or all funds after charge needed for cancellations, service failures, policy decisions
Chargeback issuer/cardholder dispute reverses or challenges the charge consumer protection and fraud handling

3.2 Authorization vs Capture

This is a classic interview comparison.

Topic Authorization Capture
Purpose checks or reserves available funds actually charges the customer
Timing often happens at checkout may happen immediately or later
Business use case reserve payment before shipment or service completion take money when fulfillment is confirmed
Example Amazon authorizes before shipment Amazon captures when items ship
Failure impact order can remain unfulfilled revenue collection fails after customer thought checkout succeeded

Delayed capture is common in real systems:

  • Amazon captures when an item ships, not when the order is first placed.
  • Uber may pre-authorize an estimated ride amount and capture the final amount after trip completion.
  • Hotels often authorize a card at check-in and settle later.

3.3 Payment State Machines

A robust payment system uses explicit state transitions instead of ad hoc boolean flags.

stateDiagram-v2
	[*] --> RequiresPaymentMethod
	RequiresPaymentMethod --> RequiresConfirmation
	RequiresConfirmation --> RequiresAction
	RequiresConfirmation --> Processing
	RequiresAction --> Processing
	Processing --> Authorized
	Processing --> Failed
	Authorized --> Captured
	Authorized --> Expired
	Captured --> Settling
	Settling --> Succeeded
	Succeeded --> PartiallyRefunded
	PartiallyRefunded --> Refunded
	Succeeded --> Refunded
	Succeeded --> ChargebackOpen
	ChargebackOpen --> ChargebackWon
	ChargebackOpen --> ChargebackLost
	Failed --> [*]
	Refunded --> [*]
	ChargebackWon --> [*]
	ChargebackLost --> [*]

The specific states vary across providers, but the idea is consistent:

  • payments are long-lived workflows
  • transitions happen over time
  • asynchronous events may drive later transitions
  • invalid transitions should be rejected explicitly

3.4 Stripe-Like Payment Intent Model

Stripe popularized the idea that a payment should be represented as a durable intent object rather than a one-shot charge request.

The idea is powerful because it matches reality.

A payment intent usually stores:

  • merchant or account context
  • customer context
  • amount and currency
  • payment method information or token reference
  • current state
  • whether capture is automatic or manual
  • metadata for order correlation
  • provider reference IDs

Why this model exists:

  • the same payment may require multiple attempts
  • customer authentication may interrupt the flow
  • the client, server, and provider all need a shared durable object to coordinate around
  • webhooks can safely attach later state changes to that object

In practice, a payment intent model reduces the temptation to create multiple charges for the same checkout retry.

3.5 Partial Capture

Partial capture means the merchant authorized one amount but captures only part of it.

Examples:

  • an order with multiple items ships in parts
  • a ride estimate was higher than the final fare
  • a restaurant pre-authorized a larger amount but settled the final bill

Design implications:

  • the payment object must track authorized amount, captured amount, and remaining capturable amount
  • ledger posting must reflect only captured funds, not full authorization
  • invoice and fulfillment systems must know whether a partial capture is expected behavior or a problem

3.6 Partial Refunds

Partial refunds are extremely common.

Examples:

  • one item in a multi-item order is returned
  • a service credit covers part of the invoice
  • support grants a goodwill refund for part of the charge

Design implications:

  • never model refund as a boolean flag on payment
  • track cumulative refunded amount and remaining refundable amount
  • prevent total refunds from exceeding captured amount
  • ledger and accounting should reflect each refund as its own event

3.7 Delayed Capture

Delayed capture is not an edge case. It is a standard production requirement.

Why it exists:

  • final amount may be unknown at checkout
  • merchant may want to confirm inventory or fraud review first
  • fulfillment may happen hours or days later

Tradeoffs:

  • improves business control
  • increases complexity in payment lifecycle tracking
  • introduces authorization expiry risk if capture is too late

3.8 Clearing and Settlement

Developers often stop thinking at authorization, but finance systems cannot.

Clearing

Clearing is the exchange of transaction details among payment institutions. It confirms what transaction should be processed and under what rules.

Settlement

Settlement is the actual movement of funds between institutions.

Why this matters:

  • a payment can be authorized but not yet settled
  • a successful API response does not mean cash is in the merchant bank account yet
  • marketplace payout logic often depends on settlement confidence, not just auth success

3.9 Refunds and Chargebacks

Refunds are merchant-initiated. Chargebacks are usually customer-issuer-initiated disputes.

Topic Refund Chargeback
Who initiates merchant or support workflow cardholder through issuer
Typical reason return, cancellation, service issue fraud, dissatisfaction, unrecognized charge
Timing usually after capture often days or weeks later
Control merchant usually controls initiation external process, evidence-based
Operational impact customer experience and accounting adjustment financial loss risk, dispute operations, fees

Chargebacks are a major reason payment systems need strong evidence storage, audit logs, and operational tooling.

3.10 Retry Behavior in Processing Systems

Good retry behavior requires classification.

Retry candidates:

  • connection timeout before receiving provider result
  • transient provider 5xx
  • webhook delivery failure
  • temporary downstream database outage

Do not blindly retry:

  • issuer declines
  • fraud rules blocked the payment
  • expired payment method
  • authorization window already expired

Practical pattern:

  1. attempt payment with provider idempotency key
  2. on timeout, mark local state as uncertain or processing
  3. query provider by external reference if possible
  4. wait for webhook or scheduled reconciliation if ambiguity remains

3.11 Webhooks and Event-Driven Payment Updates

Real payment platforms are event-driven because external truth arrives asynchronously.

Typical events:

  • payment succeeded
  • payment failed
  • payment requires action
  • charge captured
  • refund succeeded
  • dispute opened
  • payout paid

Best practices:

  • verify webhook signatures
  • store raw webhook payloads for replay and audit
  • deduplicate by provider event ID and business object ID
  • process via internal queue, not inline on the HTTP handler only
  • make handlers idempotent and order-aware

Common mistake: assuming events arrive exactly once and in order. Providers often deliver at least once and may redeliver older events.

3.12 Duplicate Charge Prevention

Duplicate charge prevention is not one feature. It is multiple layers.

Layer Example
API idempotency same checkout request with same key returns same result
Business key uniqueness one successful payment per order or invoice
Payment intent reuse retries reuse the same payment object
Provider-side idempotency send a provider idempotency key or merchant reference
Operational controls support tools show prior attempts before allowing manual re-charge

3.13 Exactly-Once vs At-Least-Once Reality

Exactly-once is mostly a marketing phrase in distributed money systems.

What you can realistically achieve is:

  • at-least-once delivery of events
  • idempotent processing of duplicates
  • unique business constraints that prevent duplicate final effects
  • reconciliation that catches anything missed

A strong interview answer explicitly says this instead of claiming "I will ensure exactly once delivery".

4. Subscriptions

Subscriptions turn one-time payment processing into a long-lived revenue engine.

4.1 What Recurring Billing Really Means

A subscription is a contract-like object that says:

  • who is being billed
  • for what plan or service
  • on what schedule
  • under what pricing rules
  • with what payment collection behavior

Recurring billing is not just a cron job that charges a card every month. It is a coordinated system involving:

  • plan catalog and pricing rules
  • entitlement logic
  • invoice generation
  • taxes and discounts
  • payment collection
  • retries and dunning
  • cancellation and plan change rules

4.2 High-Level Subscription Architecture

flowchart LR
	CAT[Plan Catalog\nprices features coupons] --> SUB[Subscription Service]
	CUST[Customer / Account] --> SUB
	SUB --> SCHED[Renewal Scheduler]
	SCHED --> BILL[Billing Engine]
	USAGE[Usage Metering] --> BILL
	BILL --> INV[Invoice Engine]
	INV --> PAY[Payment Service]
	PAY --> PSP[Payment Provider]
	PAY --> LEDGER[Ledger]
	INV --> LEDGER
	SUB --> ENT[Entitlement Service]
	PAY --> ENT
	SUB --> NOTIFY[Email / In-App Notifications]
	PAY --> NOTIFY

4.3 Monthly vs Annual Subscriptions

Model Business benefit Technical impact
Monthly lower entry barrier, more flexible more frequent renewals, more retry churn
Annual better cash flow, lower churn, simpler gross retention story larger invoice amounts, proration complexity, annual tax handling

A subscription system must treat cadence as a first-class configuration, not a hard-coded monthly assumption.

4.4 Trial Periods

Trials exist because product growth often needs a usage or time-limited evaluation phase.

Design questions:

  • does the trial require a payment method up front?
  • what happens when the trial ends without a payment method?
  • when do entitlements activate and deactivate?
  • can the same customer create repeated free trials?

Real-world examples:

  • Netflix historically used trials as a conversion funnel in some markets
  • SaaS tools often require a card to reduce abuse and improve conversion quality

4.5 Upgrades, Downgrades, and Proration

Proration exists because customers change plans mid-cycle.

Example:

  • plan A costs $100 per month
  • plan B costs $200 per month
  • customer upgrades halfway through the billing cycle
  • the incremental charge is usually about $50 before tax and discounts

Why proration is hard:

  • seat counts may change multiple times in the same cycle
  • taxes may vary by jurisdiction
  • credits may need to be carried to the next invoice instead of immediately refunded
  • invoice presentation must stay understandable for the customer

Best practice: store billable events and proration line items explicitly rather than trying to recompute historical changes from current plan state.

4.6 Grace Periods

A grace period allows temporary continued access after a failed renewal or overdue invoice.

Why it exists:

  • transient card failures are common
  • immediate shutdown creates bad customer experience
  • enterprise contracts may allow time for manual payment

But grace periods need clear rules:

  • what features remain enabled?
  • how long does grace last?
  • which customer segments qualify?
  • when do collections escalate?

4.7 Failed Renewal Handling, Retries, and Dunning

Dunning is the process for recovering failed recurring payments.

Typical strategy:

  1. attempt renewal
  2. if failed for retriable reason, retry on scheduled intervals
  3. notify the customer by email or in-product banners
  4. optionally use backup payment methods
  5. apply grace period
  6. cancel or suspend if payment is not recovered

Important nuance: retry behavior should depend on failure reason. Retrying an insufficient-funds decline after salary day might work. Retrying a stolen-card block probably will not.

4.8 Cancellations

Cancellation behavior should be explicit.

Common models:

  • immediate cancellation with immediate entitlement removal
  • cancel at period end
  • cancel and refund unused period under specific policy
  • pause rather than cancel

Good systems store both:

  • operational subscription status
  • effective cancellation date

Those are not always the same thing.

4.9 Scheduled Plan Changes

Many SaaS systems allow a downgrade or upgrade to take effect later.

Examples:

  • upgrade immediately, because customer wants more features now
  • downgrade at next renewal, because customer already paid for current cycle
  • price increase takes effect at next term for existing customers

That means you often need both:

  • current plan version
  • pending future plan version

4.10 Subscription Lifecycle Design

stateDiagram-v2
	[*] --> Trialing
	Trialing --> Active
	Trialing --> Incomplete
	Incomplete --> Active
	Active --> PastDue
	PastDue --> GracePeriod
	GracePeriod --> Active
	GracePeriod --> Paused
	GracePeriod --> Canceled
	Active --> PendingCancellation
	PendingCancellation --> Canceled
	Active --> Paused
	Paused --> Active
	Canceled --> [*]

Do not collapse all of these into one status without clear semantics. In real systems, invoice state, payment state, and entitlement state may temporarily differ.

Example:

  • invoice is unpaid
  • payment is retrying
  • subscription is still active because the customer is inside grace period

4.11 SaaS Subscription Architecture Patterns

Strong SaaS systems usually separate:

  • subscription contract management
  • pricing and catalog management
  • invoice generation
  • payment collection
  • entitlement enforcement

That separation matters because GitHub-style seat billing, Shopify merchant billing, and enterprise SaaS contracts all need different rules, but they still plug into the same general architecture.

4.12 Common Failures and Scaling Considerations

What breaks at scale:

  • renewing millions of subscriptions at exactly midnight UTC
  • large proration computations for accounts with many seat changes
  • delayed usage events missing the current invoice cut-off
  • broken entitlements because payment and subscription state were coupled too tightly

Best practices:

  • spread renewal schedules across time instead of one giant batch
  • snapshot pricing terms at subscription creation time
  • version plans instead of mutating them in place
  • keep entitlements separately derived from subscription policy plus payment policy

5. Invoices

5.1 What an Invoice Is

An invoice is a formal statement that documents what the customer owes for specific goods or services.

In many businesses, the invoice is not just an internal billing artifact. It is a customer-facing financial document with tax, legal, and accounting implications.

5.2 Why Invoices Exist

Invoices exist because businesses need a structured record of:

  • what was sold
  • when it was sold
  • who owes the money
  • what taxes applied
  • when payment is due

For subscription businesses, invoices often bridge billing, finance, and customer support. For enterprise sales, they may be the main financial document the customer actually processes.

5.3 Invoice Generation

Invoice generation typically pulls from multiple sources:

  • subscription charges
  • usage aggregates
  • discounts and credits
  • taxes
  • currency settings
  • customer billing profile

Typical stages:

  1. collect billable line items
  2. compute rating and discounts
  3. compute taxes
  4. create draft invoice
  5. review or finalize
  6. collect payment or send for manual payment

5.4 Invoice Numbering

Invoice numbering sounds trivial, but it is often compliance-sensitive.

Requirements may include:

  • uniqueness
  • no accidental reuse
  • understandable sequence for finance operations
  • jurisdiction-specific numbering expectations

Practical approaches:

  • globally unique invoice numbers
  • per legal entity sequences
  • per country or tax registration sequences

Common mistake: using a random UUID as the only customer-visible invoice identifier. UUIDs are fine as internal IDs, but finance and customers often still need a human-usable invoice number.

5.5 Tax Basics

A production billing system usually needs at least a working tax model.

Common concerns:

  • sales tax vs VAT vs GST
  • business location and customer location
  • tax-exempt customers
  • net price vs gross price presentation
  • tax per line item vs tax on total

Even if a company uses Stripe Tax, Avalara, or another external tax provider, your system still needs to store inputs, outputs, versioned tax decisions, and invoice presentation.

5.6 Invoice Finalization

Finalization is the point at which a draft invoice becomes an authoritative billing document.

Why it matters:

  • line items usually become locked
  • taxes and totals become authoritative
  • payment collection or customer delivery may begin
  • downstream accounting and reporting can depend on it

Many systems allow drafts to be edited but finalized invoices to be immutable.

5.7 Why Invoices Often Cannot Be Edited Like Normal Records

This is a very common interview discussion point.

Once an invoice has been issued, multiple downstream systems may already rely on it:

  • customer procurement or AP systems
  • tax reporting
  • finance close processes
  • revenue reporting
  • external accounting exports

If you simply edit the original invoice row, you can destroy auditability and legal traceability.

Typical production approaches instead are:

  • void the invoice if rules allow
  • issue a credit note
  • create a replacement invoice
  • maintain versioned presentation while preserving immutable original financial facts

5.8 Due Dates and Payment Status Tracking

Common invoice statuses:

  • draft
  • finalized or open
  • paid
  • partially paid
  • overdue
  • uncollectible
  • void

These statuses are not just cosmetic. They drive collections, revenue operations, support workflows, and customer entitlements in some businesses.

5.9 Invoice vs Receipt

Topic Invoice Receipt
Purpose request or document amount owed proof that payment was received
Timing typically before or at billing time after payment succeeds
Business meaning customer owes or was billed customer already paid
Common use subscription billing, enterprise accounts receivable ecommerce confirmation, payment confirmation

This distinction matters because many junior designs treat them as interchangeable.

5.10 Invoice Lifecycle

flowchart LR
	D[Draft Invoice] --> F[Finalized Invoice]
	F --> O[Open / Awaiting Payment]
	O --> P[Paid]
	O --> OD[Overdue]
	OD --> U[Uncollectible]
	F --> V[Void]
	P --> R[Credit Note / Refund Adjustment]

5.11 Invoice Versioning Considerations

Versioning is often needed for presentation changes without rewriting financial history.

Examples:

  • corrected customer address
  • updated PDF template
  • additional display metadata for support

Good design distinguishes between:

  • immutable financial facts
  • mutable presentation metadata

The rules vary heavily by country and business model, but common themes include:

  • invoice numbering discipline
  • retention periods
  • tax fields and calculations
  • non-editability of issued documents
  • legal entity identifiers

The engineering lesson is simple: invoice design is not just a UI export problem.

6. Idempotency

Idempotency is one of the most important topics in financial backend design.

6.1 Why Idempotency Is Critical

Retries are unavoidable.

They happen because:

  • the client timed out
  • the mobile network dropped
  • a proxy retried the request
  • a worker retried the job
  • a human clicked the checkout button twice

Without idempotency, retries turn into duplicate financial side effects.

6.2 Duplicate Payment Risks

Duplicate payment failures are expensive because they create:

  • customer trust loss
  • support costs
  • refund workload
  • dispute risk
  • accounting noise

That is why idempotency is a first-class design concern, not a nice API feature.

6.3 Idempotency Keys

An idempotency key is a client- or server-generated key representing one logical operation.

Example use cases:

  • create payment for order 123
  • refund payment 456 for $20
  • create invoice for subscription cycle 2026-04

Good idempotency key design usually includes scope:

  • merchant or account ID
  • operation type
  • business object reference

6.4 API Idempotency Design

Typical pattern:

  1. client sends request with idempotency key
  2. server checks durable idempotency store
  3. if key exists with completed identical request, return stored result
  4. if key exists with conflicting payload, reject
  5. if key is new, reserve it and process the request
  6. store final response or final effect reference

Important nuance: storing only "request seen" is often insufficient. You usually also need the final outcome or object ID.

6.5 Idempotency Processing Flow

flowchart TD
	REQ[Incoming Request with Idempotency Key] --> LOOKUP{Key Exists?}
	LOOKUP -->|No| RESERVE[Reserve Key as In-Progress]
	RESERVE --> EXEC[Execute Business Operation]
	EXEC --> STORE[Store Final Result / Object Reference]
	STORE --> RESP[Return Response]
	LOOKUP -->|Yes same payload| RETURN[Return Existing Result]
	LOOKUP -->|Yes different payload| CONFLICT[Reject as Idempotency Conflict]

6.6 Storage Strategies

Strategy Pros Cons
Relational table with unique constraint durable, transactional, simple to reason about can become hot under heavy traffic
Redis plus durable backing store low latency needs careful durability and failover semantics
Business-object uniqueness only simpler for narrow cases not enough for generic API retries
Workflow engine state store good for long-running operations heavier architecture

In financial systems, a relational durable store is often the safest default for write paths that truly move money.

6.7 Expiration Strategies

Idempotency keys should not necessarily live forever, but expiring them too quickly is dangerous.

Considerations:

  • retry window of clients and background jobs
  • provider response delays
  • support workflows that may replay operations
  • fraud or abuse risks from unbounded storage

A practical approach is:

  • keep keys for a bounded period such as 24 hours or several days for payment creation APIs
  • rely on business-level uniqueness constraints for longer-term protection

6.8 Webhook Idempotency

Webhook handlers also need idempotency.

Use multiple layers:

  • deduplicate provider event IDs
  • also make object-state transitions idempotent
  • reject invalid backward transitions unless explicitly allowed

Why both matter:

  • the same event may be delivered multiple times
  • different events may refer to the same payment object
  • events may arrive out of order

6.9 Exactly-Once Myths

Exactly-once delivery across clients, APIs, workers, providers, and webhooks is not a realistic assumption.

What you want instead:

  • idempotent APIs
  • idempotent event processing
  • durable business object references
  • reconciliation to catch misses

6.10 Designing Safe Retries

Safe retry design usually means:

  • deterministic operation identity
  • state machine aware transitions
  • provider-side idempotency support when available
  • ability to query by external reference after ambiguous failures

6.11 Real-World Example: Duplicate Checkout Prevention

Imagine a customer presses "Pay" twice because the page looked frozen.

A robust system uses:

  • one order ID
  • one payment intent for that order
  • one client-generated idempotency key per payment create or confirm action
  • a unique constraint that only one successful charge can attach to the order

That layered design is how systems like Stripe-integrated checkouts avoid accidental duplicate charges.

7. Reconciliation

Reconciliation is the process of comparing internal financial records with external records to ensure they match.

7.1 What Reconciliation Is

Reconciliation answers questions like:

  • did every captured payment in our system appear in provider reports?
  • did every provider settlement correspond to a ledger entry?
  • did our bank receive the expected payout amount?
  • are refunds, fees, and chargebacks reflected correctly?

7.2 Why Reconciliation Exists

Reconciliation exists because your code is not the only actor in the system.

Even if your application logic is correct, discrepancies still happen because of:

  • delayed webhooks
  • provider bugs or temporary outages
  • manual operations
  • settlement timing differences
  • duplicate or missing files
  • currency conversion and fee differences

This is why reconciliation is mandatory even when your code is "correct".

7.3 Internal vs External Reconciliation

Type What it compares Example
Internal reconciliation your own systems against each other order amount matches invoice, payment, and ledger postings
External reconciliation your systems against providers and banks captured payments match processor report and bank payout

Strong platforms do both.

7.4 Payment Provider Reconciliation

Provider reconciliation usually compares internal payment records against:

  • provider event feeds
  • settlement reports
  • fee reports
  • refund reports
  • dispute reports

Matching keys often include:

  • external payment ID
  • merchant reference or order ID
  • amount and currency
  • settlement date

7.5 Bank Reconciliation

Bank reconciliation compares expected cash movement against actual bank statements or payout reports.

This matters because provider-level success does not always mean the merchant bank account received the exact expected cash at the expected time.

7.6 Settlement Verification

Settlement verification answers:

  • was a captured payment settled?
  • was the correct fee deducted?
  • did the merchant or platform receive the expected net amount?
  • were refunds and chargebacks netted correctly?

Marketplace platforms care deeply about this because payouts to sellers depend on correct net settlement calculations.

7.7 Reconciliation Flow

flowchart LR
	INT[Internal Orders / Payments / Ledger] --> MATCH[Reconciliation Engine]
	PR[Processor Reports] --> MATCH
	BANK[Bank Statements / Payout Files] --> MATCH
	MATCH --> OK[Matched Records]
	MATCH --> EX[Exceptions Queue]
	EX --> OPS[Finance Ops Review]
	OPS --> FIX[Adjustments / Replays / Escalation]
	FIX --> LED[Ledger Adjustment or Operational Resolution]

7.8 Mismatch Detection

Common mismatch categories:

  • internal record exists but provider record missing
  • provider record exists but internal record missing
  • amount mismatch
  • currency mismatch
  • status mismatch
  • settlement date mismatch
  • duplicate external events

7.9 Missing Transaction Handling

When records are missing, the system should not just log and forget.

Typical workflow:

  1. detect exception automatically
  2. classify likely cause
  3. attempt automatic replay or status refresh if safe
  4. escalate to finance or payment ops queue
  5. resolve via adjustment, support action, or provider escalation

7.10 Delayed Event Handling

Reconciliation systems need time windows and tolerance for delay.

Example:

  • a payment was captured on day 1
  • settlement report arrives on day 2
  • bank payout arrives on day 3

If your recon job assumes all systems should match instantly, it will generate noisy false positives.

7.11 Operational Workflows and Manual Review

Real finance systems need exception management.

Manual review may be required for:

  • ambiguous provider outcomes
  • chargeback evidence preparation
  • missing bank settlement
  • suspicious refund patterns
  • mismatched currency conversions

This is a good interview point: strong financial system design includes admin tools and ops queues, not just APIs and tables.

7.12 Best Practices and Common Mistakes

Best practices:

  • store raw provider reports and bank files immutably
  • keep reconciliation jobs rerunnable
  • use deterministic matching rules and explainable exception categories
  • distinguish pending mismatch from confirmed mismatch

Common mistakes:

  • relying only on webhooks and skipping provider reports
  • silently auto-correcting discrepancies without traceability
  • failing to account for settlement delay windows
  • not exposing exception queues to operations teams

8. Ledger Systems

The ledger is the core financial memory of the platform.

8.1 What a Ledger System Is

A ledger system records money movement as financial entries between accounts.

It is the source of truth for balances, obligations, and historical financial events.

Examples:

  • wallet balances
  • merchant payable balances
  • platform fee revenue entries
  • stored credits and promotional balances
  • payout obligations

8.2 Why a Ledger Exists

Applications often start with a naive design like this:

  • user.balance = user.balance + amount

That works until you need to answer:

  • why is the balance this number?
  • which transaction changed it?
  • can we reverse one specific event?
  • can we reproduce yesterday's state?
  • can auditors follow the trail?

The ledger exists because balances should be derived from recorded entries, not treated as unexplained mutable facts.

8.3 Append-Only Design

Good ledgers are append-only.

That means:

  • new financial events create new entries
  • existing entries are not casually edited or deleted
  • corrections are represented as reversing or compensating entries

This is essential for auditability.

8.4 Immutable Financial Records

Immutability matters because historical financial truth should be reconstructable.

If a platform changes an old ledger row in place, it may make today's balance look correct while destroying the explanation of how that balance was reached.

8.5 Double-Entry Bookkeeping Basics

Double-entry bookkeeping means every financial event affects at least two accounts, and the posting remains balanced.

The core idea is simple:

  • money cannot appear from nowhere
  • money cannot disappear without an offsetting explanation

In interviews, the exact accounting treatment is less important than the principle that every financial event should create balanced entries.

8.6 Debit and Credit Concepts

Developers often struggle here because debit and credit are not just synonyms for plus and minus.

They are directions whose effect depends on account type.

Useful interview-safe intuition:

  • assets and expenses typically increase with debits
  • liabilities, equity, and revenue typically increase with credits

You do not need to present a CPA-level lecture. You do need to show that a financial event should be represented as balanced movement across accounts, not one mutable balance update.

8.7 Example Ledger Posting

Suppose a platform charges a customer $100 and keeps a $3 fee while $97 is owed to the merchant.

One possible journal representation is:

Account Entry
Processor receivable Debit $100
Merchant payable Credit $97
Platform fee revenue Credit $3

The exact chart of accounts differs by business, but the balanced-entry principle does not.

8.8 Balances Derived from Entries

A balance table may still exist for performance, but it should usually be a materialized or derived view of ledger entries, not the only source of truth.

This is a fundamental distinction.

8.9 Ledger vs Simple Balance Table

Ledger system Simple balance table
append-only history latest number only
supports reconstruction and audit hard to explain changes
supports reversals and corrections corrections overwrite history
good for compliance and reconciliation fragile under concurrency and debugging
more complex to build simpler initially

8.10 Ledger Posting Flow

flowchart LR
	EV[Business Event\npayment capture refund payout] --> POST[Posting Engine]
	POST --> RULES[Posting Rules / Chart of Accounts]
	RULES --> JE[Journal Entry with Balanced Lines]
	JE --> LED[Append-Only Ledger]
	LED --> BAL[Materialized Balances]
	LED --> STMT[Statements / Reporting / Reconciliation]

8.11 Reversals Instead of Updates

If a refund happens, the ledger should usually record a new reversing or compensating event instead of editing the original capture entry.

Why this matters:

  • preserves history
  • makes reconciliation explainable
  • supports financial close and audit workflows

8.12 Auditability and Correctness Guarantees

Good ledger systems enforce invariants such as:

  • journal entries must balance
  • account and currency must be explicit
  • financial periods may be locked after close
  • external references must be preserved

8.13 Wallet and Platform Examples

Examples where ledgers matter deeply:

  • Stripe Connect-like marketplace balances
  • PayPal wallet balances
  • Uber driver earnings and adjustments
  • Shopify merchant payouts and fee deductions
  • SaaS customer credit balances

8.14 Common Mistakes

Common mistakes include:

  • using floating point for money
  • mixing currencies in the same account without explicit conversion events
  • updating balances directly without append-only entries
  • letting business services write arbitrary ledger rows without a posting engine

8.15 Scaling Considerations

What changes at scale:

  • very hot accounts may need partitioning or sharding
  • balances may need snapshotting for fast reads
  • backfills need controlled replay semantics
  • period close and replay rules need strong governance

Best practice: centralize posting logic so product teams do not each invent their own accounting behavior.

9. Billing System

Billing is the system that decides what should be charged, when, and under what pricing rules.

9.1 Billing System Overview

Billing is not the same thing as payments.

System Core question
Billing what should this customer owe?
Payments did we successfully collect money?
Invoicing what formal document shows the charge?
Ledger what are the immutable financial movements?

Why billing and payments are separate:

  • a customer can owe money before payment happens
  • payment can fail while the invoice remains valid
  • some businesses bill monthly but collect later by wire or ACH
  • some charges are usage-derived long after the product event happened

9.2 Billing Architecture

flowchart LR
	PE[Product Events\nAPI calls seats storage compute] --> MTR[Metering Ingestion]
	MTR --> AGG[Usage Aggregation]
	CAT[Plan Catalog\nprice books discounts] --> RATE[Rating Engine]
	AGG --> RATE
	SUB[Subscription Contracts] --> RATE
	RATE --> INV[Invoice Engine]
	INV --> PAY[Payment Collection]
	INV --> AR[Accounts Receivable]
	PAY --> LEDGER[Ledger]
	INV --> LEDGER
	LEDGER --> ACC[Accounting Export]
	PAY --> ENT[Entitlement / Service Control]
	INV --> NOTIFY[Invoice Email / Customer Portal]

9.3 Usage-Based vs Subscription Billing

Model Example Technical implications
Pure subscription Netflix monthly plan scheduled renewals, simpler predictable invoices
Usage-based API calls, storage, compute hours metering, aggregation, late events, rating complexity
Seat-based GitHub or SaaS user seats seat snapshots, proration, entitlement sync
Hybrid Shopify plan plus transaction fees combine recurring and usage or fee-derived line items

9.4 Pricing Model Design

Pricing is partly a product decision and partly a systems design problem.

Technical questions pricing creates:

  • can pricing rules be versioned?
  • can invoices explain the charge simply?
  • can finance reconcile it?
  • can sales and support understand it?
  • can the system backfill or re-rate if needed?

Bad pricing models are often hard not because math is hard, but because they create confusing operational behavior.

9.5 Usage Tracking

Usage tracking powers usage-based billing and parts of seat-based or entitlement billing.

9.5.1 Usage Metering Basics

Metering means recording billable events such as:

  • API requests
  • storage GB-months
  • compute minutes or instance-hours
  • messages sent
  • seats active during a billing window

9.5.2 Event Ingestion

Meter events usually arrive through:

  • synchronous product write path
  • async message streams
  • log pipelines
  • batch imports from service usage systems

Best practice: persist raw usage events before aggregation so billing can be recomputed if needed.

9.5.3 Usage Aggregation

Aggregation transforms raw events into billable quantities.

Examples:

  • total API calls by account per day
  • average daily active seats in cycle
  • total storage byte-hours converted to GB-month

Aggregation often needs a dedicated pipeline because raw event volume is too large for invoice-time computation.

9.5.4 Deduplication

Usage events are often duplicated by retries or replay.

Dedup strategies include:

  • event IDs unique per producer
  • idempotent upserts into raw event store
  • windowed dedup during aggregation

9.5.5 Delayed and Late-Arriving Usage

Late usage is normal.

Examples:

  • a region buffers logs and ships them later
  • mobile device usage syncs late
  • a downstream service republishes events after outage recovery

This forces you to choose a policy:

  • hold invoice finalization until watermark is reached
  • close invoice on time and roll late usage into next cycle
  • issue an adjustment invoice later

There is no universal answer. It depends on customer expectations and finance policy.

9.5.6 Billing Windows

Billing windows define which usage belongs to which cycle.

You need clear answers for:

  • timezone and cutoff rules
  • how retries across midnight are handled
  • how backdated corrections are posted

9.5.7 Prepaid vs Postpaid Usage

Model Meaning Example
Prepaid customer buys credits or balance in advance ad platforms, wallet systems, prepaid API credits
Postpaid usage is measured first, billed later cloud compute, SaaS overage billing

Prepaid systems care more about balance control and real-time enforcement. Postpaid systems care more about accurate metering and invoice correctness.

9.5.8 Accuracy Guarantees

Real systems rarely guarantee perfect exactly-once event ingestion. Instead they aim for:

  • durable raw event retention
  • deterministic aggregation logic
  • replay capability
  • reconciliation between product metrics and billable usage

9.5.9 Fraud Prevention Basics

Usage billing can be abused.

Examples:

  • account takeover causing huge compute spend
  • self-generated fake usage for promotional credit abuse
  • duplicated or forged meter events

Basic controls:

  • signed or authenticated meter producers
  • anomaly detection on usage spikes
  • spend caps and alerts
  • quota-based temporary throttling

9.5.10 Real-World Examples

  • API platforms bill on requests or tokens consumed
  • cloud storage bills on byte-hours or GB-months
  • compute platforms bill on runtime duration or instance-hours
  • GitHub-like SaaS products may bill on active seats or premium features enabled

9.6 Subscription Plans

9.6.1 Pricing Models

Model Example System implications
Flat rate one plan, one price easiest billing, simplest invoices
Seat-based per user or active seat seat counting rules, proration, entitlement sync
Tiered pricing first 100 units one rate, next 900 another complex rating and customer explanation
Usage-based pay per request, GB, minute metering pipeline required
Hybrid base subscription plus overages multiple billing engines meet on one invoice

9.6.2 Feature Entitlements

Feature access should not be hard-coded to plan name strings.

Good design uses:

  • plan version
  • feature flags or entitlement policies
  • effective date ranges

Why this matters:

  • marketing renames plans
  • enterprise exceptions happen
  • grandfathered customers need older feature bundles

9.6.3 Enterprise Custom Plans

Enterprise contracts often include:

  • custom pricing
  • annual commitments
  • manual invoicing
  • negotiated payment terms like net 30 or net 60
  • true-up charges later

This is why billing platforms need enough flexibility to support both self-serve checkout and finance-managed accounts receivable flows.

9.6.4 Discounts, Coupons, and Promotions

Discount systems need careful modeling.

Questions to answer:

  • percent or flat discount?
  • one-time or recurring?
  • plan-limited or account-wide?
  • does it apply before or after tax?
  • does it affect revenue reporting?

9.6.5 Grandfathered Plans

When pricing changes, many existing customers keep old terms.

Engineering implication:

  • do not mutate the current plan price in place and assume history will still make sense
  • create new plan versions
  • store which version each subscription is attached to

9.6.6 Plan Versioning

Plan versioning is one of the most important billing design habits.

Without it, you cannot safely explain historical invoices after pricing or feature changes.

9.7 Refunds

Refunds connect customer support, payments, billing, ledger, and fraud controls.

9.7.1 Full and Partial Refunds

Refunds can be:

  • full refund of the whole captured amount
  • partial refund of specific amount or line items
  • service credit instead of cash refund

9.7.2 Refund Approval Flows

Not every refund should be a direct API call.

Common approval patterns:

  • automated refund within policy limits
  • support-initiated refund with permission checks
  • manager approval for large amounts
  • finance review for old or exceptional cases

9.7.3 Refund Timing Constraints

Refund behavior depends on payment stage.

  • if payment was only authorized, you may be able to void instead of refund
  • if captured but not fully settled, provider behavior may differ
  • bank transfer refunds may require separate payout instructions

9.7.4 Asynchronous Refund Completion

Refunds are often asynchronous.

That means the system should track states like:

  • refund requested
  • refund submitted to provider
  • refund pending
  • refund succeeded
  • refund failed

9.7.5 Ledger Reversal Handling

A refund should create compensating financial entries, not erase the original charge.

That usually means:

  • reduce receivable or cash position as appropriate
  • reduce merchant payable or reverse revenue where applicable
  • link refund entries to original payment event

9.7.6 Abuse Prevention and Fraud Considerations

Refund systems can be abused.

Examples:

  • compromised support account issuing fraudulent refunds
  • customer requesting repeated partial refunds across channels
  • refunding to a payment method not tied to the original transaction

Controls:

  • permissioned refund roles
  • approval thresholds
  • audit logs for every refund action
  • anomaly detection on refund velocity

9.7.7 Accounting Implications

Refunds affect more than payment state.

They can affect:

  • revenue reporting
  • tax adjustments
  • merchant payable balances
  • customer statements

9.7.8 Safe Operational Refund Design

flowchart LR
	SUP[Support / Customer Request] --> POLICY[Refund Policy Engine]
	POLICY --> APPROVE{Needs Approval?}
	APPROVE -->|Yes| MGR[Manager / Finance Approval]
	APPROVE -->|No| REQ[Create Refund Request]
	MGR --> REQ
	REQ --> PAY[Payment Provider Refund API]
	REQ --> LED[Ledger Reversal Pending]
	PAY --> WH[Refund Webhook / Status Update]
	WH --> LED2[Ledger Reversal Finalized]
	WH --> NOTIF[Customer Notification]
	WH --> AUD[Audit Log]

9.8 Audit Logs

Audit logs are not the same as application logs.

9.8.1 Why Audit Logs Matter

Audit logs answer questions like:

  • who issued this refund?
  • who changed billing settings?
  • when was a plan changed?
  • who updated payout bank details?
  • who overrode a failed payment and granted access?

9.8.2 Compliance and Operational Requirements

Financially relevant actions often require durable traceability for:

  • internal investigations
  • fraud reviews
  • regulatory or audit requests
  • customer disputes

9.8.3 What Audit Logs Typically Capture

  • actor identity
  • action type
  • target object
  • before and after state where allowed
  • timestamp
  • request ID or trace ID
  • approval context if applicable

9.8.4 Immutable Event History

Audit logs are usually append-only and write-restricted.

Why:

  • if admins can casually edit audit history, the log loses its value
  • investigations need evidence quality, not best-effort debugging

9.8.5 Admin Action Tracking

Particularly sensitive actions include:

  • refunds
  • manual captures
  • invoice voids
  • plan price changes
  • bank account or payout changes
  • permission grants for support and finance roles

9.8.6 Financial Investigation Workflows

When something goes wrong, investigators often need to correlate:

  • customer-facing event
  • internal admin actions
  • payment provider events
  • ledger postings
  • reconciliation exceptions

Strong systems make that traceability possible with consistent IDs.

9.8.7 Retention Strategy and Tamper Resistance

Common design choices:

  • long retention for financially relevant events
  • restricted delete permissions
  • append-only storage patterns
  • checksums or immutable storage layers for high-sensitivity contexts

9.8.8 Application Logs vs Audit Logs

Topic Application logs Audit logs
Purpose debugging and observability accountability and investigation
Retention often shorter often longer
Editability may be reprocessed or rotated freely should be tightly controlled
Structure often operational and noisy structured and action-focused
Example HTTP 500 from refund API admin user 42 approved refund of $500

Common mistake: assuming standard service logs are enough for auditability.

10. How These Systems Connect in Real Architecture

The best way to understand the whole stack is to follow a realistic end-to-end flow.

10.1 SaaS Subscription Example

Imagine a GitHub-like SaaS product with seat-based billing and annual plans.

  1. Customer selects a plan and seat count.
  2. Subscription service creates the contract and stores plan version.
  3. Invoice engine generates the first invoice, applying discount and tax.
  4. Payment service creates a payment intent and charges the default card.
  5. Provider returns initial result, then sends async confirmation webhook.
  6. Ledger posts the charge and resulting receivable or revenue flows.
  7. Entitlement service activates the plan after policy conditions are met.
  8. Renewal scheduler later creates the next billing cycle.
  9. Usage and seat changes may create proration line items.
  10. Reconciliation confirms settlement and fee correctness.

10.2 Marketplace Example

Imagine a Shopify-like or Uber-like platform.

  1. Customer pays the platform.
  2. Payment is authorized and later captured.
  3. Ledger records customer payment, platform fee, and merchant or driver payable.
  4. Refunds or chargebacks later create adjusting entries.
  5. Payout system sends net funds to merchant or driver.
  6. Reconciliation confirms provider settlement and bank payout.

This example shows why ledgers are central in platforms that hold and distribute money between parties.

10.3 Common Architectural Separation

Good financial architectures often split into services like:

  • checkout or payment orchestration
  • provider integration adapters
  • invoice engine
  • subscription service
  • metering and rating pipeline
  • ledger or accounting posting service
  • reconciliation service
  • audit and admin tooling

This separation exists because each domain has different correctness rules, scaling patterns, and operational teams.

11. Common Interview Discussions

When interviewers ask about financial systems, they usually care less about memorizing provider jargon and more about whether you understand failure and correctness.

11.1 Questions You Should Be Ready For

  • How do you prevent duplicate charges?
  • How do you model payment state transitions?
  • Why do you need a ledger instead of a balance column?
  • How do you handle provider webhooks arriving late or twice?
  • Why is reconciliation needed if your service is correct?
  • How would you support partial refunds and chargebacks?
  • How do you design subscription retries and dunning?
  • Why are billing and payments separate systems?

11.2 Strong Talking Points

Strong answers usually include:

  • idempotency at API and event-processing layers
  • explicit state machines for long-lived payment workflows
  • append-only ledger entries with reversals instead of mutation
  • asynchronous processing and webhook handling
  • reconciliation against provider and bank data
  • operational tooling for refunds, disputes, and manual review

11.3 Weak Talking Points

Weak answers often sound like:

  • "I would just store payment status in a table"
  • "I would make sure events are exactly once"
  • "If the request times out, I would retry"
  • "The invoice can just be edited if something changes"

Those answers ignore the real complexity of money systems.

12. Common Production Mistakes

These are the mistakes that repeatedly break real systems.

  1. Using floating point for money.
  2. Treating payment success as final truth without waiting for async confirmation where needed.
  3. Not separating order, invoice, payment, and ledger state.
  4. Forgetting idempotency on write APIs and webhook consumers.
  5. Updating financial records in place instead of using reversals or versioned models.
  6. Skipping reconciliation because "our DB is correct".
  7. Designing no admin tooling for refunds, disputes, and exceptions.
  8. Hard-coding plan names and pricing instead of versioning them.
  9. Using the balance table as the only source of truth.
  10. Ignoring currency, tax, and settlement timing edge cases.

13. Practical Best Practices Checklist

If you want an interview answer that also sounds production-ready, these are the habits to emphasize.

  • represent money in minor units with explicit currency
  • use business IDs and provider IDs together
  • design state machines, not booleans
  • keep financial records append-only where possible
  • make retries safe with idempotency
  • verify and store raw webhook events
  • build reconciliation jobs from the start
  • maintain audit logs for sensitive operations
  • version pricing, plans, and invoice-affecting rules
  • give operations teams visibility and controls

14. Final Mental Model

If you remember only one thing, remember this:

Financial systems are not just about moving money. They are about maintaining trust under retries, ambiguity, delay, failure, and scrutiny.

A strong design usually separates:

  • business events
  • billable calculation
  • payment execution
  • legal documents
  • immutable financial recording
  • external verification
  • operational accountability

That is the mental model behind real payment platforms, subscription businesses, marketplaces, and SaaS billing systems.

And that is also the mental model interviewers are usually trying to detect.