Computer-Fundamentals/systems design/3.dataStorage.md

# Data Storage

Data storage is where system design stops being abstract and starts becoming expensive, slow, fragile, or correct. Almost every backend system eventually becomes a storage problem: where data lives, how fast it can be read, how safely it can be written, how it is recovered after failure, and how it evolves as the product and traffic grow.

In interviews, data storage questions are rarely about naming a database. They are about whether you understand:

- what kind of data model the system needs
- what read and write patterns dominate
- what correctness guarantees matter
- what breaks under scale or failure
- what tradeoffs you are making when you choose one storage model over another

In production, storage decisions affect latency, reliability, developer velocity, analytics capability, and incident severity. A weak storage design causes constant pain later: bad schemas, slow queries, difficult migrations, stale replicas, failed restores, or impossible cross-shard operations.

This guide is written to serve both interview preparation and real backend engineering. The goal is not to memorize definitions. The goal is to understand why these systems exist, how they work internally, and how strong engineers talk about them.

Examples in this guide are generalized from common industry patterns and public engineering discussions from companies such as Google, Netflix, Uber, Amazon, GitHub, Stripe, and large SaaS platforms.

## 1. Big Picture: How Storage Fits Into Real Systems

At a high level, most production systems use multiple storage systems at once.

One database is rarely enough because different workloads want different things:

- transactional correctness for orders, payments, and user accounts
- low-latency caching for hot reads
- search-oriented storage for text retrieval
- analytics storage for large scans and aggregations
- event logs for asynchronous processing
- archival storage for old or infrequently accessed data

### 1.1 Typical Production Architecture

```mermaid
flowchart LR
	C[Client / API Consumer] --> APP[Application Services]
	APP --> CACHE[(Cache / Key-Value Store)]
	APP --> SQLP[(Primary Relational DB)]
	SQLP --> SQLR[(Read Replicas)]
	APP --> DOC[(Document DB)]
	APP --> KV[(Distributed Key-Value)]
	APP --> STREAM[(Event Log / Queue)]
	STREAM --> OLAP[(Warehouse / Analytics Store)]
	SQLP --> BACKUP[(Backups + PITR Logs)]
```

The point is not to use many technologies for their own sake. The point is that storage is selected per workload.

### 1.2 The Four Questions Behind Every Storage Decision

| Question | Why it matters | Typical design consequence |
|---|---|---|
| What is the shape of the data? | Rows, documents, events, graphs, blobs, counters all behave differently | Drives relational vs document vs wide-column vs graph |
| How is the data accessed? | Point lookups, range scans, joins, aggregations, traversals, write bursts | Drives indexing, partitioning, caching, denormalization |
| What consistency is required? | Money movement and inventory need stronger guarantees than likes or view counts | Drives ACID, transactions, replica strategy, stale-read tolerance |
| How will it scale and fail? | Size, throughput, multi-region behavior, backups, migrations, hotspots | Drives sharding, replication, partitioning, archival, DR strategy |

### 1.3 Interview Framing

When an interviewer asks, "What database would you use?", a strong answer sounds like this:

"I would start from access patterns and correctness requirements. If the core entities are strongly related, require transactions, and need flexible querying, I would default to a relational database. If the workload is massive key-based access with simple lookups and looser consistency, a NoSQL store may be a better fit. I would then discuss indexes, replication, partitioning, and backup strategy because the database choice alone does not solve production scale."

That answer shows engineering judgment rather than tool memorization.

## 2. Relational Databases

Relational databases are the default storage system for a huge amount of backend software because most business systems revolve around structured entities and relationships: users, subscriptions, invoices, orders, teams, repositories, shipments, and permissions.

### 2.1 What Relational Databases Are

A relational database stores data in tables. Each table represents a type of entity. Rows represent individual records. Columns represent attributes of those records.

Example:

- `users(id, email, created_at)`
- `orders(id, user_id, total_amount, status)`
- `order_items(order_id, product_id, quantity, price)`

The key idea is not just "table-shaped data". The key idea is that relationships are first-class. One row can reference another through keys, and the database can enforce integrity rules around those relationships.

### 2.2 Why Relational Databases Exist

They exist because business data is usually not independent. It is interconnected.

Examples:

- an invoice belongs to an account
- an order belongs to a customer
- a repository belongs to an organization
- a payout belongs to a merchant and must match ledger entries

Without a relational model, every application would have to manually enforce the same guarantees:

- uniqueness
- referential integrity
- transactional correctness
- safe concurrent updates
- consistent query semantics

Relational databases centralize those guarantees and provide a query language, optimizer, transaction manager, and storage engine to manage them efficiently.

### 2.3 Core Building Blocks

| Concept | Meaning | Why it exists |
|---|---|---|
| Table | Collection of rows of the same entity type | Organizes records under a common schema |
| Row | One record | Represents one object or fact |
| Column | One attribute | Gives structure and type constraints |
| Primary key | Unique identifier for a row | Supports identity, lookup, and relationships |
| Foreign key | Reference to another table's primary key | Enforces valid relationships |
| Constraint | Rule the data must satisfy | Prevents invalid writes at the database layer |
| Schema | The formal structure of tables and fields | Makes data predictable and queryable |

### 2.4 Primary Keys

A primary key uniquely identifies a row.

Typical choices:

- auto-increment integer
- UUID
- ULID or time-sortable ID
- composite key in some domain-specific cases

Tradeoffs:

- auto-increment IDs are compact and index-friendly but can create write hotspots in distributed systems
- UUIDs support decentralized ID generation but are larger, less cache-friendly, and can fragment indexes depending on type
- time-ordered IDs improve write locality while keeping uniqueness across distributed nodes

Interview discussion:

If asked about ID choice, do not stop at uniqueness. Discuss index locality, distributed generation, predictability, and operational implications.

### 2.5 Foreign Keys and Constraints

Foreign keys ensure that a row points to something real.

Example:

- `orders.user_id` should reference `users.id`

Why this matters:

- prevents orphaned records
- protects against application bugs
- makes data trustworthy for downstream systems

Constraints commonly used in production:

- `PRIMARY KEY`
- `UNIQUE`
- `NOT NULL`
- `CHECK`
- `FOREIGN KEY`

Tradeoff:

Some high-scale systems relax or avoid certain constraints in hot paths because they want more write throughput or more control in the application layer. But removing database-level safety shifts correctness burden onto application code and background repair jobs.

### 2.6 Schema Design

Schema design is the process of deciding how entities are represented and related.

A good schema:

- reflects domain boundaries clearly
- supports the most important queries efficiently
- enforces correctness where possible
- avoids unnecessary duplication
- evolves safely over time

Bad schema design is one of the most expensive long-term mistakes in backend systems. It causes:

- hard-to-explain data models
- excessive joins
- duplicated fields that drift apart
- slow migrations
- poor index strategy
- downstream analytics confusion

Practical rule: schema design should start from access patterns and invariants, not from class diagrams alone.

### 2.7 How Relational Databases Work Internally

Most modern relational systems are not just "tables on disk". Internally, they usually include:

- a parser that understands SQL
- a planner/optimizer that chooses an execution strategy
- a buffer pool that caches pages in memory
- a storage engine that organizes rows and indexes into pages or files
- a write-ahead log (WAL) or redo log for durability and crash recovery
- a lock manager or MVCC mechanism for concurrency control

Conceptually, a write often looks like this:

1. Parse and validate the SQL.
2. Check permissions, constraints, and transaction state.
3. Update in-memory structures or page buffers.
4. Append the change to a durable log.
5. Eventually flush changed pages to disk.
6. On crash recovery, replay the log to restore a consistent state.

This log-first approach is why databases can be both durable and reasonably fast.

### 2.8 ACID Properties

ACID is one of the main reasons relational databases dominate critical business systems.

| Property | Meaning | Why it matters in production |
|---|---|---|
| Atomicity | A transaction happens entirely or not at all | Prevents partial writes such as charging a card but not recording the payment |
| Consistency | The database moves from one valid state to another valid state | Preserves constraints, invariants, and business rules |
| Isolation | Concurrent transactions behave as if they do not corrupt each other | Prevents race conditions and inconsistent reads |
| Durability | Committed data survives crashes | Critical for orders, money, and audit logs |

ACID does not mean infinite performance or global serializability everywhere. It means the database provides mechanisms to preserve correctness within its transaction model.

### 2.9 OLTP vs OLAP Basics

| Dimension | OLTP | OLAP |
|---|---|---|
| Goal | Fast transactional reads/writes | Large analytical scans and aggregations |
| Query shape | Point lookups, small updates, short transactions | Complex joins, scans, groupings across large datasets |
| Data freshness | Near real-time | Often batch or streaming ingestion |
| Schema style | Highly normalized or moderately normalized | Often denormalized star/snowflake models |
| Example | Checkout, login, billing, inventory | Revenue dashboards, cohort analysis, forecasting |

Common mistake: using the primary production database for heavy analytics. That often destroys latency for user-facing traffic.

Production pattern:

- OLTP DB for live product traffic
- CDC or event stream replicates changes into an OLAP system or warehouse

### 2.10 When Relational Databases Are the Right Choice

Choose a relational database when you need:

- strong transactional correctness
- relationships across entities
- flexible querying across structured data
- strict constraints and integrity checks
- mature tooling and predictable operational behavior

Typical examples:

- Stripe-like payments and ledger systems
- GitHub-like repository metadata, accounts, permissions, billing
- SaaS products with users, teams, subscriptions, invoices, audit records
- e-commerce orders, inventory reservations, and shipments

### 2.11 Common Production Problems

| Problem | Why it happens | Typical response |
|---|---|---|
| Slow queries | Missing indexes, bad query shape, poor cardinality estimates | Add or fix indexes, rewrite query, update stats |
| Lock contention | Hot rows, long transactions, frequent updates | Reduce transaction scope, reorder operations, use optimistic patterns |
| Replica lag | Heavy writes, slow replication, long-running queries on replicas | Limit lag-sensitive reads, tune replication, use read-after-write routing |
| Connection exhaustion | Too many app instances or poor pooling | Use connection pooling and backpressure |
| Schema migration risk | Tight coupling between app versions and schema | Use backward-compatible migrations |
| Storage bloat | Dead tuples, fragmentation, old data retention | Vacuum/compaction, archiving, partitioning |

## 3. SQL

SQL is the language used to describe what data you want and how you want to change it. The important part is that SQL is largely declarative: you say what result you want, and the database decides how to get it.

That separation is powerful because the optimizer can choose different execution strategies based on data size, indexes, and statistics.

### 3.1 Why SQL Exists

SQL exists because applications need a high-level way to:

- retrieve specific records
- filter and aggregate data
- join related tables
- modify data safely inside transactions
- define schemas, constraints, and access policies

Instead of hand-coding file access and pointer traversal, developers describe intent and let the database manage physical execution.

### 3.2 How Databases Think About SQL Internally

When you write a query, the database usually goes through stages like:

1. Parse the SQL into an internal tree.
2. Validate table names, column names, permissions, and types.
3. Rewrite the query if useful, such as simplifying expressions or flattening subqueries.
4. Estimate row counts using statistics.
5. Consider candidate plans.
6. Choose access methods like index scan, sequential scan, hash join, sort, or aggregation.
7. Execute the plan and stream results back.

### 3.3 Query Execution Flow

```mermaid
flowchart TD
	Q[SQL Query] --> P[Parser]
	P --> R[Rewriter]
	R --> O[Optimizer / Planner]
	O --> A{Choose Access Path}
	A --> S1[Sequential Scan]
	A --> S2[Index Scan]
	A --> J[Join Strategy]
	J --> J1[Nested Loop]
	J --> J2[Hash Join]
	J --> J3[Merge Join]
	S1 --> E[Execution Engine]
	S2 --> E
	J1 --> E
	J2 --> E
	J3 --> E
	E --> RES[Rows Returned]
```

The optimizer is the reason two queries with similar meaning can perform very differently.

### 3.4 Core SQL Operations

#### `SELECT`

Used to retrieve data. But the real question is not "how do I write SELECT". The real question is: what access path will the database choose to produce the rows?

If a query filters by an indexed, selective column, the planner may use an index scan. If the condition matches a large percentage of the table, a full scan may actually be cheaper.

#### `INSERT`

Adds new rows. Internally, inserts may:

- allocate a new row location
- update all relevant indexes
- validate constraints
- write to the transaction log

A write to one logical table can mean multiple physical writes due to indexes.

#### `UPDATE`

Modifies existing rows. Updates can be deceptively expensive because they may:

- find the existing row
- create a new row version under MVCC
- update each affected index
- generate dead tuples or fragmented storage

#### `DELETE`

Deletes rows. In many engines, deletes do not immediately remove bytes from disk. They mark rows as deleted, and cleanup happens later through vacuuming or compaction.

That is why mass deletes can be surprisingly expensive and can cause storage bloat.

### 3.5 `WHERE`, `GROUP BY`, `ORDER BY`, `HAVING`

| Clause | What it does logically | What often matters physically |
|---|---|---|
| `WHERE` | Filters rows before aggregation | Index usage, predicate selectivity |
| `GROUP BY` | Groups rows into buckets | Sorting or hashing cost |
| `ORDER BY` | Sorts result rows | Whether an index can satisfy ordering |
| `HAVING` | Filters after grouping | Size of grouped intermediate results |

Interview point:

`WHERE` and `HAVING` are not interchangeable. `WHERE` reduces rows earlier. `HAVING` filters after grouping. Earlier reduction is usually cheaper.

### 3.6 Aggregations

Aggregations such as `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX` are straightforward conceptually but can be expensive at scale.

Production concerns:

- full-table aggregations are expensive on hot OLTP databases
- grouped aggregations can require large hash tables or sort phases
- repeated dashboard queries often need precomputation, caching, or OLAP offloading

Example:

A SaaS dashboard showing active users by day should not necessarily run a large `GROUP BY date(created_at)` directly against the primary transactional database every page load.

### 3.7 Subqueries, CTEs, Views, Stored Procedures

#### Subqueries

Useful for expressing dependent logic, but performance depends on whether the optimizer can rewrite or decorrelate them.

#### CTEs

Common Table Expressions improve readability and can structure complex logic into steps. In some databases they behave mostly as syntax sugar; in others they can affect optimization boundaries. Do not assume they are always free.

#### Views

Views encapsulate query logic behind a virtual table interface.

Good use cases:

- simplifying common query logic
- enforcing access patterns
- providing stable read models to consumers

Tradeoff:

Regular views do not store data. They execute underlying logic each time. Materialized views store results but must be refreshed.

#### Stored Procedures

Stored procedures put logic in the database.

Why teams use them:

- reduce application round trips
- centralize critical logic close to data
- improve consistency for some workflows

Why teams avoid overusing them:

- harder versioning and testing compared to application code
- business logic becomes harder to trace
- vendor lock-in increases

In interviews, a balanced answer is strongest: stored procedures are useful for certain data-heavy workflows, but many modern teams prefer to keep most domain logic in application services.

### 3.8 Transactions and SQL

SQL statements do not operate in a vacuum. They run inside transaction boundaries.

This matters because:

- multiple writes may need to succeed together
- reads may or may not see concurrent updates
- locks or row versions are affected by transaction scope

Example:

```sql
BEGIN;

UPDATE accounts
SET balance = balance - 100
WHERE id = 1;

UPDATE accounts
SET balance = balance + 100
WHERE id = 2;

COMMIT;
```

This is not just two updates. It is one correctness boundary.

### 3.9 Query Optimization Mindset

Strong engineers do not guess blindly. They ask:

- how many rows are being scanned?
- which predicates are selective?
- is an index usable?
- are we forcing a sort?
- are we joining large intermediate results?
- are we fetching columns not covered by the index?
- is this workload better served from a cache or OLAP store?

### 3.10 Reading Execution Plans

At a high level, execution plans tell you:

- scan type used
- join order and join method
- estimated versus actual row counts
- sort and aggregation steps
- cost estimates

What to look for first:

- unexpected sequential scans on large tables
- massive row count mismatches indicating stale statistics or bad estimates
- repeated nested loops over huge inputs
- sorts spilling or large hash structures

Interview point:

You do not need to memorize every plan node. You do need to show that you know how to reason from plan shape to performance bottleneck.

### 3.11 Common Interview SQL Discussions

- Explain why a query is slow.
- Design indexes for a query.
- Compare a subquery to a join.
- Explain `WHERE` versus `HAVING`.
- Explain why `SELECT *` is often a bad idea.
- Discuss pagination: offset pagination versus keyset pagination.
- Explain how transactions affect concurrent SQL statements.

## 4. Indexing

Indexes exist because scanning every row for every query does not scale.

If a table has 500 rows, a full scan is fine. If it has 500 million rows and the query wants one customer by email, a full scan is a disaster.

### 4.1 The Full Table Scan Problem

Without an index, the database may need to inspect every row to answer a filter such as:

```sql
SELECT * FROM users WHERE email = 'a@example.com';
```

That means CPU, disk I/O, buffer churn, and higher latency. Indexes trade extra storage and write cost for much faster reads.

### 4.2 What an Index Is

An index is an auxiliary data structure that makes certain lookups cheaper.

Think of it like a book index:

- without it, you scan the whole book
- with it, you jump near the right page immediately

### 4.3 B-Tree Indexes

B-Tree indexes are the default in many relational systems because they support:

- exact lookups
- range scans
- ordered traversal
- prefix matches on composite keys

Internally, B-Trees keep keys in sorted order across pages so the database can navigate down the tree quickly and then scan a relevant range.

Why they are popular:

- good general-purpose behavior
- efficient for equality and range queries
- can support `ORDER BY` if the ordering matches the index

### 4.4 Hash Indexes

Hash indexes map keys to buckets and are good for exact equality lookups.

Tradeoff:

- excellent for `=` lookups
- not useful for range scans like `BETWEEN`, `<`, `>`, or ordered traversal

In many production systems, B-Trees remain the default because they are more versatile.

### 4.5 Composite Indexes

A composite index includes multiple columns, such as:

```sql
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at);
```

Important concept: column order matters.

This index is most useful when queries filter or order by the leading prefix:

- `WHERE user_id = ?`
- `WHERE user_id = ? AND status = ?`
- `WHERE user_id = ? AND status = ? ORDER BY created_at`

It may not help much for a query filtering only by `status`.

### 4.6 Covering Indexes

A covering index contains enough information to answer a query without going back to the base table.

This reduces extra reads.

Example idea:

If a query needs only `id` and `status`, and those are already in the index, the engine may avoid fetching the full row.

Why it matters:

- lower I/O
- lower latency
- better cache efficiency

### 4.7 Clustered vs Non-Clustered Indexes

| Type | Meaning | Implication |
|---|---|---|
| Clustered index | Table data is physically ordered by the index key or closely tied to that key | Great for range scans on the clustering key, but only one primary physical order |
| Non-clustered index | Separate structure pointing to row locations | Flexible, but may require extra lookups to fetch row data |

Practical intuition:

You get one main physical organization of the table, but many secondary lookup structures.

### 4.8 Unique Indexes

Unique indexes enforce that indexed values do not repeat.

Examples:

- email should be unique per user account table
- external payment id should be unique to prevent duplicate processing

They improve correctness, not just performance.

### 4.9 Partial Indexes

A partial index only indexes rows matching a condition.

Example:

```sql
CREATE INDEX idx_active_subscriptions
ON subscriptions(user_id)
WHERE status = 'active';
```

Why it is useful:

- smaller index
- faster maintenance
- targeted at frequent queries

This is especially powerful when the application repeatedly queries a hot subset of data.

### 4.10 Index Selectivity

Selectivity means how well a predicate narrows the result set.

- high selectivity: returns very few rows, like email or order id
- low selectivity: returns many rows, like boolean `is_active = true` in a large table where most rows are active

Indexes are more useful on selective predicates. Indexing a low-selectivity column alone may not help much.

### 4.11 How Indexes Affect Query Planning

The optimizer decides whether an index helps based on estimated cost.

It may ignore an index when:

- the query needs most rows anyway
- the table is small enough that a scan is cheaper
- statistics suggest low selectivity
- using the index would cause too many random reads

This surprises many beginners who think, "I created an index, so it must be used."

It is not index existence that matters. It is cost.

### 4.12 Write Amplification Tradeoff

Every extra index speeds some reads but makes writes more expensive.

An insert or update may need to update:

- the base table storage
- the primary key index
- each secondary index

Costs of over-indexing:

- slower writes
- more storage
- more memory usage
- more vacuum/compaction overhead
- more operational complexity during schema changes

### 4.13 Good vs Bad Indexing Examples

| Pattern | Good or Bad | Why |
|---|---|---|
| Index on `users(email)` for login lookup | Good | High selectivity, common query path |
| Composite index on `(tenant_id, created_at)` for tenant timeline queries | Good | Matches filter and sort shape |
| Separate indexes on `tenant_id` and `created_at` when queries always use both together | Often weaker | Composite index may serve the access pattern better |
| Index every column "just in case" | Bad | High write cost, low usefulness |
| Index a boolean column by itself in a table where almost all values are the same | Often bad | Low selectivity |

### 4.14 Common Indexing Mistakes

- indexing without looking at real query patterns
- ignoring column order in composite indexes
- assuming indexes always help writes or deletes
- forgetting that indexes need maintenance and storage
- missing indexes on foreign keys used heavily in joins
- failing to reconsider indexes after query patterns change

## 5. Joins

Joins exist because relational data is split across tables to reduce redundancy and preserve structure, but applications still need combined answers.

Example:

- users in one table
- orders in another
- order items in a third

To answer "show me Alice's last 10 orders and item counts", the database must combine related records.

### 5.1 Why Joins Exist

Without joins, every application would either:

- duplicate related data everywhere, or
- perform multiple round trips and manually stitch results together

Joins let the database combine data close to storage, often more efficiently than the application layer could.

### 5.2 Join Types

| Join | Meaning | Typical use |
|---|---|---|
| `INNER JOIN` | Return only matching rows | Orders with valid customers |
| `LEFT JOIN` | Keep all rows from left table, even if right side missing | Users and optional profile settings |
| `RIGHT JOIN` | Keep all rows from right table | Less commonly used; often rewritten as left join |
| `FULL OUTER JOIN` | Keep all rows from both sides | Reconciliation or comparison workloads |
| `SELF JOIN` | Join a table to itself | Org hierarchy, manager relationships |
| `CROSS JOIN` | Cartesian product | Rare except for deliberate matrix generation |

### 5.3 How Databases Perform Joins Internally

The database does not "magically combine" tables. It chooses an algorithm.

#### Nested Loop Join

For each row from one input, look up matching rows in the other input.

Best when:

- one side is small
- the other side has a useful index

Danger:

- terrible if both sides are large and lookups are expensive

#### Hash Join

Build a hash table from one input, then probe it using the other input.

Best when:

- joining large unsorted datasets on equality

Tradeoff:

- needs memory to build the hash structure
- not useful for every join predicate shape

#### Merge Join

Walk two sorted inputs together.

Best when:

- both inputs are already sorted or indexed on join keys
- join is based on sortable keys

Tradeoff:

- sorting inputs can be expensive if not already ordered

### 5.4 Join Execution Example

```mermaid
flowchart LR
	A[Orders Scan or Index Scan] --> J{Join Strategy}
	B[Users Scan or Index Scan] --> J
	J --> NL[Nested Loop]
	J --> HJ[Hash Join]
	J --> MJ[Merge Join]
	NL --> R[Joined Result]
	HJ --> R
	MJ --> R
```

### 5.5 Join Performance Problems

Joins become expensive when:

- join keys are not indexed appropriately
- large intermediate results are created before filtering
- data is highly skewed
- many-to-many joins explode result size
- joins happen across shards or services instead of within one database node

### 5.6 Avoiding Expensive Joins at Scale

Common production strategies:

- add the right indexes on join keys
- filter early before joining large tables
- precompute summaries or read models
- denormalize hot read paths
- move heavy analytical joins to OLAP systems
- avoid cross-service synchronous joins in request paths

Real-world intuition:

A GitHub-like product can keep strongly normalized core data in MySQL or Postgres, but timeline feeds, counters, search views, and recommendation paths often use denormalized or precomputed models because repeated multi-table joins on hot endpoints become too costly.

## 6. Transactions

Transactions are the unit of correctness in storage systems.

They exist because many business operations are not a single write. They are a bundle of writes and reads that must behave as one logical step.

Examples:

- transferring money between two accounts
- reserving inventory and creating an order
- creating a subscription and initial invoice together
- writing a payment and corresponding ledger entry

### 6.1 What a Transaction Is

A transaction is a sequence of operations that the database treats as one atomic unit.

Either:

- all intended changes commit, or
- none of them do

### 6.2 Commit and Rollback

- `COMMIT` makes the transaction durable and visible according to isolation rules
- `ROLLBACK` discards its changes

This sounds simple, but under the hood the database must coordinate row versions, locks, logs, and crash recovery.

### 6.3 Transaction Flow

```mermaid
sequenceDiagram
	participant App
	participant DB
	participant Log as WAL / Redo Log

	App->>DB: BEGIN
	App->>DB: Read current state
	App->>DB: Write change A
	App->>DB: Write change B
	DB->>Log: Append durable log records
	App->>DB: COMMIT
	DB-->>App: Success
```

If a failure happens before commit completes, the system must recover to a consistent state using the log.

### 6.4 ACID In Depth

#### Atomicity

Prevents partial effects.

Without atomicity:

- balance deducted from account A
- crash occurs
- balance never credited to account B

That is unacceptable for financial or inventory systems.

#### Consistency

Consistency means the database enforces declared invariants and valid transitions.

Examples:

- foreign keys remain valid
- unique constraints are not violated
- balances cannot go below allowed thresholds if enforced properly

Important nuance: some business consistency rules are stronger than database constraints and must also be enforced in application logic.

#### Isolation

Isolation determines how concurrent transactions interact.

This is where many subtle bugs live.

#### Durability

Durability means that once commit succeeds, data survives process crash or machine crash, assuming the system's durability model is functioning correctly.

### 6.5 Isolation Levels

| Isolation level | Prevents | Still allows in many systems | Typical tradeoff |
|---|---|---|---|
| Read uncommitted | Very little | Dirty reads, non-repeatable reads, phantoms | Rarely used for correctness-sensitive work |
| Read committed | Dirty reads | Non-repeatable reads, phantoms | Common default in many databases |
| Repeatable read | Dirty reads, many non-repeatable reads | Phantom behavior depends on engine | Stronger consistency, more overhead |
| Serializable | Behaves like serial execution | None conceptually, but may abort conflicting transactions | Strongest correctness, lowest concurrency |

### 6.6 Read Anomalies

#### Dirty Read

Transaction A sees uncommitted data from transaction B. If B rolls back, A observed fiction.

#### Non-Repeatable Read

Transaction A reads a row twice and gets different values because another transaction committed a change between reads.

#### Phantom Read

Transaction A runs the same predicate twice and sees different sets of rows because another transaction inserted or deleted matching rows.

Interview point:

Do not memorize anomalies in isolation. Connect them to business impact, such as showing inconsistent account balance, inventory count, or list of eligible coupons.

### 6.7 Locking Basics

Databases often use locks or lock-like mechanisms to coordinate concurrent access.

Common ideas:

- shared locks for reading
- exclusive locks for writing
- row-level, page-level, or table-level scope depending on engine and operation

Stronger isolation often means more locking or more conflict detection.

### 6.8 Optimistic vs Pessimistic Locking

| Approach | Idea | Best when | Tradeoff |
|---|---|---|---|
| Optimistic locking | Assume conflicts are rare; detect them at commit/update time | High-read, lower-conflict systems | Retries needed on conflict |
| Pessimistic locking | Lock early to prevent conflicts | High-conflict or correctness-critical updates | Lower concurrency, risk of lock waits |

Example of optimistic locking:

- row has a `version`
- update succeeds only if `version` still matches expected value

This is common in SaaS applications where conflicts are possible but not constant.

### 6.9 Deadlocks

A deadlock happens when two transactions each wait for locks held by the other.

Example:

- transaction A locks row 1, then wants row 2
- transaction B locks row 2, then wants row 1

The database detects the cycle and aborts one transaction.

Best practices:

- access rows in consistent order
- keep transactions short
- avoid user interaction inside transactions
- retry safely on deadlock aborts

### 6.10 Distributed Transactions Basics

Transactions inside one database node are already complex. Across multiple services or databases, they are much harder.

Two-phase commit exists, but many large distributed systems avoid relying on it for core business flows because it adds latency, coordinator complexity, and difficult failure handling.

### 6.11 Saga Pattern Basics

The saga pattern breaks a distributed workflow into multiple local transactions plus compensating actions.

Example:

1. reserve inventory
2. create order
3. charge payment
4. if payment fails, release inventory and cancel order

This is not the same as a single ACID transaction. It is a business-level consistency approach.

Production tradeoff:

- better scalability and service autonomy
- more complex failure and compensation logic
- eventual consistency windows must be handled explicitly

### 6.12 Real-World Production Tradeoffs

Stripe-like money systems often keep the critical ledger in a strongly consistent relational model because correctness matters more than raw write scale. A social feed or notification counter may accept eventual consistency and use cheaper distributed storage patterns.

That contrast is the essence of storage design: correctness requirements decide architecture.

## 7. Normalization

Normalization exists to reduce redundancy and protect data quality.

It is often taught as academic normal forms, but in production it is a practical engineering tool for controlling duplication and update pain.

### 7.1 Why Normalization Exists

Imagine storing customer email in every order row. If the customer changes email, old and new values may diverge across orders. Now your data is inconsistent.

Normalization helps separate facts so each fact is stored in one canonical place when appropriate.

### 7.2 Redundancy Problems and Anomalies

| Problem | Meaning | Example |
|---|---|---|
| Update anomaly | Same fact must be updated in many places | Customer address copied into many rows |
| Delete anomaly | Deleting one record accidentally loses another fact | Deleting last order removes only stored customer metadata |
| Insert anomaly | Cannot insert one fact without another unrelated fact | Cannot create product record until there is an order |

### 7.3 1NF, 2NF, 3NF

#### First Normal Form (1NF)

Store atomic values, not repeating groups stuffed into one field.

Bad:

- `phone_numbers = '123,456,789'`

Better:

- separate rows or related table for phone numbers

#### Second Normal Form (2NF)

For tables with composite keys, non-key attributes should depend on the whole key, not just part of it.

#### Third Normal Form (3NF)

Non-key attributes should depend on the key, the whole key, and nothing but the key. In practice, this means avoid storing derived or indirectly dependent facts in the same table when they belong elsewhere.

### 7.4 When to Stop Normalizing

Normalize until:

- duplication is controlled
- invariants are clear
- common writes stay manageable
- queries are still practical

Do not normalize forever just to satisfy theory. If a read path becomes too join-heavy or too expensive, denormalization may be the right decision.

### 7.5 Denormalization Tradeoffs

Denormalization means intentionally duplicating data to improve read performance, reduce joins, or simplify query paths.

Examples:

- storing `customer_name` on an order snapshot
- storing aggregate counters on a parent row
- materializing a feed or timeline table for fast reads

Benefits:

- faster hot-path reads
- simpler queries
- lower join cost

Costs:

- duplicate data can drift
- writes become more complex
- backfills and migrations become harder
- multiple representations must stay consistent

### 7.6 Normalization vs Denormalization

| Dimension | Normalized design | Denormalized design |
|---|---|---|
| Write correctness | Easier to enforce | More application logic required |
| Read performance | Can require joins | Often faster for hot reads |
| Storage usage | Lower duplication | Higher duplication |
| Evolution | Cleaner canonical model | Backfills and consistency harder |
| Best for | Core transactional records | High-volume read models, feeds, dashboards |

Interview framing:

Strong answers usually say: start normalized for correctness, then denormalize specific read paths once measurement shows it is necessary.

## 8. NoSQL Databases

NoSQL does not mean "no structure" or "always better at scale". It broadly refers to storage systems that do not center the relational table model and SQL-style joins/transactions as the default abstraction.

### 8.1 Why NoSQL Exists

NoSQL systems grew because many modern workloads stressed relational databases in specific ways:

- enormous write throughput
- horizontal scaling across many nodes
- flexible or nested data models
- low-latency key-based access
- globally distributed workloads
- looser consistency acceptable for some use cases

Relational databases are powerful, but not every workload needs joins and multi-row ACID transactions. Some systems care more about partition tolerance, write availability, or schema flexibility.

### 8.2 CAP Theorem Basics

CAP is often explained badly.

In the presence of a network partition, a distributed system can prioritize:

- consistency: all nodes reflect the same latest view before responding
- availability: every request receives a response even if some responses are stale

Partition tolerance is not optional in distributed systems that span nodes and networks. The real question during partition is how the system behaves.

Practical takeaway:

- money transfer systems often prefer stronger consistency
- shopping carts, timelines, caches, and analytics may prefer availability or eventual consistency

### 8.3 Eventual Consistency Basics

Eventual consistency means replicas may temporarily disagree, but if no new writes happen, they converge eventually.

This is acceptable when:

- temporary staleness is tolerable
- conflict resolution is possible
- the business experience can absorb delayed convergence

Examples:

- social like counts
- follower counts
- recommendation data
- session caches in some architectures

### 8.4 Schema Flexibility

Some NoSQL systems let records vary more freely than rigid relational schemas.

This helps when:

- different entities have different optional fields
- product requirements evolve rapidly
- nested document structures fit the domain naturally

But schema flexibility is not free. Uncontrolled flexibility leads to inconsistent documents, messy queries, and difficult validation.

### 8.5 SQL vs NoSQL Comparison

| Dimension | Relational / SQL | NoSQL |
|---|---|---|
| Data model | Tables with relations | Key-value, document, wide-column, graph |
| Transactions | Usually strong and mature | Often limited, scoped, or workload-dependent |
| Schema | Strongly defined | Flexible or query-driven depending on system |
| Joins | Native and powerful | Often limited or avoided |
| Horizontal scale | Possible but operationally harder | Often a primary design goal |
| Best workloads | Transactional business systems | Large-scale distributed access patterns and specialized models |
| Risk | Scaling and write hotspots if misused | Weaker guarantees or harder query flexibility if misapplied |

### 8.6 When Not to Use NoSQL

Do not choose NoSQL just because it sounds scalable.

Avoid it when you need:

- rich multi-table queries
- strong relational integrity
- complex transactional workflows
- frequent ad hoc reporting over structured business data

A surprising number of early-stage products are best served by a relational database plus caching, not by premature polyglot persistence.

## 9. Key-Value Databases

Key-value databases store data as a mapping from key to value.

That sounds simple because it is. Their power comes from doing simple access patterns extremely well.

### 9.1 Data Model

- key: unique identifier
- value: blob, string, JSON, serialized object, counter, etc.

You usually ask for data by exact key. There is often little or no join logic.

### 9.2 Why They Exist

They are optimized for:

- extremely fast reads and writes
- simple lookup patterns
- horizontal distribution
- caching and ephemeral state

### 9.3 Common Use Cases

| Use case | Why key-value fits |
|---|---|
| Cache | Same key requested repeatedly, low-latency reads matter |
| Session storage | Retrieve session by session id |
| Rate limiting | Increment counters by user or API key |
| Feature flags | Lookup flag set by environment or tenant |
| Idempotency keys | Check if request key has been seen before |

### 9.4 Distributed Cache Basics

Systems like Redis-style stores are often used as distributed caches sitting in front of primary databases.

Why this matters:

- many reads are repetitive
- memory is much faster than disk-backed database access
- caching can protect the primary database during traffic spikes

Common patterns:

- cache-aside: app reads cache, falls back to DB, then populates cache
- write-through: writes update cache and backing store together
- write-behind: cache writes asynchronously to backing store

### 9.5 TTL Concepts

TTL means time to live. After the TTL expires, the item is evicted or treated as absent.

Useful for:

- sessions
- temporary tokens
- API response caching
- rate-limit windows

TTL prevents stale ephemeral data from living forever.

### 9.6 Eviction Strategies Basics

When cache memory is full, items must be evicted.

Common policies:

- LRU: least recently used
- LFU: least frequently used
- TTL-driven expiration
- random or approximate policies in some implementations

Tradeoff:

cache design is about picking what is okay to lose.

### 9.7 Internal Working Model

Many key-value systems rely on:

- in-memory hash tables for fast lookups
- append-only logs or snapshots for durability
- replication for availability
- partitioning across nodes for scale

If durability matters, do not assume an in-memory store is safe by default. Persistence configuration matters.

### 9.8 Failure Cases and Mistakes

- treating cache as the source of truth
- not handling cache stampedes when many requests miss simultaneously
- using unbounded key cardinality and blowing memory
- forgetting TTLs for ephemeral values
- assuming multi-key transactions behave like relational ACID systems

Production note:

Netflix-like and large SaaS architectures frequently use aggressive caching layers because keeping all read pressure on the relational core does not scale well enough.

## 10. Document Databases

Document databases store data as self-contained documents, often JSON-like.

They work well when the application naturally thinks in aggregates rather than heavily normalized relationships.

### 10.1 Data Model

Example document:

```json
{
  "orderId": "ord_123",
  "customerId": "cus_42",
  "shippingAddress": {
    "city": "Seattle",
    "country": "US"
  },
  "items": [
    { "sku": "A1", "qty": 2, "price": 20 },
    { "sku": "B9", "qty": 1, "price": 50 }
  ],
  "status": "paid"
}
```

This is attractive because the whole order can be loaded and stored together.

### 10.2 Why Document Databases Exist

They exist for workloads where:

- entities are naturally hierarchical or nested
- schema evolves frequently
- a whole aggregate is often read or written together
- joins are limited or can be avoided

### 10.3 Schema Flexibility

Schema flexibility is helpful during rapid product iteration, but mature teams still impose structure through validation, conventions, or schema tooling.

Otherwise, the same concept ends up with multiple shapes across documents, and queries become painful.

### 10.4 Indexing Considerations

Document databases still need indexes. Flexibility does not eliminate query planning.

Common indexed fields:

- tenant id
- status
- created_at
- nested fields used in filters

Mistake:

storing large nested documents but frequently querying on many scattered nested fields without planning index cost.

### 10.5 Querying Patterns

Document databases are strongest when queries align with document boundaries.

Good fit:

- fetch profile by id
- fetch order by order id
- fetch products in category with a few indexed filters

Weaker fit:

- many complex joins across many collections
- arbitrary relational analytics
- cross-document transactions on hot paths if the system handles them poorly

### 10.6 Denormalization Patterns

Document databases often encourage denormalized aggregates.

Example:

- store shipping address snapshot inside the order document
- embed small child objects that are usually fetched together

Good when:

- aggregate boundaries are clear
- updates are localized
- read patterns match whole-document access

Bad when:

- the embedded data changes independently at high frequency
- many documents must be updated for one logical change

### 10.7 When Document Databases Work Well

- content management systems
- product catalogs with variable attributes
- user profiles with optional nested settings
- event metadata or semi-structured application state

Public-pattern intuition:

Uber-style dynamic entities, marketplace metadata, or fast-evolving product features often push teams toward document-style models in some subsystems because strict relational modeling can become cumbersome for highly variable payloads.

## 11. Wide-Column Databases

Wide-column databases are built for high write throughput, large-scale distribution, and query models that are designed around known access patterns.

Examples include Cassandra-style systems.

### 11.1 Column Family Model

These systems are not "SQL tables with many columns". They typically organize data by partition key and clustering columns, storing rows together by partition.

The design philosophy is often: model your tables around how queries will be executed, not around a perfectly normalized domain model.

### 11.2 Partition Key and Clustering Key

| Concept | Role |
|---|---|
| Partition key | Decides which node stores the data |
| Clustering key | Decides ordering within a partition |

Example mental model:

- partition by `user_id`
- cluster by `event_time`

This makes "recent events for user X" very efficient.

### 11.3 Why They Exist

Wide-column systems are designed for:

- large write-heavy workloads
- globally distributed or multi-node availability
- time-series, logs, events, and message-like data
- predictable query-first schemas

### 11.4 Internal Working Model

Many wide-column systems use LSM-tree style storage:

- writes go first to a commit log for durability
- data is accumulated in memory structures such as memtables
- flushed to immutable disk files such as SSTables
- background compaction merges files over time

Why this helps:

- writes are fast and mostly sequential
- good for high-ingest workloads

Tradeoff:

- reads may touch multiple files until compaction catches up
- compaction can be operationally expensive

### 11.5 Consistency Tradeoffs

These systems often let you tune consistency per operation.

You may choose stronger or weaker consistency depending on how many replicas must acknowledge a read or write.

This is powerful, but it means the application team must actually understand the tradeoff.

### 11.6 Modeling Around Queries First

Wide-column systems punish generic modeling.

You usually design tables specifically for questions such as:

- get latest 100 events for device X
- get messages for conversation Y ordered by timestamp
- get metrics for service Z over last hour

If you need arbitrary secondary queries later, the design may not support them well.

### 11.7 Best Use Cases

- event streams and activity histories
- IoT and telemetry ingestion
- time-series style data
- write-heavy systems that must survive node failure

Public-pattern example:

Netflix-like metadata and high-availability distributed services often favor Cassandra-style patterns for workloads where availability and write scalability are more important than relational joins.

## 12. Graph Databases

Graph databases model data as nodes and edges.

They exist because some domains are fundamentally about relationships, not just records.

### 12.1 Why Graph Databases Exist

Relational databases can represent relationships, but some workloads involve repeated, deep traversal through interconnected entities.

Examples:

- social graph: who follows whom
- fraud graph: account, device, card, IP, merchant relationships
- recommendation graph: users, items, interactions, similarity edges
- network topology or dependency mapping

### 12.2 Nodes and Edges

- nodes represent entities
- edges represent relationships

Edges can also have properties such as weight, type, or timestamp.

### 12.3 Traversal Efficiency

Graph systems are optimized for traversals like:

- friends of friends
- accounts connected through shared devices
- shortest path
- subgraph pattern matching

Why they can outperform SQL here:

- repeated multi-hop joins in relational systems become cumbersome and expensive
- graph engines often store adjacency information more directly for traversal workloads

### 12.4 When Graph Databases Are Better Than SQL Joins

Graph databases shine when:

- the core questions are relationship-first
- traversal depth is variable or multi-hop
- graph algorithms are central to the product

They are not automatically better for simple CRUD over business entities. A graph database is not a replacement for every transactional workload.

### 12.5 Typical Use Cases

- fraud detection pipelines
- recommendation engines
- social features
- dependency impact analysis

Production reality:

Many companies do not use a graph database as the primary source of truth. They maintain transactional data in relational or document stores and project relationship-heavy subsets into graph-oriented systems.

## 13. Scaling Databases

As traffic or data volume grows, a single database instance eventually becomes a bottleneck.

Scaling databases is hard because unlike stateless application servers, databases hold durable state, enforce consistency, and often cannot be copied or split casually.

### 13.1 Replication

Replication means keeping multiple copies of data across nodes.

#### Why Replication Exists

- improve availability
- protect against node failure
- scale reads
- support backups or region-level redundancy

#### Primary-Replica Model

In a common setup:

- primary handles writes
- replicas receive data changes from primary
- replicas may serve reads

```mermaid
flowchart LR
	APP[Application] --> P[(Primary)]
	P --> R1[(Replica 1)]
	P --> R2[(Replica 2)]
	P --> R3[(Replica 3)]
	R1 --> READS1[Read Traffic]
	R2 --> READS2[Read Traffic]
```

#### Synchronous vs Asynchronous Replication

| Mode | Behavior | Benefit | Cost |
|---|---|---|---|
| Synchronous | Primary waits for replica ack before success | Stronger consistency | Higher write latency, lower availability if replicas slow |
| Asynchronous | Primary commits before replicas catch up | Lower latency, higher write availability | Replica lag and stale reads |

#### Replication Lag

Lag means replicas are behind the primary.

Consequences:

- stale reads
- broken read-after-write expectations
- confusing user experiences such as "I just updated my profile but do not see it"

#### Failure Handling and Leader Election Basics

If the primary fails, the system may promote a replica to become the new leader.

Leader election matters because multiple primaries accepting writes independently can cause divergence.

#### Split-Brain Basics

Split-brain happens when multiple nodes believe they are primary and accept conflicting writes.

This is one of the most dangerous failure modes in distributed storage.

Strong coordination, quorum logic, and careful failover processes exist largely to prevent this.

### 13.2 Read Replicas

Read replicas are a specific use of replication for read scaling.

#### Why They Exist

Many systems are read-heavy. Product pages, user profiles, repository metadata, timelines, dashboards, and catalog views are often read far more than written.

Read replicas let the system distribute read load away from the primary.

#### Routing Reads vs Writes

Common policy:

- writes always go to primary
- stale-tolerant reads can go to replicas
- consistency-sensitive reads stay on primary or sticky session path

#### Read-After-Write Consistency Issues

If a user writes to the primary and immediately reads from a lagging replica, they may not see their update.

Production strategies:

- route same-user reads to primary for a short window after write
- use session stickiness
- check replica lag and avoid stale replicas for sensitive endpoints
- use version or timestamp based consistency logic

GitHub-like or SaaS admin dashboards often need this because users expect immediate visibility after editing settings.

### 13.3 Sharding

Sharding means splitting data horizontally across multiple databases so one node does not hold or serve everything.

#### Why Sharding Exists

Replication helps availability and read scale, but it does not solve the core problem that one primary still handles all writes and all data volume.

Sharding is how systems scale beyond one machine's storage, CPU, memory, or write throughput limits.

#### Sharding Strategy Diagram

```mermaid
flowchart TD
	REQ[Application Request] --> ROUTER[Shard Router]
	ROUTER --> S1[(Shard 1: users 1-1M)]
	ROUTER --> S2[(Shard 2: users 1M-2M)]
	ROUTER --> S3[(Shard 3: users 2M-3M)]
	ROUTER --> S4[(Shard 4: users 3M+)]
```

#### Shard Key Selection

The shard key decides where data goes.

Good shard keys:

- distribute traffic evenly
- align with common access patterns
- minimize cross-shard queries

Bad shard keys:

- create hotspots
- force frequent cross-shard fanout
- are hard to rebalance later

Example:

user-based sharding often works well when most queries are scoped to one user or tenant.

#### Hotspot Problems

Even a seemingly good shard key can fail if traffic is skewed.

Examples:

- one celebrity user receives disproportionate traffic
- one tenant is much larger than all others
- sequential keys route many recent writes to the same shard

#### Rebalancing Challenges

As shards fill unevenly, data must be moved.

This is difficult because:

- data movement is expensive
- routing metadata changes must be coordinated
- in-flight traffic must still work during migration
- hot shards often need urgent relief without causing downtime

#### Consistent Hashing Basics

Consistent hashing reduces remapping when nodes are added or removed.

It is popular in distributed caches and some storage systems because it minimizes movement compared to naive modulo-based partitioning.

#### Cross-Shard Query Problems

Queries that need data from multiple shards become expensive because the application or middleware must:

- fan out requests
- merge results
- handle partial failures
- coordinate ordering and pagination

#### Cross-Shard Transactions

These are much harder than local transactions.

Many systems avoid them through:

- careful data ownership boundaries
- asynchronous workflows
- per-tenant or per-user consistency scopes

#### Operational Complexity

Sharding solves scale problems by introducing operational problems:

- routing layer complexity
- uneven load distribution
- resharding complexity
- harder debugging and analytics
- backup and restore per shard

Public-pattern intuition:

Uber-like marketplace platforms and GitHub-scale multi-tenant systems often end up with some form of sharding because a single relational instance eventually stops being enough for global growth.

### 13.4 Partitioning

Partitioning means splitting data into logical chunks. It can exist inside one database system or across multiple nodes.

#### Vertical Partitioning

Split by columns or functionality.

Example:

- user auth data in one database
- user analytics profile data in another

Useful when different parts of the entity have different access, sensitivity, or scaling profiles.

#### Horizontal Partitioning

Split by rows.

Example:

- users 1-1M in one partition
- users 1M-2M in another

This overlaps conceptually with sharding, but partitioning may happen within a single database product while sharding usually implies distribution across multiple database instances or clusters.

#### Range Partitioning

Partition rows by ranges, such as date ranges.

Great for:

- time-based data
- archival and pruning

Risk:

- recent range can become a hotspot

#### Hash Partitioning

Use a hash of the key to spread rows more evenly.

Great for:

- balancing load

Tradeoff:

- makes range queries less natural

#### List Partitioning

Partition by specific values such as region or tenant tier.

#### Time-Based Partitioning

Very common for logs, events, analytics, and audit tables.

Why it matters:

- easier retention policies
- easier archival
- partition pruning improves query efficiency

#### Partition Pruning

Partition pruning means the database skips partitions that cannot contain relevant data.

Example:

If a query asks for logs from last 7 days, the engine should avoid scanning last year's partitions.

#### Archival Strategies

Old partitions can be:

- moved to cheaper storage
- exported to object storage or warehouse
- dropped based on retention policy

### 13.5 Partitioning vs Sharding

| Dimension | Partitioning | Sharding |
|---|---|---|
| Scope | Usually logical split within a system or table family | Usually split across multiple independent database instances or clusters |
| Goal | Manageability, pruning, performance | Capacity scale beyond one node and distribution of traffic |
| Operational cost | Lower | Higher |
| Query complexity | Often handled by one database engine | Often needs routing and cross-shard logic |

### 13.6 Replication vs Sharding

| Dimension | Replication | Sharding |
|---|---|---|
| Primary purpose | Availability and read scaling | Write scaling and data volume scaling |
| Data on each node | Mostly same dataset copies | Different subsets of data |
| Main challenge | Staleness and failover | routing and cross-shard complexity |

## 14. Backups

Backups exist because high availability is not the same as recoverability.

Replication protects against node failure. It does not protect against every kind of bad write, accidental delete, corrupt deployment, or operator mistake. Replicas happily replicate mistakes too.

### 14.1 Why Backups Matter

You need backups for:

- accidental deletion
- data corruption
- ransomware or security incidents
- region loss
- bad migrations
- operator error

### 14.2 Types of Backups

| Type | Meaning | Best for |
|---|---|---|
| Snapshot | Point-in-time copy of storage volume or dataset | Fast restore of large datasets |
| Logical backup | Export of schemas and rows, often SQL dump style | Portability and selective restore |
| Physical backup | Raw database files or engine-level backup | Faster full restores, engine-specific |

### 14.3 Point-in-Time Recovery (PITR)

PITR combines a base backup with transaction logs so you can restore to a specific moment.

This is critical when the problem is not "disk died" but "bad deployment deleted good data at 11:07 AM".

### 14.4 Restore Testing

Backups that have never been restored are assumptions, not backups.

This is one of the most important real-world lessons in storage engineering.

Common failure modes:

- backup files exist but are corrupted
- restore time is far too slow for business requirements
- missing transaction logs make PITR impossible
- permissions or secrets needed for restore are unavailable
- application code no longer works with restored data layout

### 14.5 Backup Verification and Retention

Best practices:

- verify backups automatically
- test restore procedures regularly
- define retention policy by business and compliance needs
- keep copies in separate failure domains
- document RPO and RTO

Where:

- RPO: how much data loss is acceptable
- RTO: how long recovery may take

### 14.6 Disaster Recovery Basics

Disaster recovery is broader than backups. It includes:

- alternate region strategy
- restore runbooks
- DNS or traffic failover
- infrastructure recreation
- dependency recovery order

Interview point:

If you mention backups but not restore testing, you are leaving the operational story incomplete.

## 15. Migrations

Migrations are how storage evolves without breaking production.

This is where many outages come from because schema changes are easy in development and dangerous in live systems with rolling deployments and old code versions still running.

### 15.1 Schema Migrations

Schema migrations change database structure:

- add column
- remove column
- add table
- create index
- rename field
- change data type

### 15.2 Backward-Compatible Migrations

In production, application and database must remain compatible during rollout.

Safe default rule:

- first make schema changes that old and new code can both tolerate
- then deploy code that uses the new schema
- only later remove old structures

### 15.3 Expand-and-Contract Pattern

This is a standard zero-downtime migration pattern.

1. Expand: add new schema elements without removing old ones.
2. Dual-write or backfill data if needed.
3. Switch reads to the new structure.
4. Contract: remove old columns or tables only after full cutover.

### 15.4 Zero-Downtime Migration Example

Suppose you want to rename `full_name` to `display_name`.

Unsafe approach:

- drop `full_name`
- add `display_name`
- deploy code

Safe approach:

1. Add `display_name`.
2. Deploy code that writes both fields and can read either.
3. Backfill old rows.
4. Switch reads fully to `display_name`.
5. Remove `full_name` later.

### 15.5 Data Migrations

Data migrations move or transform actual data, not just schema.

Risks:

- long-running locks
- replication lag
- huge write amplification
- application reading partially migrated state

Best practice:

- run in batches
- make the process idempotent
- measure lag and throughput
- provide rollback or pause controls

### 15.6 Rolling Deployments Impact

During rolling deploys, old and new app versions may run simultaneously.

That means:

- database changes must tolerate mixed versions
- writes must remain compatible from both versions
- feature flags often help separate schema rollout from behavior rollout

### 15.7 Rollback Strategies

Rollback is easy only before destructive steps.

Strong teams design migrations so that:

- code can be rolled back without immediate schema breakage
- destructive changes are delayed until confidence is high
- backfills can resume safely

### 15.8 Migration Failure Scenarios

| Failure | Why it happens | Mitigation |
|---|---|---|
| Table lock outage | Large DDL on hot table | Use online schema change strategy or maintenance plan |
| Replica lag spike | Backfill or index build overwhelms replication | Throttle migration, observe lag |
| Mixed-version incompatibility | New code expects schema that old DB path lacks | Expand first, contract later |
| Partial backfill | Job crashed midway | Idempotent batch processing and progress tracking |
| Rollback impossible | Old columns already dropped | Delay destructive cleanup |

Production note:

At companies like GitHub, Stripe, Amazon, and large SaaS platforms, safe migration discipline is not optional. It is core operational hygiene.

## 16. How These Systems Connect in Real Architectures

The strongest backend designs do not ask, "SQL or NoSQL?" They ask, "Which storage model belongs to which workload?"

### 16.1 Typical SaaS Architecture

```mermaid
flowchart LR
	U[User Request] --> API[API Service]
	API --> AUTH[(Relational DB: Users / Billing / Permissions)]
	API --> CACHE[(Redis-style Cache)]
	API --> SEARCH[(Search / Indexing Store)]
	API --> EVENTS[(Event Stream)]
	EVENTS --> ANALYTICS[(Warehouse / OLAP)]
	EVENTS --> READMODEL[(Denormalized Read Models)]
```

What lives where:

- relational DB for core source-of-truth entities
- cache for hot reads and sessions
- event stream for async workflows and fanout
- warehouse for large analytical queries
- search index for text retrieval
- denormalized read models for dashboards or feeds

### 16.2 Example System Patterns

#### Google-style lesson

When you need globally distributed SQL with strong consistency, a system like Spanner demonstrates that it is possible, but expensive and operationally sophisticated. The lesson is not "always use globally consistent SQL". The lesson is that strong consistency at global scale requires serious infrastructure investment.

#### Netflix-style lesson

Large-scale streaming and metadata systems often rely heavily on caches and wide-column stores because read volume and availability requirements are enormous. The lesson is that relational correctness is not the only objective; availability and latency matter too.

#### Uber-style lesson

Marketplace systems deal with geospatial lookups, event streams, denormalized views, high write rates, and many asynchronous workflows. The lesson is that one storage engine rarely fits all workloads inside the same product.

#### Amazon-style lesson

The Dynamo lineage shows how high availability, partition tolerance, and simple key-based access can drive very different storage choices than a classic relational core. The lesson is to design around failure and scale from the start when the workload demands it.

#### GitHub-style lesson

Developer platforms still rely heavily on relational storage for core entities, permissions, and consistency-sensitive workflows, but they complement it with replicas, caches, partitioning, and background jobs. The lesson is that relational databases remain central even at significant scale.

#### Stripe-style lesson

Money movement systems keep the ledger and transactional truth in strongly consistent storage, then derive downstream analytics and reporting elsewhere. The lesson is that the source of truth must match the strictest correctness requirement, not the most convenient scale story.

## 17. Common Interview Discussion Themes

### 17.1 SQL vs NoSQL

Good answer structure:

1. Start with access patterns and correctness needs.
2. Explain why relational is usually default for business workflows.
3. Explain when NoSQL wins because of data shape or scale pattern.
4. Mention that many real systems use both.

### 17.2 Replication vs Sharding

Good answer:

- replication copies the same data for availability and read scaling
- sharding splits the data to scale writes and storage
- most large systems eventually use both

### 17.3 Normalization vs Denormalization

Good answer:

- normalize source-of-truth entities for correctness
- denormalize proven hot read paths for performance
- discuss consistency maintenance and backfill cost

### 17.4 Transactions in Microservices

Good answer:

- keep strong ACID boundaries local when possible
- avoid distributed transactions for every workflow
- use sagas or asynchronous orchestration where acceptable
- discuss idempotency and compensating actions

### 17.5 What Breaks at Scale

Interviewers often want to hear these failure modes:

- hot keys and hot partitions
- replica lag and stale reads
- slow queries due to missing indexes
- lock contention and deadlocks
- painful cross-shard operations
- unsafe migrations
- backups that have never been restored

If you can talk about those fluently, you sound much more like a production engineer.

## 18. Best Practices Checklist

- Start from access patterns, not technology hype.
- Use relational databases by default for transactional business systems unless the workload clearly argues otherwise.
- Design indexes from real queries, not guesswork.
- Keep transactions short and explicit.
- Normalize core data first; denormalize intentionally, not accidentally.
- Assume replica lag exists and design around it.
- Choose shard keys carefully because changing them later is painful.
- Separate OLTP from heavy analytics.
- Treat backup restore testing as a real engineering responsibility.
- Use backward-compatible, zero-downtime migration patterns.
- Measure real bottlenecks before introducing new storage technologies.

## 19. Common Mistakes

- choosing NoSQL too early for simple SaaS CRUD workloads
- using one database for transactions, analytics, search, caching, and feeds all at once
- adding indexes everywhere without understanding write cost
- building schemas around current ORM classes instead of durable domain boundaries
- ignoring replication lag in user experience design
- trying to solve bad query design with bigger hardware forever
- denormalizing without a plan for consistency repair
- sharding before exhausting simpler options
- believing replicas are backups
- running destructive migrations without compatibility strategy

## 20. Final Mental Model

Think about storage decisions in this order:

1. What are the core entities and invariants?
2. What are the hottest reads and writes?
3. What consistency guarantees are actually required?
4. What data model matches the workload best?
5. How will indexes, replication, caching, and partitioning support that workload?
6. What fails first as traffic grows?
7. How will you back up, restore, and migrate safely?

If you can answer those questions clearly, you can usually handle both interview discussions and real production design work.

The most important practical lesson is simple: data storage is not just about where bytes live. It is about preserving correctness while delivering latency, scale, and operability under failure.