Files
Computer-Fundamentals/systems design/4.perfLayer.md
T
tarun-elango 26810e43d0 sd text
2026-04-26 13:27:19 -04:00

65 KiB

Performance Layer

The performance layer is the part of system design that turns a system from merely correct into fast, cost-efficient, and resilient under real traffic. In interviews, candidates often describe databases, queues, and load balancers, but they stop short of explaining how the system stays responsive when one hot endpoint receives millions of requests, when one viral product page suddenly becomes the top key in the fleet, or when static assets must be served globally at low latency.

That gap is the performance layer.

At a practical level, the performance layer is made of techniques and systems that sit between raw demand and expensive work. It includes in-process caches, distributed caches like Redis or Memcached, CDNs, edge caches, TTL strategy, eviction policy, and invalidation mechanisms. These components exist because real backends cannot afford to recompute, reread, or retransmit everything from the source of truth on every request.

This guide is written for two goals at once:

  • interview preparation, where you need to explain tradeoffs clearly
  • real backend engineering, where you need to build systems that remain stable at scale

The focus is not on memorizing buzzwords. The focus is understanding why these systems exist, how they work internally, what breaks at scale, and how to discuss them like an engineer who has operated them.

Examples in this guide are generalized from common industry patterns and public engineering discussions from companies such as Google, Netflix, Uber, Amazon, GitHub, Stripe, and large SaaS platforms.

1. Big Picture: What the Performance Layer Actually Does

Every backend has some expensive path:

  • reading from a database
  • computing personalized recommendations
  • rendering a large page or API payload
  • resizing images
  • loading product catalog data
  • checking permissions repeatedly
  • serving static assets to users across continents

If every request goes all the way to the expensive source, the system eventually fails one of these goals:

  • latency becomes too high
  • database load becomes too high
  • compute cost becomes too high
  • throughput becomes too low
  • spikes become unmanageable

The performance layer exists to absorb repeated work.

1.1 Performance Layer in a Real Architecture

flowchart LR
	U[User / Browser] --> B[Browser Cache]
	B --> DNS[DNS / Global Routing]
	DNS --> CDN[CDN Edge]
	CDN --> LB[Load Balancer / API Gateway]
	LB --> APP[Application Service]
	APP --> LC[Local In-Process Cache]
	APP --> RC[(Redis / Memcached)]
	APP --> DB[(Primary Database)]
	DB --> REP[(Read Replicas / Search / Derived Stores)]
	DB --> BUS[Event Bus / Invalidation Stream]
	BUS --> RC
	BUS --> CDN

This diagram captures an important idea: performance is not a single cache. It is a stack.

  • the browser may cache assets or API responses
  • the CDN may answer without touching your origin
  • the application may use local in-memory caching for extremely hot keys
  • a distributed cache may prevent repeated database reads
  • invalidation events keep caches aligned with changing data

1.2 Why Interviewers Care About This Layer

Interviewers use performance-layer questions to test whether you understand the difference between a toy design and a production design.

A weak answer sounds like this:

"I would use a database and maybe Redis for caching."

A strong answer sounds like this:

"I expect read-heavy traffic with hot keys, so I would place a distributed cache in front of the database, possibly keep a small local cache inside each service instance for ultra-hot objects, use TTL plus event-driven invalidation to balance freshness and load, and put static assets behind a CDN to reduce origin traffic globally. Then I would discuss what happens during cache misses, stampedes, stale reads, and regional failover."

That answer shows system-level thinking.

1.3 Core Mental Model

The performance layer trades one form of complexity for another.

It improves:

  • latency
  • throughput
  • cost
  • resilience to spikes

But it introduces:

  • stale data risk
  • invalidation complexity
  • memory pressure
  • partial failure modes
  • operational tuning work

If you remember one sentence from this guide, remember this:

Performance systems are valuable because they turn expensive operations into cheap lookups, but they make correctness and freshness harder.

2. Caching Fundamentals

Caching is the core concept of the performance layer.

2.1 What Caching Is

A cache stores the result of an expensive operation so that future requests can avoid doing the expensive work again.

That expensive work might be:

  • a database query
  • an API response assembly step
  • a rendered HTML fragment
  • a permission lookup
  • a session lookup
  • a computed leaderboard
  • a static file fetch from origin

The basic idea is simple, but the system effect is huge. If a result is requested many times, caching changes the system from repeatedly paying the full cost to paying it once and reusing it.

2.2 Why Caching Exists

Goal What caching improves Real production effect
Reduce latency Serves data from memory or from a nearby edge Requests that take tens of milliseconds from Redis may have taken hundreds from a database or remote service
Reduce database load Prevents repeated reads for the same objects Fewer DB connections, less lock contention, lower CPU, fewer read replicas
Reduce compute cost Avoids re-running expensive business logic or rendering Lower application CPU and lower cloud spend
Improve scalability Lets the same backend support more traffic A service that would saturate at 20k RPS may survive 10x more if most reads hit cache
Smooth spikes Absorbs bursts on hot keys Viral traffic becomes manageable instead of overwhelming the source of truth

2.3 Hot Paths and Hot Data

Most traffic is not evenly distributed.

In real systems, a small fraction of endpoints, users, products, or assets often receives a large fraction of total traffic. This is why caching is so effective.

Examples:

  • an e-commerce homepage and a few trending product pages get disproportionate reads
  • a GitHub repository landing page gets far more traffic after a release or public announcement
  • a Stripe dashboard loads the same account metadata repeatedly during one user session
  • a ride-sharing system repeatedly reads nearby-driver state for a busy downtown area

This concentration of demand creates hot paths and hot data.

  • hot path: an endpoint or execution path hit extremely often
  • hot data: specific keys or objects requested repeatedly

Performance engineering often starts by identifying those hot spots and caching them before scaling the entire system blindly.

2.4 Cache Hit vs Cache Miss

Term Meaning Why it matters
Cache hit The requested item is found in cache Fast path, cheap path
Cache miss The item is not in cache Slow path, must fetch from source
Hit ratio Fraction of requests served from cache One of the main health metrics for a cache-backed system
Miss penalty Extra latency and source load caused by a miss Critical when the backend is expensive or fragile

You should think of a cache miss as more than a slower request. A miss is also a source-of-truth request. At scale, misses multiply into pressure on databases, services, and storage layers.

If 95 percent of requests hit cache, the remaining 5 percent still define whether the system stays alive during a cache flush or deployment.

2.5 Cache Lifecycle

flowchart LR
	REQ[Request Arrives] --> LOOKUP{Item In Cache?}
	LOOKUP -- Yes --> HIT[Return Cached Value]
	LOOKUP -- No --> LOAD[Load From Source]
	LOAD --> STORE[Store In Cache]
	STORE --> RESP[Return Response]
	HIT --> TTL[Wait Until TTL / Invalidation / Eviction]
	RESP --> TTL
	TTL --> EXPIRE[Item Expires or Is Removed]
	EXPIRE --> LOOKUP

This looks circular because it is. A cache is not a one-time optimization. It is a continuous lifecycle of population, use, staleness, and removal.

2.6 Cache Warming and Cold Starts

Cache warming means pre-populating a cache before traffic arrives or before a new deployment begins serving traffic.

Cold start means the cache is empty or mostly empty.

Why cold starts are dangerous:

  • traffic suddenly falls through to the database
  • p99 latency jumps sharply
  • a previously healthy backend gets overloaded
  • autoscaling can make it worse because new instances start with empty local caches

Typical warming strategies:

  • prefill the top N hot keys after deployment
  • replay recent hot key logs into the cache
  • keep a warm standby cache fleet during migration
  • gradually shift traffic to fresh instances

Production example:

An e-commerce system may warm the homepage, popular category pages, top products, and pricing metadata before turning on a new region. A SaaS dashboard may warm account summaries and permission maps for large enterprise tenants.

2.7 Multi-Layer Caching

Most large systems use more than one cache layer.

Layer Where it lives Strengths Weaknesses
Browser cache On the client Zero origin cost, extremely low latency Harder to invalidate precisely, less control
CDN cache At edge POPs Global latency reduction, origin offload Works best for cacheable and moderately shared content
Local cache Inside service process Fastest server-side lookup, no network hop Small, per-instance inconsistency, lost on restart
Distributed cache Remote shared cache like Redis or Memcached Shared across instances, larger capacity Network hop, operational overhead

Multi-layer caching exists because the cheapest cache is the one closest to the requester. The closer you can answer, the less network, compute, and backend work you do.

2.8 Local Cache vs Distributed Cache

Dimension Local Cache Distributed Cache
Latency Extremely low Low but includes network hop
Scope Single process or instance Shared across many instances
Consistency Weak across instances Better shared visibility
Capacity Limited by process memory Larger dedicated memory pool
Failure behavior Lost on process restart Survives app restarts but is its own dependency
Good for Ultra-hot config, permission snapshots, small metadata Shared sessions, popular entities, counters, rate limits

Strong production designs often combine them. For example:

  • local cache for 100 hottest objects per instance
  • Redis for shared hot data across the fleet
  • database for source of truth

2.9 How Caching Changes Overall Architecture

Caching does not just make a request faster. It changes the control flow of the system.

Without caching:

  • every read goes to the source of truth
  • latency scales with database or service performance
  • spikes directly impact the backend

With caching:

  • reads split into hit path and miss path
  • miss handling becomes a core part of correctness
  • invalidation and freshness become first-class architecture concerns
  • partial outages can be hidden or amplified depending on cache behavior

That is why caching is never "just add Redis." It is an architectural decision that affects reads, writes, deployments, incident response, and observability.

2.10 Consistency Challenges and Stale Data Tradeoffs

Caches are usually not the source of truth. That means there is always a risk that the cache contains old data.

The key design question is not "Can stale data happen?" The answer is almost always yes.

The real questions are:

  • how stale can data be before users notice or correctness breaks
  • how quickly do updates need to propagate
  • what should happen if invalidation fails
  • which data can tolerate eventual consistency and which cannot

Examples:

  • stale profile photos are usually fine
  • stale inventory counts can cause overselling
  • stale permission or fraud state can become a security problem
  • stale pricing can create financial or legal issues

Strong engineers discuss stale data in business terms, not just technical terms.

2.11 Failure Patterns Every Interviewer Expects

Failure pattern What it means What it looks like in production Common mitigations
Cache stampede Many requests miss the same key and all rebuild it DB traffic spike after key expiry Request coalescing, single-flight, locks, soft TTL, background refresh
Cache penetration Requests repeatedly ask for keys that do not exist Attack or bug causes endless DB misses Negative caching, bloom filters, request validation
Cache avalanche Many keys expire simultaneously Sudden backend overload when a batch of TTLs ends together TTL jitter, staggered expiration, warmup, traffic shaping
Hot key overload One key becomes extremely popular One Redis shard or service instance gets overloaded Key replication, local cache, consistent hashing, hot key splitting
Stale data leak Invalidation fails or is delayed Users see old values after updates Version checks, event-driven invalidation, bounded TTLs, read repair

These are not interview-only concepts. They show up in real incidents.

2.12 Best Practices and Common Mistakes

Best practices:

  • cache only where the access pattern justifies it
  • measure hit ratio, miss latency, key size, and source-of-truth fallback rate
  • define ownership of cache invalidation explicitly
  • use TTL jitter to avoid synchronized expiration storms
  • keep serialized values compact and stable
  • protect the miss path with rate limits, request coalescing, or backpressure

Common mistakes:

  • caching everything without understanding access patterns
  • caching data that must be strongly consistent without a freshness plan
  • letting huge values or huge key cardinality blow up memory
  • using one global TTL for all data types
  • ignoring cache outage behavior
  • forgetting that a cache flush can look like a DDoS against your database

3. Redis

Redis is one of the most widely used performance-layer systems in backend engineering.

3.1 What Redis Is

Redis is an in-memory data structure store commonly used as a distributed cache, fast key-value store, rate limiter, session store, coordination primitive, leaderboard engine, and lightweight stream or queue component.

It became popular because it combines several valuable traits:

  • very low latency
  • simple operational model for many use cases
  • multiple built-in data structures
  • high developer productivity
  • strong ecosystem support

In practice, Redis often becomes the first non-database data system teams add when they need more performance.

3.2 Why Redis Is Widely Used

Redis is useful because it covers many common backend problems with one fast system.

Examples:

  • caching user profiles or product metadata
  • storing web sessions
  • implementing rate limiting per user, token, or IP
  • maintaining rolling counters and quotas
  • computing leaderboards with sorted sets
  • distributing lightweight invalidation or coordination events
  • holding ephemeral state for jobs, retries, and workflows

A big reason engineers like Redis is that it is often fast enough to move a system from struggling to healthy without a major architectural rewrite.

3.3 In-Memory Architecture

Redis keeps its primary working dataset in memory. That is the key reason it is fast.

Compared to disk-backed databases:

  • memory access is much faster than disk I/O
  • data structures can be updated quickly with minimal indirection
  • many operations are constant time or logarithmic time

But in-memory design also brings constraints:

  • memory is expensive compared to disk
  • total dataset size is limited by RAM
  • persistence must be carefully designed if data matters
  • large keys and fragmentation can become operational issues

Redis works best when the working set fits comfortably in memory and when the system is comfortable with Redis being a fast data layer rather than the ultimate durable source of truth.

3.4 Single-Threaded Event Loop Concept

Historically, Redis is famous for a mostly single-threaded command execution model.

That sounds like a weakness at first, but the intuition matters.

Why single-threaded command execution helped Redis:

  • avoids lock contention inside the core data path
  • simplifies internal state management
  • keeps operations predictable
  • reduces complexity in common use cases

The model is roughly:

  1. accept network requests
  2. parse commands
  3. execute commands against in-memory data structures
  4. write responses back to clients

Because the operations happen in memory and avoid heavy internal locking, the system can be extremely fast.

Important nuance for interviews:

Modern Redis versions use threads for some I/O and background work, but command execution for a given shard is still primarily single-threaded in spirit. The point is not the slogan "single-threaded" by itself. The point is why that model worked: low coordination overhead on an in-memory data path.

3.5 Persistence Basics

Redis is often used as a cache, but it also supports persistence.

The two core persistence ideas are:

Mechanism What it does Strengths Weaknesses
RDB snapshot Periodically writes a point-in-time snapshot to disk Compact, good for backups and restart speed Can lose recent writes between snapshots
AOF append-only file Logs write operations as they happen Better durability, more recent recovery Larger files, rewrite complexity, more I/O

Many deployments combine both.

Interview framing:

If Redis is used only as a cache, persistence may be optional. If it stores critical ephemeral state like sessions, counters, or queues that you care about recovering, persistence and replication matter much more.

3.6 Redis Data Structures and Why They Matter

Redis is not just a string map. Its data structures are a large part of why it is useful.

Data structure Intuition Common operations Production use cases
Strings Simplest key to value mapping GET, SET, INCR General caching, counters, feature flags, serialized objects
Hashes Small field map under one key HGET, HSET User/session metadata, grouped object fields
Lists Ordered sequence with push/pop LPUSH, RPUSH, LPOP Simple queues, activity buffers, recent events
Sets Unordered unique members SADD, SISMEMBER Membership checks, tags, deduplication
Sorted sets Unique members with score ordering ZADD, ZRANGE Leaderboards, ranking, delayed tasks, time windows
Bitmaps Bit-level state compactly stored SETBIT, BITCOUNT Presence flags, lightweight analytics, feature rollout markers
Streams Append-only log with consumer groups XADD, XREADGROUP Event pipelines, work distribution, ordered consumption

Strings

Strings are the default choice for caching serialized values such as JSON or protobuf blobs.

Why teams use them:

  • easiest operationally
  • flexible schema at the application layer
  • supports TTL directly
  • works well for counters with atomic increment operations

Hashes

Hashes are useful when you want multiple fields grouped under one logical key. They can reduce duplication and sometimes improve ergonomics for partial field access.

Example:

  • session:123 with fields like user_id, expires_at, role

Lists

Lists are good for queue-like behavior, but teams should be careful not to overuse Redis lists as a full durable queue platform when stronger guarantees are required.

Sets

Sets give fast membership testing.

Example:

  • which users have access to a beta feature
  • which object IDs were already processed

Sorted Sets

Sorted sets are one of Redis's most powerful primitives.

They maintain a set of members ordered by score. This makes them ideal for:

  • leaderboards
  • ranking systems
  • top-N queries
  • sliding windows for rate limiting
  • scheduling delayed tasks by timestamp

Bitmaps

Bitmaps are memory-efficient when you need large boolean state spaces.

Example:

  • whether a user ID belongs to a cohort
  • whether an event occurred on a given day

Streams

Streams provide an append-only structure with consumer groups and replay semantics. They are useful when you want lightweight log-like behavior in Redis.

They are helpful, but you should not automatically replace Kafka or other durable log systems with Redis Streams for large-scale, long-retention event pipelines.

3.7 Pub/Sub Basics

Redis Pub/Sub allows publishers to send messages to channels and subscribers to receive them.

This is useful for lightweight fan-out such as:

  • cache invalidation notifications
  • internal live updates
  • ephemeral coordination signals

But Pub/Sub is not durable messaging. If a subscriber is down, messages can be missed. That makes it suitable for best-effort signaling, not for business-critical guaranteed delivery.

3.8 Distributed Locks Basics

Redis is often used for lightweight distributed locking.

Typical use case:

  • ensure only one worker rebuilds a hot cache key
  • avoid duplicate job execution
  • coordinate a small critical section across nodes

The basic pattern is setting a key with a TTL only if it does not already exist.

Important caution:

Distributed locking is easy to misuse. If the lock expires too early, if clients pause, or if ownership is not verified on release, correctness bugs appear. For critical correctness, database transactions or purpose-built coordination systems may be safer.

3.9 Common Redis Use Cases

Use case Why Redis fits
Rate limiting Atomic increments and expirations are simple and fast
Session storage Low latency key lookup with TTL
Leaderboards Sorted sets make ranking natural
Caching Memory speed plus TTL support
Lightweight queues Lists or streams for simple work pipelines
Token or OTP storage Fast expiry-based ephemeral data
Idempotency keys Short-lived state for duplicate request protection

Rate Limiting

Redis is common for rate limiting because counters and expirations are easy to implement atomically.

Examples:

  • 100 requests per minute per API key
  • 5 login attempts per 10 minutes per account
  • per-IP abuse protection at API gateway or edge layer

Session Storage

Redis is widely used for session storage because sessions are read frequently, written occasionally, and usually have natural expiration.

Typical SaaS pattern:

  • app server reads session by token
  • Redis returns session data quickly
  • TTL naturally removes expired sessions

Leaderboards

Games, social apps, and competition systems often use sorted sets for leaderboards because rank queries and top-N retrieval are natural operations.

Queues Basics

Redis can be used for simple queueing, retries, and scheduled jobs.

It is often a good fit when:

  • throughput is moderate
  • retention is short
  • operational simplicity matters

It is a weaker fit when:

  • you need long retention
  • you need strong replay guarantees
  • event history matters deeply
  • consumer scaling and durability are primary concerns

3.10 Replication Basics

Redis commonly uses primary-replica replication.

Why replicas exist:

  • improve read scale
  • improve availability
  • reduce data loss risk during failure

Tradeoff:

replication is typically asynchronous, so replicas may lag. That means stale reads are possible.

For many cache use cases, that is acceptable. For correctness-sensitive use cases, that must be discussed explicitly.

3.11 Sentinel Basics

Redis Sentinel monitors Redis instances and helps automate failover for primary-replica setups.

What Sentinel does:

  • health checks
  • failure detection
  • leader election among Sentinel nodes
  • promoting a replica to primary
  • updating clients or discovery mechanisms

Sentinel matters when you want high availability without full Redis Cluster complexity, especially for simpler primary-replica deployments.

3.12 Redis Cluster Basics

Redis Cluster provides sharding across multiple nodes.

Why it exists:

  • a single Redis node has memory and throughput limits
  • large workloads need horizontal scale

Cluster distributes keys across hash slots. That spreads memory and traffic across nodes.

Tradeoffs:

  • operations spanning multiple keys become more constrained
  • some application logic must be shard-aware
  • hot keys can still overload one shard
  • operational complexity increases

Cluster helps with capacity and throughput, but it does not magically eliminate key-distribution problems.

3.13 Memory Management Considerations

Redis performance problems often become memory problems.

Things engineers must watch:

  • maxmemory limits
  • eviction behavior under pressure
  • large keys or huge collections
  • fragmentation overhead
  • persistence overhead during fork or rewrite
  • replication buffers
  • serialization bloat

Bad Redis incidents often come from not respecting memory reality.

Examples:

  • storing enormous JSON blobs under one key
  • letting key cardinality grow without bounds
  • forgetting that snapshots or AOF rewrites need extra memory headroom
  • assuming TTL means memory disappears immediately and uniformly

3.14 When Redis Should Not Be Used

Do not use Redis when:

  • the dataset does not fit comfortably in memory
  • you need the primary source of truth for large durable data
  • you need complex relational queries or joins
  • you need very strong durability guarantees with minimal write loss tolerance
  • you need long-lived event storage and replay at log-system scale
  • you cannot tolerate cache/data loss but are treating Redis like a cheap database

Redis is excellent, but many outages come from stretching it past its natural use case.

3.15 Real-World Patterns

Generalized production patterns you will see repeatedly:

  • Amazon-like e-commerce systems cache product and pricing metadata, but keep order and payment state in durable databases
  • GitHub-like systems use caching for repository page composition and rate limiting, but not as the source of truth for repository metadata
  • Stripe-like systems may use Redis for short-lived idempotency, fraud throttles, or session-like state, while preserving financial correctness in durable transactional stores
  • Uber-like systems use fast data systems for hot operational state and rate control, while durable systems preserve business records and historical data

4. Memcached

Memcached is another classic distributed caching system.

4.1 What Memcached Is

Memcached is a high-performance, memory-only, distributed cache built around a simpler model than Redis.

It is focused primarily on one job: caching values in memory and serving them fast.

That focus is why many companies historically used it heavily for large-scale read caching.

4.2 How Memcached Differs in Spirit from Redis

Redis evolved into a multi-purpose in-memory data system.

Memcached stayed closer to a simple cache appliance.

That means:

  • fewer built-in data structures
  • less feature breadth
  • simpler mental model
  • often lower overhead for straightforward cache workloads

4.3 Simple Distributed Caching Model

A classic Memcached deployment is made of many independent cache nodes. The client typically decides which node holds a given key using hashing.

This model is simple:

  • key arrives at the application
  • application hashes the key
  • application sends request to the selected Memcached node
  • node stores or returns the value

There is usually less server-side coordination than in more feature-rich clustered systems.

4.4 Memory-Only Behavior

Memcached is memory-only. It is not designed as a durable store.

This is important conceptually:

  • it is a pure performance layer
  • if it restarts, cached data is gone
  • that is acceptable because the source of truth should be elsewhere

This simplicity is powerful when your cache is truly disposable.

4.5 Slab Allocation Basics

One important internal concept in Memcached is slab allocation.

The cache groups memory into classes of fixed-size chunks so that similarly sized objects can be stored efficiently.

Why this exists:

  • general-purpose memory allocation can fragment under heavy cache churn
  • cache workloads often involve huge numbers of similarly sized objects
  • fixed allocation classes improve speed and predictability

Tradeoff:

  • if object sizes do not fit slab classes well, memory can be wasted through internal fragmentation

This is a good example of a design optimized specifically for caching rather than for general-purpose data structures.

4.6 Cache-Focused Design

Memcached is intentionally narrow.

Its strength is that it does not try to be a queue, stream platform, lock manager, or ranked index. It tries to be a very fast shared cache.

This makes it attractive when the problem really is just:

  • store hot objects in memory
  • retrieve them quickly
  • let the app refill them on misses

4.7 Common Production Use Cases

Memcached is commonly used for:

  • page fragment caching
  • session-like ephemeral web data
  • query result caching
  • product or profile object caching
  • large-scale read-heavy web workloads where durability is irrelevant

Historically, many large web companies used Memcached aggressively in front of databases for exactly this reason.

4.8 Scaling Characteristics

Memcached scales horizontally in a straightforward way because nodes are relatively independent.

Strengths:

  • easy to add more cache capacity
  • predictable use for simple key-value caching
  • low complexity for read-heavy workloads

Weaknesses:

  • fewer built-in coordination features
  • no rich server-side data structures
  • less helpful when the application wants more than plain caching

4.9 Limitations Compared to Redis

Compared to Redis, Memcached generally has:

  • less feature breadth
  • less support for rich data structures
  • no native persistence model for recovering data
  • fewer coordination-oriented use cases

But that narrower design can be a feature, not a bug, when simplicity is what you want.

4.10 Redis vs Memcached

Dimension Redis Memcached
Primary identity General-purpose in-memory data store Pure distributed cache
Data structures Rich: strings, hashes, lists, sets, sorted sets, streams, more Mostly simple key-value
Persistence Optional RDB/AOF Memory-only
Coordination features Pub/Sub, scripts, counters, locks, streams Minimal
Operational simplicity for pure cache Good, but broader feature set Often very simple
Memory efficiency for basic cache workloads Good, workload dependent Historically attractive for pure cache cases
Best fit Cache plus broader backend primitives Straight shared cache at scale

4.11 When Companies Choose One Over the Other

Choose Redis when:

  • you want one fast system for caching plus rate limits, counters, sessions, or leaderboards
  • you need richer data types
  • you want optional persistence or replication features

Choose Memcached when:

  • the problem is pure disposable caching
  • the workload is straightforward key-value object caching
  • simplicity and cache-specific behavior matter more than feature breadth

In interviews, do not answer this as a popularity contest. Answer it as a workload decision.

5. Cache Access Patterns

The cache technology is only half the story. The access pattern determines behavior, consistency, and failure modes.

5.1 Cache-Aside Pattern

Cache-aside is the most common caching pattern in production systems.

The idea is simple:

  1. application reads from cache first
  2. if the key exists, return it
  3. if it does not exist, read from database or source of truth
  4. store the result in cache
  5. return it to the caller

This is also called lazy loading because the cache is filled on demand.

Why Cache-Aside Exists

It is popular because it is simple, flexible, and keeps the source of truth unchanged. The application decides when and what to cache.

Cache-Aside Read Flow

sequenceDiagram
	participant C as Client
	participant A as Application
	participant Cache as Cache
	participant DB as Database

	C->>A: Read object
	A->>Cache: GET key
	alt Cache hit
		Cache-->>A: cached value
		A-->>C: response
	else Cache miss
		A->>DB: query object
		DB-->>A: row / record
		A->>Cache: SET key with TTL
		A-->>C: response
	end

Advantages

  • simple mental model
  • cache stores only demanded data
  • no cache write cost for cold data
  • application controls key format and TTL per object type

Disadvantages

  • first read after expiry is slow
  • cache misses can overload the database
  • stale data appears if invalidation is weak
  • multiple readers may rebuild the same key at once

Stale Data Risks

On writes, the source of truth changes first. If the cache is not invalidated immediately, future reads may still see the old cached value.

This is why cache-aside usually needs one of these:

  • delete cached key on write
  • update cached value on write
  • short TTL as a backstop
  • version checks in the application

Failure Cases

  • cache node unavailable: all reads fall through to DB
  • DB slow: miss path becomes dangerous
  • key expires under burst traffic: stampede
  • invalidation event lost: stale data survives until TTL

Common Production Usage

Cache-aside is common for:

  • product pages
  • user profiles
  • configuration data
  • permission maps that tolerate bounded staleness
  • API aggregation results

5.2 Write-Through

Write-through means writes go to the cache and to the backing store as part of the write path.

The intent is to keep cache and source of truth aligned immediately.

Write Path Flow

  1. client sends write
  2. application validates input
  3. application writes new value to database and cache in the same logical operation
  4. future reads hit a fresh cache entry

Why It Exists

Write-through exists because read-after-write consistency from cache is often better than with purely lazy cache-aside. Immediately after a successful write, the cache already has the fresh value.

Benefits

  • fresher cache after writes
  • simpler read path after updates
  • fewer stale reads right after mutation

Tradeoffs

  • every write pays cache cost even if the data is never read again
  • write latency increases because more systems are involved
  • failure handling becomes trickier if DB write succeeds but cache write fails, or vice versa

Failure Handling Questions

You must define:

  • which write is authoritative if one succeeds and one fails
  • whether the request should fail or retry
  • whether reconciliation jobs exist

Production Suitability

Write-through is useful when:

  • reads soon after writes are common
  • keeping cache hot is valuable
  • write volume is manageable

It is a weaker fit when:

  • write traffic is very high
  • many written objects are never read again
  • write latency is extremely sensitive

5.3 Write-Back / Write-Behind

Write-back means the application writes to the cache first and persists to the database asynchronously later.

This is the most aggressive performance-oriented pattern.

Why It Exists

It exists to absorb high write throughput and smooth backend load. The immediate write path becomes very fast because the durable store is no longer on the critical path.

Throughput Advantages

  • low-latency writes
  • batched or buffered persistence
  • can smooth write bursts before they hit the database

Durability Risks

This pattern is dangerous because data may exist only in the cache or buffer for some time.

If the cache crashes, if the async worker fails, or if the queue is lost, writes can disappear.

Data Loss Scenarios

  • cache node fails before flush
  • async worker backlog grows without bound
  • persistence queue is dropped during incident
  • ordering bugs cause older writes to overwrite newer writes

Queueing Considerations

Write-back systems are really queueing systems too. You need:

  • durable buffering strategy
  • retry behavior
  • ordering guarantees
  • backpressure when database falls behind
  • replay and reconciliation tools

Operational Complexity

Write-back is harder to operate because the write acknowledgement and true persistence are decoupled.

This can be acceptable for:

  • analytics counters
  • non-critical engagement metrics
  • temporary derived state

It is usually not acceptable for:

  • payments
  • orders
  • inventory reservation
  • anything audit-sensitive

5.4 Pattern Comparison

Pattern Read behavior Write behavior Main strength Main risk
Cache-aside Reads cache first, loads on miss Source updated separately, cache invalidated or refreshed Simple and common Stale reads and miss storms
Write-through Reads often hit fresh cache Write updates cache and DB together Better freshness after writes Higher write latency and dual-write complexity
Write-back Reads hit hot cache Write acknowledged before durable persistence completes High write throughput Data loss and operational complexity

5.5 Write-Through vs Cache-Aside

Question Write-Through Cache-Aside
Is cache populated on write? Yes Usually no
First read after write Often fast May miss if cache was invalidated
Write cost Higher Lower
Common fit Read-after-write sensitive data General-purpose read-heavy systems

5.6 Write-Back vs Write-Through

Question Write-Back Write-Through
Persistence timing Asynchronous Synchronous or near-synchronous
Durability Weaker Stronger
Throughput Higher Lower
Operational complexity Higher Lower
Safe for critical data Rarely More often

6. TTL (Time To Live)

TTL is one of the most important cache controls.

6.1 Why TTL Exists

TTL gives cached data an expiration time.

It exists because:

  • cache entries should not live forever
  • data changes over time
  • invalidation is never perfect
  • memory must eventually be reclaimed

TTL is both a freshness policy and a safety valve.

6.2 Freshness vs Performance Tradeoff

Short TTL:

  • fresher data
  • more misses
  • more backend load

Long TTL:

  • better hit ratio
  • lower backend load
  • greater stale data risk

Choosing TTL is not a mathematical purity exercise. It is a business decision informed by traffic shape and correctness requirements.

6.3 Choosing TTL Values

Data type Typical TTL thinking Why
Static assets with versioned URLs Very long, often effectively immutable Content changes only when filename changes
Product catalog metadata Minutes to hours, often event-invalidated too Read heavy, moderate freshness needs
User profile display info Minutes Slight staleness often acceptable
Inventory or seat availability Very short or event-driven Stale data can cause user-visible errors
Auth or permission data Short or version-checked Security sensitivity
Rate limiting counters Natural expiration aligned to window TTL defines the policy itself

6.4 Short TTL vs Long TTL

Short TTLs are attractive because they reduce staleness, but they often create hidden instability.

If a key is hit constantly and expires every few seconds, the system repeatedly repays the miss penalty. That can waste backend capacity.

Long TTLs improve performance, but only if you also have a reliable invalidation strategy or a clear tolerance for stale data.

6.5 Dynamic TTL Strategies

Good production systems often use different TTLs for different data classes.

Examples:

  • long TTL for immutable product images
  • medium TTL for product descriptions
  • short TTL for stock level or surge pricing data
  • longer TTL for cold data, shorter TTL for volatile entities

Some systems also vary TTL by popularity. Very hot keys may justify proactive refresh or longer cache retention because the savings are large.

6.6 Soft TTL vs Hard TTL

Hard TTL means the entry is considered expired and must be reloaded before serving.

Soft TTL means the entry is considered old enough to refresh, but the system may still serve it briefly while a background refresh happens.

Soft TTL is a practical way to avoid user-facing latency spikes and stampedes. It supports patterns like stale-while-revalidate.

6.7 Expiration Storms and Jitter Strategies

If many keys are created at the same time with the same TTL, they may all expire together.

That causes an expiration storm or avalanche.

The standard mitigation is jitter: add randomness to expiration times.

Example:

  • instead of every key expiring at exactly 600 seconds
  • expire keys at 600 seconds plus or minus a bounded random offset

This spreads rebuild work over time.

6.8 Practical TTL Decisions in Production

Strong production TTL policy usually includes:

  • base TTL chosen per data class
  • event-driven invalidation for important writes
  • jitter to avoid synchronized expiry
  • soft TTL for hot or expensive-to-build keys
  • observability on miss storms and stale-read complaints

Practical rule:

If you cannot explain why a TTL is what it is, the TTL is probably wrong.

7. Eviction Policies

TTL decides when entries should expire logically. Eviction decides what happens when memory pressure forces the cache to throw something away.

7.1 Why Eviction Policies Matter

When memory fills up, the cache must choose which entries survive.

That choice directly affects hit ratio and therefore system performance.

Wrong eviction policy can destroy performance by retaining low-value data and evicting exactly the hot data that saves the backend.

7.2 Common Policies

Policy Intuition Works well when Fails when
LRU Evict least recently used items Recent access predicts future access Workload has scanning patterns that pollute recency
LFU Evict least frequently used items Repeated popularity matters Frequency history adapts too slowly to sudden changes if tuned poorly
FIFO Evict oldest inserted items Simplicity matters more than precision Age is not a good signal of future value
Random Evict arbitrary items Cheap and simple, decent in some broad workloads Can evict very hot keys unpredictably
TTL-based Prefer items nearing expiration Expiry is meaningful and freshness-driven Hot but old keys may be evicted too early

7.3 Redis Eviction Modes

Redis exposes several eviction modes.

Mode Meaning
noeviction Reject writes when memory limit is reached
allkeys-lru Evict least recently used keys from all keys
volatile-lru Evict least recently used keys only among keys with TTL
allkeys-lfu Evict least frequently used keys from all keys
volatile-lfu Evict least frequently used keys only among keys with TTL
allkeys-random Evict random keys from all keys
volatile-random Evict random keys among TTL keys
volatile-ttl Evict keys with nearest expiration among TTL keys

The right mode depends on workload and whether all keys are disposable.

7.4 Workload-Based Policy Selection

Use LRU when:

  • recent access strongly predicts future access
  • the working set shifts over time

Use LFU when:

  • long-term popularity matters
  • some keys remain hot over long periods

Use TTL-sensitive strategies when:

  • expiring data is naturally less valuable
  • freshness policy is integral to value

Avoid random or FIFO unless you have a reason. Simpler is not always safer.

7.5 How Wrong Eviction Destroys Performance

Example:

  • assume a SaaS dashboard has 5 percent extremely hot keys and 95 percent rarely used keys
  • if eviction repeatedly removes hot keys, hit ratio falls sharply
  • application traffic shifts back to the database
  • database CPU rises, tail latency rises, and autoscaling may not help because the problem is miss amplification

Engineers often blame the database first, but the real issue is sometimes that the cache is keeping the wrong objects.

7.6 Best Practices

  • size memory with headroom rather than relying on constant eviction
  • monitor eviction rate alongside hit ratio
  • identify hot keys and oversized keys
  • match policy to workload instead of using defaults blindly
  • test cache behavior during memory pressure, not just normal load

8. CDN (Content Delivery Network)

Caching is not only a backend service concern. At internet scale, the performance layer extends to the edge.

8.1 What a CDN Is

A CDN is a globally distributed network of edge servers that caches and delivers content closer to users.

Instead of every user request hitting your origin servers directly, a nearby edge location can serve cacheable content.

8.2 Why CDNs Exist

Goal CDN benefit Real effect
Reduce latency Content served closer to user Faster page loads and API edge responses
Reduce bandwidth from origin Repeated asset delivery stays at edge Lower origin network cost
Offload backend Fewer requests reach origin Origin survives higher traffic
Improve resilience Edge absorbs surges and some attacks Better stability during spikes
Provide global delivery POPs around the world Better user experience across regions

8.3 CDN Architecture

flowchart LR
	U[User Browser] --> EDGE[Nearest CDN Edge POP]
	EDGE -->|Cache Hit| RESP[Response Returned]
	EDGE -->|Cache Miss| SHIELD[Origin Shield / Regional Cache]
	SHIELD --> ORIGIN[Origin App / Object Store]
	ORIGIN --> SHIELD
	SHIELD --> EDGE
	EDGE --> RESP

Important concepts:

  • edge server or POP: geographically distributed cache location
  • origin server: your source system where content is generated or stored
  • origin shield: an extra cache layer between edge POPs and origin to reduce duplicate origin fetches

CDN vs Reverse Proxy

These terms are related, but they are not the same thing.

Dimension CDN Reverse Proxy
Typical placement Globally distributed edge network Usually sits in front of origin inside one region or network boundary
Main goal Global latency reduction and origin offload Traffic routing, load balancing, TLS termination, caching, security controls
Geographic reach Many POPs across the world Usually one site or a few controlled deployment points
Best use case Shared content close to users worldwide Centralized front door for backend services
Examples in practice CloudFront, Fastly, Cloudflare edge delivery NGINX, Envoy, HAProxy at origin or regional edge

In real systems, they often work together rather than compete. A CDN may sit in front of a reverse proxy, and the reverse proxy then routes to application services. The CDN handles global edge delivery and shared caching; the reverse proxy handles origin-side traffic management and policy enforcement.

DDoS Mitigation Basics

CDNs help with basic DDoS resilience because they distribute traffic across a large edge footprint, absorb repeated requests close to the network boundary, and keep a meaningful fraction of malicious or accidental traffic away from the origin. That does not eliminate the need for rate limiting, WAF rules, or origin protection, but it reduces how directly every spike hits your backend.

8.4 Edge Caching

Edge caching means storing content at CDN nodes so users can be served without going back to origin.

This is especially effective for:

  • static assets
  • images
  • videos
  • public API responses that can be cached safely
  • partially personalized pages with shared fragments

8.5 Browser Cache vs CDN Cache

Dimension Browser Cache CDN Cache
Location End user device Provider edge POP
Main benefit Zero network or reduced network for repeat user visits Shared origin offload across many users
Control Limited by browser behavior and headers Controlled via CDN policies and headers
Best for User-specific repeat access to assets Shared assets and shared responses

8.6 Cache Headers Basics

HTTP caching works because servers tell intermediaries and browsers how to cache.

Header / concept What it does Why it matters
Cache-Control Defines caching directives like max age or public/private Primary cache behavior control
s-maxage Shared-cache max age Lets CDN cache differently from browser
ETag Validator representing response version Enables revalidation without full body transfer
Last-Modified Timestamp validator Simpler revalidation mechanism
stale-while-revalidate Allows stale content briefly while refresh happens Better user latency and fewer stalls
Vary Signals which request headers affect cache key Critical for safe caching of content variations

8.7 Revalidation Flow

sequenceDiagram
	participant U as User Browser
	participant E as CDN Edge
	participant O as Origin

	U->>E: GET /app.js with validator
	E->>O: Revalidate with ETag / If-None-Match
	alt Not changed
		O-->>E: 304 Not Modified
		E-->>U: Cached body reused
	else Changed
		O-->>E: 200 New content
		E-->>U: New content cached and returned
	end

Revalidation avoids retransmitting full content when the content has not changed.

8.8 Personalized Content Challenges

CDNs are easy for public static assets. They are harder for personalized content.

Problems:

  • one user's data must not leak to another user
  • too many personalization dimensions can destroy cacheability
  • authentication headers or cookies may fragment cache keys badly

Common strategies:

  • cache only the shared shell, fetch personalized data separately
  • use edge logic to vary on a small safe set of dimensions
  • cache by versioned fragments instead of full pages
  • mark highly personalized responses as private or uncacheable at the shared edge

8.9 Dynamic Content Edge Strategies

Modern systems do not limit CDNs to images and CSS.

They often use edge caching for:

  • public API responses
  • HTML shell plus client-side personalized fetches
  • signed asset access
  • bot-resistant and rate-limited request handling
  • geographically optimized routing to nearest healthy origin

Google-like and Amazon-like large systems rely heavily on globally distributed frontends or edge layers because global latency is a real product problem, not just a backend benchmark problem.

8.10 Static Asset Delivery

Static asset delivery is the most successful CDN use case.

JS, CSS, and Image Delivery

Typical frontend/backend production flow:

  1. frontend build produces versioned asset filenames
  2. assets are uploaded to object storage or origin bucket
  3. CDN caches those assets globally
  4. HTML references versioned URLs
  5. browser and CDN cache them aggressively because names change on deploy

Versioned Asset Strategy

Versioned or content-hashed filenames solve invalidation elegantly.

Example:

  • app.8f3d2.js instead of app.js

If content changes, the filename changes. That means old caches remain valid for old references, while new deployments use new URLs.

This is one of the cleanest examples of version-based invalidation in production.

Immutable Asset Caching

If assets are content-addressed or versioned, you can safely use very long cache lifetimes and immutable caching directives.

That gives extremely high cache hit rates with almost no freshness downside.

Cache Busting

Cache busting means changing the URL when content changes so caches naturally treat the asset as new.

Good cache busting is usually versioned naming, not manual emergency purges for every deploy.

Compression Basics

CDNs and origins commonly use compression:

  • Gzip: common general-purpose compression
  • Brotli: often better compression for web assets, especially text assets

Why it matters:

  • lower transfer size
  • faster page loads
  • reduced bandwidth cost

Image Optimization Basics

Images dominate page weight in many systems.

Common CDN/image strategies:

  • resize images per device size
  • use modern formats where possible
  • compress aggressively without harming visible quality
  • cache multiple transformed variants at edge

Signed URLs Basics

Signed URLs allow protected asset access through time-limited or permission-scoped links.

This is common for:

  • private downloads
  • customer-specific files
  • media assets behind authorization rules

The CDN can still help, but the cache key and security model must be designed carefully.

8.11 Global Distribution

Global delivery changes architecture decisions.

Geo Routing

Geo routing directs users toward nearby or appropriate regions.

Why it matters:

  • shorter network round trips
  • better perceived performance
  • better regional failover options

Anycast Basics

Anycast is a routing technique where multiple edge locations advertise the same IP, and network routing sends the user to a nearby or efficient destination.

This matters because users do not manually choose an edge. Network routing steers them.

Regional Latency Reduction

If your origin is only in one region, every distant user pays transcontinental latency. CDNs reduce that for cacheable content, but truly dynamic uncached requests still feel origin distance.

This is why global systems often pair CDNs with multi-region origins.

Multi-Region Architecture Impact

Once you have multiple origins or regions, the performance layer must interact with:

  • traffic steering
  • state locality
  • replication lag
  • failover policies
  • regional cache consistency

Failover Benefits

A good CDN and global routing layer can keep a regional origin issue from becoming a full global outage. Edge caches may continue serving stale or previously cached content while origins recover.

Origin Shielding Basics

Origin shielding adds an intermediate cache layer so many edge POPs do not all miss directly to origin. This is useful during viral events or large cache turnovers.

8.12 Real-World Examples

  • Netflix is the classic example of edge-heavy delivery for video content; the lesson is that moving content close to users dramatically changes scalability economics
  • Amazon-like e-commerce systems use CDNs for asset delivery, image optimization, and global storefront performance
  • GitHub-like systems use edge delivery for assets, release downloads, and parts of public web traffic
  • Stripe-like documentation, dashboards, and static resources benefit from aggressive CDN caching even when core payment flows remain origin-controlled
  • typical SaaS systems often keep app shells and static assets heavily cached while user-specific API calls remain dynamic

9. Cache Invalidation

Cache invalidation is the hardest part of the performance layer because it is where performance and correctness collide.

9.1 Why Cache Invalidation Is Hard

The famous joke says there are only two hard things in computer science: cache invalidation and naming things.

The practical meaning is this:

Once you copy data away from the source of truth, you have created multiple versions of reality. Now you must decide when old copies stop being acceptable.

That is hard because:

  • one source update may affect many cached views
  • invalidation can race with reads and writes
  • events can be delayed or lost
  • caches may exist at many layers: browser, CDN, local process, Redis
  • some views are aggregates, not direct copies of one row

9.2 Delete vs Update Strategies

There are two classic invalidation approaches after a write.

Strategy How it works Strengths Weaknesses
Delete on change Remove cache entry after source update Simple, avoids writing wrong value into cache Next read is a miss, can trigger stampede
Update on change Write new value into cache immediately Better freshness, avoids immediate miss Risk of dual-write inconsistency and more write overhead

Delete is often simpler and safer. Update can be faster for read-after-write workloads.

9.3 Event-Driven Invalidation

In event-driven invalidation, the source-of-truth write publishes an event that tells caches or downstream services what changed.

Example flow:

  1. product price changes in database
  2. product service emits product.updated
  3. consumers remove or refresh relevant cache keys
  4. next read sees new data or repopulates with new value

This is powerful because it decouples writers from all readers and cached views. It is also operationally harder because events must be reliable enough.

9.4 Pub/Sub Invalidation

Lightweight invalidation often uses Pub/Sub.

This works when:

  • missed events are acceptable because TTL is a fallback
  • low latency matters
  • invalidation is best effort rather than strictly durable

It is weaker when you need guaranteed processing and replay.

9.5 Version-Based Invalidation

Version-based invalidation means the cache key or validator includes a version.

Examples:

  • user:123:v17
  • asset filename hash
  • ETag generated from content version

This is powerful because old cached entries naturally become irrelevant when the version changes.

It is extremely common in:

  • static assets
  • schema-aware API responses
  • derived views where version numbers are easy to compute

9.6 Tag-Based Invalidation

Tag-based invalidation groups related cache entries under logical tags.

Example:

  • product detail page, search results, category page, and recommendation widget all share the tag product:123

When the product changes, all content attached to that tag can be invalidated.

This is useful when one underlying object fans out into many cached representations.

9.7 Dependency Invalidation

Many caches hold derived data, not raw rows.

Example:

  • homepage recommendations depend on user preferences, inventory, and pricing

Now invalidation is harder because one source change may invalidate multiple aggregates.

This is where dependency graphs, tags, or event fan-out matter.

9.8 Eventual Consistency Tradeoffs

In practice, many invalidation systems are eventually consistent.

That means for some brief period:

  • source of truth is updated
  • one cache is updated
  • another cache still has old data

Your job is to make that window safe.

Techniques include:

  • bounded TTL
  • read version checks
  • idempotent invalidation events
  • periodic repair or refresh jobs
  • treating stale responses as acceptable only for certain data types

9.9 Stale Read Mitigation

Good production systems do not assume invalidation is perfect. They layer safeguards.

Common mitigations:

  • short TTL for sensitive data
  • longer TTL plus event invalidation for less sensitive data
  • version numbers in payloads
  • client-side revalidation for edge content
  • cache bypass on critical user actions
  • observability for stale-read incidents

9.10 Cache Invalidation Flow

flowchart TD
	W[Write Request] --> DB[(Database Update)]
	DB --> EVT[Change Event]
	EVT --> INV[Invalidation Service]
	INV --> REDIS[Redis / App Cache Delete or Refresh]
	INV --> CDN[CDN Purge or Tag Invalidate]
	INV --> LOCAL[Local Service Cache Bust]
	REDIS --> NEXT[Next Read Rebuilds or Uses Fresh Value]
	CDN --> NEXT
	LOCAL --> NEXT

9.11 Real-World Invalidation Patterns

Typical production patterns:

  • product catalog systems invalidate product detail keys and search-result fragments when price or stock changes
  • GitHub-like public pages often rely on versioned assets and shorter-lived HTML or fragment caching
  • Stripe-like systems may avoid aggressive caching on the most correctness-sensitive payment paths but still use invalidation for dashboard and metadata views
  • typical SaaS apps invalidate tenant configuration, permissions, and dashboard aggregates via events plus TTL backstops

9.12 Best Practices and Common Mistakes

Best practices:

  • define source of truth clearly
  • design cache keys systematically
  • make invalidation events idempotent
  • use TTL as backup, not as the only correctness mechanism for important data
  • model dependencies explicitly for derived views

Common mistakes:

  • forgetting that one row update affects multiple cached views
  • using overly broad purges that destroy hit ratio
  • trusting best-effort invalidation for correctness-sensitive data
  • not planning what happens if the invalidation bus is down

10. How These Pieces Connect in Actual Architecture

The performance layer matters most when you can explain how the pieces work together, not just individually.

10.1 Typical SaaS Request Flow

sequenceDiagram
	participant Browser as Browser
	participant CDN as CDN Edge
	participant API as API Service
	participant Local as Local Cache
	participant Redis as Redis
	participant DB as Database
	participant Bus as Event Bus

	Browser->>CDN: Request app shell / assets
	alt Edge hit
		CDN-->>Browser: Cached asset or cached response
	else Edge miss
		CDN->>API: Forward request
		API->>Local: Lookup hot local entry
		alt Local hit
			Local-->>API: Value
		else Local miss
			API->>Redis: Lookup shared cache
			alt Redis hit
				Redis-->>API: Value
				API->>Local: Populate local cache
			else Redis miss
				API->>DB: Query source of truth
				DB-->>API: Fresh data
				API->>Redis: Store with TTL
				API->>Local: Populate
			end
		end
		API-->>CDN: Response
		CDN-->>Browser: Response
	end

	DB-->>Bus: Change event on writes
	Bus-->>Redis: Invalidate or refresh
	Bus-->>CDN: Purge / tag invalidation

This is what a realistic design conversation sounds like. The system is not "database plus Redis." It is a layered request path with distinct hit and miss behaviors.

10.2 What Breaks at Scale

As scale grows, the performance layer encounters these problems first:

  • one hot key overloads a single shard
  • local caches diverge across many instances
  • cache rebuilds overwhelm the database after deployments
  • CDN cache keys explode because of too many vary dimensions
  • invalidation lags behind writes during incident conditions
  • eviction removes exactly the hottest working set
  • global traffic shifts expose region-specific cold caches

A strong answer in interviews is to identify not only the optimization but also the failure mode created by that optimization.

10.3 Performance Layer Design Heuristics

Use a CDN when:

  • content is static or moderately cacheable
  • users are globally distributed
  • origin offload matters

Use local cache when:

  • objects are extremely hot
  • a network hop is still too expensive
  • slight inconsistency is acceptable

Use distributed cache when:

  • many app instances need shared fast access
  • backend misses are expensive
  • TTL and invalidation can be managed

Use careful invalidation instead of blind long TTLs when:

  • data changes matter to users or correctness

Do not add caching yet when:

  • traffic is low
  • the real bottleneck is poor query design or bad indexes
  • correctness cost exceeds performance benefit

11. Interview Discussion Guide

11.1 Common Interview Questions and Strong Answer Angles

Question What a strong answer should include
Why add a cache? Latency reduction, DB offload, compute savings, scalability, hot-key behavior
Redis or Memcached? Workload fit, feature needs, durability expectations, simplicity tradeoff
Cache-aside or write-through? Read/write mix, freshness needs, miss penalty, write latency impact
What happens on cache failure? Fallback behavior, database protection, rate limits, degraded mode
Why is invalidation hard? Multiple copies of data, derived views, event loss, race conditions, multi-layer caches
How do you choose TTL? Volatility, business tolerance for staleness, hit ratio, backend cost, jitter
What breaks at scale? Stampedes, hot keys, eviction under pressure, stale reads, regional cold starts
Why use a CDN? Edge latency reduction, origin offload, static asset delivery, global availability

11.2 A Strong Interview Structure

When asked about any performance-layer component, answer in this order:

  1. what problem it solves
  2. where it sits in the architecture
  3. how it works on the request path
  4. tradeoffs and failure modes
  5. what you would monitor in production

That structure works for caching, Redis, Memcached, TTL, eviction, CDN, and invalidation.

11.3 Metrics You Should Mention

For caches:

  • hit ratio
  • miss latency
  • eviction rate
  • memory usage
  • hot key distribution
  • rebuild rate after expiry
  • stale read complaints or mismatches

For CDNs:

  • edge hit ratio
  • origin fetch rate
  • regional latency
  • revalidation rate
  • cache-key cardinality
  • purge success and propagation time

11.4 Final Mental Checklist

Ask yourself these questions in every design:

  • what data is read repeatedly
  • which reads can tolerate staleness
  • what happens on cache miss
  • what happens when the cache is down
  • how are writes reflected in cached views
  • how are hot keys handled
  • how do global users get low latency
  • how do you keep the source of truth safe even if the performance layer fails

12. Final Takeaways

The performance layer is about moving repeated work away from expensive systems and closer to the user.

Caching reduces latency, protects databases, and lowers cost. Redis gives a rich and fast in-memory platform for many backend patterns. Memcached remains a strong option for pure distributed caching. TTL and eviction policy determine whether the cache behaves like an asset or a liability. CDNs extend caching to the edge and fundamentally change global performance. Invalidation is the price you pay for speed, and it must be treated as a core design problem rather than an afterthought.

In interviews, the goal is not just to say "use cache." The goal is to explain:

  • why the cache exists
  • where it sits in the request path
  • how reads and writes behave
  • what can go stale
  • what breaks at scale
  • how production systems stay safe when the performance layer fails

That is what separates glossary knowledge from engineering understanding.