Files

T

tarun-elango 26810e43d0 sd text

2026-04-26 13:27:19 -04:00

65 KiB

Raw Blame History

Performance Layer

The performance layer is the part of system design that turns a system from merely correct into fast, cost-efficient, and resilient under real traffic. In interviews, candidates often describe databases, queues, and load balancers, but they stop short of explaining how the system stays responsive when one hot endpoint receives millions of requests, when one viral product page suddenly becomes the top key in the fleet, or when static assets must be served globally at low latency.

That gap is the performance layer.

At a practical level, the performance layer is made of techniques and systems that sit between raw demand and expensive work. It includes in-process caches, distributed caches like Redis or Memcached, CDNs, edge caches, TTL strategy, eviction policy, and invalidation mechanisms. These components exist because real backends cannot afford to recompute, reread, or retransmit everything from the source of truth on every request.

This guide is written for two goals at once:

interview preparation, where you need to explain tradeoffs clearly
real backend engineering, where you need to build systems that remain stable at scale

The focus is not on memorizing buzzwords. The focus is understanding why these systems exist, how they work internally, what breaks at scale, and how to discuss them like an engineer who has operated them.

Examples in this guide are generalized from common industry patterns and public engineering discussions from companies such as Google, Netflix, Uber, Amazon, GitHub, Stripe, and large SaaS platforms.

1. Big Picture: What the Performance Layer Actually Does

Every backend has some expensive path:

reading from a database
computing personalized recommendations
rendering a large page or API payload
resizing images
loading product catalog data
checking permissions repeatedly
serving static assets to users across continents

If every request goes all the way to the expensive source, the system eventually fails one of these goals:

latency becomes too high
database load becomes too high
compute cost becomes too high
throughput becomes too low
spikes become unmanageable

The performance layer exists to absorb repeated work.

1.1 Performance Layer in a Real Architecture

flowchart LR
	U[User / Browser] --> B[Browser Cache]
	B --> DNS[DNS / Global Routing]
	DNS --> CDN[CDN Edge]
	CDN --> LB[Load Balancer / API Gateway]
	LB --> APP[Application Service]
	APP --> LC[Local In-Process Cache]
	APP --> RC[(Redis / Memcached)]
	APP --> DB[(Primary Database)]
	DB --> REP[(Read Replicas / Search / Derived Stores)]
	DB --> BUS[Event Bus / Invalidation Stream]
	BUS --> RC
	BUS --> CDN

This diagram captures an important idea: performance is not a single cache. It is a stack.

the browser may cache assets or API responses
the CDN may answer without touching your origin
the application may use local in-memory caching for extremely hot keys
a distributed cache may prevent repeated database reads
invalidation events keep caches aligned with changing data

1.2 Why Interviewers Care About This Layer

Interviewers use performance-layer questions to test whether you understand the difference between a toy design and a production design.

A weak answer sounds like this:

"I would use a database and maybe Redis for caching."

A strong answer sounds like this:

"I expect read-heavy traffic with hot keys, so I would place a distributed cache in front of the database, possibly keep a small local cache inside each service instance for ultra-hot objects, use TTL plus event-driven invalidation to balance freshness and load, and put static assets behind a CDN to reduce origin traffic globally. Then I would discuss what happens during cache misses, stampedes, stale reads, and regional failover."

That answer shows system-level thinking.

1.3 Core Mental Model

The performance layer trades one form of complexity for another.

It improves:

latency
throughput
cost
resilience to spikes

But it introduces:

stale data risk
invalidation complexity
memory pressure
partial failure modes
operational tuning work

If you remember one sentence from this guide, remember this:

Performance systems are valuable because they turn expensive operations into cheap lookups, but they make correctness and freshness harder.

2. Caching Fundamentals

Caching is the core concept of the performance layer.

2.1 What Caching Is

A cache stores the result of an expensive operation so that future requests can avoid doing the expensive work again.

That expensive work might be:

a database query
an API response assembly step
a rendered HTML fragment
a permission lookup
a session lookup
a computed leaderboard
a static file fetch from origin

The basic idea is simple, but the system effect is huge. If a result is requested many times, caching changes the system from repeatedly paying the full cost to paying it once and reusing it.

2.2 Why Caching Exists

Goal	What caching improves	Real production effect
Reduce latency	Serves data from memory or from a nearby edge	Requests that take tens of milliseconds from Redis may have taken hundreds from a database or remote service
Reduce database load	Prevents repeated reads for the same objects	Fewer DB connections, less lock contention, lower CPU, fewer read replicas
Reduce compute cost	Avoids re-running expensive business logic or rendering	Lower application CPU and lower cloud spend
Improve scalability	Lets the same backend support more traffic	A service that would saturate at 20k RPS may survive 10x more if most reads hit cache
Smooth spikes	Absorbs bursts on hot keys	Viral traffic becomes manageable instead of overwhelming the source of truth

2.3 Hot Paths and Hot Data

Most traffic is not evenly distributed.

In real systems, a small fraction of endpoints, users, products, or assets often receives a large fraction of total traffic. This is why caching is so effective.

Examples:

an e-commerce homepage and a few trending product pages get disproportionate reads
a GitHub repository landing page gets far more traffic after a release or public announcement
a Stripe dashboard loads the same account metadata repeatedly during one user session
a ride-sharing system repeatedly reads nearby-driver state for a busy downtown area

This concentration of demand creates hot paths and hot data.

hot path: an endpoint or execution path hit extremely often
hot data: specific keys or objects requested repeatedly

Performance engineering often starts by identifying those hot spots and caching them before scaling the entire system blindly.

2.4 Cache Hit vs Cache Miss

Term	Meaning	Why it matters
Cache hit	The requested item is found in cache	Fast path, cheap path
Cache miss	The item is not in cache	Slow path, must fetch from source
Hit ratio	Fraction of requests served from cache	One of the main health metrics for a cache-backed system
Miss penalty	Extra latency and source load caused by a miss	Critical when the backend is expensive or fragile

You should think of a cache miss as more than a slower request. A miss is also a source-of-truth request. At scale, misses multiply into pressure on databases, services, and storage layers.

If 95 percent of requests hit cache, the remaining 5 percent still define whether the system stays alive during a cache flush or deployment.

2.5 Cache Lifecycle

flowchart LR
	REQ[Request Arrives] --> LOOKUP{Item In Cache?}
	LOOKUP -- Yes --> HIT[Return Cached Value]
	LOOKUP -- No --> LOAD[Load From Source]
	LOAD --> STORE[Store In Cache]
	STORE --> RESP[Return Response]
	HIT --> TTL[Wait Until TTL / Invalidation / Eviction]
	RESP --> TTL
	TTL --> EXPIRE[Item Expires or Is Removed]
	EXPIRE --> LOOKUP

This looks circular because it is. A cache is not a one-time optimization. It is a continuous lifecycle of population, use, staleness, and removal.

2.6 Cache Warming and Cold Starts

Cache warming means pre-populating a cache before traffic arrives or before a new deployment begins serving traffic.

Cold start means the cache is empty or mostly empty.

Why cold starts are dangerous:

traffic suddenly falls through to the database
p99 latency jumps sharply
a previously healthy backend gets overloaded
autoscaling can make it worse because new instances start with empty local caches

Typical warming strategies:

prefill the top N hot keys after deployment
replay recent hot key logs into the cache
keep a warm standby cache fleet during migration
gradually shift traffic to fresh instances

Production example:

An e-commerce system may warm the homepage, popular category pages, top products, and pricing metadata before turning on a new region. A SaaS dashboard may warm account summaries and permission maps for large enterprise tenants.

2.7 Multi-Layer Caching

Most large systems use more than one cache layer.

Layer	Where it lives	Strengths	Weaknesses
Browser cache	On the client	Zero origin cost, extremely low latency	Harder to invalidate precisely, less control
CDN cache	At edge POPs	Global latency reduction, origin offload	Works best for cacheable and moderately shared content
Local cache	Inside service process	Fastest server-side lookup, no network hop	Small, per-instance inconsistency, lost on restart
Distributed cache	Remote shared cache like Redis or Memcached	Shared across instances, larger capacity	Network hop, operational overhead

Multi-layer caching exists because the cheapest cache is the one closest to the requester. The closer you can answer, the less network, compute, and backend work you do.

2.8 Local Cache vs Distributed Cache

Dimension	Local Cache	Distributed Cache
Latency	Extremely low	Low but includes network hop
Scope	Single process or instance	Shared across many instances
Consistency	Weak across instances	Better shared visibility
Capacity	Limited by process memory	Larger dedicated memory pool
Failure behavior	Lost on process restart	Survives app restarts but is its own dependency
Good for	Ultra-hot config, permission snapshots, small metadata	Shared sessions, popular entities, counters, rate limits

Strong production designs often combine them. For example:

local cache for 100 hottest objects per instance
Redis for shared hot data across the fleet
database for source of truth

2.9 How Caching Changes Overall Architecture

Caching does not just make a request faster. It changes the control flow of the system.

Without caching:

every read goes to the source of truth
latency scales with database or service performance
spikes directly impact the backend

With caching:

reads split into hit path and miss path
miss handling becomes a core part of correctness
invalidation and freshness become first-class architecture concerns
partial outages can be hidden or amplified depending on cache behavior

That is why caching is never "just add Redis." It is an architectural decision that affects reads, writes, deployments, incident response, and observability.

2.10 Consistency Challenges and Stale Data Tradeoffs

Caches are usually not the source of truth. That means there is always a risk that the cache contains old data.

The key design question is not "Can stale data happen?" The answer is almost always yes.

The real questions are:

how stale can data be before users notice or correctness breaks
how quickly do updates need to propagate
what should happen if invalidation fails
which data can tolerate eventual consistency and which cannot

Examples:

stale profile photos are usually fine
stale inventory counts can cause overselling
stale permission or fraud state can become a security problem
stale pricing can create financial or legal issues

Strong engineers discuss stale data in business terms, not just technical terms.

2.11 Failure Patterns Every Interviewer Expects

Failure pattern	What it means	What it looks like in production	Common mitigations
Cache stampede	Many requests miss the same key and all rebuild it	DB traffic spike after key expiry	Request coalescing, single-flight, locks, soft TTL, background refresh
Cache penetration	Requests repeatedly ask for keys that do not exist	Attack or bug causes endless DB misses	Negative caching, bloom filters, request validation
Cache avalanche	Many keys expire simultaneously	Sudden backend overload when a batch of TTLs ends together	TTL jitter, staggered expiration, warmup, traffic shaping
Hot key overload	One key becomes extremely popular	One Redis shard or service instance gets overloaded	Key replication, local cache, consistent hashing, hot key splitting
Stale data leak	Invalidation fails or is delayed	Users see old values after updates	Version checks, event-driven invalidation, bounded TTLs, read repair

These are not interview-only concepts. They show up in real incidents.

2.12 Best Practices and Common Mistakes

Best practices:

cache only where the access pattern justifies it
measure hit ratio, miss latency, key size, and source-of-truth fallback rate
define ownership of cache invalidation explicitly
use TTL jitter to avoid synchronized expiration storms
keep serialized values compact and stable
protect the miss path with rate limits, request coalescing, or backpressure

Common mistakes:

caching everything without understanding access patterns
caching data that must be strongly consistent without a freshness plan
letting huge values or huge key cardinality blow up memory
using one global TTL for all data types
ignoring cache outage behavior
forgetting that a cache flush can look like a DDoS against your database

3. Redis

Redis is one of the most widely used performance-layer systems in backend engineering.

3.1 What Redis Is

Redis is an in-memory data structure store commonly used as a distributed cache, fast key-value store, rate limiter, session store, coordination primitive, leaderboard engine, and lightweight stream or queue component.

It became popular because it combines several valuable traits:

very low latency
simple operational model for many use cases
multiple built-in data structures
high developer productivity
strong ecosystem support

In practice, Redis often becomes the first non-database data system teams add when they need more performance.

3.2 Why Redis Is Widely Used

Redis is useful because it covers many common backend problems with one fast system.

Examples:

caching user profiles or product metadata
storing web sessions
implementing rate limiting per user, token, or IP
maintaining rolling counters and quotas
computing leaderboards with sorted sets
distributing lightweight invalidation or coordination events
holding ephemeral state for jobs, retries, and workflows

A big reason engineers like Redis is that it is often fast enough to move a system from struggling to healthy without a major architectural rewrite.

3.3 In-Memory Architecture

Redis keeps its primary working dataset in memory. That is the key reason it is fast.

Compared to disk-backed databases:

memory access is much faster than disk I/O
data structures can be updated quickly with minimal indirection
many operations are constant time or logarithmic time

But in-memory design also brings constraints:

memory is expensive compared to disk
total dataset size is limited by RAM
persistence must be carefully designed if data matters
large keys and fragmentation can become operational issues

Redis works best when the working set fits comfortably in memory and when the system is comfortable with Redis being a fast data layer rather than the ultimate durable source of truth.

3.4 Single-Threaded Event Loop Concept

Historically, Redis is famous for a mostly single-threaded command execution model.

That sounds like a weakness at first, but the intuition matters.

Why single-threaded command execution helped Redis:

avoids lock contention inside the core data path
simplifies internal state management
keeps operations predictable
reduces complexity in common use cases

The model is roughly:

accept network requests
parse commands
execute commands against in-memory data structures
write responses back to clients

Because the operations happen in memory and avoid heavy internal locking, the system can be extremely fast.

Important nuance for interviews:

Modern Redis versions use threads for some I/O and background work, but command execution for a given shard is still primarily single-threaded in spirit. The point is not the slogan "single-threaded" by itself. The point is why that model worked: low coordination overhead on an in-memory data path.

3.5 Persistence Basics

Redis is often used as a cache, but it also supports persistence.

The two core persistence ideas are:

Mechanism	What it does	Strengths	Weaknesses
RDB snapshot	Periodically writes a point-in-time snapshot to disk	Compact, good for backups and restart speed	Can lose recent writes between snapshots
AOF append-only file	Logs write operations as they happen	Better durability, more recent recovery	Larger files, rewrite complexity, more I/O

Many deployments combine both.

Interview framing:

If Redis is used only as a cache, persistence may be optional. If it stores critical ephemeral state like sessions, counters, or queues that you care about recovering, persistence and replication matter much more.

3.6 Redis Data Structures and Why They Matter

Redis is not just a string map. Its data structures are a large part of why it is useful.

Data structure	Intuition	Common operations	Production use cases
Strings	Simplest key to value mapping	GET, SET, INCR	General caching, counters, feature flags, serialized objects
Hashes	Small field map under one key	HGET, HSET	User/session metadata, grouped object fields
Lists	Ordered sequence with push/pop	LPUSH, RPUSH, LPOP	Simple queues, activity buffers, recent events
Sets	Unordered unique members	SADD, SISMEMBER	Membership checks, tags, deduplication
Sorted sets	Unique members with score ordering	ZADD, ZRANGE	Leaderboards, ranking, delayed tasks, time windows
Bitmaps	Bit-level state compactly stored	SETBIT, BITCOUNT	Presence flags, lightweight analytics, feature rollout markers
Streams	Append-only log with consumer groups	XADD, XREADGROUP	Event pipelines, work distribution, ordered consumption

Strings

Strings are the default choice for caching serialized values such as JSON or protobuf blobs.

Why teams use them:

easiest operationally
flexible schema at the application layer
supports TTL directly
works well for counters with atomic increment operations

Hashes

Hashes are useful when you want multiple fields grouped under one logical key. They can reduce duplication and sometimes improve ergonomics for partial field access.

Example:

session:123 with fields like user_id, expires_at, role

Lists

Lists are good for queue-like behavior, but teams should be careful not to overuse Redis lists as a full durable queue platform when stronger guarantees are required.

Sets

Sets give fast membership testing.

Example:

which users have access to a beta feature
which object IDs were already processed

Sorted Sets

Sorted sets are one of Redis's most powerful primitives.

They maintain a set of members ordered by score. This makes them ideal for:

leaderboards
ranking systems
top-N queries
sliding windows for rate limiting
scheduling delayed tasks by timestamp

Bitmaps

Bitmaps are memory-efficient when you need large boolean state spaces.

Example:

whether a user ID belongs to a cohort
whether an event occurred on a given day

Streams

Streams provide an append-only structure with consumer groups and replay semantics. They are useful when you want lightweight log-like behavior in Redis.

They are helpful, but you should not automatically replace Kafka or other durable log systems with Redis Streams for large-scale, long-retention event pipelines.

3.7 Pub/Sub Basics

Redis Pub/Sub allows publishers to send messages to channels and subscribers to receive them.

This is useful for lightweight fan-out such as:

cache invalidation notifications
internal live updates
ephemeral coordination signals

But Pub/Sub is not durable messaging. If a subscriber is down, messages can be missed. That makes it suitable for best-effort signaling, not for business-critical guaranteed delivery.

3.8 Distributed Locks Basics

Redis is often used for lightweight distributed locking.

Typical use case:

ensure only one worker rebuilds a hot cache key
avoid duplicate job execution
coordinate a small critical section across nodes

The basic pattern is setting a key with a TTL only if it does not already exist.

Important caution:

Distributed locking is easy to misuse. If the lock expires too early, if clients pause, or if ownership is not verified on release, correctness bugs appear. For critical correctness, database transactions or purpose-built coordination systems may be safer.

3.9 Common Redis Use Cases

Use case	Why Redis fits
Rate limiting	Atomic increments and expirations are simple and fast
Session storage	Low latency key lookup with TTL
Leaderboards	Sorted sets make ranking natural
Caching	Memory speed plus TTL support
Lightweight queues	Lists or streams for simple work pipelines
Token or OTP storage	Fast expiry-based ephemeral data
Idempotency keys	Short-lived state for duplicate request protection

Rate Limiting

Redis is common for rate limiting because counters and expirations are easy to implement atomically.

Examples:

100 requests per minute per API key
5 login attempts per 10 minutes per account
per-IP abuse protection at API gateway or edge layer

Session Storage

Redis is widely used for session storage because sessions are read frequently, written occasionally, and usually have natural expiration.

Typical SaaS pattern:

app server reads session by token
Redis returns session data quickly
TTL naturally removes expired sessions

Leaderboards

Games, social apps, and competition systems often use sorted sets for leaderboards because rank queries and top-N retrieval are natural operations.

Queues Basics

Redis can be used for simple queueing, retries, and scheduled jobs.

It is often a good fit when:

throughput is moderate
retention is short
operational simplicity matters

It is a weaker fit when:

you need long retention
you need strong replay guarantees
event history matters deeply
consumer scaling and durability are primary concerns

3.10 Replication Basics

Redis commonly uses primary-replica replication.

Why replicas exist:

improve read scale
improve availability
reduce data loss risk during failure

Tradeoff:

replication is typically asynchronous, so replicas may lag. That means stale reads are possible.

For many cache use cases, that is acceptable. For correctness-sensitive use cases, that must be discussed explicitly.

3.11 Sentinel Basics

Redis Sentinel monitors Redis instances and helps automate failover for primary-replica setups.

What Sentinel does:

health checks
failure detection
leader election among Sentinel nodes
promoting a replica to primary
updating clients or discovery mechanisms

Sentinel matters when you want high availability without full Redis Cluster complexity, especially for simpler primary-replica deployments.

3.12 Redis Cluster Basics

Redis Cluster provides sharding across multiple nodes.

Why it exists:

a single Redis node has memory and throughput limits
large workloads need horizontal scale

Cluster distributes keys across hash slots. That spreads memory and traffic across nodes.

Tradeoffs:

operations spanning multiple keys become more constrained
some application logic must be shard-aware
hot keys can still overload one shard
operational complexity increases

Cluster helps with capacity and throughput, but it does not magically eliminate key-distribution problems.

3.13 Memory Management Considerations

Redis performance problems often become memory problems.

Things engineers must watch:

maxmemory limits
eviction behavior under pressure
large keys or huge collections
fragmentation overhead
persistence overhead during fork or rewrite
replication buffers
serialization bloat

Bad Redis incidents often come from not respecting memory reality.

Examples:

storing enormous JSON blobs under one key
letting key cardinality grow without bounds
forgetting that snapshots or AOF rewrites need extra memory headroom
assuming TTL means memory disappears immediately and uniformly

3.14 When Redis Should Not Be Used

Do not use Redis when:

the dataset does not fit comfortably in memory
you need the primary source of truth for large durable data
you need complex relational queries or joins
you need very strong durability guarantees with minimal write loss tolerance
you need long-lived event storage and replay at log-system scale
you cannot tolerate cache/data loss but are treating Redis like a cheap database

Redis is excellent, but many outages come from stretching it past its natural use case.

3.15 Real-World Patterns

Generalized production patterns you will see repeatedly:

Amazon-like e-commerce systems cache product and pricing metadata, but keep order and payment state in durable databases
GitHub-like systems use caching for repository page composition and rate limiting, but not as the source of truth for repository metadata
Stripe-like systems may use Redis for short-lived idempotency, fraud throttles, or session-like state, while preserving financial correctness in durable transactional stores
Uber-like systems use fast data systems for hot operational state and rate control, while durable systems preserve business records and historical data

4. Memcached

Memcached is another classic distributed caching system.

4.1 What Memcached Is

Memcached is a high-performance, memory-only, distributed cache built around a simpler model than Redis.

It is focused primarily on one job: caching values in memory and serving them fast.

That focus is why many companies historically used it heavily for large-scale read caching.

4.2 How Memcached Differs in Spirit from Redis

Redis evolved into a multi-purpose in-memory data system.

Memcached stayed closer to a simple cache appliance.

That means:

fewer built-in data structures
less feature breadth
simpler mental model
often lower overhead for straightforward cache workloads

4.3 Simple Distributed Caching Model

A classic Memcached deployment is made of many independent cache nodes. The client typically decides which node holds a given key using hashing.

This model is simple:

key arrives at the application
application hashes the key
application sends request to the selected Memcached node
node stores or returns the value

There is usually less server-side coordination than in more feature-rich clustered systems.

4.4 Memory-Only Behavior

Memcached is memory-only. It is not designed as a durable store.

This is important conceptually:

it is a pure performance layer
if it restarts, cached data is gone
that is acceptable because the source of truth should be elsewhere

This simplicity is powerful when your cache is truly disposable.

4.5 Slab Allocation Basics

One important internal concept in Memcached is slab allocation.

The cache groups memory into classes of fixed-size chunks so that similarly sized objects can be stored efficiently.

Why this exists:

general-purpose memory allocation can fragment under heavy cache churn
cache workloads often involve huge numbers of similarly sized objects
fixed allocation classes improve speed and predictability

Tradeoff:

if object sizes do not fit slab classes well, memory can be wasted through internal fragmentation

This is a good example of a design optimized specifically for caching rather than for general-purpose data structures.

4.6 Cache-Focused Design

Memcached is intentionally narrow.

Its strength is that it does not try to be a queue, stream platform, lock manager, or ranked index. It tries to be a very fast shared cache.

This makes it attractive when the problem really is just:

store hot objects in memory
retrieve them quickly
let the app refill them on misses

4.7 Common Production Use Cases

Memcached is commonly used for:

page fragment caching
session-like ephemeral web data
query result caching
product or profile object caching
large-scale read-heavy web workloads where durability is irrelevant

Historically, many large web companies used Memcached aggressively in front of databases for exactly this reason.

4.8 Scaling Characteristics

Memcached scales horizontally in a straightforward way because nodes are relatively independent.

Strengths:

easy to add more cache capacity
predictable use for simple key-value caching
low complexity for read-heavy workloads

Weaknesses:

fewer built-in coordination features
no rich server-side data structures
less helpful when the application wants more than plain caching

4.9 Limitations Compared to Redis

Compared to Redis, Memcached generally has:

less feature breadth
less support for rich data structures
no native persistence model for recovering data
fewer coordination-oriented use cases

But that narrower design can be a feature, not a bug, when simplicity is what you want.

4.10 Redis vs Memcached

Dimension	Redis	Memcached
Primary identity	General-purpose in-memory data store	Pure distributed cache
Data structures	Rich: strings, hashes, lists, sets, sorted sets, streams, more	Mostly simple key-value
Persistence	Optional RDB/AOF	Memory-only
Coordination features	Pub/Sub, scripts, counters, locks, streams	Minimal
Operational simplicity for pure cache	Good, but broader feature set	Often very simple
Memory efficiency for basic cache workloads	Good, workload dependent	Historically attractive for pure cache cases
Best fit	Cache plus broader backend primitives	Straight shared cache at scale

4.11 When Companies Choose One Over the Other

Choose Redis when:

you want one fast system for caching plus rate limits, counters, sessions, or leaderboards
you need richer data types
you want optional persistence or replication features

Choose Memcached when:

the problem is pure disposable caching
the workload is straightforward key-value object caching
simplicity and cache-specific behavior matter more than feature breadth

In interviews, do not answer this as a popularity contest. Answer it as a workload decision.

5. Cache Access Patterns

The cache technology is only half the story. The access pattern determines behavior, consistency, and failure modes.

5.1 Cache-Aside Pattern

Cache-aside is the most common caching pattern in production systems.

The idea is simple:

application reads from cache first
if the key exists, return it
if it does not exist, read from database or source of truth
store the result in cache
return it to the caller

This is also called lazy loading because the cache is filled on demand.

Why Cache-Aside Exists

It is popular because it is simple, flexible, and keeps the source of truth unchanged. The application decides when and what to cache.

Cache-Aside Read Flow

sequenceDiagram
	participant C as Client
	participant A as Application
	participant Cache as Cache
	participant DB as Database

	C->>A: Read object
	A->>Cache: GET key
	alt Cache hit
		Cache-->>A: cached value
		A-->>C: response
	else Cache miss
		A->>DB: query object
		DB-->>A: row / record
		A->>Cache: SET key with TTL
		A-->>C: response
	end

Advantages

simple mental model
cache stores only demanded data
no cache write cost for cold data
application controls key format and TTL per object type

Disadvantages

first read after expiry is slow
cache misses can overload the database
stale data appears if invalidation is weak
multiple readers may rebuild the same key at once

Stale Data Risks

On writes, the source of truth changes first. If the cache is not invalidated immediately, future reads may still see the old cached value.

This is why cache-aside usually needs one of these:

delete cached key on write
update cached value on write
short TTL as a backstop
version checks in the application

Failure Cases

cache node unavailable: all reads fall through to DB
DB slow: miss path becomes dangerous
key expires under burst traffic: stampede
invalidation event lost: stale data survives until TTL

Common Production Usage

Cache-aside is common for:

product pages
user profiles
configuration data
permission maps that tolerate bounded staleness
API aggregation results

5.2 Write-Through

Write-through means writes go to the cache and to the backing store as part of the write path.

The intent is to keep cache and source of truth aligned immediately.

Write Path Flow

client sends write
application validates input
application writes new value to database and cache in the same logical operation
future reads hit a fresh cache entry

Why It Exists

Write-through exists because read-after-write consistency from cache is often better than with purely lazy cache-aside. Immediately after a successful write, the cache already has the fresh value.

Benefits

fresher cache after writes
simpler read path after updates
fewer stale reads right after mutation

Tradeoffs

every write pays cache cost even if the data is never read again
write latency increases because more systems are involved
failure handling becomes trickier if DB write succeeds but cache write fails, or vice versa

Failure Handling Questions

You must define:

which write is authoritative if one succeeds and one fails
whether the request should fail or retry
whether reconciliation jobs exist

Production Suitability

Write-through is useful when:

reads soon after writes are common
keeping cache hot is valuable
write volume is manageable

It is a weaker fit when:

write traffic is very high
many written objects are never read again
write latency is extremely sensitive

5.3 Write-Back / Write-Behind

Write-back means the application writes to the cache first and persists to the database asynchronously later.

This is the most aggressive performance-oriented pattern.

Why It Exists

It exists to absorb high write throughput and smooth backend load. The immediate write path becomes very fast because the durable store is no longer on the critical path.

Throughput Advantages

low-latency writes
batched or buffered persistence
can smooth write bursts before they hit the database

Durability Risks

This pattern is dangerous because data may exist only in the cache or buffer for some time.

If the cache crashes, if the async worker fails, or if the queue is lost, writes can disappear.

Data Loss Scenarios

cache node fails before flush
async worker backlog grows without bound
persistence queue is dropped during incident
ordering bugs cause older writes to overwrite newer writes

Queueing Considerations

Write-back systems are really queueing systems too. You need:

durable buffering strategy
retry behavior
ordering guarantees
backpressure when database falls behind
replay and reconciliation tools

Operational Complexity

Write-back is harder to operate because the write acknowledgement and true persistence are decoupled.

This can be acceptable for:

analytics counters
non-critical engagement metrics
temporary derived state

It is usually not acceptable for:

payments
orders
inventory reservation
anything audit-sensitive

5.4 Pattern Comparison

Pattern	Read behavior	Write behavior	Main strength	Main risk
Cache-aside	Reads cache first, loads on miss	Source updated separately, cache invalidated or refreshed	Simple and common	Stale reads and miss storms
Write-through	Reads often hit fresh cache	Write updates cache and DB together	Better freshness after writes	Higher write latency and dual-write complexity
Write-back	Reads hit hot cache	Write acknowledged before durable persistence completes	High write throughput	Data loss and operational complexity

5.5 Write-Through vs Cache-Aside

Question	Write-Through	Cache-Aside
Is cache populated on write?	Yes	Usually no
First read after write	Often fast	May miss if cache was invalidated
Write cost	Higher	Lower
Common fit	Read-after-write sensitive data	General-purpose read-heavy systems

5.6 Write-Back vs Write-Through

Question	Write-Back	Write-Through
Persistence timing	Asynchronous	Synchronous or near-synchronous
Durability	Weaker	Stronger
Throughput	Higher	Lower
Operational complexity	Higher	Lower
Safe for critical data	Rarely	More often

6. TTL (Time To Live)

TTL is one of the most important cache controls.

6.1 Why TTL Exists

TTL gives cached data an expiration time.

It exists because:

cache entries should not live forever
data changes over time
invalidation is never perfect
memory must eventually be reclaimed

TTL is both a freshness policy and a safety valve.

6.2 Freshness vs Performance Tradeoff

Short TTL:

fresher data
more misses
more backend load

Long TTL:

better hit ratio
lower backend load
greater stale data risk

Choosing TTL is not a mathematical purity exercise. It is a business decision informed by traffic shape and correctness requirements.

6.3 Choosing TTL Values

Data type	Typical TTL thinking	Why
Static assets with versioned URLs	Very long, often effectively immutable	Content changes only when filename changes
Product catalog metadata	Minutes to hours, often event-invalidated too	Read heavy, moderate freshness needs
User profile display info	Minutes	Slight staleness often acceptable
Inventory or seat availability	Very short or event-driven	Stale data can cause user-visible errors
Auth or permission data	Short or version-checked	Security sensitivity
Rate limiting counters	Natural expiration aligned to window	TTL defines the policy itself

6.4 Short TTL vs Long TTL

Short TTLs are attractive because they reduce staleness, but they often create hidden instability.

If a key is hit constantly and expires every few seconds, the system repeatedly repays the miss penalty. That can waste backend capacity.

Long TTLs improve performance, but only if you also have a reliable invalidation strategy or a clear tolerance for stale data.

6.5 Dynamic TTL Strategies

Good production systems often use different TTLs for different data classes.

Examples:

long TTL for immutable product images
medium TTL for product descriptions
short TTL for stock level or surge pricing data
longer TTL for cold data, shorter TTL for volatile entities

Some systems also vary TTL by popularity. Very hot keys may justify proactive refresh or longer cache retention because the savings are large.

6.6 Soft TTL vs Hard TTL

Hard TTL means the entry is considered expired and must be reloaded before serving.

Soft TTL means the entry is considered old enough to refresh, but the system may still serve it briefly while a background refresh happens.

Soft TTL is a practical way to avoid user-facing latency spikes and stampedes. It supports patterns like stale-while-revalidate.

6.7 Expiration Storms and Jitter Strategies

If many keys are created at the same time with the same TTL, they may all expire together.

That causes an expiration storm or avalanche.

The standard mitigation is jitter: add randomness to expiration times.

Example:

instead of every key expiring at exactly 600 seconds
expire keys at 600 seconds plus or minus a bounded random offset

This spreads rebuild work over time.

6.8 Practical TTL Decisions in Production

Strong production TTL policy usually includes:

base TTL chosen per data class
event-driven invalidation for important writes
jitter to avoid synchronized expiry
soft TTL for hot or expensive-to-build keys
observability on miss storms and stale-read complaints

Practical rule:

If you cannot explain why a TTL is what it is, the TTL is probably wrong.

7. Eviction Policies

TTL decides when entries should expire logically. Eviction decides what happens when memory pressure forces the cache to throw something away.

7.1 Why Eviction Policies Matter

When memory fills up, the cache must choose which entries survive.

That choice directly affects hit ratio and therefore system performance.

Wrong eviction policy can destroy performance by retaining low-value data and evicting exactly the hot data that saves the backend.

7.2 Common Policies

Policy	Intuition	Works well when	Fails when
LRU	Evict least recently used items	Recent access predicts future access	Workload has scanning patterns that pollute recency
LFU	Evict least frequently used items	Repeated popularity matters	Frequency history adapts too slowly to sudden changes if tuned poorly
FIFO	Evict oldest inserted items	Simplicity matters more than precision	Age is not a good signal of future value
Random	Evict arbitrary items	Cheap and simple, decent in some broad workloads	Can evict very hot keys unpredictably
TTL-based	Prefer items nearing expiration	Expiry is meaningful and freshness-driven	Hot but old keys may be evicted too early

7.3 Redis Eviction Modes

Redis exposes several eviction modes.

Mode	Meaning
`noeviction`	Reject writes when memory limit is reached
`allkeys-lru`	Evict least recently used keys from all keys
`volatile-lru`	Evict least recently used keys only among keys with TTL
`allkeys-lfu`	Evict least frequently used keys from all keys
`volatile-lfu`	Evict least frequently used keys only among keys with TTL
`allkeys-random`	Evict random keys from all keys
`volatile-random`	Evict random keys among TTL keys
`volatile-ttl`	Evict keys with nearest expiration among TTL keys

The right mode depends on workload and whether all keys are disposable.

7.4 Workload-Based Policy Selection

Use LRU when:

recent access strongly predicts future access
the working set shifts over time

Use LFU when:

long-term popularity matters
some keys remain hot over long periods

Use TTL-sensitive strategies when:

expiring data is naturally less valuable
freshness policy is integral to value

Avoid random or FIFO unless you have a reason. Simpler is not always safer.

7.5 How Wrong Eviction Destroys Performance

Example:

assume a SaaS dashboard has 5 percent extremely hot keys and 95 percent rarely used keys
if eviction repeatedly removes hot keys, hit ratio falls sharply
application traffic shifts back to the database
database CPU rises, tail latency rises, and autoscaling may not help because the problem is miss amplification

Engineers often blame the database first, but the real issue is sometimes that the cache is keeping the wrong objects.

7.6 Best Practices

size memory with headroom rather than relying on constant eviction
monitor eviction rate alongside hit ratio
identify hot keys and oversized keys
match policy to workload instead of using defaults blindly
test cache behavior during memory pressure, not just normal load

8. CDN (Content Delivery Network)

Caching is not only a backend service concern. At internet scale, the performance layer extends to the edge.

8.1 What a CDN Is

A CDN is a globally distributed network of edge servers that caches and delivers content closer to users.

Instead of every user request hitting your origin servers directly, a nearby edge location can serve cacheable content.

8.2 Why CDNs Exist

Goal	CDN benefit	Real effect
Reduce latency	Content served closer to user	Faster page loads and API edge responses
Reduce bandwidth from origin	Repeated asset delivery stays at edge	Lower origin network cost
Offload backend	Fewer requests reach origin	Origin survives higher traffic
Improve resilience	Edge absorbs surges and some attacks	Better stability during spikes
Provide global delivery	POPs around the world	Better user experience across regions

8.3 CDN Architecture

flowchart LR
	U[User Browser] --> EDGE[Nearest CDN Edge POP]
	EDGE -->|Cache Hit| RESP[Response Returned]
	EDGE -->|Cache Miss| SHIELD[Origin Shield / Regional Cache]
	SHIELD --> ORIGIN[Origin App / Object Store]
	ORIGIN --> SHIELD
	SHIELD --> EDGE
	EDGE --> RESP

Important concepts:

edge server or POP: geographically distributed cache location
origin server: your source system where content is generated or stored
origin shield: an extra cache layer between edge POPs and origin to reduce duplicate origin fetches

CDN vs Reverse Proxy

These terms are related, but they are not the same thing.

Dimension	CDN	Reverse Proxy
Typical placement	Globally distributed edge network	Usually sits in front of origin inside one region or network boundary
Main goal	Global latency reduction and origin offload	Traffic routing, load balancing, TLS termination, caching, security controls
Geographic reach	Many POPs across the world	Usually one site or a few controlled deployment points
Best use case	Shared content close to users worldwide	Centralized front door for backend services
Examples in practice	CloudFront, Fastly, Cloudflare edge delivery	NGINX, Envoy, HAProxy at origin or regional edge

In real systems, they often work together rather than compete. A CDN may sit in front of a reverse proxy, and the reverse proxy then routes to application services. The CDN handles global edge delivery and shared caching; the reverse proxy handles origin-side traffic management and policy enforcement.

DDoS Mitigation Basics

CDNs help with basic DDoS resilience because they distribute traffic across a large edge footprint, absorb repeated requests close to the network boundary, and keep a meaningful fraction of malicious or accidental traffic away from the origin. That does not eliminate the need for rate limiting, WAF rules, or origin protection, but it reduces how directly every spike hits your backend.

8.4 Edge Caching

Edge caching means storing content at CDN nodes so users can be served without going back to origin.

This is especially effective for:

static assets
images
videos
public API responses that can be cached safely
partially personalized pages with shared fragments

8.5 Browser Cache vs CDN Cache

Dimension	Browser Cache	CDN Cache
Location	End user device	Provider edge POP
Main benefit	Zero network or reduced network for repeat user visits	Shared origin offload across many users
Control	Limited by browser behavior and headers	Controlled via CDN policies and headers
Best for	User-specific repeat access to assets	Shared assets and shared responses

8.6 Cache Headers Basics

HTTP caching works because servers tell intermediaries and browsers how to cache.

Header / concept	What it does	Why it matters
`Cache-Control`	Defines caching directives like max age or public/private	Primary cache behavior control
`s-maxage`	Shared-cache max age	Lets CDN cache differently from browser
`ETag`	Validator representing response version	Enables revalidation without full body transfer
`Last-Modified`	Timestamp validator	Simpler revalidation mechanism
`stale-while-revalidate`	Allows stale content briefly while refresh happens	Better user latency and fewer stalls
`Vary`	Signals which request headers affect cache key	Critical for safe caching of content variations

8.7 Revalidation Flow

sequenceDiagram
	participant U as User Browser
	participant E as CDN Edge
	participant O as Origin

	U->>E: GET /app.js with validator
	E->>O: Revalidate with ETag / If-None-Match
	alt Not changed
		O-->>E: 304 Not Modified
		E-->>U: Cached body reused
	else Changed
		O-->>E: 200 New content
		E-->>U: New content cached and returned
	end

Revalidation avoids retransmitting full content when the content has not changed.

8.8 Personalized Content Challenges

CDNs are easy for public static assets. They are harder for personalized content.

Problems:

one user's data must not leak to another user
too many personalization dimensions can destroy cacheability
authentication headers or cookies may fragment cache keys badly

Common strategies:

cache only the shared shell, fetch personalized data separately
use edge logic to vary on a small safe set of dimensions
cache by versioned fragments instead of full pages
mark highly personalized responses as private or uncacheable at the shared edge

8.9 Dynamic Content Edge Strategies

Modern systems do not limit CDNs to images and CSS.

They often use edge caching for:

public API responses
HTML shell plus client-side personalized fetches
signed asset access
bot-resistant and rate-limited request handling
geographically optimized routing to nearest healthy origin

Google-like and Amazon-like large systems rely heavily on globally distributed frontends or edge layers because global latency is a real product problem, not just a backend benchmark problem.

8.10 Static Asset Delivery

Static asset delivery is the most successful CDN use case.

JS, CSS, and Image Delivery

Typical frontend/backend production flow:

frontend build produces versioned asset filenames
assets are uploaded to object storage or origin bucket
CDN caches those assets globally
HTML references versioned URLs
browser and CDN cache them aggressively because names change on deploy

Versioned Asset Strategy

Versioned or content-hashed filenames solve invalidation elegantly.

Example:

app.8f3d2.js instead of app.js

If content changes, the filename changes. That means old caches remain valid for old references, while new deployments use new URLs.

This is one of the cleanest examples of version-based invalidation in production.

Immutable Asset Caching

If assets are content-addressed or versioned, you can safely use very long cache lifetimes and immutable caching directives.

That gives extremely high cache hit rates with almost no freshness downside.

Cache Busting

Cache busting means changing the URL when content changes so caches naturally treat the asset as new.

Good cache busting is usually versioned naming, not manual emergency purges for every deploy.

Compression Basics

CDNs and origins commonly use compression:

Gzip: common general-purpose compression
Brotli: often better compression for web assets, especially text assets

Why it matters:

lower transfer size
faster page loads
reduced bandwidth cost

Image Optimization Basics

Images dominate page weight in many systems.

Common CDN/image strategies:

resize images per device size
use modern formats where possible
compress aggressively without harming visible quality
cache multiple transformed variants at edge

Signed URLs Basics

Signed URLs allow protected asset access through time-limited or permission-scoped links.

This is common for:

private downloads
customer-specific files
media assets behind authorization rules

The CDN can still help, but the cache key and security model must be designed carefully.

8.11 Global Distribution

Global delivery changes architecture decisions.

Geo Routing

Geo routing directs users toward nearby or appropriate regions.

Why it matters:

shorter network round trips
better perceived performance
better regional failover options

Anycast Basics

Anycast is a routing technique where multiple edge locations advertise the same IP, and network routing sends the user to a nearby or efficient destination.

This matters because users do not manually choose an edge. Network routing steers them.

Regional Latency Reduction

If your origin is only in one region, every distant user pays transcontinental latency. CDNs reduce that for cacheable content, but truly dynamic uncached requests still feel origin distance.

This is why global systems often pair CDNs with multi-region origins.

Multi-Region Architecture Impact

Once you have multiple origins or regions, the performance layer must interact with:

traffic steering
state locality
replication lag
failover policies
regional cache consistency

Failover Benefits

A good CDN and global routing layer can keep a regional origin issue from becoming a full global outage. Edge caches may continue serving stale or previously cached content while origins recover.

Origin Shielding Basics

Origin shielding adds an intermediate cache layer so many edge POPs do not all miss directly to origin. This is useful during viral events or large cache turnovers.

8.12 Real-World Examples

Netflix is the classic example of edge-heavy delivery for video content; the lesson is that moving content close to users dramatically changes scalability economics
Amazon-like e-commerce systems use CDNs for asset delivery, image optimization, and global storefront performance
GitHub-like systems use edge delivery for assets, release downloads, and parts of public web traffic
Stripe-like documentation, dashboards, and static resources benefit from aggressive CDN caching even when core payment flows remain origin-controlled
typical SaaS systems often keep app shells and static assets heavily cached while user-specific API calls remain dynamic

9. Cache Invalidation

Cache invalidation is the hardest part of the performance layer because it is where performance and correctness collide.

9.1 Why Cache Invalidation Is Hard

The famous joke says there are only two hard things in computer science: cache invalidation and naming things.

The practical meaning is this:

Once you copy data away from the source of truth, you have created multiple versions of reality. Now you must decide when old copies stop being acceptable.

That is hard because:

one source update may affect many cached views
invalidation can race with reads and writes
events can be delayed or lost
caches may exist at many layers: browser, CDN, local process, Redis
some views are aggregates, not direct copies of one row

9.2 Delete vs Update Strategies

There are two classic invalidation approaches after a write.

Strategy	How it works	Strengths	Weaknesses
Delete on change	Remove cache entry after source update	Simple, avoids writing wrong value into cache	Next read is a miss, can trigger stampede
Update on change	Write new value into cache immediately	Better freshness, avoids immediate miss	Risk of dual-write inconsistency and more write overhead

Delete is often simpler and safer. Update can be faster for read-after-write workloads.

9.3 Event-Driven Invalidation

In event-driven invalidation, the source-of-truth write publishes an event that tells caches or downstream services what changed.

Example flow:

product price changes in database
product service emits product.updated
consumers remove or refresh relevant cache keys
next read sees new data or repopulates with new value

This is powerful because it decouples writers from all readers and cached views. It is also operationally harder because events must be reliable enough.

9.4 Pub/Sub Invalidation

Lightweight invalidation often uses Pub/Sub.

This works when:

missed events are acceptable because TTL is a fallback
low latency matters
invalidation is best effort rather than strictly durable

It is weaker when you need guaranteed processing and replay.

9.5 Version-Based Invalidation

Version-based invalidation means the cache key or validator includes a version.

Examples:

user:123:v17
asset filename hash
ETag generated from content version

This is powerful because old cached entries naturally become irrelevant when the version changes.

It is extremely common in:

static assets
schema-aware API responses
derived views where version numbers are easy to compute

9.6 Tag-Based Invalidation

Tag-based invalidation groups related cache entries under logical tags.

Example:

product detail page, search results, category page, and recommendation widget all share the tag product:123

When the product changes, all content attached to that tag can be invalidated.

This is useful when one underlying object fans out into many cached representations.

9.7 Dependency Invalidation

Many caches hold derived data, not raw rows.

Example:

homepage recommendations depend on user preferences, inventory, and pricing

Now invalidation is harder because one source change may invalidate multiple aggregates.

This is where dependency graphs, tags, or event fan-out matter.

9.8 Eventual Consistency Tradeoffs

In practice, many invalidation systems are eventually consistent.

That means for some brief period:

source of truth is updated
one cache is updated
another cache still has old data

Your job is to make that window safe.

Techniques include:

bounded TTL
read version checks
idempotent invalidation events
periodic repair or refresh jobs
treating stale responses as acceptable only for certain data types

9.9 Stale Read Mitigation

Good production systems do not assume invalidation is perfect. They layer safeguards.

Common mitigations:

short TTL for sensitive data
longer TTL plus event invalidation for less sensitive data
version numbers in payloads
client-side revalidation for edge content
cache bypass on critical user actions
observability for stale-read incidents

9.10 Cache Invalidation Flow

flowchart TD
	W[Write Request] --> DB[(Database Update)]
	DB --> EVT[Change Event]
	EVT --> INV[Invalidation Service]
	INV --> REDIS[Redis / App Cache Delete or Refresh]
	INV --> CDN[CDN Purge or Tag Invalidate]
	INV --> LOCAL[Local Service Cache Bust]
	REDIS --> NEXT[Next Read Rebuilds or Uses Fresh Value]
	CDN --> NEXT
	LOCAL --> NEXT

9.11 Real-World Invalidation Patterns

Typical production patterns:

product catalog systems invalidate product detail keys and search-result fragments when price or stock changes
GitHub-like public pages often rely on versioned assets and shorter-lived HTML or fragment caching
Stripe-like systems may avoid aggressive caching on the most correctness-sensitive payment paths but still use invalidation for dashboard and metadata views
typical SaaS apps invalidate tenant configuration, permissions, and dashboard aggregates via events plus TTL backstops

9.12 Best Practices and Common Mistakes

Best practices:

define source of truth clearly
design cache keys systematically
make invalidation events idempotent
use TTL as backup, not as the only correctness mechanism for important data
model dependencies explicitly for derived views

Common mistakes:

forgetting that one row update affects multiple cached views
using overly broad purges that destroy hit ratio
trusting best-effort invalidation for correctness-sensitive data
not planning what happens if the invalidation bus is down

10. How These Pieces Connect in Actual Architecture

The performance layer matters most when you can explain how the pieces work together, not just individually.

10.1 Typical SaaS Request Flow

sequenceDiagram
	participant Browser as Browser
	participant CDN as CDN Edge
	participant API as API Service
	participant Local as Local Cache
	participant Redis as Redis
	participant DB as Database
	participant Bus as Event Bus

	Browser->>CDN: Request app shell / assets
	alt Edge hit
		CDN-->>Browser: Cached asset or cached response
	else Edge miss
		CDN->>API: Forward request
		API->>Local: Lookup hot local entry
		alt Local hit
			Local-->>API: Value
		else Local miss
			API->>Redis: Lookup shared cache
			alt Redis hit
				Redis-->>API: Value
				API->>Local: Populate local cache
			else Redis miss
				API->>DB: Query source of truth
				DB-->>API: Fresh data
				API->>Redis: Store with TTL
				API->>Local: Populate
			end
		end
		API-->>CDN: Response
		CDN-->>Browser: Response
	end

	DB-->>Bus: Change event on writes
	Bus-->>Redis: Invalidate or refresh
	Bus-->>CDN: Purge / tag invalidation

This is what a realistic design conversation sounds like. The system is not "database plus Redis." It is a layered request path with distinct hit and miss behaviors.

10.2 What Breaks at Scale

As scale grows, the performance layer encounters these problems first:

one hot key overloads a single shard
local caches diverge across many instances
cache rebuilds overwhelm the database after deployments
CDN cache keys explode because of too many vary dimensions
invalidation lags behind writes during incident conditions
eviction removes exactly the hottest working set
global traffic shifts expose region-specific cold caches

A strong answer in interviews is to identify not only the optimization but also the failure mode created by that optimization.

10.3 Performance Layer Design Heuristics

Use a CDN when:

content is static or moderately cacheable
users are globally distributed
origin offload matters

Use local cache when:

objects are extremely hot
a network hop is still too expensive
slight inconsistency is acceptable

Use distributed cache when:

many app instances need shared fast access
backend misses are expensive
TTL and invalidation can be managed

Use careful invalidation instead of blind long TTLs when:

data changes matter to users or correctness

Do not add caching yet when:

traffic is low
the real bottleneck is poor query design or bad indexes
correctness cost exceeds performance benefit

11. Interview Discussion Guide

11.1 Common Interview Questions and Strong Answer Angles

Question	What a strong answer should include
Why add a cache?	Latency reduction, DB offload, compute savings, scalability, hot-key behavior
Redis or Memcached?	Workload fit, feature needs, durability expectations, simplicity tradeoff
Cache-aside or write-through?	Read/write mix, freshness needs, miss penalty, write latency impact
What happens on cache failure?	Fallback behavior, database protection, rate limits, degraded mode
Why is invalidation hard?	Multiple copies of data, derived views, event loss, race conditions, multi-layer caches
How do you choose TTL?	Volatility, business tolerance for staleness, hit ratio, backend cost, jitter
What breaks at scale?	Stampedes, hot keys, eviction under pressure, stale reads, regional cold starts
Why use a CDN?	Edge latency reduction, origin offload, static asset delivery, global availability

11.2 A Strong Interview Structure

When asked about any performance-layer component, answer in this order:

what problem it solves
where it sits in the architecture
how it works on the request path
tradeoffs and failure modes
what you would monitor in production

That structure works for caching, Redis, Memcached, TTL, eviction, CDN, and invalidation.

11.3 Metrics You Should Mention

For caches:

hit ratio
miss latency
eviction rate
memory usage
hot key distribution
rebuild rate after expiry
stale read complaints or mismatches

For CDNs:

edge hit ratio
origin fetch rate
regional latency
revalidation rate
cache-key cardinality
purge success and propagation time

11.4 Final Mental Checklist

Ask yourself these questions in every design:

what data is read repeatedly
which reads can tolerate staleness
what happens on cache miss
what happens when the cache is down
how are writes reflected in cached views
how are hot keys handled
how do global users get low latency
how do you keep the source of truth safe even if the performance layer fails

12. Final Takeaways

The performance layer is about moving repeated work away from expensive systems and closer to the user.

Caching reduces latency, protects databases, and lowers cost. Redis gives a rich and fast in-memory platform for many backend patterns. Memcached remains a strong option for pure distributed caching. TTL and eviction policy determine whether the cache behaves like an asset or a liability. CDNs extend caching to the edge and fundamentally change global performance. Invalidation is the price you pay for speed, and it must be treated as a core design problem rather than an afterthought.

In interviews, the goal is not just to say "use cache." The goal is to explain:

why the cache exists
where it sits in the request path
how reads and writes behave
what can go stale
what breaks at scale
how production systems stay safe when the performance layer fails

That is what separates glossary knowledge from engineering understanding.

65 KiB Raw Blame History

Performance Layer

1. Big Picture: What the Performance Layer Actually Does

1.1 Performance Layer in a Real Architecture

1.2 Why Interviewers Care About This Layer

1.3 Core Mental Model

2. Caching Fundamentals

2.1 What Caching Is

2.2 Why Caching Exists

2.3 Hot Paths and Hot Data

2.4 Cache Hit vs Cache Miss

2.5 Cache Lifecycle

2.6 Cache Warming and Cold Starts

2.7 Multi-Layer Caching

2.8 Local Cache vs Distributed Cache

2.9 How Caching Changes Overall Architecture

2.10 Consistency Challenges and Stale Data Tradeoffs

2.11 Failure Patterns Every Interviewer Expects

2.12 Best Practices and Common Mistakes

3. Redis

3.1 What Redis Is

3.2 Why Redis Is Widely Used

3.3 In-Memory Architecture

3.4 Single-Threaded Event Loop Concept

3.5 Persistence Basics

3.6 Redis Data Structures and Why They Matter

Strings

Hashes

Lists

Sets

Sorted Sets

Bitmaps

Streams

3.7 Pub/Sub Basics

3.8 Distributed Locks Basics

3.9 Common Redis Use Cases

Rate Limiting

Session Storage

Leaderboards

Queues Basics

3.10 Replication Basics

3.11 Sentinel Basics

3.12 Redis Cluster Basics

3.13 Memory Management Considerations

3.14 When Redis Should Not Be Used

3.15 Real-World Patterns

4. Memcached

4.1 What Memcached Is

4.2 How Memcached Differs in Spirit from Redis

4.3 Simple Distributed Caching Model

4.4 Memory-Only Behavior

4.5 Slab Allocation Basics

4.6 Cache-Focused Design

4.7 Common Production Use Cases

4.8 Scaling Characteristics

4.9 Limitations Compared to Redis

4.10 Redis vs Memcached

4.11 When Companies Choose One Over the Other

5. Cache Access Patterns

5.1 Cache-Aside Pattern

Why Cache-Aside Exists

Cache-Aside Read Flow

Advantages

Disadvantages

Stale Data Risks

Failure Cases

Common Production Usage

5.2 Write-Through

Write Path Flow

Why It Exists

Benefits

Tradeoffs

Failure Handling Questions

Production Suitability

5.3 Write-Back / Write-Behind

Why It Exists

Throughput Advantages

Durability Risks

Data Loss Scenarios

Queueing Considerations

65 KiB

Raw Blame History