Files

T

tarun-elango 26810e43d0 sd text

2026-04-26 13:27:19 -04:00

61 KiB

Raw Permalink Blame History

Request Handling

Request handling is the full journey of a request from the moment a client sends it to the moment the system returns a response. In interviews, this topic sits at the boundary between API design, distributed systems, reliability engineering, and production operations. In real systems, request handling is where latency, availability, security, and cost are decided.

Most weak system design answers treat a request as if it magically reaches the correct service and that service magically succeeds. Real systems do not work that way. Before business logic runs, a request usually passes through multiple control points:

edge protection
routing
load balancing
authentication and authorization checks
rate limiting or throttling
validation
version negotiation
retries and failover logic
observability hooks

If you understand request handling well, you can explain not only how a request succeeds, but also how the system behaves when traffic spikes, instances fail, regions go down, clients retry aggressively, or malformed input hits your API.

This guide is written with two goals:

Help you answer backend and system design interview questions with depth and structure.
Help you understand how production systems at companies like Google, Netflix, Uber, Amazon, GitHub, Stripe, and typical SaaS platforms are actually built.

Examples in this guide are intentionally generalized from widely used industry patterns and public engineering discussions rather than private internal implementation details.

1. Big Picture: What Request Handling Really Means

At a high level, request handling exists because distributed systems are hostile environments:

networks are slow and unreliable
clients are untrusted
services scale up and down dynamically
requests are not evenly distributed
failures are partial, not binary
deployments happen continuously
one bad downstream dependency can cascade into an outage

The job of request handling is to answer a series of questions quickly and safely:

Should this request be allowed into the system?
Is the client authenticated and allowed to do this?
Is the request well-formed and safe?
Which region, cluster, service, and instance should receive it?
Can the system handle the load right now?
If something is failing, should we retry, reroute, degrade, or reject?
How do we observe what happened later?

1.1 End-to-End Request Lifecycle

flowchart LR
	C[Client: Browser / Mobile / API Consumer] --> CDN[CDN / Edge Cache]
	CDN --> WAF[WAF / Reverse Proxy / TLS Termination]
	WAF --> GW[API Gateway]
	GW --> POL[Auth / Validation / Rate Limit]
	POL --> ROUTE[Routing + Load Balancing]
	ROUTE --> S1[Service A]
	ROUTE --> S2[Service B]
	S1 --> CACHE[(Cache)]
	S1 --> DB[(Database)]
	S2 --> MQ[(Queue / Stream)]
	GW -. logs / metrics / traces .-> OBS[Observability Stack]
	S1 -. logs / metrics / traces .-> OBS
	S2 -. logs / metrics / traces .-> OBS

1.2 Core Goals of Request Handling

Goal	Why it matters	Typical mechanisms
Correctness	Wrong requests can corrupt data or create security holes	validation, auth, idempotency, request signing
Availability	Users should still be served when instances or regions fail	load balancing, health checks, failover, retries, circuit breaking
Performance	Users care about latency more than architecture diagrams	caching, compression, routing, efficient LB strategy
Isolation	One tenant or one endpoint should not take down the whole system	rate limiting, throttling, priority queues, load shedding
Observability	If you cannot see failures, you cannot fix them	centralized logs, metrics, traces, correlation IDs
Evolvability	APIs and deployments must change without breaking clients	versioning, traffic splitting, canary release, blue-green rollout

1.3 Interview Framing

In an interview, saying "I will add a load balancer" is not enough. A stronger answer sounds more like this:

"Requests first hit a reverse proxy or gateway where we terminate TLS, authenticate the caller, apply rate limiting, and route traffic. From there, an L7 load balancer sends traffic to healthy service instances discovered dynamically. We keep requests idempotent where retries are possible, use readiness checks so bad instances do not receive traffic, and add observability at the edge and service layers so we can debug tail latency and failure patterns."

That answer shows system thinking rather than component name-dropping.

2. API Gateway

2.1 What It Is

An API gateway is the entry point for requests into a backend system. It sits between clients and backend services and applies common policies before requests reach business logic.

Think of it as a programmable front door for your platform.

Common technologies:

Envoy
NGINX
Kong
HAProxy
AWS API Gateway
Spring Cloud Gateway
Netflix Zuul historically

2.2 Why It Exists

Without a gateway, each service often ends up re-implementing the same cross-cutting concerns:

token validation
request logging
rate limiting
route matching
error normalization
response compression
API version handling

That leads to duplicated logic, inconsistent behavior, and harder operations.

The gateway centralizes edge concerns so backend services can focus more on business rules.

2.3 Main Responsibilities

Responsibility	What it does	Why it belongs at the gateway
Authentication	Verifies tokens, API keys, signed requests	cheap early reject before backend work
Authorization at coarse level	Rejects callers that cannot access an API family	reduces unnecessary downstream traffic
Request aggregation	Combines data from multiple services into one response	reduces client chattiness, especially mobile
Centralized logging	Captures request metadata once	consistent audit and debugging
Observability	emits metrics, traces, request IDs	easier latency and error tracking
Service discovery integration	resolves service names to live instances	works with autoscaling and dynamic infra
Retries and timeouts	handles transient failures	reduces client-visible errors when used carefully
Circuit breaking	stops sending traffic to failing backends	prevents cascading failure
Response transformation	maps internal responses to public API shape	decouples client contract from internal service contract
Caching	serves repeated read traffic cheaply	lowers latency and backend load

2.4 How It Works Internally

A gateway typically processes a request through a pipeline:

Accept TCP/TLS connection.
Terminate TLS or pass it through depending on setup.
Parse HTTP request metadata.
Match route rules using host, path, method, headers, or query parameters.
Apply middleware or policies such as auth, rate limiting, schema checks, and logging.
Resolve the upstream service using static config or service discovery.
Select a healthy backend instance using a load-balancing policy.
Forward the request.
Apply retries, timeouts, or circuit-breaker policy if needed.
Transform, compress, cache, or redact the response.
Emit logs, metrics, and trace spans.

2.5 Request Lifecycle Through a Gateway

sequenceDiagram
	participant Client
	participant Gateway
	participant Auth as Auth/JWKS
	participant Registry as Service Discovery
	participant Service as Backend Service
	participant Obs as Observability

	Client->>Gateway: HTTPS request
	Gateway->>Auth: Validate token or signature
	Auth-->>Gateway: Auth result
	Gateway->>Registry: Resolve upstream instances
	Registry-->>Gateway: Healthy endpoints
	Gateway->>Service: Forward request
	Service-->>Gateway: Response
	Gateway->>Obs: Logs / metrics / traces
	Gateway-->>Client: Final response

2.6 Request Aggregation

Request aggregation is when the gateway calls multiple backend services and combines their results into a single response.

Example: a mobile home screen might need:

profile service
recommendation service
notification service
recent activity service

Without aggregation, the mobile app might issue 4 to 8 network calls. With a gateway or backend-for-frontend layer, the client makes one call and receives one composite response.

Why it exists:

mobile networks are high latency
clients should not need to know internal service topology
it reduces repetitive orchestration logic across clients

Tradeoffs:

the gateway becomes more complex
partial failures are harder to represent
tail latency can worsen because one slow dependency slows the whole aggregate response
aggregation logic can become accidental business logic

Best practice: keep aggregation focused on shaping data for clients, not on implementing domain rules that belong in services.

2.7 Authentication at the Gateway

This is common because authentication is cheap to reject early and expensive to repeat everywhere.

Typical patterns:

JWT validation at the gateway using cached public keys from a JWKS endpoint
API key lookup for machine clients
OAuth token introspection for opaque tokens
mTLS for service-to-service requests in internal systems

Important nuance: gateway authentication does not eliminate service-level authorization.

The gateway can answer, "Is this caller known and allowed to hit this API family?"

The service still often needs to answer, "Can this user access this specific invoice, order, or repository?"

Common mistake: pushing all authorization into the gateway. Fine-grained authorization usually belongs closer to the business object.

2.8 Centralized Logging and Observability

The gateway is the best place to generate or propagate correlation identifiers such as:

request ID
trace ID
span ID
tenant ID
client application ID

Useful gateway metrics:

requests per second by route
latency percentiles, especially p95 and p99
error rates by status code family
upstream retry counts
rate-limited requests
auth failures
cache hit ratio

Observability matters because many request-handling bugs look similar from the outside. A user just sees a timeout. Internally, that timeout might have been caused by:

route misconfiguration
unhealthy instances still receiving traffic
retry storms
TLS handshake issues
bad DNS resolution
a dependency that is slow but not fully down

2.9 Service Discovery Integration

In dynamic environments, backend instances change constantly because of autoscaling, deployments, and failures. Hardcoding backend IPs is not realistic.

The gateway therefore needs service discovery.

Common models:

client-side discovery: the caller resolves service instances and chooses one
server-side discovery: the gateway or load balancer resolves instances and forwards traffic

In Kubernetes, a gateway often routes to a Service object, and kube-proxy or the data plane routes to live pod endpoints. In service-mesh-heavy environments, Envoy sidecars may receive endpoint updates via xDS-style control-plane APIs.

Failure case: stale discovery data can send traffic to dead instances.

Best practices:

respect health information, not just presence in registry
support quick config propagation
use connection draining when removing instances
avoid very aggressive caching of endpoint lists

2.10 Retries

Retries are deceptively dangerous.

Why they exist:

networks fail transiently
connections reset occasionally
an instance may fail while others are healthy

Why they are risky:

retries multiply traffic load during incidents
non-idempotent operations may execute twice
stacked retries at multiple layers create retry storms

Best practices:

retry only idempotent or safely repeatable operations
use bounded retries, usually very small counts
add exponential backoff and jitter
couple retries with timeouts and circuit breakers
never let every layer retry blindly

Interview point: if a gateway retries a POST payment request, you must discuss idempotency keys.

2.11 Circuit Breaking

Circuit breaking protects the rest of the system from a dependency that is failing or timing out.

Typical states:

closed: traffic flows normally
open: requests fail fast instead of calling a bad dependency
half-open: limited test traffic checks whether recovery has happened

Why it matters:

If a database or dependency is timing out, continuing to send full traffic often just consumes worker threads, saturates queues, and increases latency everywhere.

Circuit breaking buys time and preserves system health.

2.12 Response Transformation

Gateways often transform responses by:

removing internal fields
renaming fields for public API consistency
changing status code mappings
combining multiple backend responses into one DTO
translating protocols such as gRPC to JSON/HTTP for clients

Useful when:

internal services evolve independently
multiple clients need different shapes
you want to hide internal topology

Danger: too much transformation turns the gateway into a fragile orchestration layer.

2.13 Request and Response Caching

Caching at the gateway is powerful for read-heavy APIs.

Good candidates:

public or semi-public GET responses
configuration or feature metadata
rarely changing reference data

Hard parts:

cache invalidation
per-user or per-tenant cache keys
auth-sensitive content
stale responses after writes

Best practices:

cache only clearly safe responses
include authorization context in the cache key if needed
use TTLs conservatively
prefer cache headers and explicit policies over guesswork

2.14 Gateway vs Service Mesh

Dimension	API Gateway	Service Mesh
Main traffic direction	north-south, from clients into platform	east-west, service-to-service
Main role	edge policy and API entry	internal traffic management
Common features	auth, rate limiting, versioning, aggregation, public API concerns	mTLS, retries, traffic shaping, service identity, observability
Consumer	external clients or apps	internal services
Operational risk	can become a choke point	can add significant platform complexity
Typical tools	API Gateway products, NGINX, Kong, Envoy	Istio, Linkerd, Consul Connect, Envoy-based meshes

They are not mutually exclusive. Many real systems use both.

2.15 Production Patterns

Netflix popularized a gateway-style edge layer to handle cross-cutting concerns before requests hit microservices.
Stripe-like public APIs often emphasize idempotency, auth, versioning, and request logging at the edge because correctness matters more than raw throughput alone.
Large SaaS platforms often use gateways to enforce tenant-aware limits and route requests to the correct service family.

2.16 Common Mistakes

turning the gateway into a monolith of business logic
retrying non-idempotent requests
centralizing coarse and fine-grained authorization in the same place
logging secrets or PII in raw form
caching personalized responses incorrectly
making the gateway a single point of failure without horizontal scaling

3. Request Routing

Routing decides where a request goes after it enters the system.

This sounds simple, but routing is one of the most important control points in production because it determines:

which service handles the request
which version handles it
which region handles it
which tenant or experimental path it follows

3.1 Routing Types

Routing style	How it works	Common use cases	Risks
Path-based routing	route by URL path such as `/payments` or `/users`	REST APIs, monolith decomposition, ingress rules	overlapping route patterns, regex complexity
Host-based routing	route by hostname such as `api.example.com` or `admin.example.com`	multiple products or domains behind one edge	DNS and certificate management complexity
Header-based routing	route using headers such as version, tenant, device type, or experiment ID	canaries, A/B tests, tenant isolation	header spoofing, harder debugging
Geo-based routing	route by location, region, or country	latency reduction, data residency, regulatory compliance	incorrect geo inference, data locality problems
Canary routing	send a small portion of traffic to a new version	safe rollout	canary users may not represent real load
Blue-green routing	switch traffic between old and new environments	low-risk deployments with quick rollback	expensive duplication, data migration risk
Weighted traffic splitting	send 90 percent to old version and 10 percent to new version, then ramp	gradual deployment, model rollout	sticky results, measurement bias

3.2 How Routing Usually Works Internally

The router generally evaluates rules in order:

Match host.
Match method and path.
Evaluate higher-priority header or cookie rules.
Apply traffic-splitting policy if multiple upstreams are eligible.
Resolve the target service using discovery.
Pick a healthy instance.

Rule order matters. A subtle configuration bug can shadow a more specific route with a broader one.

3.3 Routing Decision Flow

flowchart TD
	A[Incoming Request] --> B{Host Match?}
	B -->|api.example.com| C{Path Match?}
	B -->|admin.example.com| D[Admin Service]
	C -->|/v1/payments| E{Header or Canary Rule?}
	C -->|/v1/users| F[User Service]
	E -->|Canary| G[Payments vNext]
	E -->|Default| H[Payments vCurrent]

3.4 Path-Based Routing

This is the most common form of routing for HTTP APIs.

Examples:

/orders/* to order service
/payments/* to payment service
/search/* to search service

Why it exists:

intuitive for REST-style APIs
easy to reason about operationally
fits ingress and gateway tools well

Failure case: route collisions. For example, a generic /payments/* rule may accidentally catch /payments/admin/* if route precedence is wrong.

3.5 Host-Based Routing

Host-based routing routes by domain name.

Examples:

api.company.com
admin.company.com
uploads.company.com
hooks.company.com

This is useful when products or workloads differ enough that they deserve different operational policies.

For example, a webhook ingestion domain may need different timeout, retry, and rate-limit rules than a user-facing API domain.

3.6 Header-Based Routing

Header-based routing is common for:

API version rollout
internal testing
tenant routing
language or device-specific responses
canary routing with explicit opt-in

Example headers:

X-API-Version
X-Tenant-ID
X-Experiment
X-Canary

Be careful: headers are easy for internal callers, but for public APIs they may be spoofed unless protected by auth and policy.

3.7 Geo-Based Routing

Geo routing tries to send users to the best region based on one or more goals:

lower latency
data residency compliance
regulatory boundaries
disaster isolation
capacity balancing

Examples:

EU users sent to EU region for GDPR-sensitive workloads
ride-sharing or mapping traffic sent to region nearest user demand
global SaaS tenants pinned to a home region

Tradeoffs:

nearest region is not always best if the user’s data lives elsewhere
geo-IP is imperfect
cross-region writes can be very expensive or consistency-sensitive

Interview point: geo routing and data placement must be discussed together.

3.8 Canary Routing

Canary routing sends a small portion of traffic to a new version first.

Typical rollout:

1 percent traffic
5 percent traffic
10 percent traffic
25 percent traffic
50 percent traffic
100 percent traffic

What you watch:

error rate
latency regression
resource utilization
business metrics such as checkout success or sign-in success

Why companies use it:

safer than instant full rollout
catches dependency or schema issues early
provides rollback window

Real-world intuition: Netflix-style continuous deployment is only practical because traffic shaping and observability let teams expose changes gradually.

3.9 Blue-Green Deployments

Blue-green means you maintain two environments:

blue: current production
green: new version

Then you shift traffic from one to the other.

Why it exists:

rollback is simple in theory because the old environment still exists
deployment risk is separated from code build risk

Where it gets hard:

databases do not switch as cleanly as stateless services
dual environments cost more
background jobs and asynchronous consumers may still affect shared data

3.10 Traffic Splitting

Traffic splitting is a more general concept than canary.

You can split traffic by:

percentage
user cohort
tenant tier
geography
request attributes
session stickiness

This is useful for:

A/B experiments
canaries
ML model rollout
progressive feature migration

3.11 Service Discovery Impact

Routing depends on service discovery more than most beginners realize.

A route usually points to a logical service name, not a fixed machine. The system must map that name to currently healthy instances.

If service discovery is stale or slow:

requests go to dead instances
traffic may concentrate on a few nodes
rollout changes may not propagate consistently

Best practices:

keep route definitions separate from ephemeral instance identity
use health-aware endpoint selection
support connection draining during deployment
prefer automation over manual endpoint lists

3.12 Common Interview Discussions

How do you safely route traffic to a new version?
How do you guarantee tenant isolation during routing?
What happens if a route config is wrong globally?
How do you roll back a bad canary quickly?

4. Load Balancing

4.1 Why Load Balancing Exists

If requests were sent to a single server, that server would become a bottleneck and a single point of failure.

Load balancing exists to:

distribute traffic across multiple instances
improve availability
enable horizontal scaling
reduce overload on any single machine
route away from unhealthy instances

Horizontal scaling is the key idea. Instead of buying one huge machine forever, you run multiple smaller instances and distribute work.

4.2 Active-Active vs Active-Passive

Mode	Meaning	Benefits	Drawbacks	Common use
Active-active	multiple nodes or regions serve traffic at the same time	high availability, better capacity utilization, low failover time	more complex consistency and routing	web and API frontends, global services
Active-passive	one node or region is primary, another waits as standby	simpler operational model, easier correctness story	slower failover, unused capacity	legacy systems, cost-sensitive setups, some databases

Interview rule of thumb: active-active improves availability and latency, but only if the data layer and operational discipline are strong enough to support it.

4.3 Common Load-Balancing Algorithms

Algorithm	How it works	Best for	Strengths	Weaknesses
Round robin	rotate evenly through instances	similar stateless servers	simple and cheap	ignores load differences
Weighted round robin	same as round robin, but some instances get more traffic	mixed-capacity fleets	easy to express capacity skew	weights can become stale
Least connections	send to instance with fewest active connections	long-lived connections, uneven request duration	better than round robin for sticky or long sessions	connection count may not reflect real CPU or memory load
Least response time	prefer instances with lower observed latency	latency-sensitive APIs	reacts to slow instances	can create feedback loops or oscillations
Consistent hashing	map request key to instance ring	caches, sharded workloads, affinity	minimal reshuffling when instances change	hotspot risk if keys are skewed
IP hash	hash client IP to pick instance	simple affinity	easy stickiness	NAT and shared IPs skew traffic badly

One strong interview addition: consistent hashing is not a general-purpose default. It is especially useful when request locality matters, such as cache ownership or shard routing.

4.4 Round Robin

Round robin is the simplest algorithm: instance A, then B, then C, then back to A.

Why it exists:

low overhead
easy to reason about
works fine when instances are homogeneous and requests are similar

Why it fails:

requests may vary wildly in cost
one node may be slow but still receive equal traffic

4.5 Weighted Round Robin

Weighted round robin lets you give bigger instances more traffic.

Example:

instance A weight 4
instance B weight 2
instance C weight 1

This is useful during mixed instance migrations or when canary nodes should receive only a small share.

4.6 Least Connections

This is common when connection duration matters, such as proxies handling long-lived connections or websocket-heavy workloads.

Why it helps:

a server already holding many active sessions gets less new work

Limitations:

10 idle connections are not equivalent to 10 expensive requests
CPU-heavy but short-lived requests may still be imbalanced

4.7 Least Response Time

This tries to avoid slow nodes by routing to faster ones.

Good intuition: if one instance’s latency is rising, it may be overloaded or degraded.

Risk: feedback loops.

If the algorithm overreacts, traffic can bounce around and create instability. Slow-start, damping, and outlier detection often help.

4.8 Consistent Hashing

Consistent hashing is important in system design interviews.

Instead of randomly balancing every request, you map a key such as:

user ID
session ID
cache key
shard key

to a position on a hash ring. Each instance owns a portion of the ring.

Why it exists:

when instances join or leave, only a subset of keys remap
this preserves cache locality and reduces churn

Common use cases:

distributed caches
sharded databases
sticky-ish routing without a central session store

Failure case: skewed keys can create hotspots. Virtual nodes and better key design are common mitigations.

4.9 IP Hash

IP hash is a crude form of affinity.

It is easy to set up, but it performs poorly when many clients appear behind a single NAT or corporate proxy. One large office can accidentally behave like one giant user from the load balancer’s perspective.

4.10 Distributed Load Balancing

At scale, the load balancer itself must also scale.

That means:

multiple LB nodes, not one appliance
shared or replicated configuration
health state propagation
often anycast or DNS in front of LB fleets

If the load balancer layer is not distributed, you have just moved the bottleneck.

4.11 Global Load Balancing

Global load balancing chooses a region before a local load balancer chooses an instance.

Goals:

reduce latency
avoid unhealthy regions
keep traffic near user or data
manage regional capacity

flowchart TD
	U[Global Users] --> GTM[Global Traffic Manager]
	GTM --> US[US Region]
	GTM --> EU[EU Region]
	GTM --> AP[APAC Region]
	US --> USLB[Regional Load Balancer]
	EU --> EULB[Regional Load Balancer]
	AP --> APLB[Regional Load Balancer]
	USLB --> US1[Service Instances]
	EULB --> EU1[Service Instances]
	APLB --> AP1[Service Instances]

Google-scale and Amazon-scale systems rely heavily on global traffic management concepts because the first routing decision is often regional, not per-instance.

4.12 DNS Load Balancing

DNS can return different IPs for the same hostname.

Why it is attractive:

simple
globally available
often the first layer of balancing

Limitations:

DNS caching means failover is not immediate
clients do not always honor TTL precisely
DNS cannot see application-level health very well by itself

Interview point: DNS is useful, but it is usually too coarse to be the only failover mechanism.

4.13 Best Practices

remove unhealthy instances quickly
use connection draining before terminating instances
prefer zone-aware balancing if cross-zone traffic is expensive
track p95 and p99, not just average latency
use slow start for newly added instances so they do not get overwhelmed instantly
treat retries as part of traffic load, not separate from it

4.14 Common Failure Cases

unhealthy instances still receive traffic because readiness is wrong
least-response-time routing amplifies instability
one AZ gets overloaded because balancing is not zone-aware
sticky affinity causes hotspots
the load balancer layer itself is not redundant

5. Rate Limiting

5.1 Why It Exists

Rate limiting controls how much traffic a client, tenant, API key, IP, or endpoint can send over time.

It exists to enforce:

abuse prevention
fairness
cost control
protection of downstream systems
multi-tenant isolation

Without rate limits, one abusive or buggy client can monopolize resources.

5.2 Where It Is Applied

Rate limits can exist at multiple layers:

CDN or edge
API gateway
service layer
database or queue concurrency controls

Different layers often enforce different types of limits.

Examples:

per-IP login limit at the edge
per-API-key limit at the gateway
per-tenant expensive-operation limit in the service

5.3 Common Algorithms

Algorithm	Idea	Strengths	Weaknesses	Good fit
Fixed window	count requests in each fixed period	very simple	bursty at window boundaries	low-complexity systems
Sliding window	approximate rolling window using adjacent buckets	smoother than fixed window	more logic and state	typical APIs
Sliding log	keep timestamps of requests	precise	expensive in memory and compute	strict low-volume policies
Token bucket	tokens refill at a constant rate, requests spend tokens	allows controlled bursts	stateful logic	public APIs with burst tolerance
Leaky bucket	requests enter a bucket and drain at fixed rate	smooths outgoing rate	may delay or drop burst traffic	traffic shaping and smoothing

5.4 Fixed Window

Example rule: 100 requests per minute.

Implementation is often as simple as:

increment a counter for current window
reject if counter exceeds limit
expire counter at end of window

Problem: boundary burst.

A client can send 100 requests at the end of one minute and 100 more at the start of the next minute, effectively sending 200 requests in a short interval.

5.5 Sliding Window

Sliding window reduces the boundary-burst problem.

Instead of treating time as disconnected minute buckets, it approximates usage across a rolling interval. This is fairer and smoother for APIs.

5.6 Sliding Log

Sliding log stores individual request timestamps and removes old ones.

It is the most exact of the common approaches, but also the most expensive. It is rarely the default choice for very high-cardinality high-throughput public traffic.

5.7 Token Bucket

This is one of the most useful algorithms to understand.

Mental model:

tokens drip into a bucket at a steady rate
each request consumes one or more tokens
if the bucket is empty, reject or delay

Why it is popular:

supports bursts up to bucket capacity
preserves average rate over time
easy to reason about for product limits

Example:

refill 10 tokens per second
bucket size 50

The client can burst 50 requests instantly, but over time they only sustain about 10 per second.

5.8 Leaky Bucket

Leaky bucket emphasizes output smoothing more than burst allowance.

Requests may queue and drain at a steady rate. This is useful when downstream systems need smooth, predictable load rather than spikes.

5.9 Redis Implementation Patterns

Redis is common for distributed rate limiting because it is fast and supports atomic operations.

Typical patterns:

fixed window: INCR plus EXPIRE
sliding log: sorted set of timestamps with ZADD, ZREMRANGEBYSCORE, and ZCARD
token bucket: store token count and last refill timestamp, update atomically with Lua

Why Lua scripts matter:

distributed rate limiting requires atomic read-modify-write behavior
without atomicity, concurrent requests can exceed the intended limit

Key design examples:

ratelimit:user:123:/payments
ratelimit:tenant:acme:minute
ratelimit:ip:203.0.113.10:login

5.10 Distributed Rate Limiting Challenges

This is where interview answers often become shallow.

Real problems include:

hot keys for large tenants or popular routes
clock skew between nodes
cross-region consistency
Redis outages
fail-open versus fail-closed policy decisions
cardinality explosion if keys are too granular

Fail-open means the limiter allows traffic if the limiter store is down.

Fail-closed means it rejects traffic if the limiter store is down.

Which is right depends on the endpoint:

login or anti-abuse endpoint may prefer fail-closed
general read endpoint may prefer fail-open to preserve availability

5.11 Best Practices

limit by the right identity: IP is often not enough
use different limits for read, write, and expensive endpoints
return 429 Too Many Requests with Retry-After when possible
monitor near-limit behavior, not just hard rejections
consider shadow mode before enforcing a new limit in production
keep rate limiting close to the edge for cheap rejection

5.12 Real-World Intuition

GitHub-style public APIs need clear client-visible limits to keep the platform fair.
Stripe-like payment APIs need rate limiting to protect correctness-sensitive backends from abuse or accidental retry loops.
Multi-tenant SaaS platforms often combine per-user, per-tenant, and per-endpoint limits.

6. Request Validation

Validation is the discipline of refusing bad requests before they do damage.

This includes much more than checking whether a JSON field exists.

6.1 Why Validation Exists

Requests are dangerous because they may be:

malformed
malicious
duplicated
replayed
semantically invalid
inconsistent with business rules

Validation protects:

correctness
security
downstream capacity
developer sanity

6.2 Validation Layers

Layer	Typical checks	Why here
Edge or gateway	body size, basic schema, auth format, signature presence, rate limit	cheap early rejection
API layer	required fields, type checks, enum checks, version compatibility	contract correctness
Domain layer	business rules and state-dependent validation	real correctness
Database layer	unique constraints, foreign keys, transactional guarantees	final integrity guardrail

Important interview point: validation should be layered. Do not rely on only one layer.

6.3 Schema Validation

Schema validation checks the request shape.

Examples:

JSON schema or OpenAPI validation for REST
protobuf validation for gRPC
GraphQL schema and resolver validation

Why it exists:

catches bad input early
makes API behavior predictable
prevents weird null or type bugs from leaking deep into business logic

What it does not do:

prove business correctness

Example:

schema can prove amount exists and is numeric
it cannot prove that a user is allowed to charge that amount

6.4 Input Sanitization

Sanitization is about ensuring input cannot be used to exploit downstream systems.

Common concerns:

SQL injection
command injection
path traversal
log injection
XSS if data will later be rendered in browsers

The important mindset is not "strip all special characters". That often breaks legitimate input.

Better practice:

use parameterized queries
encode output for the correct context
validate formats where needed
avoid blindly interpolating request data into logs or shell commands

6.5 Idempotency

Idempotency is one of the most important backend interview concepts.

A request is idempotent if sending it multiple times has the same effect as sending it once.

Why it matters:

clients retry when timeouts happen
networks fail after the server may already have processed the request
gateways or proxies may retry transient failures

Typical example: payments.

If a client sends "charge $100" twice because the first response was lost, you do not want to charge the card twice.

A common solution is an idempotency key.

sequenceDiagram
	participant Client
	participant API
	participant Store as Idempotency Store
	participant Payment as Payment Service

	Client->>API: POST /charges + Idempotency-Key: abc123
	API->>Store: Lookup key abc123
	alt Key not found
		Store-->>API: miss
		API->>Payment: Execute charge
		Payment-->>API: success
		API->>Store: Save key + normalized request hash + response
		API-->>Client: 200 OK with result
	else Key found
		Store-->>API: previous response
		API-->>Client: return same stored result
	end

Best practices for idempotency:

store both the key and enough request fingerprinting to detect misuse
scope keys appropriately, often per client or per endpoint family
keep key retention long enough to cover realistic retry windows
use for non-idempotent operations such as payment creation or order submission

6.6 Replay Protection

Replay protection prevents an attacker or buggy intermediary from resending a valid request later.

Common techniques:

timestamps with expiration windows
nonces stored briefly to prevent reuse
signed requests that include method, path, body hash, and timestamp

This is especially common in webhook verification and partner API integrations.

6.7 Request Signing Basics

Request signing often works like this:

client builds a canonical string from method, path, timestamp, and body hash
client signs it with an HMAC secret or private key
server recomputes expected signature
server rejects if signature differs or timestamp is too old

Why it exists:

verifies authenticity
detects tampering
supports replay protection when timestamp and nonce are included

GitHub-style or Stripe-style webhooks commonly use a variant of this pattern so receivers can verify that an event really came from the platform.

6.8 Common Validation Mistakes

trusting frontend validation
validating schema but not business semantics
implementing idempotency without scoping or request hashing
rejecting too late after expensive downstream work already happened
logging raw secrets, tokens, or signed payloads
treating all retries as duplicates without considering request identity

7. API Versioning

Versioning exists because APIs change, but clients do not upgrade instantly.

7.1 Why Versioning Matters

Without a versioning strategy:

client upgrades become risky
breaking changes become outages
multiple mobile app versions become painful to support
integration partners lose trust

Strong backend teams design for API evolution, not just initial launch.

7.2 Common Versioning Strategies

Strategy	Example	Benefits	Drawbacks	When useful
URI versioning	`/v1/orders`	explicit and easy to see	path clutter, can encourage large version forks	public REST APIs
Header versioning	`X-API-Version: 2025-10-01`	cleaner URLs, flexible rollout	less visible, harder to debug manually	mature APIs, platform clients
Media type versioning	`Accept: application/vnd.company.v2+json`	precise content negotiation	operationally less friendly, not beginner-friendly	specialized APIs

7.3 Backward Compatibility

This is often more important than the version number itself.

Safer changes:

adding optional fields
adding new endpoints
adding new enum values only if clients are tolerant

Risky changes:

removing fields
renaming fields
changing meaning or units of existing fields
turning nullable fields into required ones

Production rule: additive evolution is easier than breaking evolution.

7.4 Deprecation Strategy

Good deprecation is operational, not just documented.

A practical strategy:

announce deprecation clearly
measure usage of the old version
provide migration docs and examples
support both versions during a migration window
alert high-usage customers directly if possible
set and communicate a sunset date

7.5 Migration Strategy

Strong teams avoid big-bang migrations.

Typical approach:

dual-read or dual-write only when necessary and carefully controlled
gateway routes old and new versions separately
monitor client adoption
migrate major SDKs first
cut off the oldest, least-safe versions gradually

7.6 Real-World Examples

Stripe is well known for careful API versioning because payment integrations cannot break casually.
GitHub exposes explicit API versioning so clients know which contract they are using.
Internal microservices often use protobuf or schema evolution rules instead of public URI versioning.

7.7 Best Practices

version only when needed; do not fork casually
prefer backward-compatible changes when possible
monitor version usage by client and tenant
keep error formats consistent across versions
document behavioral differences, not just field differences

8. Throttling

Rate limiting and throttling are related but not identical.

8.1 Throttling vs Rate Limiting

Concept	Main goal	Typical action	Example
Rate limiting	enforce quota or fairness	reject when request budget is exceeded	100 requests per minute per API key
Throttling	protect system under stress or shape traffic	slow down, queue, degrade, or reject	reduce expensive search traffic during overload

Rate limiting is often policy-driven.

Throttling is often system-health-driven.

8.2 Why Throttling Exists

Even legitimate traffic can overwhelm a system.

Throttling helps you:

degrade gracefully rather than crash
preserve critical endpoints over less important ones
absorb bursts temporarily
smooth traffic into fragile downstream systems

8.3 Graceful Degradation

Graceful degradation means not every feature must remain equally available during stress.

Examples:

checkout stays available, recommendations are temporarily disabled
write-heavy analytics ingestion is delayed, user sign-in remains online
expensive search filters are limited, basic search still works

This is how mature systems preserve business-critical value during incidents.

8.4 Queueing

Queueing is useful when the work does not need to complete synchronously.

Examples:

email sending
thumbnail generation
event enrichment
some analytics processing

Why it helps:

decouples request acceptance from background work
smooths spikes
improves perceived responsiveness if the request can return early

Danger: unbounded queues are just hidden outages. If the queue grows forever, latency becomes effectively infinite.

8.5 Shedding Load

Load shedding means rejecting traffic on purpose so the rest of the system survives.

This can feel wrong to beginners, but it is often the correct decision.

Serving 70 percent of traffic quickly is better than timing out 100 percent of traffic after exhausting all workers.

Common strategies:

reject low-priority requests first
enforce concurrency caps on expensive endpoints
return stale cached data for noncritical reads
cut off optional features during incidents

8.6 Best Practices

define traffic priority classes
keep queues bounded
expose clear client signals such as 429 or 503
combine throttling with backoff guidance for clients
ensure degraded mode is tested before an incident

8.7 Common Mistakes

using queueing for work that users expect immediately
letting retries refill the queue faster than it drains
not distinguishing critical and noncritical traffic
treating throttling only as an edge concern when downstream systems are the real bottleneck

9. Reverse Proxy and Load Balancer

These terms overlap in practice, but they are not identical.

9.1 Reverse Proxy Role

A reverse proxy sits in front of backend servers and receives requests on their behalf.

Clients think they are talking to one endpoint. The proxy decides how to forward requests internally.

Why it exists:

hide internal topology
centralize TLS handling
compress and cache content
enforce some security rules
simplify operational control

Common tools:

NGINX
Envoy
HAProxy
cloud-managed L7 load balancers

9.2 SSL/TLS Termination

TLS termination means the proxy or load balancer decrypts incoming HTTPS traffic.

Benefits:

central certificate management
offload crypto work from backend services
enables L7 inspection for routing and policy

Tradeoff:

internal traffic must still be protected appropriately
if you terminate at the edge and send plaintext internally, the trust boundary moves inward

Many production systems re-encrypt internally or use mTLS on internal hops.

9.3 Caching

Reverse proxies can cache static assets and some API responses.

Why it matters:

lower origin load
lower latency
better resilience during backend spikes

But be careful with:

personalized responses
auth-dependent content
stale data after updates

9.4 Compression

Compression reduces payload size for responses such as JSON, HTML, CSS, and JS.

Benefits:

lower bandwidth
faster transfers for text-heavy payloads

Tradeoff:

CPU overhead
not useful for already compressed formats such as many images or zipped binaries

9.5 WAF Basics

A Web Application Firewall applies security rules to incoming requests.

It commonly helps with:

blocking obviously malicious payloads
filtering known exploit patterns
enforcing IP reputation rules
reducing bot and abuse traffic

Important nuance: a WAF is helpful, but it is not a substitute for secure application code and proper validation.

9.6 CDN Relationship

A CDN is often the outermost layer, especially for global systems.

Typical order:

CDN
WAF or reverse proxy
API gateway
internal load balancing and service routing

CDNs are best for:

static assets
edge caching
some globally cacheable API responses
DDoS absorption and edge presence

9.7 Reverse Proxy vs API Gateway

A reverse proxy is often lower-level and more generic.

An API gateway usually adds richer API-specific behavior such as:

auth policies
API keys
per-route rate limits
response transformation
version-aware routing

In practice, the same product may serve both roles.

10. L4 vs L7

This comparison appears constantly in interviews.

10.1 The Basic Difference

L4 operates at the transport layer, mainly IP, TCP, and UDP information.
L7 operates at the application layer, understanding HTTP methods, paths, headers, cookies, and sometimes message semantics.

10.2 Comparison Table

Dimension	L4	L7
Visibility	IP, port, protocol, connection metadata	URL path, headers, host, cookies, method, status
Speed	generally lower overhead	generally more overhead due to parsing and richer policy
Routing options	by IP and port	by path, host, header, content-type, user, version
TLS handling	can pass through TLS	often terminates TLS to inspect HTTP
Use cases	very high throughput transport balancing, TCP services, simple load distribution	APIs, canary routing, auth, rate limits, response transforms
Observability	coarse	rich request-aware observability
Examples	AWS NLB, IPVS-style balancing, transport proxies	AWS ALB, Envoy, NGINX, Kong

10.3 Performance Tradeoffs

Why choose L4:

lower per-request overhead
works for non-HTTP protocols
simpler fast-path routing

Why choose L7:

can make smarter decisions
can enforce API policy
can support sophisticated routing and deployment patterns

Interview answer pattern:

"I would use L7 where I need HTTP-aware routing, auth, canarying, or response handling. I would prefer L4 for simpler, high-throughput transport balancing or protocols where application parsing is unnecessary."

11. Health Checks

Health checks determine whether an instance should receive traffic or be restarted.

11.1 Liveness

Liveness asks: is the process alive at all?

If liveness fails, the platform may restart the container or instance.

Best practice: keep liveness simple. It should detect deadlock or fatal stuck states, not depend on every external dependency.

11.2 Readiness

Readiness asks: is this instance ready to serve traffic right now?

If readiness fails, the instance should stop receiving traffic, but it does not necessarily need a restart.

Examples of not-ready:

startup still in progress
critical dependency unavailable
cache warmup incomplete if that makes service unusable
instance is draining for deployment

11.3 Startup Probes

Startup probes exist because some applications take time to initialize.

Without startup-aware logic, a slow boot may be misclassified as a dead process and restarted repeatedly.

11.4 Dependency Health

This is subtle.

Should readiness depend on downstream dependencies?

Answer: only on truly critical dependencies.

If a noncritical dependency fails and the service can degrade gracefully, readiness should often stay healthy. Otherwise you risk removing all instances from service for a partial dependency issue.

11.5 Kubernetes Relevance

In Kubernetes:

liveness probe failure can restart a pod
readiness probe failure removes the pod from service endpoints
startup probe delays liveness/readiness enforcement until boot completes

This is why bad probes can cause cascading production pain.

11.6 Probe Comparison

Probe	Purpose	Good use	Common mistake
Liveness	detect dead or stuck process	deadlock or irrecoverable internal failure	checking database and causing restart storms
Readiness	decide whether to receive traffic	dependency-aware traffic gating	marking ready too early
Startup	allow slow initialization	JVM warmup, cache preload, large bootstraps	omitting it for slow-start services

11.7 Best Practices

keep liveness shallow
make readiness meaningful
distinguish recoverable dependency issues from fatal ones
support graceful shutdown by failing readiness first, then draining connections
test probe behavior during deployments and partial outages

12. Failover

Failover is the process of moving traffic or responsibility from a failing component to a healthy one.

12.1 Why It Matters

Failure is not exceptional in distributed systems. Machines fail, networks partition, regions degrade, and deployments go wrong.

Failover is how the system continues serving despite that reality.

12.2 Active-Passive Failover

One environment serves traffic. Another waits in standby.

Variants:

cold standby: mostly offline until needed
warm standby: partially provisioned
hot standby: ready to serve immediately

Benefits:

easier reasoning about writes
lower coordination complexity

Costs:

potentially slower failover
wasted standby capacity

12.3 Active-Active Failover

Multiple regions or clusters actively serve traffic.

Benefits:

lower latency for global users
fast failover because traffic is already live elsewhere
better capacity utilization

Challenges:

data consistency
write coordination
duplicate processing risk
routing users to the correct regional data

12.4 Regional Failover

Regional failover usually means a global traffic manager stops sending requests to a bad region and shifts them elsewhere.

flowchart LR
	User[Client Traffic] --> GTM[Global Traffic Manager]
	GTM -->|Primary healthy| R1[Primary Region]
	GTM -->|Standby or overflow| R2[Secondary Region]
	R1 -. region unhealthy .-> GTM
	GTM -->|Failover| R2

Hard parts:

DNS caches may slow failover
stateful sessions may not exist in the secondary region
databases may lag if replication is asynchronous
legal or data residency rules may limit where data can fail over

12.5 Database Failover Basics

This is where request-handling discussions must connect to the data layer.

Stateless compute failover is relatively straightforward.

Database failover is much harder because of:

replication lag
split-brain risk
leader election complexity
transaction durability guarantees
write fencing and stale primary protection

Common patterns:

primary-replica with leader promotion
managed database multi-AZ failover
read replicas for scale, primary for writes
careful multi-region replication for stricter availability needs

12.6 RTO and RPO

These two terms matter in interviews.

RTO: Recovery Time Objective. How quickly must the system recover?
RPO: Recovery Point Objective. How much data loss is acceptable?

Examples:

a chat notification system may tolerate some message delay and nonzero RPO
a payment ledger system usually needs very low RPO and strict correctness

Lower RTO and lower RPO generally increase cost and complexity.

12.7 Best Practices

design failover with data consistency in mind, not just traffic movement
test failover regularly, not just in slide decks
make failover automation observable and reversible
drain traffic from degraded zones before full failure when possible
separate control-plane failure from data-plane failure in your reasoning

12.8 Common Mistakes

assuming DNS failover is instant
forgetting session or cache locality during regional failover
promoting replicas without considering lag or split-brain protection
calling a system active-active when only the stateless tier is active-active

13. Sticky Sessions

Sticky sessions mean requests from the same client are repeatedly routed to the same backend instance.

13.1 Why They Exist

They are often used when session state is stored in process memory on the server.

Examples:

legacy web sessions
websocket affinity
in-memory shopping cart state in older systems

13.2 When They Help

short-term compatibility during migration away from stateful servers
workloads where per-connection state is expensive to rebuild
some real-time protocols that prefer affinity

13.3 Why They Are Often Avoided

Modern backend design prefers stateless services because stateless services:

scale horizontally more easily
recover from failure more cleanly
tolerate rebalancing better
simplify failover and deployment

Sticky sessions hurt these properties.

Problems they create:

uneven load distribution
poor failover if the chosen instance dies
harder autoscaling
harder cross-region portability

13.4 Better Alternatives

external session store such as Redis
signed or encrypted stateless tokens where appropriate
shared caches for session metadata
move connection state into durable or replicated infrastructure when necessary

13.5 Common Interview Answer

If asked about sticky sessions, a strong answer is:

"I would avoid them unless the workload truly needs affinity. In general I prefer stateless services and store session state externally so the load balancer can route to any healthy instance."

14. How These Pieces Fit Together in Real Architecture

This is the part many interview answers miss. These systems are not isolated topics. They work together as one request-handling pipeline.

14.1 Typical SaaS API Architecture

flowchart TD
	Client[Client / Partner API Consumer] --> CDN[CDN]
	CDN --> Edge[WAF / Reverse Proxy]
	Edge --> Gateway[API Gateway]
	Gateway --> Limits[Auth / Versioning / Rate Limit / Validation]
	Limits --> LB[L7 Routing + Load Balancing]
	LB --> App1[Service Instance Group A]
	LB --> App2[Service Instance Group B]
	App1 --> Cache[(Redis / Cache)]
	App1 --> DB[(Primary DB + Replicas)]
	App2 --> Queue[(Async Queue)]
	Gateway -. traces .-> Obs[Logs / Metrics / Tracing]
	App1 -. traces .-> Obs
	App2 -. traces .-> Obs
	Gateway -. service discovery .-> Registry[Service Registry / Orchestrator]

14.2 End-to-End Example: Payment API

Consider POST /payments in a Stripe-like or SaaS billing system.

Client sends HTTPS request with auth token and idempotency key.
CDN or edge forwards dynamic request to reverse proxy.
Gateway terminates TLS, checks auth, attaches trace IDs, and applies per-client rate limit.
Gateway validates request shape and routes to payment service.
Load balancer picks a healthy instance.
Payment service checks business rules and idempotency store.
Service writes to database transactionally and calls external payment processor if needed.
Response returns through gateway, which logs metadata and surfaces normalized errors.

This one request involves:

authentication
rate limiting
validation
routing
load balancing
idempotency
observability
external failure handling

14.3 End-to-End Example: Global Consumer App

Consider a Netflix-like or Uber-like global product.

Global traffic manager chooses a region.
Edge or gateway enforces auth and traffic policy.
Router sends request to correct service version, maybe with canary rules.
Service mesh or internal load balancing handles service-to-service calls.
Read requests may hit regional caches first.
If one dependency is degraded, the system may throttle optional features and preserve the core experience.

14.4 Layer-to-Concern Mapping

Layer	Main concerns
CDN and edge	caching, DDoS absorption, global reach
Reverse proxy or WAF	TLS termination, request filtering, compression
API gateway	auth, rate limiting, versioning, route policy, observability
L7 routing and load balancing	path and header-aware routing, canary, health-aware distribution
Service layer	business validation, idempotency, authorization on domain objects
Data layer	constraints, consistency, replication, failover
Observability platform	logs, metrics, traces, alerting

15. Real-World Discussion Patterns

15.1 Google-Style or Amazon-Scale Thinking

At very large scale, request handling is strongly influenced by geography and fleet management.

The important patterns are:

global traffic steering
regional isolation
heavy automation around health and rollout
multi-layer balancing rather than one magical balancer

15.2 Netflix-Style Thinking

A company deploying frequently cares deeply about:

canary releases
traffic shaping
resilience under partial failure
observability and fast rollback

15.3 Uber-Style Thinking

Systems tied to geography or real-time state often care about:

geo-aware routing
regional capacity balancing
latency sensitivity
selective degradation during spikes

15.4 GitHub-Style and Stripe-Style Thinking

Public API platforms care deeply about:

stable API contracts
client-visible rate limiting
request signing and webhook verification
versioning discipline
auditability and correctness

15.5 Typical SaaS Thinking

SaaS platforms often need a combination of:

tenant-aware routing
per-tenant quotas and rate limits
centralized auth and observability
low operational complexity relative to global hyperscale systems

16. Common Interview Questions and Strong Angles

16.1 "Where would you put rate limiting?"

Strong answer:

"Mostly at the gateway or edge for cheap rejection, but I may also add service-level limits for expensive operations or tenant-specific protections."

16.2 "How do you avoid duplicate writes when retries happen?"

Strong answer:

"Use idempotency keys or natural idempotency where possible, persist enough request identity to detect duplicates, and only retry safely repeatable operations."

16.3 "When would you choose L4 vs L7?"

Strong answer:

"L7 when I need HTTP-aware routing and policies like auth, canary, or versioning. L4 when I need simpler, high-throughput transport balancing or I do not need application-layer inspection."

16.4 "How would you do a safe deployment?"

Strong answer:

"Use health checks plus canary or blue-green routing, monitor business and technical metrics, and ensure rollback is fast."

16.5 "What happens if the rate limiter or discovery service goes down?"

Strong answer:

"I need a failure policy. For some endpoints I fail open to preserve availability; for abuse-sensitive endpoints I may fail closed. For discovery, I keep short-lived cached endpoint data and remove unhealthy instances quickly, but I do not rely on stale data too long."

16.6 "Why avoid sticky sessions?"

Strong answer:

"Because they make scaling, failover, and even load distribution harder. Stateless services are easier to operate and recover."

17. Common Mistakes Across the Whole Topic

describing only the happy path and ignoring failure behavior
saying "use a load balancer" without specifying which type or why
retrying everything blindly
forgetting idempotency on write APIs
conflating authentication with authorization
overusing the gateway as a business-logic layer
assuming health checks are trivial
assuming DNS-based failover is immediate and sufficient
using sticky sessions to avoid fixing state management
forgetting observability at the edge and routing layers

18. Practical Best Practices Checklist

terminate or manage TLS deliberately; do not let trust boundaries be accidental
reject bad or abusive traffic as early as possible
keep services stateless when you can
make retries explicit, bounded, and idempotency-aware
use readiness checks to gate traffic and liveness checks to recover dead processes
monitor p95 and p99 latency, not just averages
make rollout and failover mechanisms observable
keep routing policy simple enough to debug under pressure
treat API versioning as a product and operational discipline, not just a URL pattern
test degraded modes, not just normal operation

19. Final Mental Model

The cleanest way to think about request handling is this:

Request handling is the system that protects scarce resources while getting the right request to the right code, at the right time, under the right policy, even when parts of the system are failing.

If you can explain request handling from that perspective, you will do well in interviews and you will design more production-ready backend systems.

61 KiB Raw Permalink Blame History Unescape Escape

Request Handling

1. Big Picture: What Request Handling Really Means

1.1 End-to-End Request Lifecycle

1.2 Core Goals of Request Handling

1.3 Interview Framing

2. API Gateway

2.1 What It Is

2.2 Why It Exists

2.3 Main Responsibilities

2.4 How It Works Internally

2.5 Request Lifecycle Through a Gateway

2.6 Request Aggregation

2.7 Authentication at the Gateway

2.8 Centralized Logging and Observability

2.9 Service Discovery Integration

2.10 Retries

2.11 Circuit Breaking

2.12 Response Transformation

2.13 Request and Response Caching

2.14 Gateway vs Service Mesh

2.15 Production Patterns

2.16 Common Mistakes

3. Request Routing

3.1 Routing Types

3.2 How Routing Usually Works Internally

3.3 Routing Decision Flow

3.4 Path-Based Routing

3.5 Host-Based Routing

3.6 Header-Based Routing

3.7 Geo-Based Routing

3.8 Canary Routing

3.9 Blue-Green Deployments

3.10 Traffic Splitting

3.11 Service Discovery Impact

3.12 Common Interview Discussions

4. Load Balancing

4.1 Why Load Balancing Exists

4.2 Active-Active vs Active-Passive

4.3 Common Load-Balancing Algorithms

4.4 Round Robin

4.5 Weighted Round Robin

4.6 Least Connections

4.7 Least Response Time

4.8 Consistent Hashing

4.9 IP Hash

4.10 Distributed Load Balancing

4.11 Global Load Balancing

4.12 DNS Load Balancing

4.13 Best Practices

4.14 Common Failure Cases

5. Rate Limiting

5.1 Why It Exists

5.2 Where It Is Applied

5.3 Common Algorithms

5.4 Fixed Window

5.5 Sliding Window

5.6 Sliding Log

5.7 Token Bucket

5.8 Leaky Bucket

5.9 Redis Implementation Patterns

5.10 Distributed Rate Limiting Challenges

5.11 Best Practices

5.12 Real-World Intuition

6. Request Validation

6.1 Why Validation Exists

6.2 Validation Layers

6.3 Schema Validation

6.4 Input Sanitization

6.5 Idempotency

6.6 Replay Protection

6.7 Request Signing Basics

6.8 Common Validation Mistakes

7. API Versioning

7.1 Why Versioning Matters

7.2 Common Versioning Strategies

7.3 Backward Compatibility

7.4 Deprecation Strategy

7.5 Migration Strategy

7.6 Real-World Examples

61 KiB

Raw Permalink Blame History