Files
tarun-elango 26810e43d0 sd text
2026-04-26 13:27:19 -04:00

89 KiB

Search & Discovery + Media & File Systems

This chapter covers two families of systems that appear in almost every large product:

  • systems that decide what users can find
  • systems that store, transform, and deliver large binary data such as images, videos, documents, logs, and backups

In interviews, these topics are often split into separate questions: "Design search for an e-commerce app", "Design Instagram feed", "Design file upload for a SaaS product", or "Design YouTube video processing".

In production, they are deeply connected.

An e-commerce product page may depend on:

  • a transactional database for product metadata
  • a search index for keyword retrieval
  • a ranking system for relevance and business rules
  • a recommendation system for discovery
  • an object store for images and video
  • a CDN for fast delivery
  • an async media pipeline for thumbnails and optimization

So the real engineering question is not, "What is search?" or "What is S3?". The real question is:

How do these systems work together under scale, latency pressure, stale data, changing ranking logic, unreliable networks, expensive media processing, and strict security requirements?

This guide is written for both interview preparation and real backend engineering. The goal is not to memorize terms. The goal is to understand:

  • why each system exists
  • what problem it solves that simpler systems do not
  • how it works internally
  • what fails at scale
  • what tradeoffs strong engineers discuss in interviews
  • how companies actually combine these components in production

Examples in this guide are generalized from common patterns publicly discussed by companies such as Google, Amazon, Netflix, Uber, YouTube, Instagram, TikTok, GitHub, Stripe, and large SaaS platforms.

1. Big Picture: Why These Topics Belong Together

Search, discovery, and media systems all sit on the user-facing edge of backend engineering.

They are the systems users feel immediately:

  • search that returns the wrong result feels broken
  • autocomplete that lags feels cheap
  • feeds that repeat stale content feel low quality
  • uploads that fail at 95 percent feel unreliable
  • videos that buffer or thumbnails that look wrong feel unfinished

These systems also force tradeoffs faster than many internal systems:

  • relevance vs latency
  • freshness vs throughput
  • quality vs compute cost
  • personalization vs privacy
  • durability vs storage cost
  • precomputation vs flexibility

1.1 One Product, Many Subsystems

flowchart LR
	U[User] --> APP[API / App Backend]
	APP --> DB[(Primary DB)]
	DB --> CDC[CDC / Change Events]
	CDC --> SEARCH[Search Index]
	CDC --> FEAT[Feature / Event Pipeline]
	APP --> REC[Recommendation / Feed Service]
	FEAT --> REC
	REC --> CACHE[(Feed Cache)]
	APP --> OBJ[(Object Storage)]
	OBJ --> MEDIA[Media Processing Pipeline]
	MEDIA --> CDN[CDN]
	APP --> CDN
	SEARCH --> APP
	CACHE --> APP

This architecture is common across many products:

  • Amazon-like commerce: product DB, search index, ranking, recommendations, images in object storage
  • GitHub-like SaaS: repo and issue metadata in DB, code or issue search index, attachments in object storage, permissions filtering everywhere
  • YouTube or TikTok: metadata DB, feed ranking system, object storage, transcoding pipeline, CDN delivery
  • Stripe-like internal document systems: metadata DB, audit logs, secure file storage, signed downloads, retention policies

1.2 Search vs Discovery

These ideas are related but not identical.

Dimension Search Discovery / Recommendation
User intent Explicit Often implicit
Input Query, filters, sort User profile, session, context, behavior
Goal Find what user asked for Show what user is likely to want
Retrieval basis Query-document match Candidate generation from many signals
Example "wireless headphones" "Products you may like"

Search is usually intent retrieval. Recommendation is usually intent inference.

The best products use both.

1.3 Latency Expectations

Users tolerate different delays for different surfaces.

Surface Typical expectation Why it matters
Autocomplete Often less than 50 ms server-side, very low hundreds end-to-end Typing feels broken if suggestions lag
Search results Often around 100-300 ms for the first result page Search is interactive and abandonment is high
Home feed Often low hundreds of ms for first page, with prefetching and caching Users expect quick app open
File upload initiation Usually should start immediately Perceived responsiveness matters
Image delivery Often tens of ms from edge Visual surfaces must render fast
Video playback startup Usually a few hundred ms to a few seconds depending on network and buffer policy Startup delay strongly affects engagement

The exact target depends on product and network conditions, but the theme is consistent: these are latency-sensitive systems.

1.4 Interview Framing

When interviewers ask about search, feeds, or media, they are usually evaluating whether you can reason about:

  • data flow from source of truth to user-facing surface
  • specialized indexes or storage layouts
  • online vs offline computation
  • scale bottlenecks and hotspots
  • correctness and security boundaries
  • degradation behavior when one subsystem is stale or partially unavailable

Strong answers start from user behavior and workload shape, not from product names.

2. Search System

2.1 What a Search System Is

A search system is a retrieval system that helps users find relevant documents, records, products, posts, issues, places, or media based on a query.

The important word is relevant.

Databases already know how to retrieve data, so why do search systems exist?

Because most user-facing search is not exact lookup. It is approximate, text-heavy, fuzzy, relevance-ordered retrieval.

Examples:

  • a user types "noise cancel headphone" and expects products matching "noise-canceling headphones"
  • a developer searches GitHub issues for a phrase and expects typo tolerance, ranking, and permission-safe results
  • a job seeker searches roles by title, seniority, remote status, and location
  • a rider searches for a place in Uber and expects prefix matching, geospatial awareness, and ranking by context

2.2 Why Databases Alone Are Often Not Enough

Relational databases and standard secondary indexes are optimized for exact lookups, range scans, joins, and transactional workloads. They are not primarily optimized for large-scale full-text ranking.

If you try to implement serious search with only a transactional database, you quickly run into problems:

  • LIKE '%term%' queries do not scale well for large text corpora
  • phrase search and ranking are limited or expensive
  • stemming, synonyms, language analyzers, and typo tolerance are not first-class features in many OLTP systems
  • scoring millions of candidate documents with relevance functions is not what OLTP engines are optimized for
  • query patterns are highly varied and difficult to serve with normal B-tree indexes alone

2.3 Full-Text Search vs Exact Lookup

Dimension Exact DB Lookup Full-Text Search
Match type Exact key, range, prefix in some cases Token-based, phrase-based, fuzzy, semantic, ranked
Result ordering Usually explicit sort order Usually relevance first
Storage layout Row-oriented or index-oriented for structured fields Inverted indexes, postings, specialized scoring metadata
Common use User by ID, order by timestamp Search products, documents, issues, articles
Optimization target Transactional correctness and predictable queries Fast retrieval and ranking over large corpora

Interview shortcut: databases answer "Which rows satisfy these predicates?" Search systems answer "Which documents are most relevant to what the user probably meant?"

2.4 Architecture of a Search System

At a high level, production search systems split into two pipelines:

  • indexing pipeline: turns source data into searchable indexes
  • query pipeline: turns a user query into ranked results
flowchart LR
	SRC[Source of Truth<br/>DB / CMS / Event Log] --> ING[Ingestion]
	ING --> IDX[Index Build / Update]
	IDX --> SHARDS[Search Shards + Replicas]
	Q[User Query] --> API[Search API]
	API --> PARSE[Query Parsing / Rewrite]
	PARSE --> COORD[Coordinator]
	COORD --> SHARDS
	SHARDS --> RANK[Merge + Rank]
	RANK --> RES[Results]

The source of truth is usually not the search index itself. It is usually:

  • a relational DB
  • a document database
  • a content management system
  • a stream of events
  • a crawler pipeline in web search

The search index is a serving structure optimized for retrieval, not necessarily for canonical storage.

2.5 How Production Search Differs from Normal Database Queries

Production search systems usually have to solve problems that ordinary OLTP queries do not:

  • tokenization and normalization
  • ranking by multiple signals
  • approximate matching
  • synonym expansion
  • language-aware analysis
  • ACL or permission filtering
  • faceting and filtering at scale
  • scatter-gather across many shards
  • near-real-time indexing with eventual consistency
  • degraded behavior under partial shard failures

That is why many production systems use specialized engines such as Elasticsearch, OpenSearch, Solr, Vespa, Lucene-based services, or internal retrieval systems.

2.6 Query Pipeline

A realistic query pipeline often includes more than keyword matching.

flowchart LR
	Q[Raw Query] --> CLEAN[Normalize / Spell / Parse]
	CLEAN --> REWRITE[Synonyms / Query Rewrite / Intent Detection]
	REWRITE --> RET[Retrieval]
	RET --> FILT[Apply Filters]
	FILT --> LR[Lightweight Ranking]
	LR --> HR[Heavy Ranking / Business Rules]
	HR --> ACL[Permission Check / Result Shaping]
	ACL --> OUT[Final Results]

Important steps:

  • query normalization: lowercasing, punctuation handling, Unicode normalization
  • parser: interpret phrases, field-specific search, boolean operators, quoted strings
  • query rewrite: expand synonyms, fix spelling, map common variants
  • retrieval: find candidate documents quickly from the index
  • ranking: score candidates using lexical, behavioral, freshness, popularity, and quality signals
  • filtering: apply structured constraints such as category, location, price, permissions

2.7 Distributed Search Basics

Large search systems cannot keep all searchable data on one machine. So the index is partitioned into shards.

Common pattern:

  1. documents are assigned to shards
  2. each shard stores a local index
  3. replicas provide availability and read capacity
  4. a query coordinator fans the request out to relevant shards
  5. each shard returns its top-k candidates
  6. the coordinator merges them into the global top-k

This is called scatter-gather.

Challenges:

  • tail latency: the whole query waits for slow shards unless timeouts or degraded modes exist
  • score comparability: local shard scores may need consistent scoring logic so global merge is meaningful
  • hotspot shards: skewed data or popular terms can overload specific shards
  • rebalancing: adding capacity requires moving large index segments

2.8 Freshness vs Performance

Search freshness means how quickly updates in the source system appear in search results.

Users expect different freshness depending on domain:

  • social posts or breaking news: often seconds or near-real-time
  • product inventory and pricing: usually very fresh because stale search hurts conversion
  • document search in SaaS: usually seconds to minutes is acceptable depending on UX promises
  • code search on a giant corpus: sometimes modest indexing delay is acceptable if retrieval is fast and reliable

The tradeoff is that very fresh indexing increases write pressure and may reduce query efficiency.

Common tension:

  • frequent small segment updates improve freshness
  • large optimized segments improve query performance and compression

Many systems compromise with near-real-time indexing: documents become searchable quickly, while expensive segment merges happen asynchronously.

2.9 Consistency Challenges

Search is usually eventually consistent with the source of truth.

Typical failure cases:

  • DB write succeeded but indexing event was delayed
  • product deleted in DB still appears in search for a short period
  • permission change not reflected immediately, risking data leakage if ACL filtering is wrong
  • inventory count in search is stale while checkout uses the DB

Best practice:

  • treat the DB or source system as the correctness authority
  • do not let search be the final source for money, inventory reservation, or permissions
  • apply final correctness checks in downstream business logic when it matters

GitHub-like systems care deeply about permission-safe search. Returning a private issue, repo, or file in search is worse than returning no result.

2.10 Search Latency and Fault Tolerance

Search systems are interactive systems, so they need graceful degradation.

Common patterns:

  • shard replicas for availability
  • coordinator timeouts to avoid waiting forever on a straggler shard
  • degraded results if one replica set is temporarily unavailable
  • hot query caching for common requests
  • precomputed filter bitsets or caches for expensive constraints
  • monitoring p50, p95, p99 separately because tail latency matters more than averages

Interview note: if an interviewer asks about fault tolerance, discuss replicas, partial results, timeouts, retry behavior, and stale indexes. "We have backups" is not the answer for serving systems.

2.11 Common Mistakes

  • treating search as just another SQL query layer
  • making the search index the source of truth for critical writes
  • ignoring permission filtering until late in the design
  • underestimating reindexing cost after analyzer changes
  • focusing only on retrieval and forgetting ranking quality

3. Indexing

3.1 Why Search Indexing Exists

Search indexing exists because scanning every document for every query is too slow.

The core idea is preprocessing.

Instead of asking, "Which documents contain this term?" by reading the whole corpus repeatedly, the system builds a data structure ahead of time that answers that question quickly.

That preprocessing step is indexing.

3.2 Document Ingestion Pipeline

Indexing is usually a pipeline, not a single write.

flowchart LR
	DB[DB / Source Records] --> EVT[CDC / Event Stream / Batch Export]
	EVT --> EXTRACT[Extract Fields]
	EXTRACT --> ANALYZE[Tokenize / Normalize / Language Analysis]
	ANALYZE --> ENRICH[Synonyms / ACLs / Metadata / Quality Signals]
	ENRICH --> BUILD[Build or Update Index Segments]
	BUILD --> REPL[Replicate / Refresh Search Nodes]
	REPL --> SERVE[Search Serving]

Documents often need field-specific handling:

  • title may get higher weight
  • tags may be exact or lightly analyzed
  • description may use full stemming and stop-word removal
  • permissions and category fields may be stored for filtering
  • freshness timestamps may be stored for ranking

3.3 Tokenization

Tokenization breaks text into searchable units called tokens.

Examples:

  • "wireless headphones" -> wireless, headphones
  • "foo-bar" may become foo, bar, or foo-bar depending on analyzer design
  • East Asian languages may require dictionary-based or statistical segmentation rather than whitespace splitting

Why this matters:

What counts as a token determines what can be retrieved.

Poor tokenization causes obvious product bugs:

  • searching "e-mail" fails to match "email"
  • searching code symbols breaks because punctuation handling is wrong
  • searching C++ or C# fails because analyzers stripped important characters

Production systems often use different analyzers for different fields and languages.

3.4 Normalization

Normalization makes equivalent text forms consistent.

Common steps:

  • lowercase conversion
  • Unicode normalization
  • accent folding in some products
  • punctuation normalization
  • whitespace collapsing

Without normalization, simple variants become separate search terms and retrieval quality drops.

3.5 Stemming and Lemmatization

Stemming reduces words to a base form so related terms match.

Examples:

  • running, runs, ran may reduce toward a common root
  • connect, connected, connection may become more retrievable together

Why it exists:

Users usually care about concept matching, not exact inflected forms.

Tradeoff:

  • aggressive stemming increases recall
  • too much stemming can hurt precision by conflating different meanings

Not every domain wants stemming. Code search, SKU search, names, and legal text often need more exact handling.

3.6 Stop Words

Stop words are very common words such as "the", "a", or "of" that may add little value to retrieval.

Why they are sometimes removed:

  • they occur in many documents
  • they increase index size
  • they often do not help ranking

Why they are sometimes kept:

  • phrase search needs them
  • some queries depend on them
  • domain-specific language may make them important

Example: "to be or not to be" or song titles need precise handling.

3.7 Synonyms

Synonyms allow related terms to match the same concept.

Examples:

  • tv and television
  • hoodie and sweatshirt
  • software engineer and developer in some job systems
  • nyc and new york city

Synonyms are powerful and dangerous.

They improve recall, but bad synonym rules can create surprising results. Expanding apple to fruit and company naively is an obvious relevance bug.

Production systems usually treat synonyms as curated domain knowledge, not as a casual text feature.

3.8 Language Handling Basics

Multi-language search is not just translation.

Language handling may include:

  • language detection
  • per-language analyzers
  • script normalization
  • stemming rules per language
  • tokenization strategies for languages without whitespace delimiters
  • query rewriting or synonym dictionaries per locale

Global products such as Google, Amazon, YouTube, and large SaaS tools need language-aware indexing because naive English-centric analysis fails internationally.

3.9 Incremental Indexing

Rebuilding the entire index on every document change is impossible at scale. So production systems do incremental indexing.

Typical process:

  1. source record changes
  2. an event or CDC record is emitted
  3. indexer fetches or receives the updated document
  4. changed fields are reanalyzed
  5. the document is added, updated, or tombstoned in the index

Common challenges:

  • event duplication
  • out-of-order updates
  • delete propagation
  • retries causing duplicate work
  • partial failure between DB write and index update

Idempotent indexing pipelines matter a lot.

3.10 Near Real-Time Indexing

Many search engines are near real-time rather than strictly real-time.

That means:

  • writes become searchable after a short delay
  • index refresh is decoupled from durable storage operations
  • background merges optimize segments later

This design keeps query latency reasonable while preserving acceptable freshness.

For products like issue search, product catalogs, and SaaS document search, near-real-time indexing is often the right tradeoff.

3.11 Reindexing Challenges

Reindexing is one of the biggest operational realities in search.

You need full reindexing when:

  • analyzer rules change
  • synonym logic changes significantly
  • field weights or schema design changes
  • permissions model changes
  • you move to a new index version

Why it is hard:

  • large corpora take time to rebuild
  • dual-running old and new indexes increases cost
  • cutover must avoid downtime and bad ranking regressions
  • stale or missing events during rebuild can corrupt freshness

Common strategies:

  • build a new index version in parallel
  • backfill from source of truth
  • replay recent events after the backfill window
  • run shadow reads or compare sample queries
  • switch traffic gradually

This is similar to blue-green deployment for search data.

3.12 Database Indexes vs Search Indexes

Dimension Database Index Search Index
Primary goal Speed up structured lookups and range queries Speed up text retrieval and relevance-ranked retrieval
Typical structure B-tree, hash, LSM-related structures Inverted index, postings, term dictionaries, doc values
Query style Predicates on fields Query terms, phrases, fuzzy matching, ranking
Source of truth role Often part of the canonical DB Usually derived from another source of truth
Update pattern Tight coupling with DB writes Often async or near-real-time
Ranking support Limited compared with search engines Central purpose of the system

3.13 Best Practices

  • keep indexing idempotent and replay-safe
  • separate source of truth from serving index
  • version analyzers and schemas explicitly
  • measure freshness lag, not just query latency
  • treat reindexing as a normal operational workflow, not an emergency-only task

4. Inverted Index

4.1 What an Inverted Index Is

An inverted index maps each term to the documents that contain it.

Instead of storing documents and asking, "Which terms are inside this document?" the system stores terms and asks, "Which documents contain this term?"

That inversion is what makes large-scale text retrieval efficient.

4.2 Why It Powers Most Search Systems

Most keyword search systems need to answer queries like:

  • which documents contain wireless
  • which documents contain both wireless and headphones
  • which documents contain the exact phrase noise cancelling

If you have a term-to-document mapping, you can answer these queries much faster than scanning all documents.

4.3 Term -> Document Mapping

A simple example:

Documents:

  • D1: "wireless noise cancelling headphones"
  • D2: "wired gaming headset"
  • D3: "wireless earbuds with case"

Inverted view:

  • wireless -> D1, D3
  • noise -> D1
  • cancelling -> D1
  • headphones -> D1
  • wired -> D2
  • gaming -> D2
  • headset -> D2
  • earbuds -> D3
  • case -> D3

The list of document IDs for a term is called a postings list.

4.4 Postings Lists

A postings list usually stores more than just document IDs.

It may include:

  • document ID
  • term frequency in the document
  • positions of the term inside the document
  • field information such as title vs body
  • payloads or extra per-hit metadata in some engines

Why extra metadata matters:

  • term frequency helps ranking
  • positions enable phrase and proximity search
  • field data enables field weighting

If the system stores term positions, it can support phrase queries.

Example:

  • D1: "machine learning systems"
  • D2: "systems for machine translation learning"

Searching for the phrase "machine learning" should strongly prefer D1.

Without positions, the engine only knows both terms exist. With positions, it knows whether they appear adjacent and in the correct order.

Boolean search combines postings lists.

Examples:

  • A AND B: intersect postings lists
  • A OR B: union postings lists
  • A NOT B: subtract postings lists

This is one reason inverted indexes are fast: set operations on sorted document ID lists are efficient.

4.7 Compression Basics

Postings lists can be huge, so compression matters.

Common ideas:

  • store sorted document IDs and compress gaps between them instead of raw IDs
  • use variable-length integer encoding
  • group postings into blocks
  • add skip pointers or skip blocks so the engine can jump ahead during intersections

Compression improves memory and disk efficiency, and often query speed too because less data must be read.

4.8 Distributed Inverted Indexes

At large scale, the index is partitioned across many nodes.

Partitioning approaches:

  • document partitioning: each shard stores all terms for a subset of documents
  • term partitioning: less common in many general-purpose serving systems, but conceptually possible for some specialized workloads

Document partitioning is common because it simplifies writes and local scoring.

The coordinator sends the query to all relevant shards, and each shard computes local top results using its local inverted index.

4.9 How Retrieval Is Fast in Practice

Suppose the query is wireless headphones.

The search engine typically:

  1. normalizes the query
  2. finds postings for wireless
  3. finds postings for headphones
  4. intersects or otherwise combines candidate sets
  5. uses frequency, field boosts, positions, and ranking signals to score candidates
  6. returns only the top few results

The system does not score the whole corpus. It narrows aggressively using the inverted index first.

That is why retrieval and ranking are separated.

4.10 Common Failure Cases

  • very common terms create long postings lists and high query cost
  • badly chosen analyzers create index bloat
  • large positional indexes improve quality but increase storage
  • hotspot terms create shard imbalance
  • deletes and updates create segment fragmentation until merges clean things up

4.11 Interview Angle

If asked to explain an inverted index, keep it simple:

"It is a term-to-document lookup structure. Instead of scanning every document on each query, the engine jumps directly from the query terms to candidate documents through postings lists. Positional metadata enables phrase search, and compressed postings plus shard-level retrieval keep it fast at scale."

5. Autocomplete

5.1 Why Autocomplete Exists

Autocomplete reduces typing effort, helps users express intent, and increases query success.

It is one of the highest-leverage search UX features because it helps before the actual search even runs.

Good autocomplete does several things:

  • speeds up input
  • corrects or guides query formulation
  • exposes popular intents
  • reduces zero-result searches
  • nudges users toward query structures the backend handles well

Amazon-style search boxes, Google suggestions, GitHub issue filters, and SaaS global search bars all rely on autocomplete.

5.2 Prefix Matching

The simplest autocomplete form is prefix matching.

If the user types wire, the system returns suggestions starting with that prefix:

  • wireless headphones
  • wireless mouse
  • wired headset

Prefix matching is attractive because it is conceptually simple and fast.

5.3 Trie Basics

A trie is a tree where each edge represents a character or token prefix.

Why tries are useful:

  • prefixes share storage
  • prefix lookups are fast
  • top suggestions can be stored or aggregated at intermediate nodes
flowchart TD
	ROOT[Root] --> W[w]
	W --> WI[wi]
	WI --> WIR[wir]
	WIR --> WIRE[wire]
	WIRE --> WIREL[wirel]
	WIREL --> WIRELE[wirele]
	WIRELE --> WIRELES[wireles]
	WIRELES --> WIRELESS[wireless]
	WIRELESS --> H[wireless headphones]
	WIRELESS --> M[wireless mouse]

At scale, production systems usually store compacted tries or other optimized prefix structures rather than naive character-by-character trees.

5.4 N-gram Approaches

Autocomplete is not always solved with a trie.

N-gram indexing can help with:

  • substring matching
  • typo tolerance
  • matching mid-word fragments
  • languages or domains where token boundaries are tricky

Tradeoff:

  • n-grams improve recall and flexibility
  • they increase index size significantly
  • they may add noise if scoring is weak

5.5 Popularity-Based Suggestions

Not every valid prefix completion should be shown.

Usually suggestions are ranked by popularity and usefulness.

Signals may include:

  • historical query frequency
  • click-through rate after suggestion selection
  • conversion rate in commerce systems
  • recency or trending status
  • user-specific history

Example:

If millions of users search for iphone charger, that suggestion should likely outrank a rare but lexically valid completion.

5.6 Recent Searches and Personalization

Autocomplete often mixes multiple sources:

  • global popular suggestions
  • user's own recent searches
  • session context
  • personalized entities such as repos, docs, contacts, or previous products viewed

GitHub-like enterprise search or SaaS admin dashboards often use personalization heavily because each user's accessible universe is different.

5.7 Typo Tolerance Basics

Users make mistakes while typing. Production autocomplete systems usually include typo handling such as:

  • edit-distance based correction
  • keyboard-neighbor heuristics
  • common misspelling dictionaries
  • phonetic or transliteration support in some markets

The challenge is latency.

Autocomplete does not have much time budget, so typo tolerance must be efficient and bounded.

5.8 Caching Strategies

Autocomplete traffic is extremely cache-friendly because prefixes repeat heavily.

Common patterns:

  • cache hot prefixes in memory
  • use CDN or edge caching for anonymous popular suggestions where acceptable
  • keep top-k suggestions per prefix precomputed
  • debounce client requests to avoid one request per keystroke

5.9 Large-Scale Production Considerations

Large autocomplete systems often need to handle:

  • huge prefix skew on popular queries
  • language and locale differences
  • abuse or bot traffic
  • personalized suggestions that reduce cacheability
  • freshness for trending terms
  • safe filtering of prohibited or low-quality suggestions

A common design is hybrid:

  • static or precomputed prefix data for speed
  • online popularity updates for freshness
  • user history overlay for personalization

5.10 Common Mistakes

  • sending requests on every keystroke without debounce
  • returning lexically valid suggestions with poor user value
  • ignoring abuse and suggestion poisoning
  • making autocomplete depend on expensive full ranking pipelines

6. Filtering

6.1 What Filtering Is

Filtering narrows results using structured constraints.

Examples:

  • price between 50 and 100
  • remote jobs only
  • category = laptops
  • flights with one stop or fewer
  • issues labeled bug
  • only repositories the user can access

Filtering is not a side feature. In many business systems it is central.

For some search experiences, the user query is weak and filters carry most of the actual intent.

6.2 Structured Filters

Structured filters work over fields whose semantics are known.

Examples:

  • numeric ranges
  • enums or categories
  • dates
  • geo constraints
  • booleans such as in-stock only

These differ from text ranking because they are usually precise constraints rather than fuzzy matches.

Facets show result breakdowns by filter values.

Example in e-commerce:

  • brand counts
  • color counts
  • price buckets
  • availability counts

Why facets matter:

  • they help users refine large result sets
  • they reveal the shape of the catalog
  • they guide discovery without requiring new queries

The challenge is performance. Facet counts can be expensive, especially when the base query is broad and filters are combined interactively.

6.4 Range Filters

Range filters are common for:

  • price
  • salary
  • rating
  • departure time
  • file size
  • timestamps

They often need specialized data structures or optimized field storage because arbitrary numeric range scans over huge result sets can be expensive.

6.5 Filtering and Ranking Interaction

Filtering and ranking interact more than beginners expect.

If you filter too early, you may remove items that could have been relevant under softer criteria.

If you filter too late, you may waste ranking work on documents that cannot be shown.

Strategy Good for Risk
Pre-filter before ranking Hard constraints such as ACLs, category limits, geography, inventory Can shrink candidate set too aggressively if constraints are loose or noisy
Post-filter after retrieval or early ranking Soft presentation rules, some UI-level shaping Wastes work and may leave too few valid final results

Common practice:

  • apply hard constraints early
  • apply softer business shaping later

6.6 Filter Performance Optimization

Common techniques:

  • store filterable fields in efficient columnar or doc-values structures
  • build bitmap or bitset representations for high-volume facets
  • cache frequent filter combinations
  • precompute common facet counts where practical
  • use approximate counts if exactness is not required for UX

ACL filtering is especially important. If the user should not see a document, that filter should behave as a hard constraint and should be efficient.

6.7 Real-World Examples

E-commerce:

  • category, brand, price, availability, shipping speed, seller, ratings

Job boards:

  • location, remote, salary range, experience level, company size, visa support

Travel search:

  • stops, departure window, airline, baggage, refund policy, hotel rating, neighborhood

These products often spend as much engineering effort on filter performance and facet correctness as on keyword matching.

6.8 Common Failure Cases

  • facet counts computed on stale or mismatched indexes
  • filters applied after ranking causing irrelevant or empty pages
  • high-cardinality filters destroying cache hit rate
  • permission filtering bolted on late and leaking data

7. Ranking

7.1 Why Ranking Matters

Retrieval answers "what could match". Ranking answers "what should be shown first".

Ranking matters because users rarely inspect many results.

If the best result is not in the first few positions, the system feels wrong even if it technically retrieved the right document somewhere deeper in the list.

7.2 Why Ranking Is Usually Multi-Stage

Ranking is usually multi-stage because expensive models and business logic cannot run on the whole corpus.

Typical shape:

  1. retrieve a broad candidate set cheaply
  2. apply lightweight ranking to reduce candidates
  3. apply heavier ranking on a smaller set
  4. apply final business rules, diversity, sponsorship, and presentation logic
flowchart LR
	Q[Query / Context] --> RET[Retrieval: thousands]
	RET --> L1[Stage 1 Ranker: hundreds]
	L1 --> L2[Stage 2 Ranker: tens]
	L2 --> BR[Business Rules / Diversity / Ads]
	BR --> UI[Final Ordered Results]

7.3 Retrieval vs Ranking

This separation is critical.

Retrieval is optimized for recall and speed.

Ranking is optimized for precision and utility.

If retrieval misses a relevant item entirely, ranking cannot recover it.

If retrieval returns too many weak candidates, ranking becomes expensive and noisy.

7.4 Relevance Ranking vs Business Ranking

Production ranking is rarely pure relevance.

In e-commerce, a result order may consider:

  • text relevance
  • inventory availability
  • margin or business priority
  • fulfillment speed
  • review quality
  • return rate
  • seller trust
  • sponsored placements

In job search:

  • lexical match
  • application likelihood
  • compensation quality
  • recency
  • employer quality
  • geographic fit

In a SaaS global search:

  • text relevance
  • recency of the document
  • document type priority
  • ownership or collaboration strength

7.5 Common Ranking Signals

Signals often include:

  • lexical relevance: term match, phrase match, field boosts
  • popularity: clicks, purchases, views, stars, installs
  • freshness: newer items may deserve higher weight in some surfaces
  • engagement: dwell time, completion, watch time, saves, shares
  • quality: seller quality, document quality, content safety
  • trust: verified sources, low spam risk, low abuse signals
  • context: location, device, language, current session intent

7.6 Diversity Constraints

Blindly ranking by one score can produce repetitive or unhealthy outputs.

Examples:

  • ten nearly identical products from one seller
  • a feed dominated by one creator
  • a job result page filled with duplicates from the same company

Diversity rules improve the experience by spreading exposure across:

  • categories
  • sellers
  • creators
  • content types
  • freshness buckets

This is especially important for feeds and discovery systems.

7.7 Sponsored Content Considerations

Sponsored results complicate ranking because monetization and relevance must coexist.

Strong systems separate:

  • auction or eligibility logic
  • relevance constraints
  • sponsored placement policies
  • disclosure and compliance requirements

Bad design either destroys relevance or leaves too much money on the table.

7.8 Failure Cases

  • optimizing only CTR and creating clickbait
  • overusing popularity so incumbents dominate forever
  • boosting freshness too much and burying authoritative content
  • letting one business rule overwhelm all relevance signals
  • failing to monitor ranking regressions after model changes

7.9 Best Practices

  • make ranking multi-stage
  • separate hard eligibility from soft scoring
  • log ranking features and decisions for debugging
  • evaluate quality with offline metrics and online experiments
  • protect the system from feedback loops that only reward already-popular items

8. Relevance Scoring

8.1 What Relevance Scoring Means

Relevance scoring is how the system estimates how well a result matches the user's need.

There is no single universal score. Real systems combine multiple signals.

8.2 TF-IDF Basics

TF-IDF is one of the classic ideas in lexical search.

Intuition:

  • terms that appear often in a document may be important to that document
  • terms that appear in many documents are less discriminative across the corpus

One simple form is:


TF\text{-}IDF(t,d) = \text{TF}(t,d) \cdot \log\left(\frac{N}{\text{DF}(t)}\right)

Where:

  • \text{TF}(t,d) is term frequency of term t in document d
  • \text{DF}(t) is the number of documents containing t
  • N is the total number of documents

Why it matters:

Rare but present terms are usually more informative than extremely common terms.

8.3 BM25 Basics

BM25 is a practical ranking function widely used in lexical search systems.

It improves on simpler TF-IDF variants by handling:

  • term frequency saturation
  • document length normalization

A common form is:


BM25(q,d)=\sum_{t \in q} \text{IDF}(t) \cdot \frac{f(t,d)(k_1+1)}{f(t,d)+k_1\left(1-b+b\cdot \frac{|d|}{\text{avgdl}}\right)}

You do not need to memorize the formula in interviews, but you should know the intuition:

  • more occurrences help, but not linearly forever
  • long documents should not win just because they contain more words

8.4 Semantic Relevance Basics

Lexical matching is powerful but limited.

Semantic relevance tries to capture meaning, not just exact token overlap.

Examples:

  • sofa matching couch
  • software engineer matching backend developer
  • a support query matching a knowledge base article with similar meaning but different wording

Common production pattern today is hybrid retrieval:

  • lexical retrieval for precision and exact matching
  • semantic retrieval or re-ranking for meaning-based matches

Why hybrid is common:

  • lexical search handles exact identifiers, codes, names, and rare terms well
  • semantic methods handle paraphrases and intent better
  • using both reduces the weaknesses of either alone

8.5 Behavioral Signals

Relevance is not only about text.

Behavioral signals often matter:

  • clicks
  • dwell time
  • add-to-cart rate
  • purchase rate
  • save rate
  • watch completion
  • query reformulations after a click

These signals help the system learn which results users actually found useful.

But they are noisy. Clicked does not always mean satisfied.

8.6 Quality, Trust, and Spam Prevention

High relevance is not enough if the content is low quality or abusive.

Production systems often include additional scoring dimensions:

  • content quality scores
  • seller or source trust
  • spam or fraud risk
  • policy safety scores
  • freshness or staleness penalties

Examples:

  • Amazon-like marketplaces must suppress spammy or low-trust listings
  • GitHub-like search may downrank spam repositories or abusive content
  • YouTube-like platforms need safety and trust constraints around recommendations

8.7 Balancing Relevance vs Business Goals

"Best result" in production usually means the best result under multiple objectives.

These can include:

  • lexical match quality
  • user satisfaction
  • engagement
  • monetization
  • content safety
  • fairness or exposure goals
  • freshness

This is why ranking discussions often become multi-objective optimization discussions.

8.8 Interview Framing

If asked how a search engine decides the best result, a strong answer is:

"It usually starts with lexical or hybrid retrieval to get candidates, then uses a relevance score combining term signals such as BM25, field boosts, freshness, popularity, behavioral feedback, quality signals, and business constraints. The final order is almost never based on one score alone."

9. Recommendation System Overview

9.1 What Recommendation Systems Are

Recommendation systems choose what to show users when the user did not explicitly ask for a specific query.

They power:

  • Netflix homepages
  • YouTube next videos
  • TikTok For You feeds
  • Amazon "Customers also bought"
  • Instagram and X home feeds
  • SaaS dashboards showing suggested docs, tasks, or entities

9.2 Why They Exist

The internet has too much content. Most users will not search for everything they could care about.

Recommendation systems help with discovery by predicting relevance from behavior, similarity, context, and popularity.

Search solves explicit intent. Recommendation solves hidden intent.

9.3 Online vs Offline Recommendation

Dimension Offline Online
When computed Batch or scheduled jobs At request time or near request time
Good for Heavy model training, embeddings, broad candidate pools Fresh context, session adaptation, final ranking
Tradeoff Efficient at scale but stale Fresh but latency-sensitive

Most production systems use both.

Example:

  • offline jobs compute user embeddings, item embeddings, similarity graphs, creator clusters, trending statistics
  • online systems use current session behavior, freshness, and context to rank a page right now

9.4 Cold Start Problem

Cold start means the system has too little data.

Two forms:

  • new user: little or no history
  • new item or creator: little or no interaction data

Common mitigations:

  • use popularity and trending signals
  • use content-based features
  • use onboarding preferences
  • use location, language, device, and session context
  • provide exploration slots so new items can earn data

TikTok-like and YouTube-like systems care deeply about new content discovery. If only established content wins, the ecosystem becomes stale.

9.5 Feedback Loops

Recommendation systems influence behavior, so they create feedback loops.

If the system shows content, it gets more interaction data on that content, which can cause it to rank even higher.

This can be useful, but it can also create runaway popularity bias.

Examples of risks:

  • rich-get-richer exposure
  • narrow content bubbles
  • overfitting to clickbait
  • suppressing new creators

9.6 Exploration vs Exploitation

Recommendation systems constantly balance:

  • exploitation: show what is most likely to perform well now
  • exploration: show some uncertain or new content to learn more and avoid stagnation

If you only exploit, the system becomes conservative and may miss better items.

If you explore too much, user experience degrades.

9.7 Engagement vs Quality Tradeoffs

Not every engagement signal maps to long-term product quality.

High click-through or short-term watch time may conflict with:

  • satisfaction
  • trust
  • creator ecosystem health
  • safety
  • retention quality

This is a central real-world discussion in recommendation systems.

9.8 Example Patterns

Netflix:

  • heavy personalization by row and by title ranking
  • strong use of offline signals plus contextual ranking

YouTube:

  • massive candidate generation followed by multi-stage ranking
  • strong importance of watch time, satisfaction, freshness, safety

TikTok:

  • short-term session signals matter heavily
  • content and user embeddings are critical for rapid personalization

Amazon:

  • recommendations mix collaborative signals, co-purchase graphs, browse history, price sensitivity, and business objectives

10. Ranking in Recommendation Systems

10.1 Candidate Ranking

Recommendation ranking usually happens after candidate generation. The candidate pool may already be hundreds or thousands of items rather than millions.

The ranker then decides ordering using user, item, and context features.

10.2 Multi-Stage Ranking

The same multi-stage logic from search applies, but recommendation models are often more feature-heavy.

Typical pipeline:

  1. generate candidates from many sources
  2. apply a lightweight ranker to remove obvious weak candidates
  3. apply a heavier model to a smaller candidate set
  4. apply diversity, policy, and business constraints
flowchart LR
	CTX[User + Session + Context] --> CG[Candidate Sources]
	CG --> CANDS[Candidate Pool]
	CANDS --> FAST[Fast Ranker]
	FAST --> HEAVY[Heavy Ranker]
	HEAVY --> MIX[Diversity / Fairness / Ads / Policy]
	MIX --> FEED[Final Feed]

10.3 Lightweight vs Heavy Rankers

Fast rankers may use:

  • simple feature transforms
  • linear models
  • shallow trees
  • small neural models

Heavy rankers may use:

  • deeper neural models
  • sequence models over session history
  • expensive cross-feature interactions
  • richer content understanding features

The reason for multiple rankers is simple: the most accurate model is often too expensive to run on too many candidates.

10.4 ML Ranking Basics

ML rankers usually learn from historical interactions.

Common labels or targets:

  • click
  • long dwell time
  • like or save
  • watch completion
  • add to cart
  • purchase
  • hide or negative feedback

The hardest part is not fitting a model. It is defining the right objective and handling bias in logged data.

10.5 Online Ranking Decisions

At request time, the system may incorporate:

  • current session signals
  • latest follows or interactions
  • freshness windows
  • user device and network conditions
  • time of day or location
  • safety or rate-limit decisions

This is why recommendation systems are rarely fully precomputed.

10.6 Delayed Feedback Challenges

Some outcomes arrive late.

Examples:

  • purchases happen long after an impression
  • subscription retention takes days or weeks
  • satisfaction surveys are sparse

This creates training and evaluation problems because immediate clicks are easy to measure, but long-term satisfaction is harder.

10.7 Fairness and Creator Fairness

Real platforms often need fairness constraints such as:

  • not letting one creator dominate all slots
  • giving new creators a chance to gather signal
  • balancing exposure across categories or sellers
  • avoiding discrimination in jobs, housing, lending, or other regulated domains

These are not only ethics topics. They are product-health topics.

10.8 Freshness vs Relevance

Older content may have stronger engagement history. Newer content may be more timely.

Different products choose differently:

  • news and social updates care heavily about freshness
  • evergreen education or documentation may prioritize authority over recency
  • commerce often needs a mix of demand history and current availability

11. Fanout

11.1 What Fanout Means

Fanout is the process of distributing content references to the users who may see them.

This is most commonly discussed for social feeds.

If a user posts something, how do followers get it into their home timelines?

11.2 Fanout-on-Write

In fanout-on-write, when a user creates a post, the system pushes that post reference into follower timelines immediately or soon after write time.

Why it exists:

  • fast feed reads for ordinary users
  • precomputed per-user timelines

Tradeoff:

  • very expensive for users with huge follower counts
  • write amplification can be enormous

11.3 Fanout-on-Read

In fanout-on-read, when a user opens the app, the system fetches posts from followed accounts and constructs the feed at read time.

Why it exists:

  • avoids massive write amplification
  • handles high-fanout authors better

Tradeoff:

  • more expensive reads
  • more complex low-latency ranking at request time

11.4 Hybrid Models

Real systems often use hybrid approaches.

Common pattern:

  • ordinary users: fanout-on-write
  • celebrity or mega-scale accounts: fanout-on-read or special handling

This is the classic celebrity problem.

11.5 Fanout-on-Write vs Fanout-on-Read

Dimension Fanout-on-Write Fanout-on-Read
Write cost High Lower
Read cost Lower Higher
Good for Many readers with modest graph sizes Large fanout creators and flexible ranking
Freshness control Good if timelines update quickly Good if reads fetch current data
Complexity Simpler reads, harder writes Harder reads, simpler writes

11.6 Social Feed Architecture Example

flowchart LR
	P[Post Created] --> DIST[Distribution Service]
	DIST --> FW[Write to Follower Timelines]
	DIST --> HOT[Mark Celebrity Posts for Read-Time Fetch]
	U[User Opens Feed] --> FEED[Feed Service]
	FEED --> TL[(Timeline Cache)]
	FEED --> HOT
	TL --> RANK[Rank + Merge]
	HOT --> RANK
	RANK --> OUT[Feed Page]

11.7 Cache Invalidation Challenges

Feed systems must handle:

  • deleted posts
  • blocked users
  • privacy changes
  • edited content
  • ranking model changes

If timelines are heavily cached or precomputed, invalidation becomes hard.

11.8 Common Failure Cases

  • pushing celebrity posts to millions of timelines and overwhelming storage or queues
  • expensive read-time joins across too many sources
  • duplicated items because of retries or hybrid merge bugs
  • stale deleted content because cache invalidation lagged

11.9 Interview Angle

If asked to design a social feed, always discuss fanout strategy. That is one of the main architectural decisions.

12. Candidate Generation

12.1 Why Candidate Generation Exists

Recommendation ranking does not start from the whole corpus. It starts from a narrowed candidate pool.

Why?

Because ranking millions or billions of items per request is impossible.

Candidate generation reduces the problem from "everything" to "a few hundred or thousand promising items".

12.2 Common Candidate Sources

Production systems often combine many candidate sources:

  • collaborative filtering
  • content-based similarity
  • social graph neighbors
  • trending or popular items
  • creator-follow graph
  • embedding nearest neighbors
  • recently interacted entities
  • business-curated pools

The union of these sources becomes the candidate pool.

12.3 Collaborative Filtering Basics

Collaborative filtering uses behavioral similarity.

Core intuition:

  • users who behaved similarly in the past may like similar things in the future
  • items that co-occur in behavior may be related

Examples:

  • users who bought product A often buy product B
  • users who watched show X often watch show Y

Amazon-style "customers also bought" is the classic mental model.

12.4 Content-Based Filtering Basics

Content-based filtering uses item attributes.

Examples:

  • recommend jobs similar to jobs a user previously clicked
  • recommend articles with similar topics or embeddings
  • recommend videos with related audio, captions, or visual features

This is useful for cold start because it does not depend entirely on historical interaction volume.

12.5 Graph-Based Candidates

Graph-based candidates come from relationships:

  • who the user follows
  • who similar users follow
  • authors frequently co-engaged by the same audience
  • co-starred repos or linked documents

Graph candidates are common in social apps, GitHub-like collaboration systems, and commerce systems with co-view or co-purchase graphs.

Trending pools capture global or local momentum.

Why they matter:

  • they solve some cold start problems
  • they inject freshness
  • they expose popular content without deep personalization

But trending alone is not personalization.

12.7 Embedding Retrieval and ANN Basics

Embeddings map users and items into vector spaces where similar concepts are close together.

This enables nearest-neighbor retrieval:

  • find items near the user's embedding
  • find items near the current session embedding
  • find items similar to the current content being viewed

Exact nearest-neighbor search can be expensive at scale, so many systems use approximate nearest neighbor, or ANN, techniques.

High-level idea:

  • use structures that avoid comparing against every vector
  • trade a little exactness for large speed gains

This is often good enough for candidate generation because ranking happens later.

12.8 Why Ranking Starts After Candidate Generation

Candidate generation is about recall and breadth.

Ranking is about precision and ordering.

If you skip candidate generation, ranking is too expensive.

If candidate generation is poor, ranking quality is capped.

This boundary is one of the most important concepts in modern recommendation systems.

13. Personalization

13.1 What Personalization Means

Personalization means the same corpus produces different results for different users.

Examples:

  • two users searching the same marketplace query may see different ranking orders
  • two Netflix users get different homepages
  • two SaaS users searching the same global search term see different accessible docs and different likely hits

13.2 User Profiles

A personalization profile may include:

  • long-term interests
  • recent interactions
  • follows, subscriptions, or teams
  • geographic preferences
  • device/network patterns
  • explicit preferences
  • negative feedback and muted topics

Profiles are often built from both online and offline signals.

13.3 Implicit vs Explicit Feedback

Explicit feedback:

  • likes, ratings, follows, saves, thumbs up, manual preferences

Implicit feedback:

  • clicks, dwell time, purchases, watch completion, skips, repeats, hides

Implicit data is abundant but noisy. Explicit data is sparse but clearer.

13.4 Long-Term vs Short-Term Interests

Long-term interests represent stable tastes.

Short-term interests capture immediate intent.

Example:

  • a user generally likes backend engineering content
  • this week the user is specifically searching for Redis and search systems

TikTok-like feeds often weight short-term session intent heavily. Netflix-like systems also care about context, but long-term taste matters more for broad discovery.

13.5 Contextual Ranking

Ranking may depend on context such as:

  • time of day
  • current page or query
  • device type
  • network speed
  • location
  • current session sequence

Example:

  • low-bandwidth users may get lower-bitrate or different video choices
  • local services like Uber care strongly about geography and current location

13.6 Privacy Considerations

Personalization raises privacy questions:

  • how much behavioral data is stored
  • how long it is retained
  • whether sensitive attributes are inferred
  • whether users can opt out or reset personalization
  • whether data is used across products or only within one surface

Production systems need data minimization, retention controls, access controls, and often region-specific compliance behavior.

13.7 Explainability Basics

Explainability means giving a human-understandable reason for some results.

Examples:

  • "Because you watched..."
  • "Suggested because you follow..."
  • "Related to your recent searches"

Explainability is useful for trust, debugging, and product feedback even if the underlying model is more complex than the explanation suggests.

13.8 Common Mistakes

  • overfitting to recent clicks and making feeds unstable
  • ignoring negative feedback
  • overpersonalizing so much that exploration disappears
  • using sensitive data carelessly

14. Feed Generation

14.1 What Feed Generation Is

Feed generation is the process of deciding what appears in a user's home timeline or discovery surface.

This is one of the hardest system design topics because it combines:

  • graph data
  • recommendation ranking
  • caching
  • fanout strategy
  • freshness
  • pagination
  • abuse controls
  • content safety

14.2 Home Feed Architecture

flowchart LR
	U[User Opens App] --> FEED[Feed Service]
	FEED --> PROF[Profile / Session Service]
	FEED --> SOURCES[Candidate Sources]
	SOURCES --> FOLLOW[Following Graph]
	SOURCES --> TREND[Trending]
	SOURCES --> EMB[Embedding Retrieval]
	SOURCES --> CACHE[(Timeline / Candidate Cache)]
	SOURCES --> RANK[Ranking Service]
	RANK --> HYD[Hydration / Metadata Fetch]
	HYD --> PAGE[Paginated Response]

14.3 Cache Layers

Feed systems may cache:

  • precomputed timelines
  • candidate pools
  • ranking features
  • hydrated entity metadata
  • first page responses for very hot users or anonymous feeds

Caching helps, but cache invalidation is hard because feed contents change frequently and are personalized.

14.4 Pagination Challenges

Pagination in feeds is not as simple as offset and limit.

Problems with offset-based pagination:

  • feed contents change between requests
  • inserts at the top shift offsets
  • duplicates or gaps appear

Cursor-based pagination is usually better.

But even cursor pagination is tricky if the ranking model is highly dynamic.

14.5 Consistency vs Freshness

Users want fresh content, but highly dynamic feed generation can lead to inconsistent paging and repeated items.

Common compromise:

  • freeze a short-lived ranked window for a session or cursor
  • refresh when the user pulls to refresh or a new session begins

14.6 Backfill Strategies

Backfill means what to show when the natural candidate pool is sparse.

Examples:

  • new user with few follows
  • quiet time period with not enough new content
  • strict filters remove many candidates

Backfill sources may include:

  • trending content
  • suggested accounts or topics
  • evergreen content
  • sponsored content under policy rules

14.7 Ranking at Read Time vs Write Time

Write-time ranking:

  • rank or partially prepare content as it is distributed
  • fast reads
  • less flexible when user state changes

Read-time ranking:

  • more personalized and fresh
  • more expensive and latency-sensitive

Many systems use hybrid approaches: precompute easy parts, rank final candidates at read time.

14.8 Product Styles

X or Twitter-like following feed:

  • strong graph component
  • hybrid fanout patterns
  • freshness matters heavily

Instagram-like home feed:

  • mix of follow graph, engagement prediction, and recommendations
  • strong importance of re-ranking and diversity

TikTok-like For You feed:

  • candidate generation from broad corpus, not only follows
  • strong session-based ranking and content understanding

14.9 Common Failure Cases

  • stale caches showing deleted or blocked content
  • duplicated items across pages
  • expensive read-time ranking melting the service during traffic spikes
  • feedback loops making the feed monotonous
  • ranking bugs that over-prioritize one creator or content type

15. Object Storage

15.1 What Object Storage Is

Object storage stores data as objects, each usually accessed by a key within a bucket or namespace.

An object typically contains:

  • the binary content
  • metadata
  • a key or name

Object storage is the default storage layer for large unstructured blobs such as:

  • images
  • videos
  • PDFs
  • archives
  • backups
  • logs
  • user uploads

15.2 Why It Exists

Databases are good for structured records. Local disks are tied to one machine. Traditional file systems provide hierarchical paths and POSIX-like semantics.

Object storage exists because internet-scale systems need:

  • massive scale
  • high durability
  • relatively simple access patterns
  • low operational burden per file
  • cost-effective storage of large blobs

15.3 Object Storage vs Block Storage vs File Systems

| Dimension | Object Storage | Block Storage | File System | |---|---|---| | Interface | Key/object API | Raw blocks attached to machines | Files and directories | | Typical use | Media, backups, logs, attachments | Databases, VM disks, low-level persistent volumes | Shared files, app files, local hierarchical access | | Scaling model | Very large namespaces, distributed service | Usually attached volumes per instance or host | Depends on file system implementation | | Mutation model | Often write whole objects or multipart operations | Fine-grained block updates | File operations with richer semantics | | Strength | Durability and scale for blobs | Low-level performance control | Familiar file semantics |

15.4 Object Storage vs Database vs Local Disk

| Dimension | Object Storage | Database | Local Disk | |---|---|---| | Best for | Large blobs | Structured records and queries | Fast machine-local access | | Query support | Minimal metadata lookup | Rich queries and indexes | Minimal unless app-managed | | Durability model | Service-level replication or erasure coding | Depends on DB replication and backup | Depends on host and disk setup | | Sharing | Easy across services | Structured access only | Tied to machine unless networked | | Cost profile | Usually cheap per GB for large blobs | Higher for blob-heavy usage | Cheap locally but operationally limited |

15.5 Durability Concepts

Object stores are designed for very high durability.

Internally, that typically means some combination of:

  • replication across devices or zones
  • erasure coding
  • background integrity checks
  • repair workflows for lost fragments

Durability is different from availability.

An object can be highly durable but temporarily unavailable due to network or control-plane issues.

15.6 Scalability Characteristics

Object stores are designed for:

  • huge object counts
  • independent object retrieval
  • simple write and read APIs
  • high parallelism

They are not designed for transactional joins or row-level relational queries.

15.7 Object Immutability Concepts

Many object storage workflows treat objects as immutable.

Instead of modifying a large object in place, systems often:

  • upload a new version
  • update metadata pointers
  • rely on versioning for history

This simplifies distributed durability and caching.

15.8 Metadata and CDN Relationship

Object stores are often paired with:

  • a database for metadata and permissions
  • a CDN for global low-latency delivery

Why a CDN matters:

  • it caches content near users
  • reduces origin load
  • improves image and video latency dramatically

15.9 Cost Considerations

Cost is not just storage per GB.

Real costs include:

  • request volume
  • data transfer out
  • replication region choices
  • lifecycle tiering
  • media derivative explosion such as multiple image sizes and video renditions

Systems that look cheap at rest can become expensive in egress and processing.

15.10 Lifecycle Policies

Object stores often support lifecycle policies such as:

  • transition old objects to cheaper storage tiers
  • expire temporary uploads
  • clean abandoned multipart uploads
  • retain or lock data for compliance windows

This matters for backups, logs, and SaaS attachment retention.

16. S3-Style Storage

16.1 Buckets, Objects, and Keys

S3-style systems usually organize data as:

  • bucket: top-level namespace or container
  • object: stored binary plus metadata
  • key: object identifier within the bucket

Despite folder-like UIs, the key space is conceptually flat. Paths are usually naming conventions encoded into the key.

16.2 Versioning

Versioning keeps previous object versions when objects are overwritten or deleted.

Why it matters:

  • accidental delete recovery
  • auditability
  • rollback
  • safer overwrite semantics

16.3 Pre-Signed URLs

Pre-signed URLs allow temporary access to upload or download an object without proxying the bytes through the application server.

Why they are useful:

  • reduce backend bandwidth load
  • keep object store credentials hidden from clients
  • enforce short-lived, scoped access

This is one of the most common production upload and download patterns.

16.4 Access Control Basics

Access control is usually layered:

  • bucket-level policies
  • object-level permissions in some systems
  • application-level authorization before issuing signed URLs
  • CDN or origin access policies

Best practice: do not make private content directly world-readable and hope the frontend hides the URL.

16.5 Multipart Uploads

Multipart uploads break a large object into parts.

Benefits:

  • retry only failed parts
  • upload parts in parallel
  • resume large uploads more efficiently

This is essential for video uploads, cloud drive systems, and large backups.

16.6 Consistency Basics

Modern object stores often provide strong consistency for many common operations within a region, but engineers should still think carefully about distributed workflows.

Why?

  • event notifications may arrive asynchronously
  • cross-region replication may lag
  • caches and CDNs may serve stale content
  • metadata DB updates may race with object lifecycle events

So even if storage reads are strongly consistent, the surrounding workflow may still be eventually consistent.

16.7 Event-Driven Workflows

Object creation often triggers downstream work:

  • antivirus scanning
  • image resizing
  • video transcoding
  • metadata extraction
  • OCR or transcription
  • search indexing

This is why object storage frequently sits at the center of async pipelines.

16.8 Real-World Examples

  • user uploads for profile photos or attachments
  • backup archives
  • application logs shipped to long-term storage
  • static site assets
  • media hosting for images and videos

17. Uploads and Downloads

17.1 Direct Upload vs Backend Proxy Upload

There are two common upload styles.

Strategy How it works Good for Main tradeoff
Direct upload Client gets signed URL and uploads to object storage directly Large media, high scale, low backend bandwidth More client complexity and async workflow coordination
Backend proxy upload Client uploads bytes to app backend, backend forwards or stores Simpler auth/control, small files, strict validation workflows Backend becomes bandwidth bottleneck

For large-scale media systems, direct upload is usually preferred.

17.2 Browser Upload Flow

flowchart LR
	C[Client] --> API[Backend API]
	API --> AUTH[Authorize Upload]
	AUTH --> SIGN[Generate Signed Upload URL]
	SIGN --> C
	C --> OBJ[(Object Storage)]
	OBJ --> EVT[Object Created Event]
	EVT --> PROC[Scan / Extract / Transform]
	PROC --> DB[(Metadata DB)]
	DB --> READY[Asset Ready]

Typical steps:

  1. client asks backend to start upload
  2. backend authorizes user and creates upload record
  3. backend returns signed URL or multipart session data
  4. client uploads directly to object storage
  5. object store emits event
  6. async processors validate and enrich the asset
  7. metadata DB marks asset ready for use

17.3 Mobile Upload Considerations

Mobile uploads are harder because of:

  • flaky networks
  • app backgrounding
  • battery constraints
  • limited memory
  • varying file sizes and camera formats

Best practices:

  • resumable multipart uploads
  • persistent local upload state
  • idempotent retry tokens
  • chunk sizes tuned for mobile conditions

17.4 Secure Download Patterns

Secure download is often implemented with:

  • backend authorization check
  • short-lived signed URL or signed cookie
  • private object origin behind CDN
  • optional watermarking or audit logging for sensitive downloads

Typical SaaS pattern:

  • metadata and permissions live in the app DB
  • app checks access
  • app issues a short-lived signed URL for the specific file or CDN path
flowchart LR
	REQ[Client Requests File] --> API[Backend API]
	API --> AUTH[Authorize User / Tenant / ACL]
	AUTH --> SIGN[Issue Short-Lived Signed URL or Cookie]
	SIGN --> CDN[CDN / Private Origin]
	CDN --> OBJ[(Object Storage)]
	OBJ --> CDN
	CDN --> RESP[Client Downloads File]

17.5 Resumable Uploads

Resumable upload means the client can continue after failure without restarting from zero.

This matters for:

  • large videos
  • weak mobile connectivity
  • long uploads in browser tabs
  • enterprise file transfer systems

17.6 Retry Strategies

Retries must be careful.

Good practice:

  • retry failed chunks, not the whole upload
  • use exponential backoff with limits
  • make upload initiation idempotent
  • track completed parts so retries do not duplicate work

17.7 Integrity Verification

File integrity matters because uploads can be corrupted or truncated.

Checks may include:

  • size verification
  • checksum per part and final checksum
  • content-type validation
  • file signature or magic-byte inspection

Do not trust only the filename extension.

17.8 Antivirus Scanning Basics

Many production systems scan user uploads before making them broadly available.

Common flow:

  • upload lands in quarantine or pending state
  • scan job checks the file
  • file is promoted to usable state only after passing validation

This is common in SaaS attachment systems and customer-facing file upload platforms.

17.9 Failure Cases

  • signed URL expired during large upload
  • backend marked metadata row ready before scan or transform completed
  • clients retried whole uploads and multiplied storage cost
  • insecure direct object exposure leaked private files

18. Chunked Uploads

18.1 Why Chunked Uploads Exist

Chunked or multipart uploads exist because large files are fragile to send as one giant request.

Problems with single-shot uploads:

  • one failure restarts the whole transfer
  • memory usage may be large
  • progress tracking is coarse
  • network interruptions waste more work

18.2 Multipart Upload Flow

flowchart LR
	INIT[Initiate Multipart Upload] --> PARTS[Upload Parts in Parallel]
	PARTS --> TRACK[Track Completed Part IDs + Checksums]
	TRACK --> COMPLETE[Complete Upload]
	COMPLETE --> ASSEMBLE[Store Final Object]

Typical process:

  1. client requests multipart upload session
  2. backend or object store returns upload ID and per-part instructions
  3. client uploads parts independently
  4. client records which parts succeeded
  5. final completion call assembles the object

18.3 Resume After Failure

To resume correctly, the client or backend needs state such as:

  • upload ID
  • part numbers
  • completed part identifiers or ETags
  • expected total file size
  • checksum state if used

This state may live in:

  • client local storage
  • backend DB
  • Redis for short-lived sessions

18.4 Parallel Uploads

Parallel part uploads improve throughput, especially for large files.

Tradeoffs:

  • more concurrency can increase speed
  • too much concurrency can overwhelm mobile devices, browsers, or rate limits

Chunk size and concurrency are tuning knobs.

18.5 Ordering and Assembly

Even when parts upload in parallel, the final object needs deterministic ordering.

The system uses part numbers or ordered manifests to assemble the correct final file.

18.6 Checksum Verification

Checksums can be used at:

  • per-part level
  • final object level

This helps detect corruption during transfer or assembly.

18.7 Large File Handling in Production

For very large uploads, systems often add:

  • rate limiting by user or tenant
  • upload quotas
  • expiration of abandoned multipart sessions
  • lifecycle cleanup of orphaned parts

Cloud drives and video platforms rely heavily on these controls.

18.8 Common Mistakes

  • not cleaning abandoned multipart parts
  • not persisting upload state for resume
  • choosing chunk sizes without considering mobile networks
  • trusting client-provided completion status without validation

19. Metadata

19.1 Why Metadata Exists Separately

Object storage is good at storing blobs, but most applications need richer metadata and business rules around each file.

Examples of metadata:

  • owner user or tenant
  • permission model
  • file name and original content type
  • processing status
  • timestamps
  • retention or legal hold flags
  • checksum and size
  • references to derived assets such as thumbnails or transcoded variants

This metadata is usually stored in a database, not only inside the object store.

19.2 Database + Object Storage Relationship

Common pattern:

  • object store holds the bytes
  • DB holds metadata, ownership, permissions, and workflow state

Why this split is useful:

  • queries are easier
  • business transactions stay in the DB
  • permissions are easier to manage
  • processing state is easier to update

Stripe-like or GitHub-like systems use this model for user files, exports, logs, and compliance artifacts.

19.3 Ownership and Permissions

Metadata tables often model:

  • user ownership
  • organization or tenant ownership
  • access scopes
  • sharing links
  • expiration rules

The object key alone should not be the permission system.

19.4 Indexing Metadata

Applications often need search over metadata, not the object bytes themselves.

Examples:

  • search files by filename, tag, owner, created date
  • search PDFs by OCR text and metadata
  • search media library by dimensions, duration, language, transcript terms

This is where search and file systems meet.

19.5 Audit Logging

Sensitive file systems often log:

  • who uploaded a file
  • who downloaded it
  • who changed permissions
  • who deleted or restored it

Audit trails are critical in enterprise SaaS, finance, healthcare, and security tools.

19.6 Soft Delete Patterns

Soft delete means a record is marked deleted in metadata before hard physical removal.

Why it helps:

  • recovery from mistakes
  • retention window enforcement
  • asynchronous cleanup jobs

The object may remain in storage until retention rules allow permanent deletion.

19.7 Retention Policies

Retention policies control how long data stays available or must be preserved.

This matters for:

  • compliance
  • customer contracts
  • internal audits
  • backup windows

19.8 Why Metadata Usually Does Not Live Only in the Object Store

Because object stores are not designed to answer business questions efficiently.

Questions like these belong in a metadata DB or search index:

  • show all files owned by tenant X uploaded in the last 7 days
  • list all pending virus scans
  • find all files shared externally
  • restore the previous version of this document

20. Versioning

20.1 Object Versioning

Versioning means keeping multiple historical states of an object rather than replacing the old state permanently.

This supports:

  • rollback after mistakes
  • accidental deletion protection
  • auditability
  • legal retention

20.2 File History

In user-facing products such as Google Drive or Dropbox, file history is a product feature.

Under the hood, this may mean:

  • multiple object versions in storage
  • metadata rows pointing to the active version
  • retention rules controlling how long old versions are kept

20.3 Rollback Capability

Rollback often means changing metadata to point to an older version rather than mutating the object in place.

This is simpler and safer in distributed systems.

20.4 Accidental Deletion Protection

Versioning helps because delete can mean:

  • create a delete marker
  • hide current version
  • retain earlier versions for recovery

This protects users from mistakes and protects operators from incidents.

Some systems need to prevent deletion for compliance or litigation.

That means versioning and retention logic must support:

  • immutable retention windows
  • hold flags
  • auditability around deletion attempts

20.6 Overwrite Semantics

Without versioning, overwrite means old content is lost.

With versioning, overwrite usually means:

  • new object version becomes current
  • old version remains restorable until policy cleanup

20.7 Tradeoffs

  • better safety and recovery
  • higher storage cost
  • more complex metadata and lifecycle management

Production systems usually accept that tradeoff for important user data.

21. Image Optimization

21.1 Why Image Optimization Exists

Raw uploaded images are often too large, too slow, or in the wrong format for user-facing delivery.

Image optimization exists to improve:

  • page load time
  • bandwidth cost
  • visual quality per byte
  • rendering on different device sizes

This matters for profile pictures, e-commerce catalogs, social posts, dashboards, and documentation platforms.

21.2 Resizing and Thumbnails

Common derivatives:

  • thumbnail
  • small card image
  • medium detail view
  • large zoomable image

Why precompute sizes:

  • repeated on-the-fly resizing is expensive
  • predictable variants simplify caching
  • UI surfaces often reuse the same size classes

21.3 Responsive Image Delivery

Different devices need different image sizes.

Serving a giant desktop asset to a mobile client wastes bandwidth.

Responsive delivery uses:

  • multiple size variants
  • CDN selection or URL conventions
  • client hints or frontend image markup strategies

21.4 Format Conversion Basics

Common formats:

  • JPEG: widely compatible, good for photos
  • PNG: lossless and good for sharp graphics or transparency-heavy content
  • WebP: often smaller than JPEG/PNG for many web cases
  • AVIF: often strong compression efficiency, but with ecosystem and encoding tradeoffs

The right format depends on content type, compatibility, and CPU cost.

21.5 Lazy Loading Relevance

Lazy loading reduces unnecessary downloads for off-screen images.

This is mostly a frontend delivery concern, but backend and CDN design still matter because smaller, optimized derivatives make lazy loading much more effective.

21.6 CDN Image Optimization

Some systems optimize images on request at the edge or CDN layer.

Benefits:

  • fewer precomputed variants needed
  • flexible resizing
  • device-aware format negotiation

Tradeoffs:

  • added compute cost at the edge or image service
  • cache fragmentation if variant space is uncontrolled

21.7 Quality vs Size Tradeoffs

Image optimization is always a tradeoff.

Too much compression causes artifacts. Too little wastes bandwidth and storage.

E-commerce sites care a lot here because product clarity affects conversion, but slow pages also hurt conversion.

21.8 Async Processing Pipelines

Common image flow:

  1. upload original asset
  2. validate and scan
  3. generate variants
  4. store derivatives in object storage
  5. publish metadata and CDN paths

Profile pictures, avatar systems, and commerce catalogs commonly use this asynchronous derivative pipeline.

22. Video Transcoding

22.1 Why Transcoding Exists

A raw uploaded video is rarely suitable for direct delivery to every device and network condition.

Transcoding converts the input into delivery-friendly formats and variants.

Why it is needed:

  • devices support different codecs and containers
  • users have different bandwidth conditions
  • multiple resolutions are needed
  • streaming systems need chunked delivery formats

22.2 Codec Basics

A codec defines how video and audio are compressed and decoded.

You do not need deep media math in most interviews, but you should know:

  • codecs affect compatibility, compression efficiency, and CPU cost
  • better compression often costs more encode time
  • playback support across devices matters as much as compression ratio

22.3 Bitrate Adaptation and Resolution Variants

Adaptive streaming works by producing multiple renditions such as:

  • 240p low bitrate
  • 480p medium bitrate
  • 720p or 1080p higher bitrate

The player switches between them based on network and device conditions.

This reduces buffering and improves startup reliability.

22.4 HLS and DASH Basics

HLS and DASH are common adaptive streaming approaches.

High-level idea:

  • split video into chunks or segments
  • generate manifests listing available renditions and segments
  • player fetches segments dynamically based on bandwidth and playback logic

22.5 Async Job Processing

Video transcoding is expensive, so it is almost always asynchronous.

Common architecture:

flowchart LR
	UP[Uploaded Video] --> Q[Queue / Job Orchestrator]
	Q --> TR[Transcoding Workers]
	TR --> PACK[Package HLS / DASH Variants]
	TR --> THUMB[Thumbnail Extraction]
	PACK --> OBJ[(Object Storage)]
	THUMB --> OBJ
	OBJ --> CDN[CDN]

22.6 Queue-Based Transcoding Systems

Why queues are used:

  • uploads arrive in bursts
  • transcoding jobs vary wildly by duration and cost
  • retries and worker scaling need decoupling
  • backpressure needs to be explicit

Workers may be specialized by codec, region, or job size.

22.7 Storage Explosion Challenges

Video systems multiply data quickly.

One uploaded asset may create:

  • multiple renditions
  • audio tracks
  • captions or subtitles
  • thumbnails
  • preview clips
  • manifests

This is why video platforms think carefully about retention, renditions, archival tiers, and whether every input truly needs every derivative.

22.8 Playback Optimization

Playback quality depends on more than transcoding.

It also depends on:

  • segment sizing
  • CDN placement
  • startup buffer strategy
  • manifest design
  • thumbnail or preview availability
  • device compatibility testing

YouTube- and Netflix-style systems invest heavily in startup latency and rebuffer reduction because user drop-off is highly sensitive to playback quality.

22.9 Common Failure Cases

  • queue backlogs causing long time-to-playable
  • corrupted input files causing worker crashes
  • incompatible renditions for some clients
  • expensive reprocessing after pipeline changes
  • storage growth from keeping every derivative forever

23. Thumbnails and Previews

23.1 Why Thumbnails Exist

Thumbnails help users decide what to open before downloading or playing the full asset.

They matter for:

  • video browsing
  • PDF and document explorers
  • image galleries
  • cloud drive UIs
  • e-commerce product grids

23.2 Video Thumbnail Generation

Video thumbnails may be generated by:

  • picking fixed offsets
  • picking keyframes
  • choosing frames based on saliency or quality heuristics

Why keyframe selection matters:

  • a black frame or transition frame makes the content look broken
  • better thumbnails improve CTR and perceived quality

23.3 Document Preview Generation

Documents such as PDFs or presentations often need previews.

Typical flow:

  • extract first page image or several page previews
  • store derived images in object storage
  • cache them behind a CDN

23.4 Async Preview Generation

Preview generation is usually asynchronous because:

  • file types vary
  • extraction can be CPU-heavy
  • malformed files must be isolated from the main request path

23.5 Caching Strategies

Thumbnail and preview requests are highly cacheable.

Common patterns:

  • store deterministic derivative paths
  • serve from CDN
  • cache aggressive immutable variants when versioned in URL

23.6 Failure Cases

  • preview service timing out on large or malformed documents
  • thumbnail chosen from poor video frame
  • preview cache not invalidated after replacement or new version upload

24. Compression

24.1 Why Compression Exists

Compression reduces storage and network transfer size.

It matters because:

  • bandwidth is expensive
  • users have limited network quality
  • storage multiplies at scale
  • smaller payloads improve latency when CPU cost is acceptable

24.2 Lossless vs Lossy

Type Meaning Common use
Lossless Original data can be reconstructed exactly Text, archives, logs, some images, many document workflows
Lossy Some information is discarded for higher compression Photos, audio, video

24.3 CPU Tradeoffs

Compression is not free.

Stronger compression can save bandwidth or storage but cost more CPU and latency.

This is why production systems choose compression differently for:

  • hot online serving
  • asynchronous background processing
  • archival storage

24.4 Compression in Uploads and Downloads

Compression may happen at multiple places:

  • client-side before upload in some media apps
  • server-side during processing
  • CDN or transport layer for text-based responses
  • archival pipeline for logs and backups

Do not blindly recompress already compressed media formats. It may waste CPU and reduce quality.

24.5 Media-Specific Considerations

Images and videos already use domain-specific codecs. General-purpose compression on top often gives limited benefit.

Instead, media optimization usually means:

  • choosing the right codec or format
  • choosing quality settings carefully
  • generating the right resolution variants

24.6 Archive Workflows

Archive workflows such as backups and log retention often use strong compression because:

  • data is cold
  • latency matters less
  • storage savings compound heavily over time

24.7 Where Compression Should Happen

Good rule of thumb:

  • compress once in a controlled part of the pipeline
  • avoid repeated transcoding or recompression unless there is a clear reason
  • separate serving optimization from archival optimization

24.8 Common Mistakes

  • recompressing lossy media too many times
  • optimizing for smallest size and harming user experience
  • using CPU-heavy compression in hot request paths without need

25. How These Systems Connect in Real Architectures

The most useful mental model is that search, recommendation, and media systems are not isolated services. They are derived-serving systems around a source-of-truth core.

25.1 Example: E-Commerce Architecture

Product system:

  • DB stores product, inventory, seller, and pricing records
  • search index stores analyzed text plus filterable fields
  • ranking combines lexical relevance, popularity, margin, seller trust, and availability
  • recommendation system generates related items and home feed modules
  • object storage stores product images and videos
  • image pipeline generates thumbnails and responsive variants
  • CDN serves optimized media globally

Failure discussion:

  • stale index may show out-of-stock products
  • stale media caches may show old images after updates
  • poor ranking can bury relevant products even if retrieval worked

25.2 Example: SaaS Document Platform

Document system:

  • metadata DB stores ownership, permissions, and workflow state
  • object storage stores uploaded files and derived previews
  • search index stores filename, OCR text, comments, tags, and ACL-aware retrieval fields
  • recommendation or discovery surface suggests recent or relevant docs
  • signed URLs protect downloads
  • antivirus and preview generation run asynchronously

Failure discussion:

  • ACL lag can leak confidential docs if search filtering is wrong
  • metadata/object mismatch can show broken files
  • preview lag makes the product feel stale even when upload technically succeeded

25.3 Example: Short-Form Video Platform

flowchart LR
	CREATOR[Creator Upload] --> OBJ[(Object Storage)]
	OBJ --> MEDIA[Transcode + Thumbnail + Moderation]
	MEDIA --> CDN[CDN]
	MEDIA --> META[(Metadata DB)]
	META --> INDEX[Search / Hashtag Index]
	META --> REC[Candidate Generation + Ranking]
	REC --> FEED[Home Feed Service]
	FEED --> USER[Viewer]
	CDN --> USER

System properties:

  • upload pipeline must be reliable and resumable
  • media pipeline must scale with bursty creator traffic
  • recommendation system must generate candidates and rank them quickly
  • search index may support creators, hashtags, captions, or sounds
  • CDN must absorb global playback traffic

25.4 Common Cross-System Failure Modes

  • source-of-truth DB updated but search index stale
  • object uploaded but metadata row missing
  • metadata row exists but object processing failed
  • recommendation service uses stale features or bad model rollout
  • CDN serves old media after overwrite because cache keying is wrong
  • ACL changes propagate inconsistently across DB, search, cache, and signed download logic

25.5 Strong Engineering Principles Across All These Systems

  • keep a clear source of truth
  • treat indexes, feeds, and media derivatives as derived serving layers
  • design async pipelines to be idempotent and replay-safe
  • measure freshness, not just latency
  • plan for backfills and reprocessing from day one
  • separate hard constraints from soft ranking
  • build for partial degradation, not only full success
  • treat permissions as first-class, especially in search and file access

26. Interview Playbook

If you are asked to design one of these systems, structure your answer around these questions:

  1. What is the user-facing behavior and latency expectation?
  2. What is the source of truth?
  3. What derived indexes or serving structures are needed?
  4. What is precomputed vs done online?
  5. How does ranking or retrieval work?
  6. What are the main failure cases and stale-data risks?
  7. How do permissions, abuse, and compliance affect the design?
  8. What changes at 10x scale?

26.1 High-Value Tradeoffs to Discuss

  • database query vs specialized search index
  • lexical retrieval vs semantic retrieval vs hybrid
  • freshness vs query performance
  • fanout-on-write vs fanout-on-read
  • precompute vs read-time ranking
  • direct upload vs backend proxy upload
  • object storage vs block storage vs file systems
  • exact counts vs approximate facets
  • aggressive personalization vs fairness and exploration

26.2 What Breaks at Scale

  • long-tail shard latency dominates search response time
  • popular prefixes overload autocomplete caches
  • ranking models become too expensive for online serving
  • celebrity fanout explodes write amplification
  • object store costs spike from derivative explosion and egress
  • background media pipelines backlog during traffic bursts
  • stale caches and delayed async processing create user-visible inconsistency

26.3 Final Mental Model

Search systems answer explicit intent.

Recommendation systems infer likely intent.

Object storage and media pipelines make large assets durable and deliverable.

The engineering challenge is rarely the isolated component. It is the interaction between source data, derived indexes, ranking, caching, asynchronous processing, permissions, and scale.

That is the level interviews usually want, and it is also the level production systems demand.