TarunElango/Computer-Fundamentals

Fork 0

Files

T

tarun-elango 26810e43d0 sd text

2026-04-26 13:27:19 -04:00

89 KiB

Raw Permalink Blame History

Search & Discovery + Media & File Systems

This chapter covers two families of systems that appear in almost every large product:

systems that decide what users can find
systems that store, transform, and deliver large binary data such as images, videos, documents, logs, and backups

In interviews, these topics are often split into separate questions: "Design search for an e-commerce app", "Design Instagram feed", "Design file upload for a SaaS product", or "Design YouTube video processing".

In production, they are deeply connected.

An e-commerce product page may depend on:

a transactional database for product metadata
a search index for keyword retrieval
a ranking system for relevance and business rules
a recommendation system for discovery
an object store for images and video
a CDN for fast delivery
an async media pipeline for thumbnails and optimization

So the real engineering question is not, "What is search?" or "What is S3?". The real question is:

How do these systems work together under scale, latency pressure, stale data, changing ranking logic, unreliable networks, expensive media processing, and strict security requirements?

This guide is written for both interview preparation and real backend engineering. The goal is not to memorize terms. The goal is to understand:

why each system exists
what problem it solves that simpler systems do not
how it works internally
what fails at scale
what tradeoffs strong engineers discuss in interviews
how companies actually combine these components in production

Examples in this guide are generalized from common patterns publicly discussed by companies such as Google, Amazon, Netflix, Uber, YouTube, Instagram, TikTok, GitHub, Stripe, and large SaaS platforms.

1. Big Picture: Why These Topics Belong Together

Search, discovery, and media systems all sit on the user-facing edge of backend engineering.

They are the systems users feel immediately:

search that returns the wrong result feels broken
autocomplete that lags feels cheap
feeds that repeat stale content feel low quality
uploads that fail at 95 percent feel unreliable
videos that buffer or thumbnails that look wrong feel unfinished

These systems also force tradeoffs faster than many internal systems:

relevance vs latency
freshness vs throughput
quality vs compute cost
personalization vs privacy
durability vs storage cost
precomputation vs flexibility

1.1 One Product, Many Subsystems

flowchart LR
	U[User] --> APP[API / App Backend]
	APP --> DB[(Primary DB)]
	DB --> CDC[CDC / Change Events]
	CDC --> SEARCH[Search Index]
	CDC --> FEAT[Feature / Event Pipeline]
	APP --> REC[Recommendation / Feed Service]
	FEAT --> REC
	REC --> CACHE[(Feed Cache)]
	APP --> OBJ[(Object Storage)]
	OBJ --> MEDIA[Media Processing Pipeline]
	MEDIA --> CDN[CDN]
	APP --> CDN
	SEARCH --> APP
	CACHE --> APP

This architecture is common across many products:

Amazon-like commerce: product DB, search index, ranking, recommendations, images in object storage
GitHub-like SaaS: repo and issue metadata in DB, code or issue search index, attachments in object storage, permissions filtering everywhere
YouTube or TikTok: metadata DB, feed ranking system, object storage, transcoding pipeline, CDN delivery
Stripe-like internal document systems: metadata DB, audit logs, secure file storage, signed downloads, retention policies

1.2 Search vs Discovery

These ideas are related but not identical.

Dimension	Search	Discovery / Recommendation
User intent	Explicit	Often implicit
Input	Query, filters, sort	User profile, session, context, behavior
Goal	Find what user asked for	Show what user is likely to want
Retrieval basis	Query-document match	Candidate generation from many signals
Example	"wireless headphones"	"Products you may like"

Search is usually intent retrieval. Recommendation is usually intent inference.

The best products use both.

1.3 Latency Expectations

Users tolerate different delays for different surfaces.

Surface	Typical expectation	Why it matters
Autocomplete	Often less than 50 ms server-side, very low hundreds end-to-end	Typing feels broken if suggestions lag
Search results	Often around 100-300 ms for the first result page	Search is interactive and abandonment is high
Home feed	Often low hundreds of ms for first page, with prefetching and caching	Users expect quick app open
File upload initiation	Usually should start immediately	Perceived responsiveness matters
Image delivery	Often tens of ms from edge	Visual surfaces must render fast
Video playback startup	Usually a few hundred ms to a few seconds depending on network and buffer policy	Startup delay strongly affects engagement

The exact target depends on product and network conditions, but the theme is consistent: these are latency-sensitive systems.

1.4 Interview Framing

When interviewers ask about search, feeds, or media, they are usually evaluating whether you can reason about:

data flow from source of truth to user-facing surface
specialized indexes or storage layouts
online vs offline computation
scale bottlenecks and hotspots
correctness and security boundaries
degradation behavior when one subsystem is stale or partially unavailable

Strong answers start from user behavior and workload shape, not from product names.

2. Search System

2.1 What a Search System Is

A search system is a retrieval system that helps users find relevant documents, records, products, posts, issues, places, or media based on a query.

The important word is relevant.

Databases already know how to retrieve data, so why do search systems exist?

Because most user-facing search is not exact lookup. It is approximate, text-heavy, fuzzy, relevance-ordered retrieval.

Examples:

a user types "noise cancel headphone" and expects products matching "noise-canceling headphones"
a developer searches GitHub issues for a phrase and expects typo tolerance, ranking, and permission-safe results
a job seeker searches roles by title, seniority, remote status, and location
a rider searches for a place in Uber and expects prefix matching, geospatial awareness, and ranking by context

2.2 Why Databases Alone Are Often Not Enough

Relational databases and standard secondary indexes are optimized for exact lookups, range scans, joins, and transactional workloads. They are not primarily optimized for large-scale full-text ranking.

If you try to implement serious search with only a transactional database, you quickly run into problems:

LIKE '%term%' queries do not scale well for large text corpora
phrase search and ranking are limited or expensive
stemming, synonyms, language analyzers, and typo tolerance are not first-class features in many OLTP systems
scoring millions of candidate documents with relevance functions is not what OLTP engines are optimized for
query patterns are highly varied and difficult to serve with normal B-tree indexes alone

2.3 Full-Text Search vs Exact Lookup

Dimension	Exact DB Lookup	Full-Text Search
Match type	Exact key, range, prefix in some cases	Token-based, phrase-based, fuzzy, semantic, ranked
Result ordering	Usually explicit sort order	Usually relevance first
Storage layout	Row-oriented or index-oriented for structured fields	Inverted indexes, postings, specialized scoring metadata
Common use	User by ID, order by timestamp	Search products, documents, issues, articles
Optimization target	Transactional correctness and predictable queries	Fast retrieval and ranking over large corpora

Interview shortcut: databases answer "Which rows satisfy these predicates?" Search systems answer "Which documents are most relevant to what the user probably meant?"

2.4 Architecture of a Search System

At a high level, production search systems split into two pipelines:

indexing pipeline: turns source data into searchable indexes
query pipeline: turns a user query into ranked results

flowchart LR
	SRC[Source of Truth<br/>DB / CMS / Event Log] --> ING[Ingestion]
	ING --> IDX[Index Build / Update]
	IDX --> SHARDS[Search Shards + Replicas]
	Q[User Query] --> API[Search API]
	API --> PARSE[Query Parsing / Rewrite]
	PARSE --> COORD[Coordinator]
	COORD --> SHARDS
	SHARDS --> RANK[Merge + Rank]
	RANK --> RES[Results]

The source of truth is usually not the search index itself. It is usually:

a relational DB
a document database
a content management system
a stream of events
a crawler pipeline in web search

The search index is a serving structure optimized for retrieval, not necessarily for canonical storage.

2.5 How Production Search Differs from Normal Database Queries

Production search systems usually have to solve problems that ordinary OLTP queries do not:

tokenization and normalization
ranking by multiple signals
approximate matching
synonym expansion
language-aware analysis
ACL or permission filtering
faceting and filtering at scale
scatter-gather across many shards
near-real-time indexing with eventual consistency
degraded behavior under partial shard failures

That is why many production systems use specialized engines such as Elasticsearch, OpenSearch, Solr, Vespa, Lucene-based services, or internal retrieval systems.

2.6 Query Pipeline

A realistic query pipeline often includes more than keyword matching.

flowchart LR
	Q[Raw Query] --> CLEAN[Normalize / Spell / Parse]
	CLEAN --> REWRITE[Synonyms / Query Rewrite / Intent Detection]
	REWRITE --> RET[Retrieval]
	RET --> FILT[Apply Filters]
	FILT --> LR[Lightweight Ranking]
	LR --> HR[Heavy Ranking / Business Rules]
	HR --> ACL[Permission Check / Result Shaping]
	ACL --> OUT[Final Results]

Important steps:

query normalization: lowercasing, punctuation handling, Unicode normalization
parser: interpret phrases, field-specific search, boolean operators, quoted strings
query rewrite: expand synonyms, fix spelling, map common variants
retrieval: find candidate documents quickly from the index
ranking: score candidates using lexical, behavioral, freshness, popularity, and quality signals
filtering: apply structured constraints such as category, location, price, permissions

2.7 Distributed Search Basics

Large search systems cannot keep all searchable data on one machine. So the index is partitioned into shards.

Common pattern:

documents are assigned to shards
each shard stores a local index
replicas provide availability and read capacity
a query coordinator fans the request out to relevant shards
each shard returns its top-k candidates
the coordinator merges them into the global top-k

This is called scatter-gather.

Challenges:

tail latency: the whole query waits for slow shards unless timeouts or degraded modes exist
score comparability: local shard scores may need consistent scoring logic so global merge is meaningful
hotspot shards: skewed data or popular terms can overload specific shards
rebalancing: adding capacity requires moving large index segments

2.8 Freshness vs Performance

Search freshness means how quickly updates in the source system appear in search results.

Users expect different freshness depending on domain:

social posts or breaking news: often seconds or near-real-time
product inventory and pricing: usually very fresh because stale search hurts conversion
document search in SaaS: usually seconds to minutes is acceptable depending on UX promises
code search on a giant corpus: sometimes modest indexing delay is acceptable if retrieval is fast and reliable

The tradeoff is that very fresh indexing increases write pressure and may reduce query efficiency.

Common tension:

frequent small segment updates improve freshness
large optimized segments improve query performance and compression

Many systems compromise with near-real-time indexing: documents become searchable quickly, while expensive segment merges happen asynchronously.

2.9 Consistency Challenges

Search is usually eventually consistent with the source of truth.

Typical failure cases:

DB write succeeded but indexing event was delayed
product deleted in DB still appears in search for a short period
permission change not reflected immediately, risking data leakage if ACL filtering is wrong
inventory count in search is stale while checkout uses the DB

Best practice:

treat the DB or source system as the correctness authority
do not let search be the final source for money, inventory reservation, or permissions
apply final correctness checks in downstream business logic when it matters

GitHub-like systems care deeply about permission-safe search. Returning a private issue, repo, or file in search is worse than returning no result.

2.10 Search Latency and Fault Tolerance

Search systems are interactive systems, so they need graceful degradation.

Common patterns:

shard replicas for availability
coordinator timeouts to avoid waiting forever on a straggler shard
degraded results if one replica set is temporarily unavailable
hot query caching for common requests
precomputed filter bitsets or caches for expensive constraints
monitoring p50, p95, p99 separately because tail latency matters more than averages

Interview note: if an interviewer asks about fault tolerance, discuss replicas, partial results, timeouts, retry behavior, and stale indexes. "We have backups" is not the answer for serving systems.

2.11 Common Mistakes

treating search as just another SQL query layer
making the search index the source of truth for critical writes
ignoring permission filtering until late in the design
underestimating reindexing cost after analyzer changes
focusing only on retrieval and forgetting ranking quality

3. Indexing

3.1 Why Search Indexing Exists

Search indexing exists because scanning every document for every query is too slow.

The core idea is preprocessing.

Instead of asking, "Which documents contain this term?" by reading the whole corpus repeatedly, the system builds a data structure ahead of time that answers that question quickly.

That preprocessing step is indexing.

3.2 Document Ingestion Pipeline

Indexing is usually a pipeline, not a single write.

flowchart LR
	DB[DB / Source Records] --> EVT[CDC / Event Stream / Batch Export]
	EVT --> EXTRACT[Extract Fields]
	EXTRACT --> ANALYZE[Tokenize / Normalize / Language Analysis]
	ANALYZE --> ENRICH[Synonyms / ACLs / Metadata / Quality Signals]
	ENRICH --> BUILD[Build or Update Index Segments]
	BUILD --> REPL[Replicate / Refresh Search Nodes]
	REPL --> SERVE[Search Serving]

Documents often need field-specific handling:

title may get higher weight
tags may be exact or lightly analyzed
description may use full stemming and stop-word removal
permissions and category fields may be stored for filtering
freshness timestamps may be stored for ranking

3.3 Tokenization

Tokenization breaks text into searchable units called tokens.

Examples:

"wireless headphones" -> wireless, headphones
"foo-bar" may become foo, bar, or foo-bar depending on analyzer design
East Asian languages may require dictionary-based or statistical segmentation rather than whitespace splitting

Why this matters:

What counts as a token determines what can be retrieved.

Poor tokenization causes obvious product bugs:

searching "e-mail" fails to match "email"
searching code symbols breaks because punctuation handling is wrong
searching C++ or C# fails because analyzers stripped important characters

Production systems often use different analyzers for different fields and languages.

3.4 Normalization

Normalization makes equivalent text forms consistent.

Common steps:

lowercase conversion
Unicode normalization
accent folding in some products
punctuation normalization
whitespace collapsing

Without normalization, simple variants become separate search terms and retrieval quality drops.

3.5 Stemming and Lemmatization

Stemming reduces words to a base form so related terms match.

Examples:

running, runs, ran may reduce toward a common root
connect, connected, connection may become more retrievable together

Why it exists:

Users usually care about concept matching, not exact inflected forms.

Tradeoff:

aggressive stemming increases recall
too much stemming can hurt precision by conflating different meanings

Not every domain wants stemming. Code search, SKU search, names, and legal text often need more exact handling.

3.6 Stop Words

Stop words are very common words such as "the", "a", or "of" that may add little value to retrieval.

Why they are sometimes removed:

they occur in many documents
they increase index size
they often do not help ranking

Why they are sometimes kept:

phrase search needs them
some queries depend on them
domain-specific language may make them important

Example: "to be or not to be" or song titles need precise handling.

3.7 Synonyms

Synonyms allow related terms to match the same concept.

Examples:

tv and television
hoodie and sweatshirt
software engineer and developer in some job systems
nyc and new york city

Synonyms are powerful and dangerous.

They improve recall, but bad synonym rules can create surprising results. Expanding apple to fruit and company naively is an obvious relevance bug.

Production systems usually treat synonyms as curated domain knowledge, not as a casual text feature.

3.8 Language Handling Basics

Multi-language search is not just translation.

Language handling may include:

language detection
per-language analyzers
script normalization
stemming rules per language
tokenization strategies for languages without whitespace delimiters
query rewriting or synonym dictionaries per locale

Global products such as Google, Amazon, YouTube, and large SaaS tools need language-aware indexing because naive English-centric analysis fails internationally.

3.9 Incremental Indexing

Rebuilding the entire index on every document change is impossible at scale. So production systems do incremental indexing.

Typical process:

source record changes
an event or CDC record is emitted
indexer fetches or receives the updated document
changed fields are reanalyzed
the document is added, updated, or tombstoned in the index

Common challenges:

event duplication
out-of-order updates
delete propagation
retries causing duplicate work
partial failure between DB write and index update

Idempotent indexing pipelines matter a lot.

3.10 Near Real-Time Indexing

Many search engines are near real-time rather than strictly real-time.

That means:

writes become searchable after a short delay
index refresh is decoupled from durable storage operations
background merges optimize segments later

This design keeps query latency reasonable while preserving acceptable freshness.

For products like issue search, product catalogs, and SaaS document search, near-real-time indexing is often the right tradeoff.

3.11 Reindexing Challenges

Reindexing is one of the biggest operational realities in search.

You need full reindexing when:

analyzer rules change
synonym logic changes significantly
field weights or schema design changes
permissions model changes
you move to a new index version

Why it is hard:

large corpora take time to rebuild
dual-running old and new indexes increases cost
cutover must avoid downtime and bad ranking regressions
stale or missing events during rebuild can corrupt freshness

Common strategies:

build a new index version in parallel
backfill from source of truth
replay recent events after the backfill window
run shadow reads or compare sample queries
switch traffic gradually

This is similar to blue-green deployment for search data.

3.12 Database Indexes vs Search Indexes

Dimension	Database Index	Search Index
Primary goal	Speed up structured lookups and range queries	Speed up text retrieval and relevance-ranked retrieval
Typical structure	B-tree, hash, LSM-related structures	Inverted index, postings, term dictionaries, doc values
Query style	Predicates on fields	Query terms, phrases, fuzzy matching, ranking
Source of truth role	Often part of the canonical DB	Usually derived from another source of truth
Update pattern	Tight coupling with DB writes	Often async or near-real-time
Ranking support	Limited compared with search engines	Central purpose of the system

3.13 Best Practices

keep indexing idempotent and replay-safe
separate source of truth from serving index
version analyzers and schemas explicitly
measure freshness lag, not just query latency
treat reindexing as a normal operational workflow, not an emergency-only task

4. Inverted Index

4.1 What an Inverted Index Is

An inverted index maps each term to the documents that contain it.

Instead of storing documents and asking, "Which terms are inside this document?" the system stores terms and asks, "Which documents contain this term?"

That inversion is what makes large-scale text retrieval efficient.

4.2 Why It Powers Most Search Systems

Most keyword search systems need to answer queries like:

which documents contain wireless
which documents contain both wireless and headphones
which documents contain the exact phrase noise cancelling

If you have a term-to-document mapping, you can answer these queries much faster than scanning all documents.

4.3 Term -> Document Mapping

A simple example:

Documents:

D1: "wireless noise cancelling headphones"
D2: "wired gaming headset"
D3: "wireless earbuds with case"

Inverted view:

wireless -> D1, D3
noise -> D1
cancelling -> D1
headphones -> D1
wired -> D2
gaming -> D2
headset -> D2
earbuds -> D3
case -> D3

The list of document IDs for a term is called a postings list.

4.4 Postings Lists

A postings list usually stores more than just document IDs.

It may include:

document ID
term frequency in the document
positions of the term inside the document
field information such as title vs body
payloads or extra per-hit metadata in some engines

Why extra metadata matters:

term frequency helps ranking
positions enable phrase and proximity search
field data enables field weighting

4.5 Positional Indexes and Phrase Search

If the system stores term positions, it can support phrase queries.

Example:

D1: "machine learning systems"
D2: "systems for machine translation learning"

Searching for the phrase "machine learning" should strongly prefer D1.

Without positions, the engine only knows both terms exist. With positions, it knows whether they appear adjacent and in the correct order.

4.6 Boolean Search

Boolean search combines postings lists.

Examples:

A AND B: intersect postings lists
A OR B: union postings lists
A NOT B: subtract postings lists

This is one reason inverted indexes are fast: set operations on sorted document ID lists are efficient.

4.7 Compression Basics

Postings lists can be huge, so compression matters.

Common ideas:

store sorted document IDs and compress gaps between them instead of raw IDs
use variable-length integer encoding
group postings into blocks
add skip pointers or skip blocks so the engine can jump ahead during intersections

Compression improves memory and disk efficiency, and often query speed too because less data must be read.

4.8 Distributed Inverted Indexes

At large scale, the index is partitioned across many nodes.

Partitioning approaches:

document partitioning: each shard stores all terms for a subset of documents
term partitioning: less common in many general-purpose serving systems, but conceptually possible for some specialized workloads

Document partitioning is common because it simplifies writes and local scoring.

The coordinator sends the query to all relevant shards, and each shard computes local top results using its local inverted index.

4.9 How Retrieval Is Fast in Practice

Suppose the query is wireless headphones.

The search engine typically:

normalizes the query
finds postings for wireless
finds postings for headphones
intersects or otherwise combines candidate sets
uses frequency, field boosts, positions, and ranking signals to score candidates
returns only the top few results

The system does not score the whole corpus. It narrows aggressively using the inverted index first.

That is why retrieval and ranking are separated.

4.10 Common Failure Cases

very common terms create long postings lists and high query cost
badly chosen analyzers create index bloat
large positional indexes improve quality but increase storage
hotspot terms create shard imbalance
deletes and updates create segment fragmentation until merges clean things up

4.11 Interview Angle

If asked to explain an inverted index, keep it simple:

"It is a term-to-document lookup structure. Instead of scanning every document on each query, the engine jumps directly from the query terms to candidate documents through postings lists. Positional metadata enables phrase search, and compressed postings plus shard-level retrieval keep it fast at scale."

5. Autocomplete

5.1 Why Autocomplete Exists

Autocomplete reduces typing effort, helps users express intent, and increases query success.

It is one of the highest-leverage search UX features because it helps before the actual search even runs.

Good autocomplete does several things:

speeds up input
corrects or guides query formulation
exposes popular intents
reduces zero-result searches
nudges users toward query structures the backend handles well

Amazon-style search boxes, Google suggestions, GitHub issue filters, and SaaS global search bars all rely on autocomplete.

5.2 Prefix Matching

The simplest autocomplete form is prefix matching.

If the user types wire, the system returns suggestions starting with that prefix:

wireless headphones
wireless mouse
wired headset

Prefix matching is attractive because it is conceptually simple and fast.

5.3 Trie Basics

A trie is a tree where each edge represents a character or token prefix.

Why tries are useful:

prefixes share storage
prefix lookups are fast
top suggestions can be stored or aggregated at intermediate nodes

flowchart TD
	ROOT[Root] --> W[w]
	W --> WI[wi]
	WI --> WIR[wir]
	WIR --> WIRE[wire]
	WIRE --> WIREL[wirel]
	WIREL --> WIRELE[wirele]
	WIRELE --> WIRELES[wireles]
	WIRELES --> WIRELESS[wireless]
	WIRELESS --> H[wireless headphones]
	WIRELESS --> M[wireless mouse]

At scale, production systems usually store compacted tries or other optimized prefix structures rather than naive character-by-character trees.

5.4 N-gram Approaches

Autocomplete is not always solved with a trie.

N-gram indexing can help with:

substring matching
typo tolerance
matching mid-word fragments
languages or domains where token boundaries are tricky

Tradeoff:

n-grams improve recall and flexibility
they increase index size significantly
they may add noise if scoring is weak

5.5 Popularity-Based Suggestions

Not every valid prefix completion should be shown.

Usually suggestions are ranked by popularity and usefulness.

Signals may include:

historical query frequency
click-through rate after suggestion selection
conversion rate in commerce systems
recency or trending status
user-specific history

Example:

If millions of users search for iphone charger, that suggestion should likely outrank a rare but lexically valid completion.

5.6 Recent Searches and Personalization

Autocomplete often mixes multiple sources:

global popular suggestions
user's own recent searches
session context
personalized entities such as repos, docs, contacts, or previous products viewed

GitHub-like enterprise search or SaaS admin dashboards often use personalization heavily because each user's accessible universe is different.

5.7 Typo Tolerance Basics

Users make mistakes while typing. Production autocomplete systems usually include typo handling such as:

edit-distance based correction
keyboard-neighbor heuristics
common misspelling dictionaries
phonetic or transliteration support in some markets

The challenge is latency.

Autocomplete does not have much time budget, so typo tolerance must be efficient and bounded.

5.8 Caching Strategies

Autocomplete traffic is extremely cache-friendly because prefixes repeat heavily.

Common patterns:

cache hot prefixes in memory
use CDN or edge caching for anonymous popular suggestions where acceptable
keep top-k suggestions per prefix precomputed
debounce client requests to avoid one request per keystroke

5.9 Large-Scale Production Considerations

Large autocomplete systems often need to handle:

huge prefix skew on popular queries
language and locale differences
abuse or bot traffic
personalized suggestions that reduce cacheability
freshness for trending terms
safe filtering of prohibited or low-quality suggestions

A common design is hybrid:

static or precomputed prefix data for speed
online popularity updates for freshness
user history overlay for personalization

5.10 Common Mistakes

sending requests on every keystroke without debounce
returning lexically valid suggestions with poor user value
ignoring abuse and suggestion poisoning
making autocomplete depend on expensive full ranking pipelines

6. Filtering

6.1 What Filtering Is

Filtering narrows results using structured constraints.

Examples:

price between 50 and 100
remote jobs only
category = laptops
flights with one stop or fewer
issues labeled bug
only repositories the user can access

Filtering is not a side feature. In many business systems it is central.

For some search experiences, the user query is weak and filters carry most of the actual intent.

6.2 Structured Filters

Structured filters work over fields whose semantics are known.

Examples:

numeric ranges
enums or categories
dates
geo constraints
booleans such as in-stock only

These differ from text ranking because they are usually precise constraints rather than fuzzy matches.

6.3 Faceted Search

Facets show result breakdowns by filter values.

Example in e-commerce:

brand counts
color counts
price buckets
availability counts

Why facets matter:

they help users refine large result sets
they reveal the shape of the catalog
they guide discovery without requiring new queries

The challenge is performance. Facet counts can be expensive, especially when the base query is broad and filters are combined interactively.

6.4 Range Filters

Range filters are common for:

price
salary
rating
departure time
file size
timestamps

They often need specialized data structures or optimized field storage because arbitrary numeric range scans over huge result sets can be expensive.

6.5 Filtering and Ranking Interaction

Filtering and ranking interact more than beginners expect.

If you filter too early, you may remove items that could have been relevant under softer criteria.

If you filter too late, you may waste ranking work on documents that cannot be shown.

Strategy	Good for	Risk
Pre-filter before ranking	Hard constraints such as ACLs, category limits, geography, inventory	Can shrink candidate set too aggressively if constraints are loose or noisy
Post-filter after retrieval or early ranking	Soft presentation rules, some UI-level shaping	Wastes work and may leave too few valid final results

Common practice:

apply hard constraints early
apply softer business shaping later

6.6 Filter Performance Optimization

Common techniques:

store filterable fields in efficient columnar or doc-values structures
build bitmap or bitset representations for high-volume facets
cache frequent filter combinations
precompute common facet counts where practical
use approximate counts if exactness is not required for UX

ACL filtering is especially important. If the user should not see a document, that filter should behave as a hard constraint and should be efficient.

6.7 Real-World Examples

E-commerce:

category, brand, price, availability, shipping speed, seller, ratings

Job boards:

location, remote, salary range, experience level, company size, visa support

Travel search:

stops, departure window, airline, baggage, refund policy, hotel rating, neighborhood

These products often spend as much engineering effort on filter performance and facet correctness as on keyword matching.

6.8 Common Failure Cases

facet counts computed on stale or mismatched indexes
filters applied after ranking causing irrelevant or empty pages
high-cardinality filters destroying cache hit rate
permission filtering bolted on late and leaking data

7. Ranking

7.1 Why Ranking Matters

Retrieval answers "what could match". Ranking answers "what should be shown first".

Ranking matters because users rarely inspect many results.

If the best result is not in the first few positions, the system feels wrong even if it technically retrieved the right document somewhere deeper in the list.

7.2 Why Ranking Is Usually Multi-Stage

Ranking is usually multi-stage because expensive models and business logic cannot run on the whole corpus.

Typical shape:

retrieve a broad candidate set cheaply
apply lightweight ranking to reduce candidates
apply heavier ranking on a smaller set
apply final business rules, diversity, sponsorship, and presentation logic

flowchart LR
	Q[Query / Context] --> RET[Retrieval: thousands]
	RET --> L1[Stage 1 Ranker: hundreds]
	L1 --> L2[Stage 2 Ranker: tens]
	L2 --> BR[Business Rules / Diversity / Ads]
	BR --> UI[Final Ordered Results]

7.3 Retrieval vs Ranking

This separation is critical.

Retrieval is optimized for recall and speed.

Ranking is optimized for precision and utility.

If retrieval misses a relevant item entirely, ranking cannot recover it.

If retrieval returns too many weak candidates, ranking becomes expensive and noisy.

7.4 Relevance Ranking vs Business Ranking

Production ranking is rarely pure relevance.

In e-commerce, a result order may consider:

text relevance
inventory availability
margin or business priority
fulfillment speed
review quality
return rate
seller trust
sponsored placements

In job search:

lexical match
application likelihood
compensation quality
recency
employer quality
geographic fit

In a SaaS global search:

text relevance
recency of the document
document type priority
ownership or collaboration strength

7.5 Common Ranking Signals

Signals often include:

lexical relevance: term match, phrase match, field boosts
popularity: clicks, purchases, views, stars, installs
freshness: newer items may deserve higher weight in some surfaces
engagement: dwell time, completion, watch time, saves, shares
quality: seller quality, document quality, content safety
trust: verified sources, low spam risk, low abuse signals
context: location, device, language, current session intent

7.6 Diversity Constraints

Blindly ranking by one score can produce repetitive or unhealthy outputs.

Examples:

ten nearly identical products from one seller
a feed dominated by one creator
a job result page filled with duplicates from the same company

Diversity rules improve the experience by spreading exposure across:

categories
sellers
creators
content types
freshness buckets

This is especially important for feeds and discovery systems.

7.7 Sponsored Content Considerations

Sponsored results complicate ranking because monetization and relevance must coexist.

Strong systems separate:

auction or eligibility logic
relevance constraints
sponsored placement policies
disclosure and compliance requirements

Bad design either destroys relevance or leaves too much money on the table.

7.8 Failure Cases

optimizing only CTR and creating clickbait
overusing popularity so incumbents dominate forever
boosting freshness too much and burying authoritative content
letting one business rule overwhelm all relevance signals
failing to monitor ranking regressions after model changes

7.9 Best Practices

make ranking multi-stage
separate hard eligibility from soft scoring
log ranking features and decisions for debugging
evaluate quality with offline metrics and online experiments
protect the system from feedback loops that only reward already-popular items

8. Relevance Scoring

8.1 What Relevance Scoring Means

Relevance scoring is how the system estimates how well a result matches the user's need.

There is no single universal score. Real systems combine multiple signals.

8.2 TF-IDF Basics

TF-IDF is one of the classic ideas in lexical search.

Intuition:

terms that appear often in a document may be important to that document
terms that appear in many documents are less discriminative across the corpus

One simple form is:


TF\text{-}IDF(t,d) = \text{TF}(t,d) \cdot \log\left(\frac{N}{\text{DF}(t)}\right)

Where:

\text{TF}(t,d) is term frequency of term t in document d
\text{DF}(t) is the number of documents containing t
N is the total number of documents

Why it matters:

Rare but present terms are usually more informative than extremely common terms.

8.3 BM25 Basics

BM25 is a practical ranking function widely used in lexical search systems.

It improves on simpler TF-IDF variants by handling:

term frequency saturation
document length normalization

A common form is:


BM25(q,d)=\sum_{t \in q} \text{IDF}(t) \cdot \frac{f(t,d)(k_1+1)}{f(t,d)+k_1\left(1-b+b\cdot \frac{|d|}{\text{avgdl}}\right)}

You do not need to memorize the formula in interviews, but you should know the intuition:

more occurrences help, but not linearly forever
long documents should not win just because they contain more words

8.4 Semantic Relevance Basics

Lexical matching is powerful but limited.

Semantic relevance tries to capture meaning, not just exact token overlap.

Examples:

sofa matching couch
software engineer matching backend developer
a support query matching a knowledge base article with similar meaning but different wording

Common production pattern today is hybrid retrieval:

lexical retrieval for precision and exact matching
semantic retrieval or re-ranking for meaning-based matches

Why hybrid is common:

lexical search handles exact identifiers, codes, names, and rare terms well
semantic methods handle paraphrases and intent better
using both reduces the weaknesses of either alone

8.5 Behavioral Signals

Relevance is not only about text.

Behavioral signals often matter:

clicks
dwell time
add-to-cart rate
purchase rate
save rate
watch completion
query reformulations after a click

These signals help the system learn which results users actually found useful.

But they are noisy. Clicked does not always mean satisfied.

8.6 Quality, Trust, and Spam Prevention

High relevance is not enough if the content is low quality or abusive.

Production systems often include additional scoring dimensions:

content quality scores
seller or source trust
spam or fraud risk
policy safety scores
freshness or staleness penalties

Examples:

Amazon-like marketplaces must suppress spammy or low-trust listings
GitHub-like search may downrank spam repositories or abusive content
YouTube-like platforms need safety and trust constraints around recommendations

8.7 Balancing Relevance vs Business Goals

"Best result" in production usually means the best result under multiple objectives.

These can include:

lexical match quality
user satisfaction
engagement
monetization
content safety
fairness or exposure goals
freshness

This is why ranking discussions often become multi-objective optimization discussions.

8.8 Interview Framing

If asked how a search engine decides the best result, a strong answer is:

"It usually starts with lexical or hybrid retrieval to get candidates, then uses a relevance score combining term signals such as BM25, field boosts, freshness, popularity, behavioral feedback, quality signals, and business constraints. The final order is almost never based on one score alone."

9. Recommendation System Overview

9.1 What Recommendation Systems Are

Recommendation systems choose what to show users when the user did not explicitly ask for a specific query.

They power:

Netflix homepages
YouTube next videos
TikTok For You feeds
Amazon "Customers also bought"
Instagram and X home feeds
SaaS dashboards showing suggested docs, tasks, or entities

9.2 Why They Exist

The internet has too much content. Most users will not search for everything they could care about.

Recommendation systems help with discovery by predicting relevance from behavior, similarity, context, and popularity.

Search solves explicit intent. Recommendation solves hidden intent.

9.3 Online vs Offline Recommendation

Dimension	Offline	Online
When computed	Batch or scheduled jobs	At request time or near request time
Good for	Heavy model training, embeddings, broad candidate pools	Fresh context, session adaptation, final ranking
Tradeoff	Efficient at scale but stale	Fresh but latency-sensitive

Most production systems use both.

Example:

offline jobs compute user embeddings, item embeddings, similarity graphs, creator clusters, trending statistics
online systems use current session behavior, freshness, and context to rank a page right now

9.4 Cold Start Problem

Cold start means the system has too little data.

Two forms:

new user: little or no history
new item or creator: little or no interaction data

Common mitigations:

use popularity and trending signals
use content-based features
use onboarding preferences
use location, language, device, and session context
provide exploration slots so new items can earn data

TikTok-like and YouTube-like systems care deeply about new content discovery. If only established content wins, the ecosystem becomes stale.

9.5 Feedback Loops

Recommendation systems influence behavior, so they create feedback loops.

If the system shows content, it gets more interaction data on that content, which can cause it to rank even higher.

This can be useful, but it can also create runaway popularity bias.

Examples of risks:

rich-get-richer exposure
narrow content bubbles
overfitting to clickbait
suppressing new creators

9.6 Exploration vs Exploitation

Recommendation systems constantly balance:

exploitation: show what is most likely to perform well now
exploration: show some uncertain or new content to learn more and avoid stagnation

If you only exploit, the system becomes conservative and may miss better items.

If you explore too much, user experience degrades.

9.7 Engagement vs Quality Tradeoffs

Not every engagement signal maps to long-term product quality.

High click-through or short-term watch time may conflict with:

satisfaction
trust
creator ecosystem health
safety
retention quality

This is a central real-world discussion in recommendation systems.

9.8 Example Patterns

Netflix:

heavy personalization by row and by title ranking
strong use of offline signals plus contextual ranking

YouTube:

massive candidate generation followed by multi-stage ranking
strong importance of watch time, satisfaction, freshness, safety

TikTok:

short-term session signals matter heavily
content and user embeddings are critical for rapid personalization

Amazon:

recommendations mix collaborative signals, co-purchase graphs, browse history, price sensitivity, and business objectives

10. Ranking in Recommendation Systems

10.1 Candidate Ranking

Recommendation ranking usually happens after candidate generation. The candidate pool may already be hundreds or thousands of items rather than millions.

The ranker then decides ordering using user, item, and context features.

10.2 Multi-Stage Ranking

The same multi-stage logic from search applies, but recommendation models are often more feature-heavy.

Typical pipeline:

generate candidates from many sources
apply a lightweight ranker to remove obvious weak candidates
apply a heavier model to a smaller candidate set
apply diversity, policy, and business constraints

flowchart LR
	CTX[User + Session + Context] --> CG[Candidate Sources]
	CG --> CANDS[Candidate Pool]
	CANDS --> FAST[Fast Ranker]
	FAST --> HEAVY[Heavy Ranker]
	HEAVY --> MIX[Diversity / Fairness / Ads / Policy]
	MIX --> FEED[Final Feed]

10.3 Lightweight vs Heavy Rankers

Fast rankers may use:

simple feature transforms
linear models
shallow trees
small neural models

Heavy rankers may use:

deeper neural models
sequence models over session history
expensive cross-feature interactions
richer content understanding features

The reason for multiple rankers is simple: the most accurate model is often too expensive to run on too many candidates.

10.4 ML Ranking Basics

ML rankers usually learn from historical interactions.

Common labels or targets:

click
long dwell time
like or save
watch completion
add to cart
purchase
hide or negative feedback

The hardest part is not fitting a model. It is defining the right objective and handling bias in logged data.

10.5 Online Ranking Decisions

At request time, the system may incorporate:

current session signals
latest follows or interactions
freshness windows
user device and network conditions
time of day or location
safety or rate-limit decisions

This is why recommendation systems are rarely fully precomputed.

10.6 Delayed Feedback Challenges

Some outcomes arrive late.

Examples:

purchases happen long after an impression
subscription retention takes days or weeks
satisfaction surveys are sparse

This creates training and evaluation problems because immediate clicks are easy to measure, but long-term satisfaction is harder.

10.7 Fairness and Creator Fairness

Real platforms often need fairness constraints such as:

not letting one creator dominate all slots
giving new creators a chance to gather signal
balancing exposure across categories or sellers
avoiding discrimination in jobs, housing, lending, or other regulated domains

These are not only ethics topics. They are product-health topics.

10.8 Freshness vs Relevance

Older content may have stronger engagement history. Newer content may be more timely.

Different products choose differently:

news and social updates care heavily about freshness
evergreen education or documentation may prioritize authority over recency
commerce often needs a mix of demand history and current availability

11. Fanout

11.1 What Fanout Means

Fanout is the process of distributing content references to the users who may see them.

This is most commonly discussed for social feeds.

If a user posts something, how do followers get it into their home timelines?

11.2 Fanout-on-Write

In fanout-on-write, when a user creates a post, the system pushes that post reference into follower timelines immediately or soon after write time.

Why it exists:

fast feed reads for ordinary users
precomputed per-user timelines

Tradeoff:

very expensive for users with huge follower counts
write amplification can be enormous

11.3 Fanout-on-Read

In fanout-on-read, when a user opens the app, the system fetches posts from followed accounts and constructs the feed at read time.

Why it exists:

avoids massive write amplification
handles high-fanout authors better

Tradeoff:

more expensive reads
more complex low-latency ranking at request time

11.4 Hybrid Models

Real systems often use hybrid approaches.

Common pattern:

ordinary users: fanout-on-write
celebrity or mega-scale accounts: fanout-on-read or special handling

This is the classic celebrity problem.

11.5 Fanout-on-Write vs Fanout-on-Read

Dimension	Fanout-on-Write	Fanout-on-Read
Write cost	High	Lower
Read cost	Lower	Higher
Good for	Many readers with modest graph sizes	Large fanout creators and flexible ranking
Freshness control	Good if timelines update quickly	Good if reads fetch current data
Complexity	Simpler reads, harder writes	Harder reads, simpler writes

flowchart LR
	P[Post Created] --> DIST[Distribution Service]
	DIST --> FW[Write to Follower Timelines]
	DIST --> HOT[Mark Celebrity Posts for Read-Time Fetch]
	U[User Opens Feed] --> FEED[Feed Service]
	FEED --> TL[(Timeline Cache)]
	FEED --> HOT
	TL --> RANK[Rank + Merge]
	HOT --> RANK
	RANK --> OUT[Feed Page]

11.7 Cache Invalidation Challenges

Feed systems must handle:

deleted posts
blocked users
privacy changes
edited content
ranking model changes

If timelines are heavily cached or precomputed, invalidation becomes hard.

11.8 Common Failure Cases

pushing celebrity posts to millions of timelines and overwhelming storage or queues
expensive read-time joins across too many sources
duplicated items because of retries or hybrid merge bugs
stale deleted content because cache invalidation lagged

11.9 Interview Angle

If asked to design a social feed, always discuss fanout strategy. That is one of the main architectural decisions.

12. Candidate Generation

12.1 Why Candidate Generation Exists

Recommendation ranking does not start from the whole corpus. It starts from a narrowed candidate pool.

Why?

Because ranking millions or billions of items per request is impossible.

Candidate generation reduces the problem from "everything" to "a few hundred or thousand promising items".

12.2 Common Candidate Sources

Production systems often combine many candidate sources:

collaborative filtering
content-based similarity
social graph neighbors
trending or popular items
creator-follow graph
embedding nearest neighbors
recently interacted entities
business-curated pools

The union of these sources becomes the candidate pool.

12.3 Collaborative Filtering Basics

Collaborative filtering uses behavioral similarity.

Core intuition:

users who behaved similarly in the past may like similar things in the future
items that co-occur in behavior may be related

Examples:

users who bought product A often buy product B
users who watched show X often watch show Y

Amazon-style "customers also bought" is the classic mental model.

12.4 Content-Based Filtering Basics

Content-based filtering uses item attributes.

Examples:

recommend jobs similar to jobs a user previously clicked
recommend articles with similar topics or embeddings
recommend videos with related audio, captions, or visual features

This is useful for cold start because it does not depend entirely on historical interaction volume.

12.5 Graph-Based Candidates

Graph-based candidates come from relationships:

who the user follows
who similar users follow
authors frequently co-engaged by the same audience
co-starred repos or linked documents

Graph candidates are common in social apps, GitHub-like collaboration systems, and commerce systems with co-view or co-purchase graphs.

Trending pools capture global or local momentum.

Why they matter:

they solve some cold start problems
they inject freshness
they expose popular content without deep personalization

But trending alone is not personalization.

12.7 Embedding Retrieval and ANN Basics

Embeddings map users and items into vector spaces where similar concepts are close together.

This enables nearest-neighbor retrieval:

find items near the user's embedding
find items near the current session embedding
find items similar to the current content being viewed

Exact nearest-neighbor search can be expensive at scale, so many systems use approximate nearest neighbor, or ANN, techniques.

High-level idea:

use structures that avoid comparing against every vector
trade a little exactness for large speed gains

This is often good enough for candidate generation because ranking happens later.

12.8 Why Ranking Starts After Candidate Generation

Candidate generation is about recall and breadth.

Ranking is about precision and ordering.

If you skip candidate generation, ranking is too expensive.

If candidate generation is poor, ranking quality is capped.

This boundary is one of the most important concepts in modern recommendation systems.

13. Personalization

13.1 What Personalization Means

Personalization means the same corpus produces different results for different users.

Examples:

two users searching the same marketplace query may see different ranking orders
two Netflix users get different homepages
two SaaS users searching the same global search term see different accessible docs and different likely hits

13.2 User Profiles

A personalization profile may include:

long-term interests
recent interactions
follows, subscriptions, or teams
geographic preferences
device/network patterns
explicit preferences
negative feedback and muted topics

Profiles are often built from both online and offline signals.

13.3 Implicit vs Explicit Feedback

Explicit feedback:

likes, ratings, follows, saves, thumbs up, manual preferences

Implicit feedback:

clicks, dwell time, purchases, watch completion, skips, repeats, hides

Implicit data is abundant but noisy. Explicit data is sparse but clearer.

13.4 Long-Term vs Short-Term Interests

Long-term interests represent stable tastes.

Short-term interests capture immediate intent.

Example:

a user generally likes backend engineering content
this week the user is specifically searching for Redis and search systems

TikTok-like feeds often weight short-term session intent heavily. Netflix-like systems also care about context, but long-term taste matters more for broad discovery.

13.5 Contextual Ranking

Ranking may depend on context such as:

time of day
current page or query
device type
network speed
location
current session sequence

Example:

low-bandwidth users may get lower-bitrate or different video choices
local services like Uber care strongly about geography and current location

13.6 Privacy Considerations

Personalization raises privacy questions:

how much behavioral data is stored
how long it is retained
whether sensitive attributes are inferred
whether users can opt out or reset personalization
whether data is used across products or only within one surface

Production systems need data minimization, retention controls, access controls, and often region-specific compliance behavior.

13.7 Explainability Basics

Explainability means giving a human-understandable reason for some results.

Examples:

"Because you watched..."
"Suggested because you follow..."
"Related to your recent searches"

Explainability is useful for trust, debugging, and product feedback even if the underlying model is more complex than the explanation suggests.

13.8 Common Mistakes

overfitting to recent clicks and making feeds unstable
ignoring negative feedback
overpersonalizing so much that exploration disappears
using sensitive data carelessly

14. Feed Generation

14.1 What Feed Generation Is

Feed generation is the process of deciding what appears in a user's home timeline or discovery surface.

This is one of the hardest system design topics because it combines:

graph data
recommendation ranking
caching
fanout strategy
freshness
pagination
abuse controls
content safety

14.2 Home Feed Architecture

flowchart LR
	U[User Opens App] --> FEED[Feed Service]
	FEED --> PROF[Profile / Session Service]
	FEED --> SOURCES[Candidate Sources]
	SOURCES --> FOLLOW[Following Graph]
	SOURCES --> TREND[Trending]
	SOURCES --> EMB[Embedding Retrieval]
	SOURCES --> CACHE[(Timeline / Candidate Cache)]
	SOURCES --> RANK[Ranking Service]
	RANK --> HYD[Hydration / Metadata Fetch]
	HYD --> PAGE[Paginated Response]

14.3 Cache Layers

Feed systems may cache:

precomputed timelines
candidate pools
ranking features
hydrated entity metadata
first page responses for very hot users or anonymous feeds

Caching helps, but cache invalidation is hard because feed contents change frequently and are personalized.

14.4 Pagination Challenges

Pagination in feeds is not as simple as offset and limit.

Problems with offset-based pagination:

feed contents change between requests
inserts at the top shift offsets
duplicates or gaps appear

Cursor-based pagination is usually better.

But even cursor pagination is tricky if the ranking model is highly dynamic.

14.5 Consistency vs Freshness

Users want fresh content, but highly dynamic feed generation can lead to inconsistent paging and repeated items.

Common compromise:

freeze a short-lived ranked window for a session or cursor
refresh when the user pulls to refresh or a new session begins

14.6 Backfill Strategies

Backfill means what to show when the natural candidate pool is sparse.

Examples:

new user with few follows
quiet time period with not enough new content
strict filters remove many candidates

Backfill sources may include:

trending content
suggested accounts or topics
evergreen content
sponsored content under policy rules

14.7 Ranking at Read Time vs Write Time

Write-time ranking:

rank or partially prepare content as it is distributed
fast reads
less flexible when user state changes

Read-time ranking:

more personalized and fresh
more expensive and latency-sensitive

Many systems use hybrid approaches: precompute easy parts, rank final candidates at read time.

14.8 Product Styles

X or Twitter-like following feed:

strong graph component
hybrid fanout patterns
freshness matters heavily

Instagram-like home feed:

mix of follow graph, engagement prediction, and recommendations
strong importance of re-ranking and diversity

TikTok-like For You feed:

candidate generation from broad corpus, not only follows
strong session-based ranking and content understanding

14.9 Common Failure Cases

stale caches showing deleted or blocked content
duplicated items across pages
expensive read-time ranking melting the service during traffic spikes
feedback loops making the feed monotonous
ranking bugs that over-prioritize one creator or content type

15. Object Storage

15.1 What Object Storage Is

Object storage stores data as objects, each usually accessed by a key within a bucket or namespace.

An object typically contains:

the binary content
metadata
a key or name

Object storage is the default storage layer for large unstructured blobs such as:

images
videos
PDFs
archives
backups
logs
user uploads

15.2 Why It Exists

Databases are good for structured records. Local disks are tied to one machine. Traditional file systems provide hierarchical paths and POSIX-like semantics.

Object storage exists because internet-scale systems need:

massive scale
high durability
relatively simple access patterns
low operational burden per file
cost-effective storage of large blobs

15.3 Object Storage vs Block Storage vs File Systems

| Dimension | Object Storage | Block Storage | File System | |---|---|---| | Interface | Key/object API | Raw blocks attached to machines | Files and directories | | Typical use | Media, backups, logs, attachments | Databases, VM disks, low-level persistent volumes | Shared files, app files, local hierarchical access | | Scaling model | Very large namespaces, distributed service | Usually attached volumes per instance or host | Depends on file system implementation | | Mutation model | Often write whole objects or multipart operations | Fine-grained block updates | File operations with richer semantics | | Strength | Durability and scale for blobs | Low-level performance control | Familiar file semantics |

15.4 Object Storage vs Database vs Local Disk

| Dimension | Object Storage | Database | Local Disk | |---|---|---| | Best for | Large blobs | Structured records and queries | Fast machine-local access | | Query support | Minimal metadata lookup | Rich queries and indexes | Minimal unless app-managed | | Durability model | Service-level replication or erasure coding | Depends on DB replication and backup | Depends on host and disk setup | | Sharing | Easy across services | Structured access only | Tied to machine unless networked | | Cost profile | Usually cheap per GB for large blobs | Higher for blob-heavy usage | Cheap locally but operationally limited |

15.5 Durability Concepts

Object stores are designed for very high durability.

Internally, that typically means some combination of:

replication across devices or zones
erasure coding
background integrity checks
repair workflows for lost fragments

Durability is different from availability.

An object can be highly durable but temporarily unavailable due to network or control-plane issues.

15.6 Scalability Characteristics

Object stores are designed for:

huge object counts
independent object retrieval
simple write and read APIs
high parallelism

They are not designed for transactional joins or row-level relational queries.

15.7 Object Immutability Concepts

Many object storage workflows treat objects as immutable.

Instead of modifying a large object in place, systems often:

upload a new version
update metadata pointers
rely on versioning for history

This simplifies distributed durability and caching.

15.8 Metadata and CDN Relationship

Object stores are often paired with:

a database for metadata and permissions
a CDN for global low-latency delivery

Why a CDN matters:

it caches content near users
reduces origin load
improves image and video latency dramatically

15.9 Cost Considerations

Cost is not just storage per GB.

Real costs include:

request volume
data transfer out
replication region choices
lifecycle tiering
media derivative explosion such as multiple image sizes and video renditions

Systems that look cheap at rest can become expensive in egress and processing.

15.10 Lifecycle Policies

Object stores often support lifecycle policies such as:

transition old objects to cheaper storage tiers
expire temporary uploads
clean abandoned multipart uploads
retain or lock data for compliance windows

This matters for backups, logs, and SaaS attachment retention.

16. S3-Style Storage

16.1 Buckets, Objects, and Keys

S3-style systems usually organize data as:

bucket: top-level namespace or container
object: stored binary plus metadata
key: object identifier within the bucket

Despite folder-like UIs, the key space is conceptually flat. Paths are usually naming conventions encoded into the key.

16.2 Versioning

Versioning keeps previous object versions when objects are overwritten or deleted.

Why it matters:

accidental delete recovery
auditability
rollback
safer overwrite semantics

16.3 Pre-Signed URLs

Pre-signed URLs allow temporary access to upload or download an object without proxying the bytes through the application server.

Why they are useful:

reduce backend bandwidth load
keep object store credentials hidden from clients
enforce short-lived, scoped access

This is one of the most common production upload and download patterns.

16.4 Access Control Basics

Access control is usually layered:

bucket-level policies
object-level permissions in some systems
application-level authorization before issuing signed URLs
CDN or origin access policies

Best practice: do not make private content directly world-readable and hope the frontend hides the URL.

16.5 Multipart Uploads

Multipart uploads break a large object into parts.

Benefits:

retry only failed parts
upload parts in parallel
resume large uploads more efficiently

This is essential for video uploads, cloud drive systems, and large backups.

16.6 Consistency Basics

Modern object stores often provide strong consistency for many common operations within a region, but engineers should still think carefully about distributed workflows.

Why?

event notifications may arrive asynchronously
cross-region replication may lag
caches and CDNs may serve stale content
metadata DB updates may race with object lifecycle events

So even if storage reads are strongly consistent, the surrounding workflow may still be eventually consistent.

16.7 Event-Driven Workflows

Object creation often triggers downstream work:

antivirus scanning
image resizing
video transcoding
metadata extraction
OCR or transcription
search indexing

This is why object storage frequently sits at the center of async pipelines.

16.8 Real-World Examples

user uploads for profile photos or attachments
backup archives
application logs shipped to long-term storage
static site assets
media hosting for images and videos

17. Uploads and Downloads

17.1 Direct Upload vs Backend Proxy Upload

There are two common upload styles.

Strategy	How it works	Good for	Main tradeoff
Direct upload	Client gets signed URL and uploads to object storage directly	Large media, high scale, low backend bandwidth	More client complexity and async workflow coordination
Backend proxy upload	Client uploads bytes to app backend, backend forwards or stores	Simpler auth/control, small files, strict validation workflows	Backend becomes bandwidth bottleneck

For large-scale media systems, direct upload is usually preferred.

17.2 Browser Upload Flow

flowchart LR
	C[Client] --> API[Backend API]
	API --> AUTH[Authorize Upload]
	AUTH --> SIGN[Generate Signed Upload URL]
	SIGN --> C
	C --> OBJ[(Object Storage)]
	OBJ --> EVT[Object Created Event]
	EVT --> PROC[Scan / Extract / Transform]
	PROC --> DB[(Metadata DB)]
	DB --> READY[Asset Ready]

Typical steps:

client asks backend to start upload
backend authorizes user and creates upload record
backend returns signed URL or multipart session data
client uploads directly to object storage
object store emits event
async processors validate and enrich the asset
metadata DB marks asset ready for use

17.3 Mobile Upload Considerations

Mobile uploads are harder because of:

flaky networks
app backgrounding
battery constraints
limited memory
varying file sizes and camera formats

Best practices:

resumable multipart uploads
persistent local upload state
idempotent retry tokens
chunk sizes tuned for mobile conditions

17.4 Secure Download Patterns

Secure download is often implemented with:

backend authorization check
short-lived signed URL or signed cookie
private object origin behind CDN
optional watermarking or audit logging for sensitive downloads

Typical SaaS pattern:

metadata and permissions live in the app DB
app checks access
app issues a short-lived signed URL for the specific file or CDN path

flowchart LR
	REQ[Client Requests File] --> API[Backend API]
	API --> AUTH[Authorize User / Tenant / ACL]
	AUTH --> SIGN[Issue Short-Lived Signed URL or Cookie]
	SIGN --> CDN[CDN / Private Origin]
	CDN --> OBJ[(Object Storage)]
	OBJ --> CDN
	CDN --> RESP[Client Downloads File]

17.5 Resumable Uploads

Resumable upload means the client can continue after failure without restarting from zero.

This matters for:

large videos
weak mobile connectivity
long uploads in browser tabs
enterprise file transfer systems

17.6 Retry Strategies

Retries must be careful.

Good practice:

retry failed chunks, not the whole upload
use exponential backoff with limits
make upload initiation idempotent
track completed parts so retries do not duplicate work

17.7 Integrity Verification

File integrity matters because uploads can be corrupted or truncated.

Checks may include:

size verification
checksum per part and final checksum
content-type validation
file signature or magic-byte inspection

Do not trust only the filename extension.

17.8 Antivirus Scanning Basics

Many production systems scan user uploads before making them broadly available.

Common flow:

upload lands in quarantine or pending state
scan job checks the file
file is promoted to usable state only after passing validation

This is common in SaaS attachment systems and customer-facing file upload platforms.

17.9 Failure Cases

signed URL expired during large upload
backend marked metadata row ready before scan or transform completed
clients retried whole uploads and multiplied storage cost
insecure direct object exposure leaked private files

18. Chunked Uploads

18.1 Why Chunked Uploads Exist

Chunked or multipart uploads exist because large files are fragile to send as one giant request.

Problems with single-shot uploads:

one failure restarts the whole transfer
memory usage may be large
progress tracking is coarse
network interruptions waste more work

18.2 Multipart Upload Flow

flowchart LR
	INIT[Initiate Multipart Upload] --> PARTS[Upload Parts in Parallel]
	PARTS --> TRACK[Track Completed Part IDs + Checksums]
	TRACK --> COMPLETE[Complete Upload]
	COMPLETE --> ASSEMBLE[Store Final Object]

Typical process:

client requests multipart upload session
backend or object store returns upload ID and per-part instructions
client uploads parts independently
client records which parts succeeded
final completion call assembles the object

18.3 Resume After Failure

To resume correctly, the client or backend needs state such as:

upload ID
part numbers
completed part identifiers or ETags
expected total file size
checksum state if used

This state may live in:

client local storage
backend DB
Redis for short-lived sessions

18.4 Parallel Uploads

Parallel part uploads improve throughput, especially for large files.

Tradeoffs:

more concurrency can increase speed
too much concurrency can overwhelm mobile devices, browsers, or rate limits

Chunk size and concurrency are tuning knobs.

18.5 Ordering and Assembly

Even when parts upload in parallel, the final object needs deterministic ordering.

The system uses part numbers or ordered manifests to assemble the correct final file.

18.6 Checksum Verification

Checksums can be used at:

per-part level
final object level

This helps detect corruption during transfer or assembly.

18.7 Large File Handling in Production

For very large uploads, systems often add:

rate limiting by user or tenant
upload quotas
expiration of abandoned multipart sessions
lifecycle cleanup of orphaned parts

Cloud drives and video platforms rely heavily on these controls.

18.8 Common Mistakes

not cleaning abandoned multipart parts
not persisting upload state for resume
choosing chunk sizes without considering mobile networks
trusting client-provided completion status without validation

19. Metadata

19.1 Why Metadata Exists Separately

Object storage is good at storing blobs, but most applications need richer metadata and business rules around each file.

Examples of metadata:

owner user or tenant
permission model
file name and original content type
processing status
timestamps
retention or legal hold flags
checksum and size
references to derived assets such as thumbnails or transcoded variants

This metadata is usually stored in a database, not only inside the object store.

19.2 Database + Object Storage Relationship

Common pattern:

object store holds the bytes
DB holds metadata, ownership, permissions, and workflow state

Why this split is useful:

queries are easier
business transactions stay in the DB
permissions are easier to manage
processing state is easier to update

Stripe-like or GitHub-like systems use this model for user files, exports, logs, and compliance artifacts.

19.3 Ownership and Permissions

Metadata tables often model:

user ownership
organization or tenant ownership
access scopes
sharing links
expiration rules

The object key alone should not be the permission system.

19.4 Indexing Metadata

Applications often need search over metadata, not the object bytes themselves.

Examples:

search files by filename, tag, owner, created date
search PDFs by OCR text and metadata
search media library by dimensions, duration, language, transcript terms

This is where search and file systems meet.

19.5 Audit Logging

Sensitive file systems often log:

who uploaded a file
who downloaded it
who changed permissions
who deleted or restored it

Audit trails are critical in enterprise SaaS, finance, healthcare, and security tools.

19.6 Soft Delete Patterns

Soft delete means a record is marked deleted in metadata before hard physical removal.

Why it helps:

recovery from mistakes
retention window enforcement
asynchronous cleanup jobs

The object may remain in storage until retention rules allow permanent deletion.

19.7 Retention Policies

Retention policies control how long data stays available or must be preserved.

This matters for:

compliance
customer contracts
internal audits
backup windows

19.8 Why Metadata Usually Does Not Live Only in the Object Store

Because object stores are not designed to answer business questions efficiently.

Questions like these belong in a metadata DB or search index:

show all files owned by tenant X uploaded in the last 7 days
list all pending virus scans
find all files shared externally
restore the previous version of this document

20. Versioning

20.1 Object Versioning

Versioning means keeping multiple historical states of an object rather than replacing the old state permanently.

This supports:

rollback after mistakes
accidental deletion protection
auditability
legal retention

20.2 File History

In user-facing products such as Google Drive or Dropbox, file history is a product feature.

Under the hood, this may mean:

multiple object versions in storage
metadata rows pointing to the active version
retention rules controlling how long old versions are kept

20.3 Rollback Capability

Rollback often means changing metadata to point to an older version rather than mutating the object in place.

This is simpler and safer in distributed systems.

20.4 Accidental Deletion Protection

Versioning helps because delete can mean:

create a delete marker
hide current version
retain earlier versions for recovery

This protects users from mistakes and protects operators from incidents.

20.5 Legal Retention and Holds

Some systems need to prevent deletion for compliance or litigation.

That means versioning and retention logic must support:

immutable retention windows
hold flags
auditability around deletion attempts

20.6 Overwrite Semantics

Without versioning, overwrite means old content is lost.

With versioning, overwrite usually means:

new object version becomes current
old version remains restorable until policy cleanup

20.7 Tradeoffs

better safety and recovery
higher storage cost
more complex metadata and lifecycle management

Production systems usually accept that tradeoff for important user data.

21. Image Optimization

21.1 Why Image Optimization Exists

Raw uploaded images are often too large, too slow, or in the wrong format for user-facing delivery.

Image optimization exists to improve:

page load time
bandwidth cost
visual quality per byte
rendering on different device sizes

This matters for profile pictures, e-commerce catalogs, social posts, dashboards, and documentation platforms.

21.2 Resizing and Thumbnails

Common derivatives:

thumbnail
small card image
medium detail view
large zoomable image

Why precompute sizes:

repeated on-the-fly resizing is expensive
predictable variants simplify caching
UI surfaces often reuse the same size classes

21.3 Responsive Image Delivery

Different devices need different image sizes.

Serving a giant desktop asset to a mobile client wastes bandwidth.

Responsive delivery uses:

multiple size variants
CDN selection or URL conventions
client hints or frontend image markup strategies

21.4 Format Conversion Basics

Common formats:

JPEG: widely compatible, good for photos
PNG: lossless and good for sharp graphics or transparency-heavy content
WebP: often smaller than JPEG/PNG for many web cases
AVIF: often strong compression efficiency, but with ecosystem and encoding tradeoffs

The right format depends on content type, compatibility, and CPU cost.

21.5 Lazy Loading Relevance

Lazy loading reduces unnecessary downloads for off-screen images.

This is mostly a frontend delivery concern, but backend and CDN design still matter because smaller, optimized derivatives make lazy loading much more effective.

21.6 CDN Image Optimization

Some systems optimize images on request at the edge or CDN layer.

Benefits:

fewer precomputed variants needed
flexible resizing
device-aware format negotiation

Tradeoffs:

added compute cost at the edge or image service
cache fragmentation if variant space is uncontrolled

21.7 Quality vs Size Tradeoffs

Image optimization is always a tradeoff.

Too much compression causes artifacts. Too little wastes bandwidth and storage.

E-commerce sites care a lot here because product clarity affects conversion, but slow pages also hurt conversion.

21.8 Async Processing Pipelines

Common image flow:

upload original asset
validate and scan
generate variants
store derivatives in object storage
publish metadata and CDN paths

Profile pictures, avatar systems, and commerce catalogs commonly use this asynchronous derivative pipeline.

22. Video Transcoding

22.1 Why Transcoding Exists

A raw uploaded video is rarely suitable for direct delivery to every device and network condition.

Transcoding converts the input into delivery-friendly formats and variants.

Why it is needed:

devices support different codecs and containers
users have different bandwidth conditions
multiple resolutions are needed
streaming systems need chunked delivery formats

22.2 Codec Basics

A codec defines how video and audio are compressed and decoded.

You do not need deep media math in most interviews, but you should know:

codecs affect compatibility, compression efficiency, and CPU cost
better compression often costs more encode time
playback support across devices matters as much as compression ratio

22.3 Bitrate Adaptation and Resolution Variants

Adaptive streaming works by producing multiple renditions such as:

240p low bitrate
480p medium bitrate
720p or 1080p higher bitrate

The player switches between them based on network and device conditions.

This reduces buffering and improves startup reliability.

22.4 HLS and DASH Basics

HLS and DASH are common adaptive streaming approaches.

High-level idea:

split video into chunks or segments
generate manifests listing available renditions and segments
player fetches segments dynamically based on bandwidth and playback logic

22.5 Async Job Processing

Video transcoding is expensive, so it is almost always asynchronous.

Common architecture:

flowchart LR
	UP[Uploaded Video] --> Q[Queue / Job Orchestrator]
	Q --> TR[Transcoding Workers]
	TR --> PACK[Package HLS / DASH Variants]
	TR --> THUMB[Thumbnail Extraction]
	PACK --> OBJ[(Object Storage)]
	THUMB --> OBJ
	OBJ --> CDN[CDN]

22.6 Queue-Based Transcoding Systems

Why queues are used:

uploads arrive in bursts
transcoding jobs vary wildly by duration and cost
retries and worker scaling need decoupling
backpressure needs to be explicit

Workers may be specialized by codec, region, or job size.

22.7 Storage Explosion Challenges

Video systems multiply data quickly.

One uploaded asset may create:

multiple renditions
audio tracks
captions or subtitles
thumbnails
preview clips
manifests

This is why video platforms think carefully about retention, renditions, archival tiers, and whether every input truly needs every derivative.

22.8 Playback Optimization

Playback quality depends on more than transcoding.

It also depends on:

segment sizing
CDN placement
startup buffer strategy
manifest design
thumbnail or preview availability
device compatibility testing

YouTube- and Netflix-style systems invest heavily in startup latency and rebuffer reduction because user drop-off is highly sensitive to playback quality.

22.9 Common Failure Cases

queue backlogs causing long time-to-playable
corrupted input files causing worker crashes
incompatible renditions for some clients
expensive reprocessing after pipeline changes
storage growth from keeping every derivative forever

23. Thumbnails and Previews

23.1 Why Thumbnails Exist

Thumbnails help users decide what to open before downloading or playing the full asset.

They matter for:

video browsing
PDF and document explorers
image galleries
cloud drive UIs
e-commerce product grids

23.2 Video Thumbnail Generation

Video thumbnails may be generated by:

picking fixed offsets
picking keyframes
choosing frames based on saliency or quality heuristics

Why keyframe selection matters:

a black frame or transition frame makes the content look broken
better thumbnails improve CTR and perceived quality

23.3 Document Preview Generation

Documents such as PDFs or presentations often need previews.

Typical flow:

extract first page image or several page previews
store derived images in object storage
cache them behind a CDN

23.4 Async Preview Generation

Preview generation is usually asynchronous because:

file types vary
extraction can be CPU-heavy
malformed files must be isolated from the main request path

23.5 Caching Strategies

Thumbnail and preview requests are highly cacheable.

Common patterns:

store deterministic derivative paths
serve from CDN
cache aggressive immutable variants when versioned in URL

23.6 Failure Cases

preview service timing out on large or malformed documents
thumbnail chosen from poor video frame
preview cache not invalidated after replacement or new version upload

24. Compression

24.1 Why Compression Exists

Compression reduces storage and network transfer size.

It matters because:

bandwidth is expensive
users have limited network quality
storage multiplies at scale
smaller payloads improve latency when CPU cost is acceptable

24.2 Lossless vs Lossy

Type	Meaning	Common use
Lossless	Original data can be reconstructed exactly	Text, archives, logs, some images, many document workflows
Lossy	Some information is discarded for higher compression	Photos, audio, video

24.3 CPU Tradeoffs

Compression is not free.

Stronger compression can save bandwidth or storage but cost more CPU and latency.

This is why production systems choose compression differently for:

hot online serving
asynchronous background processing
archival storage

24.4 Compression in Uploads and Downloads

Compression may happen at multiple places:

client-side before upload in some media apps
server-side during processing
CDN or transport layer for text-based responses
archival pipeline for logs and backups

Do not blindly recompress already compressed media formats. It may waste CPU and reduce quality.

24.5 Media-Specific Considerations

Images and videos already use domain-specific codecs. General-purpose compression on top often gives limited benefit.

Instead, media optimization usually means:

choosing the right codec or format
choosing quality settings carefully
generating the right resolution variants

24.6 Archive Workflows

Archive workflows such as backups and log retention often use strong compression because:

data is cold
latency matters less
storage savings compound heavily over time

24.7 Where Compression Should Happen

Good rule of thumb:

compress once in a controlled part of the pipeline
avoid repeated transcoding or recompression unless there is a clear reason
separate serving optimization from archival optimization

24.8 Common Mistakes

recompressing lossy media too many times
optimizing for smallest size and harming user experience
using CPU-heavy compression in hot request paths without need

25. How These Systems Connect in Real Architectures

The most useful mental model is that search, recommendation, and media systems are not isolated services. They are derived-serving systems around a source-of-truth core.

25.1 Example: E-Commerce Architecture

Product system:

DB stores product, inventory, seller, and pricing records
search index stores analyzed text plus filterable fields
ranking combines lexical relevance, popularity, margin, seller trust, and availability
recommendation system generates related items and home feed modules
object storage stores product images and videos
image pipeline generates thumbnails and responsive variants
CDN serves optimized media globally

Failure discussion:

stale index may show out-of-stock products
stale media caches may show old images after updates
poor ranking can bury relevant products even if retrieval worked

25.2 Example: SaaS Document Platform

Document system:

metadata DB stores ownership, permissions, and workflow state
object storage stores uploaded files and derived previews
search index stores filename, OCR text, comments, tags, and ACL-aware retrieval fields
recommendation or discovery surface suggests recent or relevant docs
signed URLs protect downloads
antivirus and preview generation run asynchronously

Failure discussion:

ACL lag can leak confidential docs if search filtering is wrong
metadata/object mismatch can show broken files
preview lag makes the product feel stale even when upload technically succeeded

25.3 Example: Short-Form Video Platform

flowchart LR
	CREATOR[Creator Upload] --> OBJ[(Object Storage)]
	OBJ --> MEDIA[Transcode + Thumbnail + Moderation]
	MEDIA --> CDN[CDN]
	MEDIA --> META[(Metadata DB)]
	META --> INDEX[Search / Hashtag Index]
	META --> REC[Candidate Generation + Ranking]
	REC --> FEED[Home Feed Service]
	FEED --> USER[Viewer]
	CDN --> USER

System properties:

upload pipeline must be reliable and resumable
media pipeline must scale with bursty creator traffic
recommendation system must generate candidates and rank them quickly
search index may support creators, hashtags, captions, or sounds
CDN must absorb global playback traffic

25.4 Common Cross-System Failure Modes

source-of-truth DB updated but search index stale
object uploaded but metadata row missing
metadata row exists but object processing failed
recommendation service uses stale features or bad model rollout
CDN serves old media after overwrite because cache keying is wrong
ACL changes propagate inconsistently across DB, search, cache, and signed download logic

25.5 Strong Engineering Principles Across All These Systems

keep a clear source of truth
treat indexes, feeds, and media derivatives as derived serving layers
design async pipelines to be idempotent and replay-safe
measure freshness, not just latency
plan for backfills and reprocessing from day one
separate hard constraints from soft ranking
build for partial degradation, not only full success
treat permissions as first-class, especially in search and file access

26. Interview Playbook

If you are asked to design one of these systems, structure your answer around these questions:

What is the user-facing behavior and latency expectation?
What is the source of truth?
What derived indexes or serving structures are needed?
What is precomputed vs done online?
How does ranking or retrieval work?
What are the main failure cases and stale-data risks?
How do permissions, abuse, and compliance affect the design?
What changes at 10x scale?

26.1 High-Value Tradeoffs to Discuss

database query vs specialized search index
lexical retrieval vs semantic retrieval vs hybrid
freshness vs query performance
fanout-on-write vs fanout-on-read
precompute vs read-time ranking
direct upload vs backend proxy upload
object storage vs block storage vs file systems
exact counts vs approximate facets
aggressive personalization vs fairness and exploration

26.2 What Breaks at Scale

long-tail shard latency dominates search response time
popular prefixes overload autocomplete caches
ranking models become too expensive for online serving
celebrity fanout explodes write amplification
object store costs spike from derivative explosion and egress
background media pipelines backlog during traffic bursts
stale caches and delayed async processing create user-visible inconsistency

26.3 Final Mental Model

Search systems answer explicit intent.

Recommendation systems infer likely intent.

Object storage and media pipelines make large assets durable and deliverable.

The engineering challenge is rarely the isolated component. It is the interaction between source data, derived indexes, ranking, caching, asynchronous processing, permissions, and scale.

That is the level interviews usually want, and it is also the level production systems demand.

89 KiB Raw Permalink Blame History

Search & Discovery + Media & File Systems

1. Big Picture: Why These Topics Belong Together

1.1 One Product, Many Subsystems

1.2 Search vs Discovery

1.3 Latency Expectations

1.4 Interview Framing

2. Search System

2.1 What a Search System Is

2.2 Why Databases Alone Are Often Not Enough

2.3 Full-Text Search vs Exact Lookup

2.4 Architecture of a Search System

2.5 How Production Search Differs from Normal Database Queries

2.6 Query Pipeline

2.7 Distributed Search Basics

2.8 Freshness vs Performance

2.9 Consistency Challenges

2.10 Search Latency and Fault Tolerance

2.11 Common Mistakes

3. Indexing

3.1 Why Search Indexing Exists

3.2 Document Ingestion Pipeline

3.3 Tokenization

3.4 Normalization

3.5 Stemming and Lemmatization

3.6 Stop Words

3.7 Synonyms

3.8 Language Handling Basics

3.9 Incremental Indexing

3.10 Near Real-Time Indexing

3.11 Reindexing Challenges

3.12 Database Indexes vs Search Indexes

3.13 Best Practices

4. Inverted Index

4.1 What an Inverted Index Is

4.2 Why It Powers Most Search Systems

4.3 Term -> Document Mapping

4.4 Postings Lists

4.5 Positional Indexes and Phrase Search

4.6 Boolean Search

4.7 Compression Basics

4.8 Distributed Inverted Indexes

4.9 How Retrieval Is Fast in Practice

4.10 Common Failure Cases

4.11 Interview Angle

5. Autocomplete

5.1 Why Autocomplete Exists

5.2 Prefix Matching

5.3 Trie Basics

5.4 N-gram Approaches

5.5 Popularity-Based Suggestions

5.6 Recent Searches and Personalization

5.7 Typo Tolerance Basics

5.8 Caching Strategies

5.9 Large-Scale Production Considerations

5.10 Common Mistakes

6. Filtering

6.1 What Filtering Is

6.2 Structured Filters

6.3 Faceted Search

6.4 Range Filters

6.5 Filtering and Ranking Interaction

6.6 Filter Performance Optimization

6.7 Real-World Examples

6.8 Common Failure Cases

7. Ranking

7.1 Why Ranking Matters

7.2 Why Ranking Is Usually Multi-Stage

7.3 Retrieval vs Ranking

7.4 Relevance Ranking vs Business Ranking

7.5 Common Ranking Signals

7.6 Diversity Constraints

7.7 Sponsored Content Considerations

7.8 Failure Cases

7.9 Best Practices

8. Relevance Scoring

8.1 What Relevance Scoring Means

8.2 TF-IDF Basics

8.3 BM25 Basics

8.4 Semantic Relevance Basics

89 KiB

Raw Permalink Blame History