12 KiB
File 4: STL Deep Dive
Learning Goals
By the end of this file, you should be able to:
- choose standard containers based on access patterns, not habit
- explain how core STL containers work internally
- understand iterator categories and invalidation rules well enough to avoid subtle bugs
- use algorithms library functions as first-class tools rather than optional extras
- discuss STL complexity tradeoffs in interviews and system design conversations
This file assumes you already understand object lifetime, move semantics, and ownership. The STL is not “just a library.” It is a design philosophy built around generic programming, iterator-based abstraction, and predictable complexity.
What the STL Is Trying to Solve
Intuition
Most programs need the same families of operations:
- store collections of data
- traverse them efficiently
- search, sort, transform, filter, and aggregate
The STL gives reusable building blocks for those tasks while preserving performance transparency.
Its core ideas are:
- containers own and organize data
- iterators provide a common traversal interface
- algorithms operate over iterator ranges instead of hardcoding container types
That separation is one of the most important patterns in C++.
std::vector
Intuition
std::vector is the default dynamic array in C++. It stores elements contiguously and grows as needed.
If you do not have a strong reason to pick something else, vector is often the correct first choice.
Internal Mechanics
A vector typically stores:
- a pointer to a contiguous heap buffer
- its current size
- its current capacity
flowchart LR
A[vector object] --> B[data pointer]
A --> C[size]
A --> D[capacity]
B --> E[element 0]
B --> F[element 1]
B --> G[element 2]
When capacity is exceeded, vector allocates a larger buffer, moves or copies elements into it, then frees the old buffer.
Why It Exists
Contiguous storage gives major benefits:
- O(1) random access
- strong cache locality
- easy interop with C APIs and low-level buffers
- efficient iteration and algorithm use
Practical Usage
- numeric data
- event buffers
- parsed records
- task queues with append-heavy patterns
Pitfalls
- reallocation invalidates pointers, references, and iterators to elements
- frequent small growth can cause repeated reallocations if capacity is not reserved
- insertion in the middle is expensive because elements after the insertion point must shift
Real Advice
If you know approximate size up front, call reserve(). That is one of the highest-value micro-optimizations in ordinary C++ code.
std::deque
Intuition
deque is a double-ended queue optimized for efficient insertion and removal at both ends while still supporting indexed access.
Internal Mechanics
Unlike vector, deque is not typically one contiguous buffer. It is often implemented as a segmented structure: a map of fixed-size blocks.
This avoids whole-buffer reallocation for growth at the front or back.
Practical Usage
- queue-like workloads needing both front and back operations
- sliding window logic
- schedulers and work-stealing structures in some implementations
Pitfalls
- weaker cache locality than vector
- assumptions about contiguity are invalid
- iterators can be invalidated in ways different from vector
std::list
Intuition
list is a doubly linked list. It exists because some workloads benefit from stable iterators and cheap insertion or removal at known positions.
Internal Mechanics
Each node usually stores:
- the element value
- pointer to previous node
- pointer to next node
flowchart LR
A[Node] --> B[prev]
A --> C[value]
A --> D[next]
Practical Usage
In practice, far fewer workloads need list than many engineers assume. It can be useful when:
- you already hold iterators to splice locations
- stable node addresses matter
- frequent insertion and erasure in the middle dominate performance and traversal locality matters less
Common Misconception
“List is better for lots of inserts and deletes.”
Only sometimes. Pointer chasing hurts cache locality badly. In many real workloads, vector still wins despite O(n) insertion because contiguous memory is so CPU-friendly.
Associative Containers: map and set
Intuition
Ordered associative containers maintain elements in sorted order and support logarithmic lookup, insertion, and removal.
Internal Mechanics
std::map and std::set are typically implemented as balanced binary search trees, commonly red-black trees.
flowchart TB
A[8] --> B[4]
A --> C[12]
B --> D[2]
B --> E[6]
C --> F[10]
C --> G[14]
Why this matters:
- elements are kept ordered
- lookup is O(log n)
- iterating produces sorted order
- node-based storage means references and iterators are often more stable than in vector
Practical Usage
- ordered dictionaries
- interval or range logic using
lower_boundandupper_bound - workloads where sorted traversal is part of the contract
Pitfalls
- higher per-element overhead than vector-based approaches
- poorer cache locality because nodes are separately allocated
- using
mapby default when ordered traversal is not needed
Hash-Based Containers: unordered_map and unordered_set
Intuition
Hash-based containers optimize for average-case constant-time lookup rather than ordering.
Internal Mechanics
An unordered_map typically uses:
- a bucket array
- a hash function to choose a bucket
- collision handling, often with chains or equivalent node structures
flowchart LR
A[key hash] --> B[bucket array]
B --> C[bucket 0]
B --> D[bucket 1]
B --> E[bucket 2]
D --> F[node -> node]
Practical Usage
- caches
- symbol tables
- frequency counting
- routing tables or registries when order does not matter
Tradeoffs
- average O(1) lookup, but worst-case O(n)
- memory overhead from buckets and nodes
- iteration order is not stable or meaningful
Pitfalls
- bad custom hash functions hurt performance
- rehashing invalidates iterators in many cases
- using
unordered_mapwhen deterministic iteration order is important
Iterators
Intuition
Iterators generalize traversal so algorithms can work across many containers.
Instead of writing one sorting routine for vectors and another for arrays, algorithms operate on iterator ranges.
Categories Matter
Different iterators support different capabilities:
- input iterator: read sequentially
- forward iterator: one-way multi-pass traversal
- bidirectional iterator: move both forward and backward
- random-access iterator: jump in constant time
This is why std::sort works with vector iterators but not list iterators. Sorting efficiently requires random access.
Mental Model
Think of an iterator as a generalized cursor with container-specific guarantees.
Practical Usage
- generic algorithms over different container types
- decoupling traversal from storage details
- writing reusable library code
Pitfalls
- invalidating iterators after insertions or erasures
- dereferencing
end() - assuming all iterators support the same operations
Iterator Invalidation
Intuition
This is one of the most frequent real-world STL bug sources. The container changes, but code keeps using old iterators, references, or pointers.
Practical Rules of Thumb
- vector reallocation invalidates all iterators, pointers, and references to elements
- list node insertions usually preserve iterators to other nodes
- unordered containers may invalidate iterators when rehashing occurs
Do not rely on vague memory here. For critical code, check the container's exact guarantees.
Algorithms Library
Intuition
The algorithms library exists so you can express intent at a higher level than manual loops while still staying efficient.
Common examples include:
std::sortstd::find_ifstd::transformstd::accumulatestd::lower_boundstd::remove_if
Why It Matters
Algorithms make code:
- more declarative
- easier to review
- easier for the compiler to optimize consistently
- less error-prone than handwritten index manipulation
Example
std::vector<int> values = {5, 1, 4, 2, 3};
std::sort(values.begin(), values.end());
You do not need to reimplement quicksort or mergesort in production code unless the problem specifically requires it.
The Erase-Remove Idiom
This is a classic STL pattern:
values.erase(
std::remove_if(values.begin(), values.end(), [](int v) { return v % 2 == 0; }),
values.end());
Why it exists:
remove_ifreorders the range so kept elements move to the front- it returns the new logical end
eraseactually shrinks the container
Understanding this pattern signals real STL fluency.
Complexity Cheat Sheet
Sequence Containers
| Container | Random Access | Push Back | Push Front | Insert Middle | Iterator Stability |
|---|---|---|---|---|---|
vector |
O(1) | amortized O(1) | O(n) | O(n) | weak under reallocation |
deque |
O(1) | O(1) | O(1) | O(n) | moderate, container-specific |
list |
O(n) | O(1) | O(1) | O(1) with iterator | strong for other nodes |
Associative Containers
| Container | Lookup | Insert | Order | Typical Internal Structure |
|---|---|---|---|---|
map |
O(log n) | O(log n) | sorted | balanced tree |
set |
O(log n) | O(log n) | sorted | balanced tree |
unordered_map |
average O(1) | average O(1) | none | hash table |
unordered_set |
average O(1) | average O(1) | none | hash table |
Why Interviews Ask About This
Interviewers are usually not checking if you memorized tables. They want to know whether you can choose the right structure for a workload.
Examples:
- frequent append plus indexed reads: likely
vector - ordered lookup with range queries: likely
map - key lookup without ordering: likely
unordered_map - middle splicing with stable iterators: maybe
list, but verify locality costs first
Container Selection in Real Systems
Prefer vector More Often Than You Think
Because of contiguity, vector is often the fastest general-purpose container even when its theoretical complexity looks worse than a node-based alternative.
Reach for Ordered Containers When Order Matters as Part of the Contract
If you need sorted traversal, nearest-key queries, or stable ordering semantics, map earns its cost.
Use Hash Containers When Key Lookup Dominates and Order Does Not Matter
This is common in compilers, interpreters, caches, and service registries.
Avoid Cargo-Culting list
Many engineers learn linked lists academically and then overestimate their usefulness in high-performance software.
Common STL Pitfalls
- forgetting iterator invalidation rules
- using
operator[]onmaporunordered_mapwhen accidental insertion is undesirable - choosing containers by asymptotic complexity alone and ignoring memory locality
- copying large containers accidentally when references or moves were intended
- assuming all algorithms work with all iterator categories
Interview Checkpoints
You should be able to explain:
- why
vectoris often the default container - how vector reallocation works and why
reserve()helps - the internal difference between
mapandunordered_map - what iterator categories mean in practice
- why
std::sortrequires random-access iterators - how the erase-remove idiom works
- why cache locality can beat seemingly better asymptotic complexity
What Comes Next
The final file moves from language and library mechanics into systems-level C++: threads, locks, atomics, performance work, and the patterns that show up in production engines, compilers, and low-latency systems.