Files
Computer-Fundamentals/c++/04_stl_deep_dive.md
T
tarun-elango be31df2d44 more text
2026-04-26 14:09:04 -04:00

12 KiB

File 4: STL Deep Dive

Learning Goals

By the end of this file, you should be able to:

  • choose standard containers based on access patterns, not habit
  • explain how core STL containers work internally
  • understand iterator categories and invalidation rules well enough to avoid subtle bugs
  • use algorithms library functions as first-class tools rather than optional extras
  • discuss STL complexity tradeoffs in interviews and system design conversations

This file assumes you already understand object lifetime, move semantics, and ownership. The STL is not “just a library.” It is a design philosophy built around generic programming, iterator-based abstraction, and predictable complexity.

What the STL Is Trying to Solve

Intuition

Most programs need the same families of operations:

  • store collections of data
  • traverse them efficiently
  • search, sort, transform, filter, and aggregate

The STL gives reusable building blocks for those tasks while preserving performance transparency.

Its core ideas are:

  • containers own and organize data
  • iterators provide a common traversal interface
  • algorithms operate over iterator ranges instead of hardcoding container types

That separation is one of the most important patterns in C++.

std::vector

Intuition

std::vector is the default dynamic array in C++. It stores elements contiguously and grows as needed.

If you do not have a strong reason to pick something else, vector is often the correct first choice.

Internal Mechanics

A vector typically stores:

  • a pointer to a contiguous heap buffer
  • its current size
  • its current capacity
flowchart LR
    A[vector object] --> B[data pointer]
    A --> C[size]
    A --> D[capacity]
    B --> E[element 0]
    B --> F[element 1]
    B --> G[element 2]

When capacity is exceeded, vector allocates a larger buffer, moves or copies elements into it, then frees the old buffer.

Why It Exists

Contiguous storage gives major benefits:

  • O(1) random access
  • strong cache locality
  • easy interop with C APIs and low-level buffers
  • efficient iteration and algorithm use

Practical Usage

  • numeric data
  • event buffers
  • parsed records
  • task queues with append-heavy patterns

Pitfalls

  • reallocation invalidates pointers, references, and iterators to elements
  • frequent small growth can cause repeated reallocations if capacity is not reserved
  • insertion in the middle is expensive because elements after the insertion point must shift

Real Advice

If you know approximate size up front, call reserve(). That is one of the highest-value micro-optimizations in ordinary C++ code.

std::deque

Intuition

deque is a double-ended queue optimized for efficient insertion and removal at both ends while still supporting indexed access.

Internal Mechanics

Unlike vector, deque is not typically one contiguous buffer. It is often implemented as a segmented structure: a map of fixed-size blocks.

This avoids whole-buffer reallocation for growth at the front or back.

Practical Usage

  • queue-like workloads needing both front and back operations
  • sliding window logic
  • schedulers and work-stealing structures in some implementations

Pitfalls

  • weaker cache locality than vector
  • assumptions about contiguity are invalid
  • iterators can be invalidated in ways different from vector

std::list

Intuition

list is a doubly linked list. It exists because some workloads benefit from stable iterators and cheap insertion or removal at known positions.

Internal Mechanics

Each node usually stores:

  • the element value
  • pointer to previous node
  • pointer to next node
flowchart LR
    A[Node] --> B[prev]
    A --> C[value]
    A --> D[next]

Practical Usage

In practice, far fewer workloads need list than many engineers assume. It can be useful when:

  • you already hold iterators to splice locations
  • stable node addresses matter
  • frequent insertion and erasure in the middle dominate performance and traversal locality matters less

Common Misconception

“List is better for lots of inserts and deletes.”

Only sometimes. Pointer chasing hurts cache locality badly. In many real workloads, vector still wins despite O(n) insertion because contiguous memory is so CPU-friendly.

Associative Containers: map and set

Intuition

Ordered associative containers maintain elements in sorted order and support logarithmic lookup, insertion, and removal.

Internal Mechanics

std::map and std::set are typically implemented as balanced binary search trees, commonly red-black trees.

flowchart TB
    A[8] --> B[4]
    A --> C[12]
    B --> D[2]
    B --> E[6]
    C --> F[10]
    C --> G[14]

Why this matters:

  • elements are kept ordered
  • lookup is O(log n)
  • iterating produces sorted order
  • node-based storage means references and iterators are often more stable than in vector

Practical Usage

  • ordered dictionaries
  • interval or range logic using lower_bound and upper_bound
  • workloads where sorted traversal is part of the contract

Pitfalls

  • higher per-element overhead than vector-based approaches
  • poorer cache locality because nodes are separately allocated
  • using map by default when ordered traversal is not needed

Hash-Based Containers: unordered_map and unordered_set

Intuition

Hash-based containers optimize for average-case constant-time lookup rather than ordering.

Internal Mechanics

An unordered_map typically uses:

  • a bucket array
  • a hash function to choose a bucket
  • collision handling, often with chains or equivalent node structures
flowchart LR
    A[key hash] --> B[bucket array]
    B --> C[bucket 0]
    B --> D[bucket 1]
    B --> E[bucket 2]
    D --> F[node -> node]

Practical Usage

  • caches
  • symbol tables
  • frequency counting
  • routing tables or registries when order does not matter

Tradeoffs

  • average O(1) lookup, but worst-case O(n)
  • memory overhead from buckets and nodes
  • iteration order is not stable or meaningful

Pitfalls

  • bad custom hash functions hurt performance
  • rehashing invalidates iterators in many cases
  • using unordered_map when deterministic iteration order is important

Iterators

Intuition

Iterators generalize traversal so algorithms can work across many containers.

Instead of writing one sorting routine for vectors and another for arrays, algorithms operate on iterator ranges.

Categories Matter

Different iterators support different capabilities:

  • input iterator: read sequentially
  • forward iterator: one-way multi-pass traversal
  • bidirectional iterator: move both forward and backward
  • random-access iterator: jump in constant time

This is why std::sort works with vector iterators but not list iterators. Sorting efficiently requires random access.

Mental Model

Think of an iterator as a generalized cursor with container-specific guarantees.

Practical Usage

  • generic algorithms over different container types
  • decoupling traversal from storage details
  • writing reusable library code

Pitfalls

  • invalidating iterators after insertions or erasures
  • dereferencing end()
  • assuming all iterators support the same operations

Iterator Invalidation

Intuition

This is one of the most frequent real-world STL bug sources. The container changes, but code keeps using old iterators, references, or pointers.

Practical Rules of Thumb

  • vector reallocation invalidates all iterators, pointers, and references to elements
  • list node insertions usually preserve iterators to other nodes
  • unordered containers may invalidate iterators when rehashing occurs

Do not rely on vague memory here. For critical code, check the container's exact guarantees.

Algorithms Library

Intuition

The algorithms library exists so you can express intent at a higher level than manual loops while still staying efficient.

Common examples include:

  • std::sort
  • std::find_if
  • std::transform
  • std::accumulate
  • std::lower_bound
  • std::remove_if

Why It Matters

Algorithms make code:

  • more declarative
  • easier to review
  • easier for the compiler to optimize consistently
  • less error-prone than handwritten index manipulation

Example

std::vector<int> values = {5, 1, 4, 2, 3};
std::sort(values.begin(), values.end());

You do not need to reimplement quicksort or mergesort in production code unless the problem specifically requires it.

The Erase-Remove Idiom

This is a classic STL pattern:

values.erase(
    std::remove_if(values.begin(), values.end(), [](int v) { return v % 2 == 0; }),
    values.end());

Why it exists:

  • remove_if reorders the range so kept elements move to the front
  • it returns the new logical end
  • erase actually shrinks the container

Understanding this pattern signals real STL fluency.

Complexity Cheat Sheet

Sequence Containers

Container Random Access Push Back Push Front Insert Middle Iterator Stability
vector O(1) amortized O(1) O(n) O(n) weak under reallocation
deque O(1) O(1) O(1) O(n) moderate, container-specific
list O(n) O(1) O(1) O(1) with iterator strong for other nodes

Associative Containers

Container Lookup Insert Order Typical Internal Structure
map O(log n) O(log n) sorted balanced tree
set O(log n) O(log n) sorted balanced tree
unordered_map average O(1) average O(1) none hash table
unordered_set average O(1) average O(1) none hash table

Why Interviews Ask About This

Interviewers are usually not checking if you memorized tables. They want to know whether you can choose the right structure for a workload.

Examples:

  • frequent append plus indexed reads: likely vector
  • ordered lookup with range queries: likely map
  • key lookup without ordering: likely unordered_map
  • middle splicing with stable iterators: maybe list, but verify locality costs first

Container Selection in Real Systems

Prefer vector More Often Than You Think

Because of contiguity, vector is often the fastest general-purpose container even when its theoretical complexity looks worse than a node-based alternative.

Reach for Ordered Containers When Order Matters as Part of the Contract

If you need sorted traversal, nearest-key queries, or stable ordering semantics, map earns its cost.

Use Hash Containers When Key Lookup Dominates and Order Does Not Matter

This is common in compilers, interpreters, caches, and service registries.

Avoid Cargo-Culting list

Many engineers learn linked lists academically and then overestimate their usefulness in high-performance software.

Common STL Pitfalls

  • forgetting iterator invalidation rules
  • using operator[] on map or unordered_map when accidental insertion is undesirable
  • choosing containers by asymptotic complexity alone and ignoring memory locality
  • copying large containers accidentally when references or moves were intended
  • assuming all algorithms work with all iterator categories

Interview Checkpoints

You should be able to explain:

  • why vector is often the default container
  • how vector reallocation works and why reserve() helps
  • the internal difference between map and unordered_map
  • what iterator categories mean in practice
  • why std::sort requires random-access iterators
  • how the erase-remove idiom works
  • why cache locality can beat seemingly better asymptotic complexity

What Comes Next

The final file moves from language and library mechanics into systems-level C++: threads, locks, atomics, performance work, and the patterns that show up in production engines, compilers, and low-latency systems.