Files
Computer-Fundamentals/c++/01_foundations_of_cpp.md
T
tarun-elango be31df2d44 more text
2026-04-26 14:09:04 -04:00

18 KiB

File 1: Foundations of C++

Learning Goals

By the end of this file, you should be able to:

  • explain how C++ source code becomes a running executable
  • reason about basic types, object storage, and memory layout
  • distinguish stack allocation from heap allocation in practical terms
  • use pointers and references without treating them as magic syntax
  • debug common low-level failures with a structured mental model

This file is the foundation for the rest of the guide. If later topics like RAII, smart pointers, iterators, or multithreading feel abstract, come back here first. C++ becomes much easier once you can picture what the compiler produces and what memory actually looks like at runtime.

Why C++ Exists

C++ sits in an unusual position among mainstream languages. It gives you high-level abstractions such as classes, templates, exceptions, and a rich standard library, but it still lets you work close to the machine.

That combination is why C++ shows up in places where both abstraction and control matter:

  • game engines that need tight performance and custom memory behavior
  • trading systems that care about latency and predictable execution
  • databases, compilers, browsers, and storage engines that manipulate large amounts of structured data
  • embedded and systems code where resource use must be explicit

The core idea is not just “fast language.” Many languages are fast in some contexts. C++ is valuable because it lets you choose where to pay for abstraction and where to avoid it.

The Compilation Model

Intuition

In Python or JavaScript, you can often treat “running the code” as a direct action. In C++, there is a build pipeline between the source you write and the machine code the CPU executes. Understanding that pipeline helps explain many common C++ issues:

  • why header files exist
  • why template code often lives in headers
  • why link errors happen even when code compiles
  • why build systems matter so much in large codebases

The Big Picture

flowchart LR
    A[Source files .cpp] --> B[Preprocessor]
    H[Header files .h .hpp] --> B
    B --> C[Compiler]
    C --> D[Object files .o]
    D --> E[Linker]
    L[Libraries] --> E
    E --> F[Executable or shared library]

Preprocessing

Before the compiler sees your program, the preprocessor handles directives such as #include, #define, #if, and include guards.

What this means internally:

  • #include is essentially textual inclusion
  • macros are expanded before real compilation begins
  • conditional compilation can remove or include chunks of code based on flags

That is why headers can feel deceptively simple. A header is not linked in as a separate unit. Its contents are copied into each translation unit that includes it.

Example:

// math_utils.h
int add(int a, int b);

// main.cpp
#include "math_utils.h"

The compiler effectively sees the declaration from the header pasted into main.cpp before actual parsing.

Compilation

The compiler parses the preprocessed source, checks types, builds intermediate representations, optimizes code, and emits object files.

A .cpp file plus all text included into it after preprocessing becomes a translation unit.

Practical consequence:

  • syntax errors, type errors, and many template errors are compilation-time issues
  • each translation unit is compiled independently
  • the compiler only knows what declarations are visible in that translation unit

Linking

The linker resolves symbol references across object files and libraries.

If you declare a function in a header but forget to provide the definition in a compiled source file, compilation may succeed while linking fails.

Example:

// declared
int compute();

// used
int main() {
    return compute();
}

If no compiled object file contains a matching definition of compute, the linker reports an unresolved symbol.

Practical Usage

This model matters constantly in real systems:

  • large codebases use headers to expose interfaces and source files to hide implementation
  • build time can explode if headers pull in too much code
  • libraries are distributed as headers plus compiled binaries or as header-only template libraries
  • ABI and symbol compatibility matter when separate teams ship shared libraries

Common Pitfalls

  • confusing compile errors with link errors
  • putting non-inline function definitions in headers and causing multiple definition errors
  • overusing macros when constants, constexpr, or templates would be safer
  • including large dependency trees in headers, which slows builds and increases coupling

Variables, Types, and Object Storage

Intuition

A variable in C++ is not “just a name.” It is usually a named object with a type, storage duration, alignment requirements, and a region of memory associated with it.

The type system tells both the compiler and the reader what operations are legal and how many bytes an object likely occupies.

What a Type Really Means

A C++ type typically determines:

  • size, though this can vary by platform
  • alignment requirements
  • how the value is interpreted in memory
  • what operations are available
  • construction and destruction behavior for user-defined types

Consider:

int count = 42;
double ratio = 0.5;
char flag = 'Y';

These values are all just bits in memory, but the type tells the compiler how to read and manipulate those bits.

Value vs Representation

One useful systems-level habit is to separate a value from its representation.

For example, an int stores a signed integer value, but underneath it is represented in binary with a platform-defined size, usually 32 bits on modern desktop/server platforms. A pointer stores an address value, but underneath it is also just bits.

This distinction matters when you debug memory corruption. The CPU does not know “this is a tree node” in some abstract sense. It only sees instructions and bytes. The meaning comes from your program's types and the compiler's generated code.

Storage Duration

Every object in C++ has a storage duration. At a practical level, that answers: when does this object come into existence, and when does its storage stop being valid?

The main categories are:

  • automatic storage duration: usually local variables created when a scope is entered
  • static storage duration: global variables and static locals that live for the life of the program
  • dynamic storage duration: objects created explicitly on the heap, typically with new or via allocators

Later, RAII and smart pointers will build directly on this idea.

Stack vs Heap

Intuition

Beginners often memorize “stack is fast, heap is slow.” That is too shallow and often misleading.

The real difference is about lifetime management and allocation strategy.

  • stack allocation is usually automatic and scoped
  • heap allocation is explicit or indirect and more flexible

Mental Model

flowchart TB
    A[Program starts] --> B[Call main]
    B --> C[Create stack frame for main]
    C --> D[Call function]
    D --> E[Create another stack frame]
    E --> F[Return from function]
    F --> G[Frame removed automatically]
    C --> H[Heap objects may outlive function scope]

Stack Allocation

Local variables inside a function usually live on the stack, though the exact implementation is up to the compiler and optimizer.

Example:

void process() {
    int retries = 3;
    double threshold = 0.75;
}

Why it exists:

  • function-local state is extremely common
  • scoped lifetimes are easy to manage automatically
  • creation and cleanup can often be handled without a general-purpose allocator

Internally, each function call usually gets a stack frame holding return information, saved registers, and local storage. When the function returns, that frame is popped.

Practical usage:

  • temporary computation state
  • small fixed-size objects
  • ownership that should never outlive the current scope

Pitfalls:

  • returning pointers or references to local variables
  • allocating very large arrays on the stack and causing stack overflow
  • assuming stack layout is fixed across compilers or optimization levels

Heap Allocation

Heap allocation is used when an object's lifetime must outlive a scope, when size is only known at runtime, or when ownership must be transferred across components.

Example:

int* value = new int(42);
delete value;

Internally, new usually asks an allocator for a chunk of dynamic memory, then constructs the object in that memory. delete destroys the object and releases the storage.

Practical usage:

  • dynamic data structures such as graphs or trees
  • objects shared across subsystems
  • buffers sized from runtime input

Pitfalls:

  • memory leaks from forgetting delete
  • double delete from freeing the same pointer twice
  • dangling pointers after deletion
  • heap fragmentation and allocator overhead in performance-sensitive systems

Important note: in modern C++, direct new and delete should be rare in application code. Prefer containers and smart pointers. You still need to understand heap behavior because the abstractions are built on top of it.

Pointers

Intuition

A pointer is a value whose job is to hold the address of another object. That is all. It is powerful because it lets you refer to memory indirectly.

Pointers exist because systems software constantly needs indirect access:

  • linked data structures
  • optional access to objects
  • efficient parameter passing without copying large objects
  • polymorphic behavior through base-class pointers
  • interaction with operating systems, hardware, and C APIs

Basic Form

int score = 99;
int* ptr = &score;

Here:

  • score is an int
  • &score means “address of score”
  • ptr stores that address
  • *ptr means “the int stored at that address”

Pointer Relationship Diagram

flowchart LR
    P[ptr] -->|stores address| S[score in memory]
    S --> V[99]

How It Works Internally

On a 64-bit system, a pointer is commonly 8 bytes. The compiler tracks the pointed-to type because pointer arithmetic and dereferencing depend on that type.

For example, incrementing an int* advances by sizeof(int) bytes, not by 1 byte.

int values[3] = {10, 20, 30};
int* p = values;
+p; // now points to values[1]

The compiler scales the increment according to the pointed-to type.

Practical Usage

  • traversal in low-level data structures
  • API boundaries that may accept nullable inputs
  • efficient manipulation of contiguous buffers
  • ownership and lifetime control in specialized libraries or allocators

Common Pitfalls

  • dereferencing nullptr
  • dereferencing uninitialized pointers
  • using a pointer after the object it points to has been destroyed
  • confusing ownership with access: a pointer can point to something without owning it

That last point is critical. A raw pointer does not tell you who is responsible for deleting the object.

References

Intuition

A reference is an alias to an existing object. It exists to make code safer and clearer than pointer-heavy interfaces when nullability and reseating are not needed.

Example:

void increment(int& value) {
    ++value;
}

Why References Exist

Without references, you would often pass pointers just to avoid copying objects. But pointers imply optionality and manual dereferencing.

References express a stronger contract:

  • this function expects a valid object
  • there is no need for null checks as part of normal usage
  • the alias should behave like the original object

Internal View

At the machine level, a reference is often implemented similarly to a pointer, but the language treats it differently.

Key properties:

  • must be initialized when created
  • cannot be reseated to refer to another object
  • usually cannot be null in well-formed code
  • use normal object syntax instead of pointer syntax
flowchart LR
    R[ref] -->|alias of| X[x]

Practical Usage

  • passing large objects efficiently without copying
  • operator overloading and fluent APIs
  • returning aliases to subobjects when lifetime is guaranteed

Pitfalls and Misconceptions

  • a reference is not an independent object with its own lifetime target management
  • returning a reference to a local variable is still invalid
  • “references are always safer than pointers” is too simplistic; pointers are the right tool when optionality, reseating, or explicit low-level behavior is required

Const Correctness

Intuition

const is one of the cheapest ways to make C++ code easier to reason about. It restricts mutation and therefore reduces the number of possible program states.

Practical Examples

void print(const std::string& name);

const int limit = 100;

Why it matters in real systems:

  • APIs become clearer about who is allowed to modify data
  • the compiler can catch accidental writes
  • reviewers can reason more quickly about ownership and side effects

Common Pitfalls

  • confusing const int* p with int* const p
  • using const inconsistently across interfaces
  • assuming const automatically implies thread safety or deep immutability

Arrays, Decay, and Basic Memory Layout

Intuition

C++ inherits much of C's memory model. Arrays are contiguous blocks of elements, which is why they are fast for indexed access and cache-friendly iteration.

int values[4] = {1, 2, 3, 4};

The elements are stored adjacent in memory. That contiguity is why pointer arithmetic and array indexing are closely related.

Under the Hood

values[i] is conceptually equivalent to *(values + i).

This is powerful, but it is also why out-of-bounds access is dangerous. C++ does not automatically check bounds for raw arrays.

Practical Usage

  • numerical buffers
  • serialization code
  • high-performance loops
  • interop with C libraries

Pitfalls

  • array-to-pointer decay in function parameters
  • buffer overflows
  • assuming stack arrays automatically know their size when passed to a function

In most application code, prefer std::array for fixed-size arrays and std::vector for dynamic arrays. You will still see raw arrays in systems code, embedded code, and performance-critical paths.

A Debugging Mental Model

Intuition

Low-level bugs in C++ often feel mysterious only when you lack a runtime model. Most of the time, they reduce to one of a few categories:

  • invalid lifetime
  • invalid memory access
  • wrong ownership
  • incorrect assumptions about object state
  • data races in concurrent code

A Useful Diagnostic Loop

When debugging a crash or corruption issue, ask these questions in order:

  1. What object was accessed?
  2. Was it initialized?
  3. Is its lifetime still valid?
  4. Who owns it?
  5. Could memory nearby have been overwritten?
  6. Is the failure deterministic or timing-dependent?

That checklist is more valuable than memorizing debugger buttons.

Common Failure Modes

Segmentation Faults

Usually caused by dereferencing an invalid address such as:

  • nullptr
  • a dangling pointer
  • a wild pointer from uninitialized memory

Use-After-Free

You delete an object, but some pointer or reference still points to the old address. The address may still look valid for a while, which makes this class of bug subtle.

Stack Corruption

Often caused by out-of-bounds writes into local arrays or incorrect pointer arithmetic.

Memory Leaks

The program keeps allocating memory without freeing it. In long-running services, that becomes a production issue rather than just a test annoyance.

Practical Tools

Real C++ debugging is easier when you use tooling, not just intuition:

  • compiler warnings: start with strict warnings enabled
  • AddressSanitizer: catches use-after-free, buffer overflows, and more
  • UndefinedBehaviorSanitizer: catches many invalid language-level operations
  • Valgrind on supported platforms: useful for leaks and invalid accesses
  • debugger: inspect stack frames, variables, and memory addresses

Example build flags on Clang or GCC for local debugging:

-Wall -Wextra -Wpedantic -fsanitize=address,undefined -g

Misconception to Avoid

“If it only crashes sometimes, the code is almost correct.”

In C++, nondeterministic behavior is often a sign of undefined behavior, not a minor bug. Once you have UB, the optimizer and runtime can produce very different outcomes from one build or machine to another.

Foundation Patterns That Matter Later

Several later C++ ideas are really lifetime-management patterns built on the concepts above:

  • constructors and destructors manage object setup and cleanup
  • RAII ties resource lifetime to scope lifetime
  • smart pointers model ownership on top of heap allocation
  • containers hide raw memory management while preserving performance properties
  • concurrency primitives rely on precise reasoning about storage and object lifetime

If you can already picture stack frames, heap allocation, pointer indirection, and the compile-link pipeline, you are ready for object-oriented and modern C++ design.

Interview Checkpoints

You should be able to explain these clearly in an interview without hiding behind buzzwords:

  • the difference between compilation and linking
  • why headers can increase build time and coupling
  • what stack and heap allocation really mean in terms of lifetime
  • the difference between a pointer and a reference
  • what causes dangling pointers and use-after-free bugs
  • why const improves API design and reasoning

What Comes Next

The next file builds on these memory and lifetime foundations to explain classes, constructors, destructors, inheritance, and polymorphism. The key shift is this: C++ object-oriented features are not separate from the memory model. They are layered on top of it.