more text

2026-04-26 14:09:04 -04:00
parent 26810e43d0
commit be31df2d44
22 changed files with 10664 additions and 0 deletions
@@ -0,0 +1,528 @@
+# File 1: Foundations of C++
+
+## Learning Goals
+
+By the end of this file, you should be able to:
+
+- explain how C++ source code becomes a running executable
+- reason about basic types, object storage, and memory layout
+- distinguish stack allocation from heap allocation in practical terms
+- use pointers and references without treating them as magic syntax
+- debug common low-level failures with a structured mental model
+
+This file is the foundation for the rest of the guide. If later topics like RAII, smart pointers, iterators, or multithreading feel abstract, come back here first. C++ becomes much easier once you can picture what the compiler produces and what memory actually looks like at runtime.
+
+## Why C++ Exists
+
+C++ sits in an unusual position among mainstream languages. It gives you high-level abstractions such as classes, templates, exceptions, and a rich standard library, but it still lets you work close to the machine.
+
+That combination is why C++ shows up in places where both abstraction and control matter:
+
+- game engines that need tight performance and custom memory behavior
+- trading systems that care about latency and predictable execution
+- databases, compilers, browsers, and storage engines that manipulate large amounts of structured data
+- embedded and systems code where resource use must be explicit
+
+The core idea is not just “fast language.” Many languages are fast in some contexts. C++ is valuable because it lets you choose where to pay for abstraction and where to avoid it.
+
+## The Compilation Model
+
+### Intuition
+
+In Python or JavaScript, you can often treat “running the code” as a direct action. In C++, there is a build pipeline between the source you write and the machine code the CPU executes. Understanding that pipeline helps explain many common C++ issues:
+
+- why header files exist
+- why template code often lives in headers
+- why link errors happen even when code compiles
+- why build systems matter so much in large codebases
+
+### The Big Picture
+
+```mermaid
+flowchart LR
+    A[Source files .cpp] --> B[Preprocessor]
+    H[Header files .h .hpp] --> B
+    B --> C[Compiler]
+    C --> D[Object files .o]
+    D --> E[Linker]
+    L[Libraries] --> E
+    E --> F[Executable or shared library]
+```
+
+### Preprocessing
+
+Before the compiler sees your program, the preprocessor handles directives such as `#include`, `#define`, `#if`, and include guards.
+
+What this means internally:
+
+- `#include` is essentially textual inclusion
+- macros are expanded before real compilation begins
+- conditional compilation can remove or include chunks of code based on flags
+
+That is why headers can feel deceptively simple. A header is not linked in as a separate unit. Its contents are copied into each translation unit that includes it.
+
+Example:
+
+```cpp
+// math_utils.h
+int add(int a, int b);
+
+// main.cpp
+#include "math_utils.h"
+```
+
+The compiler effectively sees the declaration from the header pasted into `main.cpp` before actual parsing.
+
+### Compilation
+
+The compiler parses the preprocessed source, checks types, builds intermediate representations, optimizes code, and emits object files.
+
+A `.cpp` file plus all text included into it after preprocessing becomes a translation unit.
+
+Practical consequence:
+
+- syntax errors, type errors, and many template errors are compilation-time issues
+- each translation unit is compiled independently
+- the compiler only knows what declarations are visible in that translation unit
+
+### Linking
+
+The linker resolves symbol references across object files and libraries.
+
+If you declare a function in a header but forget to provide the definition in a compiled source file, compilation may succeed while linking fails.
+
+Example:
+
+```cpp
+// declared
+int compute();
+
+// used
+int main() {
+    return compute();
+}
+```
+
+If no compiled object file contains a matching definition of `compute`, the linker reports an unresolved symbol.
+
+### Practical Usage
+
+This model matters constantly in real systems:
+
+- large codebases use headers to expose interfaces and source files to hide implementation
+- build time can explode if headers pull in too much code
+- libraries are distributed as headers plus compiled binaries or as header-only template libraries
+- ABI and symbol compatibility matter when separate teams ship shared libraries
+
+### Common Pitfalls
+
+- confusing compile errors with link errors
+- putting non-inline function definitions in headers and causing multiple definition errors
+- overusing macros when constants, `constexpr`, or templates would be safer
+- including large dependency trees in headers, which slows builds and increases coupling
+
+## Variables, Types, and Object Storage
+
+### Intuition
+
+A variable in C++ is not “just a name.” It is usually a named object with a type, storage duration, alignment requirements, and a region of memory associated with it.
+
+The type system tells both the compiler and the reader what operations are legal and how many bytes an object likely occupies.
+
+### What a Type Really Means
+
+A C++ type typically determines:
+
+- size, though this can vary by platform
+- alignment requirements
+- how the value is interpreted in memory
+- what operations are available
+- construction and destruction behavior for user-defined types
+
+Consider:
+
+```cpp
+int count = 42;
+double ratio = 0.5;
+char flag = 'Y';
+```
+
+These values are all just bits in memory, but the type tells the compiler how to read and manipulate those bits.
+
+### Value vs Representation
+
+One useful systems-level habit is to separate a value from its representation.
+
+For example, an `int` stores a signed integer value, but underneath it is represented in binary with a platform-defined size, usually 32 bits on modern desktop/server platforms. A pointer stores an address value, but underneath it is also just bits.
+
+This distinction matters when you debug memory corruption. The CPU does not know “this is a tree node” in some abstract sense. It only sees instructions and bytes. The meaning comes from your program's types and the compiler's generated code.
+
+### Storage Duration
+
+Every object in C++ has a storage duration. At a practical level, that answers: when does this object come into existence, and when does its storage stop being valid?
+
+The main categories are:
+
+- automatic storage duration: usually local variables created when a scope is entered
+- static storage duration: global variables and `static` locals that live for the life of the program
+- dynamic storage duration: objects created explicitly on the heap, typically with `new` or via allocators
+
+Later, RAII and smart pointers will build directly on this idea.
+
+## Stack vs Heap
+
+### Intuition
+
+Beginners often memorize “stack is fast, heap is slow.” That is too shallow and often misleading.
+
+The real difference is about lifetime management and allocation strategy.
+
+- stack allocation is usually automatic and scoped
+- heap allocation is explicit or indirect and more flexible
+
+### Mental Model
+
+```mermaid
+flowchart TB
+    A[Program starts] --> B[Call main]
+    B --> C[Create stack frame for main]
+    C --> D[Call function]
+    D --> E[Create another stack frame]
+    E --> F[Return from function]
+    F --> G[Frame removed automatically]
+    C --> H[Heap objects may outlive function scope]
+```
+
+### Stack Allocation
+
+Local variables inside a function usually live on the stack, though the exact implementation is up to the compiler and optimizer.
+
+Example:
+
+```cpp
+void process() {
+    int retries = 3;
+    double threshold = 0.75;
+}
+```
+
+Why it exists:
+
+- function-local state is extremely common
+- scoped lifetimes are easy to manage automatically
+- creation and cleanup can often be handled without a general-purpose allocator
+
+Internally, each function call usually gets a stack frame holding return information, saved registers, and local storage. When the function returns, that frame is popped.
+
+Practical usage:
+
+- temporary computation state
+- small fixed-size objects
+- ownership that should never outlive the current scope
+
+Pitfalls:
+
+- returning pointers or references to local variables
+- allocating very large arrays on the stack and causing stack overflow
+- assuming stack layout is fixed across compilers or optimization levels
+
+### Heap Allocation
+
+Heap allocation is used when an object's lifetime must outlive a scope, when size is only known at runtime, or when ownership must be transferred across components.
+
+Example:
+
+```cpp
+int* value = new int(42);
+delete value;
+```
+
+Internally, `new` usually asks an allocator for a chunk of dynamic memory, then constructs the object in that memory. `delete` destroys the object and releases the storage.
+
+Practical usage:
+
+- dynamic data structures such as graphs or trees
+- objects shared across subsystems
+- buffers sized from runtime input
+
+Pitfalls:
+
+- memory leaks from forgetting `delete`
+- double delete from freeing the same pointer twice
+- dangling pointers after deletion
+- heap fragmentation and allocator overhead in performance-sensitive systems
+
+Important note: in modern C++, direct `new` and `delete` should be rare in application code. Prefer containers and smart pointers. You still need to understand heap behavior because the abstractions are built on top of it.
+
+## Pointers
+
+### Intuition
+
+A pointer is a value whose job is to hold the address of another object. That is all. It is powerful because it lets you refer to memory indirectly.
+
+Pointers exist because systems software constantly needs indirect access:
+
+- linked data structures
+- optional access to objects
+- efficient parameter passing without copying large objects
+- polymorphic behavior through base-class pointers
+- interaction with operating systems, hardware, and C APIs
+
+### Basic Form
+
+```cpp
+int score = 99;
+int* ptr = &score;
+```
+
+Here:
+
+- `score` is an `int`
+- `&score` means “address of score”
+- `ptr` stores that address
+- `*ptr` means “the int stored at that address”
+
+### Pointer Relationship Diagram
+
+```mermaid
+flowchart LR
+    P[ptr] -->|stores address| S[score in memory]
+    S --> V[99]
+```
+
+### How It Works Internally
+
+On a 64-bit system, a pointer is commonly 8 bytes. The compiler tracks the pointed-to type because pointer arithmetic and dereferencing depend on that type.
+
+For example, incrementing an `int*` advances by `sizeof(int)` bytes, not by 1 byte.
+
+```cpp
+int values[3] = {10, 20, 30};
+int* p = values;
+p; // now points to values[1]
+```
+
+The compiler scales the increment according to the pointed-to type.
+
+### Practical Usage
+
+- traversal in low-level data structures
+- API boundaries that may accept nullable inputs
+- efficient manipulation of contiguous buffers
+- ownership and lifetime control in specialized libraries or allocators
+
+### Common Pitfalls
+
+- dereferencing `nullptr`
+- dereferencing uninitialized pointers
+- using a pointer after the object it points to has been destroyed
+- confusing ownership with access: a pointer can point to something without owning it
+
+That last point is critical. A raw pointer does not tell you who is responsible for deleting the object.
+
+## References
+
+### Intuition
+
+A reference is an alias to an existing object. It exists to make code safer and clearer than pointer-heavy interfaces when nullability and reseating are not needed.
+
+Example:
+
+```cpp
+void increment(int& value) {
+    ++value;
+}
+```
+
+### Why References Exist
+
+Without references, you would often pass pointers just to avoid copying objects. But pointers imply optionality and manual dereferencing.
+
+References express a stronger contract:
+
+- this function expects a valid object
+- there is no need for null checks as part of normal usage
+- the alias should behave like the original object
+
+### Internal View
+
+At the machine level, a reference is often implemented similarly to a pointer, but the language treats it differently.
+
+Key properties:
+
+- must be initialized when created
+- cannot be reseated to refer to another object
+- usually cannot be null in well-formed code
+- use normal object syntax instead of pointer syntax
+
+```mermaid
+flowchart LR
+    R[ref] -->|alias of| X[x]
+```
+
+### Practical Usage
+
+- passing large objects efficiently without copying
+- operator overloading and fluent APIs
+- returning aliases to subobjects when lifetime is guaranteed
+
+### Pitfalls and Misconceptions
+
+- a reference is not an independent object with its own lifetime target management
+- returning a reference to a local variable is still invalid
+- “references are always safer than pointers” is too simplistic; pointers are the right tool when optionality, reseating, or explicit low-level behavior is required
+
+## Const Correctness
+
+### Intuition
+
+`const` is one of the cheapest ways to make C++ code easier to reason about. It restricts mutation and therefore reduces the number of possible program states.
+
+### Practical Examples
+
+```cpp
+void print(const std::string& name);
+
+const int limit = 100;
+```
+
+Why it matters in real systems:
+
+- APIs become clearer about who is allowed to modify data
+- the compiler can catch accidental writes
+- reviewers can reason more quickly about ownership and side effects
+
+### Common Pitfalls
+
+- confusing `const int* p` with `int* const p`
+- using `const` inconsistently across interfaces
+- assuming `const` automatically implies thread safety or deep immutability
+
+## Arrays, Decay, and Basic Memory Layout
+
+### Intuition
+
+C++ inherits much of C's memory model. Arrays are contiguous blocks of elements, which is why they are fast for indexed access and cache-friendly iteration.
+
+```cpp
+int values[4] = {1, 2, 3, 4};
+```
+
+The elements are stored adjacent in memory. That contiguity is why pointer arithmetic and array indexing are closely related.
+
+### Under the Hood
+
+`values[i]` is conceptually equivalent to `*(values + i)`.
+
+This is powerful, but it is also why out-of-bounds access is dangerous. C++ does not automatically check bounds for raw arrays.
+
+### Practical Usage
+
+- numerical buffers
+- serialization code
+- high-performance loops
+- interop with C libraries
+
+### Pitfalls
+
+- array-to-pointer decay in function parameters
+- buffer overflows
+- assuming stack arrays automatically know their size when passed to a function
+
+In most application code, prefer `std::array` for fixed-size arrays and `std::vector` for dynamic arrays. You will still see raw arrays in systems code, embedded code, and performance-critical paths.
+
+## A Debugging Mental Model
+
+### Intuition
+
+Low-level bugs in C++ often feel mysterious only when you lack a runtime model. Most of the time, they reduce to one of a few categories:
+
+- invalid lifetime
+- invalid memory access
+- wrong ownership
+- incorrect assumptions about object state
+- data races in concurrent code
+
+### A Useful Diagnostic Loop
+
+When debugging a crash or corruption issue, ask these questions in order:
+
+1. What object was accessed?
+2. Was it initialized?
+3. Is its lifetime still valid?
+4. Who owns it?
+5. Could memory nearby have been overwritten?
+6. Is the failure deterministic or timing-dependent?
+
+That checklist is more valuable than memorizing debugger buttons.
+
+### Common Failure Modes
+
+#### Segmentation Faults
+
+Usually caused by dereferencing an invalid address such as:
+
+- `nullptr`
+- a dangling pointer
+- a wild pointer from uninitialized memory
+
+#### Use-After-Free
+
+You delete an object, but some pointer or reference still points to the old address. The address may still look valid for a while, which makes this class of bug subtle.
+
+#### Stack Corruption
+
+Often caused by out-of-bounds writes into local arrays or incorrect pointer arithmetic.
+
+#### Memory Leaks
+
+The program keeps allocating memory without freeing it. In long-running services, that becomes a production issue rather than just a test annoyance.
+
+### Practical Tools
+
+Real C++ debugging is easier when you use tooling, not just intuition:
+
+- compiler warnings: start with strict warnings enabled
+- AddressSanitizer: catches use-after-free, buffer overflows, and more
+- UndefinedBehaviorSanitizer: catches many invalid language-level operations
+- Valgrind on supported platforms: useful for leaks and invalid accesses
+- debugger: inspect stack frames, variables, and memory addresses
+
+Example build flags on Clang or GCC for local debugging:
+
+```bash
+-Wall -Wextra -Wpedantic -fsanitize=address,undefined -g
+```
+
+### Misconception to Avoid
+
+“If it only crashes sometimes, the code is almost correct.”
+
+In C++, nondeterministic behavior is often a sign of undefined behavior, not a minor bug. Once you have UB, the optimizer and runtime can produce very different outcomes from one build or machine to another.
+
+## Foundation Patterns That Matter Later
+
+Several later C++ ideas are really lifetime-management patterns built on the concepts above:
+
+- constructors and destructors manage object setup and cleanup
+- RAII ties resource lifetime to scope lifetime
+- smart pointers model ownership on top of heap allocation
+- containers hide raw memory management while preserving performance properties
+- concurrency primitives rely on precise reasoning about storage and object lifetime
+
+If you can already picture stack frames, heap allocation, pointer indirection, and the compile-link pipeline, you are ready for object-oriented and modern C++ design.
+
+## Interview Checkpoints
+
+You should be able to explain these clearly in an interview without hiding behind buzzwords:
+
+- the difference between compilation and linking
+- why headers can increase build time and coupling
+- what stack and heap allocation really mean in terms of lifetime
+- the difference between a pointer and a reference
+- what causes dangling pointers and use-after-free bugs
+- why `const` improves API design and reasoning
+
+## What Comes Next
+
+The next file builds on these memory and lifetime foundations to explain classes, constructors, destructors, inheritance, and polymorphism. The key shift is this: C++ object-oriented features are not separate from the memory model. They are layered on top of it.