# File 1: Foundations of C++ ## Learning Goals By the end of this file, you should be able to: - explain how C++ source code becomes a running executable - reason about basic types, object storage, and memory layout - distinguish stack allocation from heap allocation in practical terms - use pointers and references without treating them as magic syntax - debug common low-level failures with a structured mental model This file is the foundation for the rest of the guide. If later topics like RAII, smart pointers, iterators, or multithreading feel abstract, come back here first. C++ becomes much easier once you can picture what the compiler produces and what memory actually looks like at runtime. ## Why C++ Exists C++ sits in an unusual position among mainstream languages. It gives you high-level abstractions such as classes, templates, exceptions, and a rich standard library, but it still lets you work close to the machine. That combination is why C++ shows up in places where both abstraction and control matter: - game engines that need tight performance and custom memory behavior - trading systems that care about latency and predictable execution - databases, compilers, browsers, and storage engines that manipulate large amounts of structured data - embedded and systems code where resource use must be explicit The core idea is not just “fast language.” Many languages are fast in some contexts. C++ is valuable because it lets you choose where to pay for abstraction and where to avoid it. ## The Compilation Model ### Intuition In Python or JavaScript, you can often treat “running the code” as a direct action. In C++, there is a build pipeline between the source you write and the machine code the CPU executes. Understanding that pipeline helps explain many common C++ issues: - why header files exist - why template code often lives in headers - why link errors happen even when code compiles - why build systems matter so much in large codebases ### The Big Picture ```mermaid flowchart LR A[Source files .cpp] --> B[Preprocessor] H[Header files .h .hpp] --> B B --> C[Compiler] C --> D[Object files .o] D --> E[Linker] L[Libraries] --> E E --> F[Executable or shared library] ``` ### Preprocessing Before the compiler sees your program, the preprocessor handles directives such as `#include`, `#define`, `#if`, and include guards. What this means internally: - `#include` is essentially textual inclusion - macros are expanded before real compilation begins - conditional compilation can remove or include chunks of code based on flags That is why headers can feel deceptively simple. A header is not linked in as a separate unit. Its contents are copied into each translation unit that includes it. Example: ```cpp // math_utils.h int add(int a, int b); // main.cpp #include "math_utils.h" ``` The compiler effectively sees the declaration from the header pasted into `main.cpp` before actual parsing. ### Compilation The compiler parses the preprocessed source, checks types, builds intermediate representations, optimizes code, and emits object files. A `.cpp` file plus all text included into it after preprocessing becomes a translation unit. Practical consequence: - syntax errors, type errors, and many template errors are compilation-time issues - each translation unit is compiled independently - the compiler only knows what declarations are visible in that translation unit ### Linking The linker resolves symbol references across object files and libraries. If you declare a function in a header but forget to provide the definition in a compiled source file, compilation may succeed while linking fails. Example: ```cpp // declared int compute(); // used int main() { return compute(); } ``` If no compiled object file contains a matching definition of `compute`, the linker reports an unresolved symbol. ### Practical Usage This model matters constantly in real systems: - large codebases use headers to expose interfaces and source files to hide implementation - build time can explode if headers pull in too much code - libraries are distributed as headers plus compiled binaries or as header-only template libraries - ABI and symbol compatibility matter when separate teams ship shared libraries ### Common Pitfalls - confusing compile errors with link errors - putting non-inline function definitions in headers and causing multiple definition errors - overusing macros when constants, `constexpr`, or templates would be safer - including large dependency trees in headers, which slows builds and increases coupling ## Variables, Types, and Object Storage ### Intuition A variable in C++ is not “just a name.” It is usually a named object with a type, storage duration, alignment requirements, and a region of memory associated with it. The type system tells both the compiler and the reader what operations are legal and how many bytes an object likely occupies. ### What a Type Really Means A C++ type typically determines: - size, though this can vary by platform - alignment requirements - how the value is interpreted in memory - what operations are available - construction and destruction behavior for user-defined types Consider: ```cpp int count = 42; double ratio = 0.5; char flag = 'Y'; ``` These values are all just bits in memory, but the type tells the compiler how to read and manipulate those bits. ### Value vs Representation One useful systems-level habit is to separate a value from its representation. For example, an `int` stores a signed integer value, but underneath it is represented in binary with a platform-defined size, usually 32 bits on modern desktop/server platforms. A pointer stores an address value, but underneath it is also just bits. This distinction matters when you debug memory corruption. The CPU does not know “this is a tree node” in some abstract sense. It only sees instructions and bytes. The meaning comes from your program's types and the compiler's generated code. ### Storage Duration Every object in C++ has a storage duration. At a practical level, that answers: when does this object come into existence, and when does its storage stop being valid? The main categories are: - automatic storage duration: usually local variables created when a scope is entered - static storage duration: global variables and `static` locals that live for the life of the program - dynamic storage duration: objects created explicitly on the heap, typically with `new` or via allocators Later, RAII and smart pointers will build directly on this idea. ## Stack vs Heap ### Intuition Beginners often memorize “stack is fast, heap is slow.” That is too shallow and often misleading. The real difference is about lifetime management and allocation strategy. - stack allocation is usually automatic and scoped - heap allocation is explicit or indirect and more flexible ### Mental Model ```mermaid flowchart TB A[Program starts] --> B[Call main] B --> C[Create stack frame for main] C --> D[Call function] D --> E[Create another stack frame] E --> F[Return from function] F --> G[Frame removed automatically] C --> H[Heap objects may outlive function scope] ``` ### Stack Allocation Local variables inside a function usually live on the stack, though the exact implementation is up to the compiler and optimizer. Example: ```cpp void process() { int retries = 3; double threshold = 0.75; } ``` Why it exists: - function-local state is extremely common - scoped lifetimes are easy to manage automatically - creation and cleanup can often be handled without a general-purpose allocator Internally, each function call usually gets a stack frame holding return information, saved registers, and local storage. When the function returns, that frame is popped. Practical usage: - temporary computation state - small fixed-size objects - ownership that should never outlive the current scope Pitfalls: - returning pointers or references to local variables - allocating very large arrays on the stack and causing stack overflow - assuming stack layout is fixed across compilers or optimization levels ### Heap Allocation Heap allocation is used when an object's lifetime must outlive a scope, when size is only known at runtime, or when ownership must be transferred across components. Example: ```cpp int* value = new int(42); delete value; ``` Internally, `new` usually asks an allocator for a chunk of dynamic memory, then constructs the object in that memory. `delete` destroys the object and releases the storage. Practical usage: - dynamic data structures such as graphs or trees - objects shared across subsystems - buffers sized from runtime input Pitfalls: - memory leaks from forgetting `delete` - double delete from freeing the same pointer twice - dangling pointers after deletion - heap fragmentation and allocator overhead in performance-sensitive systems Important note: in modern C++, direct `new` and `delete` should be rare in application code. Prefer containers and smart pointers. You still need to understand heap behavior because the abstractions are built on top of it. ## Pointers ### Intuition A pointer is a value whose job is to hold the address of another object. That is all. It is powerful because it lets you refer to memory indirectly. Pointers exist because systems software constantly needs indirect access: - linked data structures - optional access to objects - efficient parameter passing without copying large objects - polymorphic behavior through base-class pointers - interaction with operating systems, hardware, and C APIs ### Basic Form ```cpp int score = 99; int* ptr = &score; ``` Here: - `score` is an `int` - `&score` means “address of score” - `ptr` stores that address - `*ptr` means “the int stored at that address” ### Pointer Relationship Diagram ```mermaid flowchart LR P[ptr] -->|stores address| S[score in memory] S --> V[99] ``` ### How It Works Internally On a 64-bit system, a pointer is commonly 8 bytes. The compiler tracks the pointed-to type because pointer arithmetic and dereferencing depend on that type. For example, incrementing an `int*` advances by `sizeof(int)` bytes, not by 1 byte. ```cpp int values[3] = {10, 20, 30}; int* p = values; +p; // now points to values[1] ``` The compiler scales the increment according to the pointed-to type. ### Practical Usage - traversal in low-level data structures - API boundaries that may accept nullable inputs - efficient manipulation of contiguous buffers - ownership and lifetime control in specialized libraries or allocators ### Common Pitfalls - dereferencing `nullptr` - dereferencing uninitialized pointers - using a pointer after the object it points to has been destroyed - confusing ownership with access: a pointer can point to something without owning it That last point is critical. A raw pointer does not tell you who is responsible for deleting the object. ## References ### Intuition A reference is an alias to an existing object. It exists to make code safer and clearer than pointer-heavy interfaces when nullability and reseating are not needed. Example: ```cpp void increment(int& value) { ++value; } ``` ### Why References Exist Without references, you would often pass pointers just to avoid copying objects. But pointers imply optionality and manual dereferencing. References express a stronger contract: - this function expects a valid object - there is no need for null checks as part of normal usage - the alias should behave like the original object ### Internal View At the machine level, a reference is often implemented similarly to a pointer, but the language treats it differently. Key properties: - must be initialized when created - cannot be reseated to refer to another object - usually cannot be null in well-formed code - use normal object syntax instead of pointer syntax ```mermaid flowchart LR R[ref] -->|alias of| X[x] ``` ### Practical Usage - passing large objects efficiently without copying - operator overloading and fluent APIs - returning aliases to subobjects when lifetime is guaranteed ### Pitfalls and Misconceptions - a reference is not an independent object with its own lifetime target management - returning a reference to a local variable is still invalid - “references are always safer than pointers” is too simplistic; pointers are the right tool when optionality, reseating, or explicit low-level behavior is required ## Const Correctness ### Intuition `const` is one of the cheapest ways to make C++ code easier to reason about. It restricts mutation and therefore reduces the number of possible program states. ### Practical Examples ```cpp void print(const std::string& name); const int limit = 100; ``` Why it matters in real systems: - APIs become clearer about who is allowed to modify data - the compiler can catch accidental writes - reviewers can reason more quickly about ownership and side effects ### Common Pitfalls - confusing `const int* p` with `int* const p` - using `const` inconsistently across interfaces - assuming `const` automatically implies thread safety or deep immutability ## Arrays, Decay, and Basic Memory Layout ### Intuition C++ inherits much of C's memory model. Arrays are contiguous blocks of elements, which is why they are fast for indexed access and cache-friendly iteration. ```cpp int values[4] = {1, 2, 3, 4}; ``` The elements are stored adjacent in memory. That contiguity is why pointer arithmetic and array indexing are closely related. ### Under the Hood `values[i]` is conceptually equivalent to `*(values + i)`. This is powerful, but it is also why out-of-bounds access is dangerous. C++ does not automatically check bounds for raw arrays. ### Practical Usage - numerical buffers - serialization code - high-performance loops - interop with C libraries ### Pitfalls - array-to-pointer decay in function parameters - buffer overflows - assuming stack arrays automatically know their size when passed to a function In most application code, prefer `std::array` for fixed-size arrays and `std::vector` for dynamic arrays. You will still see raw arrays in systems code, embedded code, and performance-critical paths. ## A Debugging Mental Model ### Intuition Low-level bugs in C++ often feel mysterious only when you lack a runtime model. Most of the time, they reduce to one of a few categories: - invalid lifetime - invalid memory access - wrong ownership - incorrect assumptions about object state - data races in concurrent code ### A Useful Diagnostic Loop When debugging a crash or corruption issue, ask these questions in order: 1. What object was accessed? 2. Was it initialized? 3. Is its lifetime still valid? 4. Who owns it? 5. Could memory nearby have been overwritten? 6. Is the failure deterministic or timing-dependent? That checklist is more valuable than memorizing debugger buttons. ### Common Failure Modes #### Segmentation Faults Usually caused by dereferencing an invalid address such as: - `nullptr` - a dangling pointer - a wild pointer from uninitialized memory #### Use-After-Free You delete an object, but some pointer or reference still points to the old address. The address may still look valid for a while, which makes this class of bug subtle. #### Stack Corruption Often caused by out-of-bounds writes into local arrays or incorrect pointer arithmetic. #### Memory Leaks The program keeps allocating memory without freeing it. In long-running services, that becomes a production issue rather than just a test annoyance. ### Practical Tools Real C++ debugging is easier when you use tooling, not just intuition: - compiler warnings: start with strict warnings enabled - AddressSanitizer: catches use-after-free, buffer overflows, and more - UndefinedBehaviorSanitizer: catches many invalid language-level operations - Valgrind on supported platforms: useful for leaks and invalid accesses - debugger: inspect stack frames, variables, and memory addresses Example build flags on Clang or GCC for local debugging: ```bash -Wall -Wextra -Wpedantic -fsanitize=address,undefined -g ``` ### Misconception to Avoid “If it only crashes sometimes, the code is almost correct.” In C++, nondeterministic behavior is often a sign of undefined behavior, not a minor bug. Once you have UB, the optimizer and runtime can produce very different outcomes from one build or machine to another. ## Foundation Patterns That Matter Later Several later C++ ideas are really lifetime-management patterns built on the concepts above: - constructors and destructors manage object setup and cleanup - RAII ties resource lifetime to scope lifetime - smart pointers model ownership on top of heap allocation - containers hide raw memory management while preserving performance properties - concurrency primitives rely on precise reasoning about storage and object lifetime If you can already picture stack frames, heap allocation, pointer indirection, and the compile-link pipeline, you are ready for object-oriented and modern C++ design. ## Interview Checkpoints You should be able to explain these clearly in an interview without hiding behind buzzwords: - the difference between compilation and linking - why headers can increase build time and coupling - what stack and heap allocation really mean in terms of lifetime - the difference between a pointer and a reference - what causes dangling pointers and use-after-free bugs - why `const` improves API design and reasoning ## What Comes Next The next file builds on these memory and lifetime foundations to explain classes, constructors, destructors, inheritance, and polymorphism. The key shift is this: C++ object-oriented features are not separate from the memory model. They are layered on top of it.