more text
This commit is contained in:
@@ -0,0 +1,528 @@
|
||||
# File 1: Foundations of C++
|
||||
|
||||
## Learning Goals
|
||||
|
||||
By the end of this file, you should be able to:
|
||||
|
||||
- explain how C++ source code becomes a running executable
|
||||
- reason about basic types, object storage, and memory layout
|
||||
- distinguish stack allocation from heap allocation in practical terms
|
||||
- use pointers and references without treating them as magic syntax
|
||||
- debug common low-level failures with a structured mental model
|
||||
|
||||
This file is the foundation for the rest of the guide. If later topics like RAII, smart pointers, iterators, or multithreading feel abstract, come back here first. C++ becomes much easier once you can picture what the compiler produces and what memory actually looks like at runtime.
|
||||
|
||||
## Why C++ Exists
|
||||
|
||||
C++ sits in an unusual position among mainstream languages. It gives you high-level abstractions such as classes, templates, exceptions, and a rich standard library, but it still lets you work close to the machine.
|
||||
|
||||
That combination is why C++ shows up in places where both abstraction and control matter:
|
||||
|
||||
- game engines that need tight performance and custom memory behavior
|
||||
- trading systems that care about latency and predictable execution
|
||||
- databases, compilers, browsers, and storage engines that manipulate large amounts of structured data
|
||||
- embedded and systems code where resource use must be explicit
|
||||
|
||||
The core idea is not just “fast language.” Many languages are fast in some contexts. C++ is valuable because it lets you choose where to pay for abstraction and where to avoid it.
|
||||
|
||||
## The Compilation Model
|
||||
|
||||
### Intuition
|
||||
|
||||
In Python or JavaScript, you can often treat “running the code” as a direct action. In C++, there is a build pipeline between the source you write and the machine code the CPU executes. Understanding that pipeline helps explain many common C++ issues:
|
||||
|
||||
- why header files exist
|
||||
- why template code often lives in headers
|
||||
- why link errors happen even when code compiles
|
||||
- why build systems matter so much in large codebases
|
||||
|
||||
### The Big Picture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[Source files .cpp] --> B[Preprocessor]
|
||||
H[Header files .h .hpp] --> B
|
||||
B --> C[Compiler]
|
||||
C --> D[Object files .o]
|
||||
D --> E[Linker]
|
||||
L[Libraries] --> E
|
||||
E --> F[Executable or shared library]
|
||||
```
|
||||
|
||||
### Preprocessing
|
||||
|
||||
Before the compiler sees your program, the preprocessor handles directives such as `#include`, `#define`, `#if`, and include guards.
|
||||
|
||||
What this means internally:
|
||||
|
||||
- `#include` is essentially textual inclusion
|
||||
- macros are expanded before real compilation begins
|
||||
- conditional compilation can remove or include chunks of code based on flags
|
||||
|
||||
That is why headers can feel deceptively simple. A header is not linked in as a separate unit. Its contents are copied into each translation unit that includes it.
|
||||
|
||||
Example:
|
||||
|
||||
```cpp
|
||||
// math_utils.h
|
||||
int add(int a, int b);
|
||||
|
||||
// main.cpp
|
||||
#include "math_utils.h"
|
||||
```
|
||||
|
||||
The compiler effectively sees the declaration from the header pasted into `main.cpp` before actual parsing.
|
||||
|
||||
### Compilation
|
||||
|
||||
The compiler parses the preprocessed source, checks types, builds intermediate representations, optimizes code, and emits object files.
|
||||
|
||||
A `.cpp` file plus all text included into it after preprocessing becomes a translation unit.
|
||||
|
||||
Practical consequence:
|
||||
|
||||
- syntax errors, type errors, and many template errors are compilation-time issues
|
||||
- each translation unit is compiled independently
|
||||
- the compiler only knows what declarations are visible in that translation unit
|
||||
|
||||
### Linking
|
||||
|
||||
The linker resolves symbol references across object files and libraries.
|
||||
|
||||
If you declare a function in a header but forget to provide the definition in a compiled source file, compilation may succeed while linking fails.
|
||||
|
||||
Example:
|
||||
|
||||
```cpp
|
||||
// declared
|
||||
int compute();
|
||||
|
||||
// used
|
||||
int main() {
|
||||
return compute();
|
||||
}
|
||||
```
|
||||
|
||||
If no compiled object file contains a matching definition of `compute`, the linker reports an unresolved symbol.
|
||||
|
||||
### Practical Usage
|
||||
|
||||
This model matters constantly in real systems:
|
||||
|
||||
- large codebases use headers to expose interfaces and source files to hide implementation
|
||||
- build time can explode if headers pull in too much code
|
||||
- libraries are distributed as headers plus compiled binaries or as header-only template libraries
|
||||
- ABI and symbol compatibility matter when separate teams ship shared libraries
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
- confusing compile errors with link errors
|
||||
- putting non-inline function definitions in headers and causing multiple definition errors
|
||||
- overusing macros when constants, `constexpr`, or templates would be safer
|
||||
- including large dependency trees in headers, which slows builds and increases coupling
|
||||
|
||||
## Variables, Types, and Object Storage
|
||||
|
||||
### Intuition
|
||||
|
||||
A variable in C++ is not “just a name.” It is usually a named object with a type, storage duration, alignment requirements, and a region of memory associated with it.
|
||||
|
||||
The type system tells both the compiler and the reader what operations are legal and how many bytes an object likely occupies.
|
||||
|
||||
### What a Type Really Means
|
||||
|
||||
A C++ type typically determines:
|
||||
|
||||
- size, though this can vary by platform
|
||||
- alignment requirements
|
||||
- how the value is interpreted in memory
|
||||
- what operations are available
|
||||
- construction and destruction behavior for user-defined types
|
||||
|
||||
Consider:
|
||||
|
||||
```cpp
|
||||
int count = 42;
|
||||
double ratio = 0.5;
|
||||
char flag = 'Y';
|
||||
```
|
||||
|
||||
These values are all just bits in memory, but the type tells the compiler how to read and manipulate those bits.
|
||||
|
||||
### Value vs Representation
|
||||
|
||||
One useful systems-level habit is to separate a value from its representation.
|
||||
|
||||
For example, an `int` stores a signed integer value, but underneath it is represented in binary with a platform-defined size, usually 32 bits on modern desktop/server platforms. A pointer stores an address value, but underneath it is also just bits.
|
||||
|
||||
This distinction matters when you debug memory corruption. The CPU does not know “this is a tree node” in some abstract sense. It only sees instructions and bytes. The meaning comes from your program's types and the compiler's generated code.
|
||||
|
||||
### Storage Duration
|
||||
|
||||
Every object in C++ has a storage duration. At a practical level, that answers: when does this object come into existence, and when does its storage stop being valid?
|
||||
|
||||
The main categories are:
|
||||
|
||||
- automatic storage duration: usually local variables created when a scope is entered
|
||||
- static storage duration: global variables and `static` locals that live for the life of the program
|
||||
- dynamic storage duration: objects created explicitly on the heap, typically with `new` or via allocators
|
||||
|
||||
Later, RAII and smart pointers will build directly on this idea.
|
||||
|
||||
## Stack vs Heap
|
||||
|
||||
### Intuition
|
||||
|
||||
Beginners often memorize “stack is fast, heap is slow.” That is too shallow and often misleading.
|
||||
|
||||
The real difference is about lifetime management and allocation strategy.
|
||||
|
||||
- stack allocation is usually automatic and scoped
|
||||
- heap allocation is explicit or indirect and more flexible
|
||||
|
||||
### Mental Model
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
A[Program starts] --> B[Call main]
|
||||
B --> C[Create stack frame for main]
|
||||
C --> D[Call function]
|
||||
D --> E[Create another stack frame]
|
||||
E --> F[Return from function]
|
||||
F --> G[Frame removed automatically]
|
||||
C --> H[Heap objects may outlive function scope]
|
||||
```
|
||||
|
||||
### Stack Allocation
|
||||
|
||||
Local variables inside a function usually live on the stack, though the exact implementation is up to the compiler and optimizer.
|
||||
|
||||
Example:
|
||||
|
||||
```cpp
|
||||
void process() {
|
||||
int retries = 3;
|
||||
double threshold = 0.75;
|
||||
}
|
||||
```
|
||||
|
||||
Why it exists:
|
||||
|
||||
- function-local state is extremely common
|
||||
- scoped lifetimes are easy to manage automatically
|
||||
- creation and cleanup can often be handled without a general-purpose allocator
|
||||
|
||||
Internally, each function call usually gets a stack frame holding return information, saved registers, and local storage. When the function returns, that frame is popped.
|
||||
|
||||
Practical usage:
|
||||
|
||||
- temporary computation state
|
||||
- small fixed-size objects
|
||||
- ownership that should never outlive the current scope
|
||||
|
||||
Pitfalls:
|
||||
|
||||
- returning pointers or references to local variables
|
||||
- allocating very large arrays on the stack and causing stack overflow
|
||||
- assuming stack layout is fixed across compilers or optimization levels
|
||||
|
||||
### Heap Allocation
|
||||
|
||||
Heap allocation is used when an object's lifetime must outlive a scope, when size is only known at runtime, or when ownership must be transferred across components.
|
||||
|
||||
Example:
|
||||
|
||||
```cpp
|
||||
int* value = new int(42);
|
||||
delete value;
|
||||
```
|
||||
|
||||
Internally, `new` usually asks an allocator for a chunk of dynamic memory, then constructs the object in that memory. `delete` destroys the object and releases the storage.
|
||||
|
||||
Practical usage:
|
||||
|
||||
- dynamic data structures such as graphs or trees
|
||||
- objects shared across subsystems
|
||||
- buffers sized from runtime input
|
||||
|
||||
Pitfalls:
|
||||
|
||||
- memory leaks from forgetting `delete`
|
||||
- double delete from freeing the same pointer twice
|
||||
- dangling pointers after deletion
|
||||
- heap fragmentation and allocator overhead in performance-sensitive systems
|
||||
|
||||
Important note: in modern C++, direct `new` and `delete` should be rare in application code. Prefer containers and smart pointers. You still need to understand heap behavior because the abstractions are built on top of it.
|
||||
|
||||
## Pointers
|
||||
|
||||
### Intuition
|
||||
|
||||
A pointer is a value whose job is to hold the address of another object. That is all. It is powerful because it lets you refer to memory indirectly.
|
||||
|
||||
Pointers exist because systems software constantly needs indirect access:
|
||||
|
||||
- linked data structures
|
||||
- optional access to objects
|
||||
- efficient parameter passing without copying large objects
|
||||
- polymorphic behavior through base-class pointers
|
||||
- interaction with operating systems, hardware, and C APIs
|
||||
|
||||
### Basic Form
|
||||
|
||||
```cpp
|
||||
int score = 99;
|
||||
int* ptr = &score;
|
||||
```
|
||||
|
||||
Here:
|
||||
|
||||
- `score` is an `int`
|
||||
- `&score` means “address of score”
|
||||
- `ptr` stores that address
|
||||
- `*ptr` means “the int stored at that address”
|
||||
|
||||
### Pointer Relationship Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P[ptr] -->|stores address| S[score in memory]
|
||||
S --> V[99]
|
||||
```
|
||||
|
||||
### How It Works Internally
|
||||
|
||||
On a 64-bit system, a pointer is commonly 8 bytes. The compiler tracks the pointed-to type because pointer arithmetic and dereferencing depend on that type.
|
||||
|
||||
For example, incrementing an `int*` advances by `sizeof(int)` bytes, not by 1 byte.
|
||||
|
||||
```cpp
|
||||
int values[3] = {10, 20, 30};
|
||||
int* p = values;
|
||||
+p; // now points to values[1]
|
||||
```
|
||||
|
||||
The compiler scales the increment according to the pointed-to type.
|
||||
|
||||
### Practical Usage
|
||||
|
||||
- traversal in low-level data structures
|
||||
- API boundaries that may accept nullable inputs
|
||||
- efficient manipulation of contiguous buffers
|
||||
- ownership and lifetime control in specialized libraries or allocators
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
- dereferencing `nullptr`
|
||||
- dereferencing uninitialized pointers
|
||||
- using a pointer after the object it points to has been destroyed
|
||||
- confusing ownership with access: a pointer can point to something without owning it
|
||||
|
||||
That last point is critical. A raw pointer does not tell you who is responsible for deleting the object.
|
||||
|
||||
## References
|
||||
|
||||
### Intuition
|
||||
|
||||
A reference is an alias to an existing object. It exists to make code safer and clearer than pointer-heavy interfaces when nullability and reseating are not needed.
|
||||
|
||||
Example:
|
||||
|
||||
```cpp
|
||||
void increment(int& value) {
|
||||
++value;
|
||||
}
|
||||
```
|
||||
|
||||
### Why References Exist
|
||||
|
||||
Without references, you would often pass pointers just to avoid copying objects. But pointers imply optionality and manual dereferencing.
|
||||
|
||||
References express a stronger contract:
|
||||
|
||||
- this function expects a valid object
|
||||
- there is no need for null checks as part of normal usage
|
||||
- the alias should behave like the original object
|
||||
|
||||
### Internal View
|
||||
|
||||
At the machine level, a reference is often implemented similarly to a pointer, but the language treats it differently.
|
||||
|
||||
Key properties:
|
||||
|
||||
- must be initialized when created
|
||||
- cannot be reseated to refer to another object
|
||||
- usually cannot be null in well-formed code
|
||||
- use normal object syntax instead of pointer syntax
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
R[ref] -->|alias of| X[x]
|
||||
```
|
||||
|
||||
### Practical Usage
|
||||
|
||||
- passing large objects efficiently without copying
|
||||
- operator overloading and fluent APIs
|
||||
- returning aliases to subobjects when lifetime is guaranteed
|
||||
|
||||
### Pitfalls and Misconceptions
|
||||
|
||||
- a reference is not an independent object with its own lifetime target management
|
||||
- returning a reference to a local variable is still invalid
|
||||
- “references are always safer than pointers” is too simplistic; pointers are the right tool when optionality, reseating, or explicit low-level behavior is required
|
||||
|
||||
## Const Correctness
|
||||
|
||||
### Intuition
|
||||
|
||||
`const` is one of the cheapest ways to make C++ code easier to reason about. It restricts mutation and therefore reduces the number of possible program states.
|
||||
|
||||
### Practical Examples
|
||||
|
||||
```cpp
|
||||
void print(const std::string& name);
|
||||
|
||||
const int limit = 100;
|
||||
```
|
||||
|
||||
Why it matters in real systems:
|
||||
|
||||
- APIs become clearer about who is allowed to modify data
|
||||
- the compiler can catch accidental writes
|
||||
- reviewers can reason more quickly about ownership and side effects
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
- confusing `const int* p` with `int* const p`
|
||||
- using `const` inconsistently across interfaces
|
||||
- assuming `const` automatically implies thread safety or deep immutability
|
||||
|
||||
## Arrays, Decay, and Basic Memory Layout
|
||||
|
||||
### Intuition
|
||||
|
||||
C++ inherits much of C's memory model. Arrays are contiguous blocks of elements, which is why they are fast for indexed access and cache-friendly iteration.
|
||||
|
||||
```cpp
|
||||
int values[4] = {1, 2, 3, 4};
|
||||
```
|
||||
|
||||
The elements are stored adjacent in memory. That contiguity is why pointer arithmetic and array indexing are closely related.
|
||||
|
||||
### Under the Hood
|
||||
|
||||
`values[i]` is conceptually equivalent to `*(values + i)`.
|
||||
|
||||
This is powerful, but it is also why out-of-bounds access is dangerous. C++ does not automatically check bounds for raw arrays.
|
||||
|
||||
### Practical Usage
|
||||
|
||||
- numerical buffers
|
||||
- serialization code
|
||||
- high-performance loops
|
||||
- interop with C libraries
|
||||
|
||||
### Pitfalls
|
||||
|
||||
- array-to-pointer decay in function parameters
|
||||
- buffer overflows
|
||||
- assuming stack arrays automatically know their size when passed to a function
|
||||
|
||||
In most application code, prefer `std::array` for fixed-size arrays and `std::vector` for dynamic arrays. You will still see raw arrays in systems code, embedded code, and performance-critical paths.
|
||||
|
||||
## A Debugging Mental Model
|
||||
|
||||
### Intuition
|
||||
|
||||
Low-level bugs in C++ often feel mysterious only when you lack a runtime model. Most of the time, they reduce to one of a few categories:
|
||||
|
||||
- invalid lifetime
|
||||
- invalid memory access
|
||||
- wrong ownership
|
||||
- incorrect assumptions about object state
|
||||
- data races in concurrent code
|
||||
|
||||
### A Useful Diagnostic Loop
|
||||
|
||||
When debugging a crash or corruption issue, ask these questions in order:
|
||||
|
||||
1. What object was accessed?
|
||||
2. Was it initialized?
|
||||
3. Is its lifetime still valid?
|
||||
4. Who owns it?
|
||||
5. Could memory nearby have been overwritten?
|
||||
6. Is the failure deterministic or timing-dependent?
|
||||
|
||||
That checklist is more valuable than memorizing debugger buttons.
|
||||
|
||||
### Common Failure Modes
|
||||
|
||||
#### Segmentation Faults
|
||||
|
||||
Usually caused by dereferencing an invalid address such as:
|
||||
|
||||
- `nullptr`
|
||||
- a dangling pointer
|
||||
- a wild pointer from uninitialized memory
|
||||
|
||||
#### Use-After-Free
|
||||
|
||||
You delete an object, but some pointer or reference still points to the old address. The address may still look valid for a while, which makes this class of bug subtle.
|
||||
|
||||
#### Stack Corruption
|
||||
|
||||
Often caused by out-of-bounds writes into local arrays or incorrect pointer arithmetic.
|
||||
|
||||
#### Memory Leaks
|
||||
|
||||
The program keeps allocating memory without freeing it. In long-running services, that becomes a production issue rather than just a test annoyance.
|
||||
|
||||
### Practical Tools
|
||||
|
||||
Real C++ debugging is easier when you use tooling, not just intuition:
|
||||
|
||||
- compiler warnings: start with strict warnings enabled
|
||||
- AddressSanitizer: catches use-after-free, buffer overflows, and more
|
||||
- UndefinedBehaviorSanitizer: catches many invalid language-level operations
|
||||
- Valgrind on supported platforms: useful for leaks and invalid accesses
|
||||
- debugger: inspect stack frames, variables, and memory addresses
|
||||
|
||||
Example build flags on Clang or GCC for local debugging:
|
||||
|
||||
```bash
|
||||
-Wall -Wextra -Wpedantic -fsanitize=address,undefined -g
|
||||
```
|
||||
|
||||
### Misconception to Avoid
|
||||
|
||||
“If it only crashes sometimes, the code is almost correct.”
|
||||
|
||||
In C++, nondeterministic behavior is often a sign of undefined behavior, not a minor bug. Once you have UB, the optimizer and runtime can produce very different outcomes from one build or machine to another.
|
||||
|
||||
## Foundation Patterns That Matter Later
|
||||
|
||||
Several later C++ ideas are really lifetime-management patterns built on the concepts above:
|
||||
|
||||
- constructors and destructors manage object setup and cleanup
|
||||
- RAII ties resource lifetime to scope lifetime
|
||||
- smart pointers model ownership on top of heap allocation
|
||||
- containers hide raw memory management while preserving performance properties
|
||||
- concurrency primitives rely on precise reasoning about storage and object lifetime
|
||||
|
||||
If you can already picture stack frames, heap allocation, pointer indirection, and the compile-link pipeline, you are ready for object-oriented and modern C++ design.
|
||||
|
||||
## Interview Checkpoints
|
||||
|
||||
You should be able to explain these clearly in an interview without hiding behind buzzwords:
|
||||
|
||||
- the difference between compilation and linking
|
||||
- why headers can increase build time and coupling
|
||||
- what stack and heap allocation really mean in terms of lifetime
|
||||
- the difference between a pointer and a reference
|
||||
- what causes dangling pointers and use-after-free bugs
|
||||
- why `const` improves API design and reasoning
|
||||
|
||||
## What Comes Next
|
||||
|
||||
The next file builds on these memory and lifetime foundations to explain classes, constructors, destructors, inheritance, and polymorphism. The key shift is this: C++ object-oriented features are not separate from the memory model. They are layered on top of it.
|
||||
Reference in New Issue
Block a user