# Concurrency in Operating Systems
## How To Use This Guide
Concurrency becomes much easier once you stop treating it as a vocabulary list and start treating it as one core operating-system problem:
How does a machine make progress on many things at once when CPUs, memory bandwidth, and I/O devices are all limited?
This guide is written for deep understanding. The goal is not to memorize terms like thread, mutex, semaphore, or deadlock in isolation. The goal is to understand why those ideas exist, what problem each one solves, and what is actually happening inside the machine when concurrent software runs.
Keep this mental model in mind throughout:
- The CPU is a worker that can execute one instruction stream per core at a time.
- The operating system is the traffic controller deciding which work gets CPU time next.
- A process is a protected workspace.
- A thread is an execution path inside that workspace.
- Synchronization is the set of rules that prevents concurrent workers from corrupting shared state.
If you understand those five statements, the rest of concurrency starts fitting together.
---
## 1. Introduction To Concurrency
### What Problem Does Concurrency Solve?
Imagine a CPU as a very fast worker standing in front of a giant pile of tasks.
Some tasks are computation-heavy, like compressing a file or rendering a video frame. Some tasks mostly wait, like reading from disk, waiting for a packet from the network, or pausing until a user clicks a button.
If the system handled one task from start to finish before touching anything else, the machine would waste huge amounts of time. A web server would sit idle while one request waited on the database. A browser would freeze while a tab waited for the network. A database would stall everyone behind one slow disk read.
Concurrency exists because real systems do not face one long uninterrupted stream of work. They face many independent activities, each with bursts of CPU work separated by waiting.
The operating system uses concurrency to make progress on multiple activities during the same time period, even if only one of them is running on a given core at a given instant.
That solves several practical problems at once:
- It keeps the CPU busy when one task blocks on I/O.
- It improves responsiveness so interactive work does not wait behind long background work.
- It lets the system multiplex limited hardware across many users, applications, and services.
- It gives programmers a model for structuring systems that naturally have multiple ongoing activities.
### Concurrency Vs Parallelism
These words are related, but they are not the same.
Concurrency is about dealing with many things at once. Parallelism is about literally doing many things at the same instant.
The easiest intuition is this:
- Concurrency is one chef managing several dishes by switching attention between them.
- Parallelism is several chefs cooking different dishes at the same time.
On a single CPU core, true parallel execution is impossible. There is only one instruction stream running at a time. But concurrency is still possible because the operating system can rapidly switch between tasks, giving the illusion that many things are advancing together.
On a multi-core CPU, the system can have both:
- concurrency as the high-level structure of many tasks in progress
- parallelism as the physical reality that multiple cores are executing different tasks simultaneously
This distinction matters in design interviews and in real systems.
A chat server with ten thousand connections is a concurrency problem even if most connections are idle. The server must keep track of many conversations and react when any of them becomes ready.
An image-processing pipeline that splits a large array across eight cores is a parallelism problem because the aim is to finish one computation faster by using multiple cores at once.
Many real systems contain both patterns.
### Why Operating Systems Need Concurrency
Operating systems need concurrency because the machine itself is concurrent.
Even on a laptop, many things are happening during the same time window:
- one process is playing audio
- another is rendering the browser UI
- the disk controller is completing I/O
- the network card is receiving packets
- timers are expiring
- the user is moving the mouse
- background services are waking up to do maintenance work
If the OS did not have a concurrency model, it would not know how to coordinate these activities safely or efficiently.
From first principles, the OS needs concurrency for four major reasons.
#### 1. Resource multiplexing
There are more runnable activities than CPUs. The OS needs a way to share the processor across them.
#### 2. Waiting without wasting
I/O is slow compared with CPU speeds. Concurrency lets one task wait while another uses the core.
#### 3. Responsiveness
Humans notice delay quickly. Interactive tasks must stay responsive even when background tasks exist.
#### 4. Structure and isolation
Different activities should often be separated so one bug or one long-running operation does not freeze the entire system.
That is why concurrency is not an optional feature layered on top of operating systems. It is part of their job description.
---
## 2. Processes Vs Threads
### The Process Model
A process is the operating system's unit of protection and resource ownership.
When you launch a program, the OS does not simply say, "start running these instructions." It creates a process with its own execution context and resource boundaries. That process usually includes:
- a private virtual address space
- open files and sockets
- credentials and permissions
- accounting information
- one or more threads of execution
The key idea is isolation.
If process A crashes, process B should usually survive. If process A writes to an address, it should not corrupt process B's memory. If process A opens a file or holds a credential, the OS can track ownership precisely.
That is why browsers, databases, shells, and service managers all care about processes. The process model gives the OS a safe container around running code.
### The Thread Model
A thread is a schedulable execution path inside a process.
If the process is the workspace, the thread is the worker moving through instructions. Multiple threads in the same process share the process's address space and most of its resources, but each thread has its own:
- program counter
- CPU register state
- stack
- scheduling state
This sharing is what makes threads both powerful and dangerous.
They are powerful because communication is cheap. One thread can update an in-memory queue and another thread can read it directly without copying data through the kernel.
They are dangerous because the shared memory is exactly where race conditions appear. Threads are easy to create compared with processes, but they remove the natural safety barrier that process isolation provides.
### Process Vs Thread Intuition
Imagine an office building.
- A process is a company office suite with its own walls, keys, and filing cabinets.
- A thread is an employee working inside that office.
Employees in the same office can easily share documents because they are in the same room. Employees in different offices are better isolated, but sharing now requires a deliberate mechanism.
That is the core tradeoff.
```mermaid
flowchart TB
subgraph P1["Process A"]
direction TB
R1["Process resources
PID, open files, sockets,
permissions"]
M1["Shared address space
code, heap, globals"]
T1["Thread 1
PC, registers, stack"]
T2["Thread 2
PC, registers, stack"]
T3["Thread 3
PC, registers, stack"]
R1 --> M1
M1 --> T1
M1 --> T2
M1 --> T3
end
subgraph P2["Process B"]
direction TB
R2["Separate process resources
different PID and files"]
M2["Different address space"]
T4["Thread 1
PC, registers, stack"]
R2 --> M2 --> T4
end
```
### Context Switching: What Actually Happens?
Concurrency is not magic. It is implemented through context switching.
A context switch happens when the CPU stops running one thread and starts running another. That switch can happen because:
- a timer interrupt fired and the current time slice expired
- the thread blocked on I/O or a lock
- the thread made a system call that caused it to sleep
- a higher-priority thread became runnable
What actually happens is more concrete than the word "switch" suggests.
#### CPU state must be saved
The currently running thread has live machine state: general-purpose registers, instruction pointer, stack pointer, flags, and often SIMD or floating-point state. The kernel must save enough of that state so the thread can later resume as if nothing happened.
#### The kernel takes control
The CPU enters kernel mode through an interrupt, exception, or system call boundary. The OS now runs scheduler logic.
#### The scheduler chooses another runnable thread
The scheduler consults its data structures, such as ready queues or more advanced run structures, and picks the next thread to run.
#### Memory mapping may change
If the next thread belongs to a different process, the CPU may need a different page-table root. On x86 systems, for example, that means changing the memory translation context. That can invalidate or reduce the usefulness of TLB entries and disturb cache locality.
If the next thread belongs to the same process, the address space may stay the same. That is one reason thread switches are often cheaper than full process switches.
#### The new thread's state is restored
The kernel loads the saved registers for the chosen thread, restores its stack pointer and instruction pointer, updates accounting information, and returns to user mode.
At that point the CPU continues execution from the new thread's perspective, as if it had simply resumed after a pause.
### What Makes Context Switching Expensive?
The switch itself is not just a few register copies. The hidden cost often comes from lost locality.
- CPU caches may now contain data for the old thread, not the new one.
- Branch predictors may be less useful.
- The TLB may need new address translations.
- The kernel spends real work on bookkeeping, queue management, and accounting.
So concurrency improves utilization and responsiveness, but excessive switching can reduce throughput.
### Tradeoffs Between Processes And Threads
| Question | Processes | Threads |
| --- | --- | --- |
| Isolation | Strong | Weak inside a process |
| Communication | More expensive | Cheap through shared memory |
| Creation and switching cost | Higher | Lower |
| Fault containment | Better | Worse |
| Risk of races | Lower across process boundary | High when sharing data |
| Typical use | Security boundaries, separate services | In-process parallel work, request handling |
The real design question is not which one is universally better. It is which failure mode and performance profile you want.
---
## 3. CPU Scheduling And Concurrency
### Why Scheduling Exists
Scheduling exists because runnable work usually exceeds immediate CPU capacity.
Even on an 8-core machine, it is normal to have hundreds or thousands of threads in the system. Most of them are sleeping, but some are ready. The OS needs a policy for deciding who runs now, who waits, and how long each runnable task keeps the CPU.
If the operating system did not schedule carefully, several bad things would happen:
- interactive tasks would freeze behind long computations
- short jobs could wait far too long
- low-priority work might interfere with urgent work
- CPUs could sit idle while runnable tasks exist elsewhere
Scheduling is where concurrency becomes visible to the user. When an app feels responsive, that is partly a scheduling success. When the machine feels sluggish under load, scheduling is often part of the story.
### Preemptive Vs Non-Preemptive Scheduling
#### Non-preemptive
In non-preemptive scheduling, a running task keeps the CPU until it finishes, blocks, or voluntarily yields.
This is conceptually simple, but dangerous for general-purpose systems. One CPU-bound task can monopolize the processor and make everything else wait.
Older cooperative systems and some embedded runtimes use this model because it is simpler and more predictable when tasks are trusted.
#### Preemptive
In preemptive scheduling, the operating system can interrupt a running task and give the CPU to someone else.
This is the normal model for modern operating systems. Timer interrupts create scheduling points so no single task can dominate forever.
Preemption is why your music can keep playing while a background compile runs, and why the mouse pointer still moves when a program is busy.
The cost is that programmers can no longer assume their code runs to completion once started. A thread can be paused almost anywhere, which is why synchronization exists.
### Common Scheduling Algorithms
Real kernels use more sophisticated hybrids than textbook policies, but the classic algorithms are still the right foundation.
#### FCFS: First-Come, First-Served
FCFS runs tasks in arrival order.
This sounds fair, but it can be terrible for responsiveness. If one long CPU-bound job arrives first, all shorter jobs wait behind it. This is called the convoy effect.
FCFS is easy to reason about, but it is poorly suited to interactive systems.
#### Round Robin
Round Robin gives each runnable task a time quantum. When the quantum expires, the task is preempted and moved to the back of the ready queue.
This improves fairness and responsiveness because no task waits indefinitely while another uses the CPU forever.
The quantum size matters:
- too large and Round Robin behaves more like FCFS
- too small and the system wastes time on context-switch overhead
Round Robin is a good mental model for time-sharing systems, terminals, and basic fairness.
```mermaid
flowchart LR
Q["Ready queue
T1 -> T2 -> T3 -> T4"] --> C["CPU runs T1
for one quantum"]
C -->|Quantum expires| Q
C -->|Blocks for I/O| W["Wait queue / device"]
W -->|I/O completes| Q
```
#### Priority Scheduling
Priority scheduling lets more important work run before less important work.
That can be essential. A real-time audio thread should often outrank a background indexing thread.
But priorities introduce their own problems.
- Low-priority tasks can starve.
- Priority inversion can appear when a high-priority task waits on a lock held by a low-priority task.
- Programmers may assign priorities too aggressively and destabilize the system.
Modern kernels often combine priorities with fairness mechanisms and dynamic adjustments rather than using fixed static priorities alone.
### Real-World Implications: Responsiveness, Throughput, Fairness
Scheduling always balances competing goals.
#### Responsiveness
How quickly does the system react to input or wake an interactive task?
Good responsiveness matters for UI threads, terminal sessions, and latency-sensitive services.
#### Throughput
How much total work gets completed over time?
Batch systems often care more about throughput than immediate response time.
#### Fairness
Do tasks get a reasonable share of CPU time, or does one class of work dominate others?
There is no single best answer for every workload. That is why production schedulers are policy engines, not one-line formulas.
### Real-System Mapping
Linux does not literally run simple FCFS or plain classroom Round Robin for normal tasks. Its Completely Fair Scheduler tries to approximate an ideal world in which each runnable task gets a fair share over time. It keeps track of how much virtual runtime each task has accumulated and tends to favor the task that has had the least recent service.
That sounds abstract, but the intuition is simple: the scheduler is trying to prevent one runnable task from quietly consuming more than its fair share.
---
## 4. Shared Memory And Race Conditions
### What Shared Memory Means In Operating-System Context
Shared memory means two or more execution contexts can access the same memory location.
The most common case is threads inside one process. Because threads share the process address space, they can all read and write the same globals, heap objects, and memory-mapped regions.
Processes can also share memory deliberately using facilities like shared memory segments or `mmap`-based shared mappings.
Why would anyone want this?
Because shared memory is fast. Instead of copying data through message buffers, two workers can look at the same bytes. That is powerful for performance, but it creates a coordination problem.
If multiple threads can touch the same data, who is allowed to update it, and when?
### Race Conditions: The Core Idea
A race condition happens when the correctness of a program depends on the relative timing of concurrent operations.
In other words, the final result depends on who got there first.
Imagine two warehouse workers updating the same whiteboard that says how many boxes remain.
The whiteboard currently says `10`.
Worker A reads `10` and plans to erase it and write `9`.
Worker B reads `10` and also plans to erase it and write `9`.
Both workers did real work, but the board ends up showing `9`, not `8`.
That is a race condition. The shared state was updated without coordination.
### Why `counter++` Is Not One Step
Programmers often write something that looks atomic but is not.
```c
int counter = 0;
void worker(void) {
for (int i = 0; i < 100000; i++) {
counter++;
}
}
```
The dangerous part is that `counter++` is conceptually three operations:
1. read the current value from memory
2. add one in a register
3. write the new value back to memory
If two threads interleave those steps, one increment can be lost.
```mermaid
sequenceDiagram
participant T1 as Thread 1
participant M as Shared Counter In Memory
participant T2 as Thread 2
T1->>M: Read counter = 0
T2->>M: Read counter = 0
T1->>T1: Add 1 locally
T2->>T2: Add 1 locally
T1->>M: Write 1
T2->>M: Write 1
Note over M: Final value is 1, not 2
```
### Critical Sections
A critical section is the part of a program that accesses shared mutable state and therefore must not be executed by multiple threads at the same time.
This is the heart of synchronization.
The rule is not "make everything single-threaded." The rule is "identify the small regions where concurrent access would violate correctness, and protect exactly those regions."
Too little protection gives races. Too much protection gives unnecessary contention and poor scalability.
### A Safe Version
```c
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int counter = 0;
void worker(void) {
for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
}
```
The mutex ensures only one thread at a time enters the critical section that updates `counter`.
What actually changes in memory is not that the CPU suddenly understands your intention. What changes is that the threads are now forced to coordinate access using a synchronization primitive built on atomic operations and kernel support.
---
## 5. Synchronization Mechanisms
Synchronization mechanisms exist because concurrency without coordination is just controlled chaos.
The goal is always the same: preserve correctness while still allowing useful overlap.
Different mechanisms solve slightly different coordination problems.
### Locks And Mutexes
#### What Problem A Mutex Solves
A mutex solves mutual exclusion: at most one thread may execute a critical section at a time.
Use a mutex when shared state must remain internally consistent across a sequence of steps.
For example, updating a hash table is rarely just one machine instruction. A thread might need to search a bucket, allocate a node, relink pointers, and update a count. If another thread sees the structure halfway through, the data can become corrupted or observed in an invalid state.
#### How A Mutex Works Internally
At a high level, a mutex has two paths:
- a fast path when the lock is free
- a slow path when there is contention
On the fast path, the thread uses an atomic operation to change the lock state from unlocked to locked. If that succeeds, it enters the critical section.
On the slow path, if another thread already holds the lock, the waiting thread typically cannot proceed. In user-space threading libraries on Linux, the thread may spin briefly and then sleep using kernel support such as a futex-backed wait queue. The holder later unlocks the mutex and wakes one of the waiters.
That means a lock is not just a variable. It is a protocol involving the CPU, memory ordering rules, and often the scheduler.
#### When To Use A Mutex
Use a mutex when:
- the critical section is short to moderate in length
- a single owner should protect a data structure
- sleeping while waiting is acceptable
Avoid one giant mutex around an entire subsystem if the workload has high contention. That keeps correctness but destroys parallelism.
```mermaid
flowchart TD
A["Thread reaches critical section"] --> B{"Mutex free?"}
B -->|Yes| C["Atomic acquire"]
C --> D["Run critical section"]
D --> E["Unlock mutex"]
E --> F["Wake one waiting thread
if any"]
B -->|No| G["Sleep or wait in mutex queue"]
G --> F
```
### Semaphores
#### What Problem A Semaphore Solves
A semaphore controls access to a limited number of identical resources.
Where a mutex is about exclusive ownership, a semaphore is about permits.
If a database connection pool has 20 connections, a counting semaphore with value 20 is a natural fit. Each task acquires one permit before using a connection and releases it afterward.
#### How A Semaphore Works Internally
A counting semaphore stores an integer count and a queue of waiters.
- `wait` or `P` tries to decrement the count
- if the count is positive, the thread proceeds
- if the count is zero, the thread blocks
- `signal` or `V` increments the count and may wake a waiter
The internal updates must still be atomic, because multiple threads may change the semaphore concurrently.
#### When To Use A Semaphore
Use semaphores when:
- you want to limit concurrency rather than enforce one-at-a-time access
- there are N interchangeable resources
- you are modeling producer-consumer capacity or admission control
Binary semaphores can resemble mutexes, but conceptually they are not the same. Mutexes emphasize ownership. Semaphores emphasize availability of permits.
### Spinlocks
#### What Problem A Spinlock Solves
A spinlock solves the same basic exclusivity problem as a mutex, but with a different waiting strategy.
Instead of sleeping when the lock is unavailable, a thread repeatedly checks the lock in a tight loop, waiting for it to become free.
#### How A Spinlock Works Internally
The lock state is guarded by an atomic instruction, often based on test-and-set or compare-and-swap. If acquisition fails, the thread keeps spinning.
This sounds wasteful, and often it is. But it can still be the right choice when the expected wait is extremely short and sleeping would cost more than spinning.
#### When To Use A Spinlock
Spinlocks make sense when:
- the critical section is extremely short
- the thread cannot sleep safely, as in certain kernel contexts
- the lock is expected to be released very quickly
They are a poor fit for long waits, especially on a single core or under oversubscription. If the lock holder is descheduled while others spin, the system burns CPU doing no useful work.
### Monitors
#### What Problem A Monitor Solves
A monitor is a higher-level structured approach to synchronization. It combines:
- shared state
- mutual exclusion
- condition-based waiting
The idea is that data and the synchronization rules that protect that data should live together.
Java's `synchronized` methods and objects are the classic example. Only one thread may execute inside the monitor at a time, and threads can wait for conditions to become true.
#### How A Monitor Works Internally
Internally, a monitor usually relies on a lock plus one or more condition queues. Entering the monitor means acquiring the lock. Waiting means releasing the lock atomically and sleeping until another thread signals that relevant state has changed.
#### When To Use A Monitor
Use monitors when:
- you want a structured object-oriented way to protect shared state
- the data and synchronization policy naturally belong together
- condition-based coordination is part of the object's behavior
Monitors are less a low-level primitive and more a disciplined design pattern supported by languages or runtimes.
### Condition Variables
#### What Problem A Condition Variable Solves
A condition variable solves a different problem from a lock.
A lock answers: who may enter the critical section?
A condition variable answers: when should a waiting thread proceed?
Imagine a bounded queue. A consumer may hold the mutex and inspect the queue, but if the queue is empty, the problem is not ownership. The problem is that the required condition is false.
#### How A Condition Variable Works Internally
Condition variables are used with a mutex.
The critical operation is `wait`, which does two things atomically:
1. releases the mutex
2. puts the thread to sleep on the condition queue
That atomicity matters. Without it, a signal could occur in the tiny gap between unlocking and going to sleep, causing a missed wakeup.
When the waiting thread wakes, it re-acquires the mutex before returning.
That is also why condition waits are written in a loop:
```c
pthread_mutex_lock(&lock);
while (queue_is_empty()) {
pthread_cond_wait(¬_empty, &lock);
}
item = pop_queue();
pthread_mutex_unlock(&lock);
```
The loop is needed because wakeups can be spurious and because another thread may consume the resource before the woken thread gets the mutex back.
#### When To Use A Condition Variable
Use condition variables when:
- threads must wait for a state transition
- sleeping is preferable to busy waiting
- shared state has predicates like "queue not empty" or "buffer has space"
### The Big Picture
All synchronization primitives are really answers to the same question:
How do we make sure multiple threads observe and modify shared state in an order that preserves invariants?
The primitive you choose depends on whether you need exclusive access, limited permits, busy waiting, structured monitor-style coordination, or state-based waiting.
---
## 6. Deadlocks
### What Deadlock Is
A deadlock is a state where a set of threads or processes are waiting forever because each one is waiting for something held by another.
Imagine two people in a hallway.
Person A will not move until Person B steps aside.
Person B will not move until Person A steps aside.
Nothing is wrong with either person individually. The system is stuck because the dependency pattern has no way forward.
In software, the classic case is:
- Thread 1 holds lock A and waits for lock B
- Thread 2 holds lock B and waits for lock A
```mermaid
graph LR
T1["Thread 1
holds Lock A"] -->|waits for| LB["Lock B"]
LB -->|held by| T2["Thread 2
holds Lock B"]
T2 -->|waits for| LA["Lock A"]
LA -->|held by| T1
```
### The Coffman Conditions
Four conditions are traditionally required for deadlock to be possible.
#### 1. Mutual exclusion
At least one resource must be non-shareable.
#### 2. Hold and wait
A thread holds one resource while waiting for another.
#### 3. No preemption
Resources cannot simply be taken away safely.
#### 4. Circular wait
There is a cycle of dependencies.
If you break any one of these conditions, true deadlock cannot occur.
That is not just theory. Many practical strategies are really ways of deliberately breaking one Coffman condition.
### Detection Vs Prevention Vs Avoidance
#### Detection
Detection means you allow the system to enter dangerous states, but you monitor for cycles or timeout patterns and recover afterward.
Databases often do this. A lock manager builds or approximates a wait-for graph. If it finds a cycle, it aborts one transaction so the others can continue.
Detection works well when recovery is acceptable.
#### Prevention
Prevention means design the system so deadlock cannot happen in the first place.
Common prevention techniques include:
- global lock ordering
- requesting all needed resources up front
- releasing held resources before requesting new ones
- allowing preemption in controlled cases
This is often the most practical strategy in application code. A simple lock hierarchy prevents many production deadlocks.
#### Avoidance
Avoidance means the system examines requests and grants them only if the resulting state remains safe.
This is more dynamic than prevention. The system is not banning a pattern outright; it is checking whether a request would push the system into a state from which deadlock could become inevitable.
### Banker's Algorithm: Conceptual Explanation
Banker's algorithm is the classic deadlock-avoidance idea.
The intuition is a bank lending money conservatively.
The bank does not care only about the current request. It asks a deeper question:
If I grant this request now, is there still some order in which every customer could finish and repay what they owe?
If yes, the state is considered safe. If not, the bank delays the request.
In operating-system teaching, this is useful because it shows that deadlock avoidance is about reasoning over future possibilities, not just present availability.
In real general-purpose operating systems, Banker's algorithm is rarely used directly because workloads are too dynamic and exact maximum future claims are usually unknown. But the idea remains important.
---
## 7. Advanced Concurrency Concepts
### Thread Pools
Creating a new thread for every small task is expensive and unstable at scale. Thread creation has cost, stacks consume memory, and too many runnable threads cause scheduling overhead and cache churn.
Thread pools exist to control that.
A thread pool keeps a fixed or bounded set of worker threads alive. Incoming tasks go into a queue. Workers pull tasks from the queue and execute them.
Why this helps:
- thread creation cost is amortized
- concurrency is capped so the system is not overwhelmed
- the queue provides backpressure when demand spikes
This is why application servers, database engines, and job systems heavily use thread pools.
### Futures And Promises
Futures and promises separate the start of an operation from the retrieval of its result.
Instead of blocking immediately, a caller receives a placeholder for work that will finish later.
This is useful because it lets programs express dependency without forcing immediate waiting.
For example, a service might start three remote requests in parallel and then wait only when it actually needs the combined results.
Under the hood, a future is usually just state plus synchronization:
- not completed yet
- completed successfully with a value
- completed with an error
Waiters either block, register callbacks, or resume later depending on the programming model.
### Message Passing Vs Shared Memory
These are two very different ways to structure concurrency.
#### Shared memory
Workers communicate by reading and writing the same memory.
Advantages:
- low-latency communication
- efficient for fine-grained data sharing
- natural for in-process data structures
Costs:
- races are easy to create
- reasoning about ownership becomes hard
- memory visibility and locking bugs appear
#### Message passing
Workers communicate by sending messages, often through queues, channels, sockets, or mailboxes.
Advantages:
- ownership boundaries are clearer
- less accidental sharing
- often easier to scale across processes or machines
Costs:
- copying and serialization may be required
- latency can be higher
- designing message protocols adds complexity
Operating systems use both. Threads inside a process may use shared memory, while services talk over sockets. Databases may use shared memory internally but message passing between client and server.
### Lock-Free And Wait-Free Programming
These terms describe progress guarantees.
#### Lock-free
Lock-free means the system as a whole keeps making progress. Even if some thread stalls, at least one thread can still complete its operation.
#### Wait-free
Wait-free is stronger. Every thread is guaranteed to complete its operation in a bounded number of steps.
These approaches avoid classic lock problems like deadlock and some forms of priority inversion, but they are hard to design correctly. They also introduce different hazards such as ABA issues, memory reclamation complexity, and subtle memory-ordering bugs.
That is why lock-free algorithms are usually reserved for high-value paths such as concurrent queues, memory allocators, kernel structures, and low-latency runtimes.
### Atomic Operations And CAS
Atomic operations are the hardware building blocks underneath most concurrency mechanisms.
An atomic operation appears indivisible to other cores. No other observer sees it halfway complete.
The most famous example is compare-and-swap, usually called CAS.
Conceptually, CAS does this:
1. read the current value at an address
2. compare it with an expected old value
3. if they match, write a new value
4. report whether the swap succeeded
That single primitive can be used to build locks, reference counters, concurrent stacks, and many other structures.
Why CAS matters on real CPUs:
- multiple cores may race to update the same cache line
- the hardware cache-coherence protocol ensures one core wins the atomic update
- losing cores observe failure and retry or take another path
Atomic instructions also interact with memory ordering. Correct concurrent programs often need not just atomicity, but rules about when writes become visible to other cores.
---
## 8. Real-World Systems Perspective
Concurrency becomes much clearer when you stop imagining toy threads incrementing counters and look at real systems.
### Web Servers: Handling Multiple Requests
A web server is a concurrency machine.
Thousands of clients may be connected at once, but most of them are not continuously using CPU. They are waiting on network I/O, TLS handshakes, backend responses, or client-side pacing.
There are several common server models.
#### Thread-per-request
Each request gets its own worker thread.
This is simple to understand because each request looks like a straightforward sequential program. But at large scale it becomes expensive. Too many threads mean stack memory overhead, scheduler pressure, and lock contention.
#### Event-driven with workers
Systems like Nginx rely heavily on event loops plus worker processes or threads. The kernel notifies the server when sockets are readable or writable. Workers do CPU work only when the request is ready to make progress.
What is actually happening under the hood:
- the NIC receives packets and interrupts or notifies the kernel
- the kernel places data in socket buffers
- readiness events are recorded
- a sleeping worker wakes via mechanisms such as `epoll`
- the worker reads, parses, routes, maybe calls a backend, and sends a response
Concurrency here is mostly about managing many mostly-waiting activities efficiently.
### Databases: Transactions And Isolation
Databases are concurrency-control systems as much as they are storage systems.
Many clients want to read and modify the same logical data at the same time. The database must allow useful parallel work while making the result look correct.
If two transactions update the same row concurrently, the database cannot just let them freely overwrite each other. It needs a concurrency-control strategy.
Two major families are common:
- lock-based concurrency control
- multi-version concurrency control, or MVCC
With lock-based control, the database uses shared and exclusive locks so readers and writers coordinate.
With MVCC, readers often see a snapshot while writers create newer versions. That reduces read-write blocking, but the engine must track visibility rules carefully.
Isolation levels like Read Committed, Repeatable Read, and Serializable are really tradeoffs about how much concurrency the database permits versus how strong the illusion of sequential execution should be.
Deadlocks are common enough in databases that detection and recovery are standard features, not edge cases.
### Operating Systems: Linux Scheduling Basics
Inside Linux, the schedulable unit is effectively a task, which covers both what user-space calls threads and processes. The scheduler tracks runnable tasks, sleeping tasks, priorities or scheduling classes, and CPU affinity.
What actually happens at a high level:
- each CPU has runnable work associated with it
- timer interrupts and wakeups create scheduling points
- blocked tasks leave the run queue and wait on an event
- I/O completion or a wakeup puts them back on a runnable queue
- the scheduler chooses the next task based on policy
- load balancing may move work across cores
The important intuition is that Linux is constantly converting external events into runnable work. Disk completion, network arrival, timer expiry, and lock release all eventually become "this task may run again now."
### CPU-Level Parallelism: Multi-Core Execution
At the hardware level, concurrency stops being a pure illusion.
On a multi-core machine, two threads can truly execute at the same time. But that does not mean they see memory instantly and in a perfectly simple order.
Each core has private caches. When two cores touch the same data, the hardware cache-coherence protocol must keep their views consistent enough to preserve the platform's memory model.
That has several consequences.
#### Shared data can become a cache-coherence hotspot
If multiple cores repeatedly write the same memory location, the cache line bounces between cores. Performance can collapse even if the algorithm is logically correct.
#### Memory ordering matters
One core's writes may not become visible to another core in the naive order a beginner expects unless the program uses proper synchronization.
#### More cores do not automatically mean faster code
If a workload spends most of its time waiting on one lock, then adding more cores just creates more threads waiting on the same bottleneck.
That is why high-performance concurrent programming cares about data partitioning, locality, reducing contention, and minimizing shared mutable state.
---
## 9. Common Bugs And Pitfalls
### Deadlocks In Production Systems
Deadlocks in real systems often arise from small inconsistencies, not grand design mistakes.
One code path acquires `user_lock` then `cache_lock`. Another path acquires `cache_lock` then `user_lock`. Under light load, both seem fine. Under load, one unlucky timing interleaving freezes both threads.
This is why production teams establish lock-ordering rules and document them explicitly.
### Race Conditions In Distributed Systems
Distributed race conditions are even trickier because the shared state is not just memory. It is spread across machines, networks, replicas, queues, and clocks.
Examples:
- two services process the same event twice
- messages arrive out of order
- one node acts on stale data from another
- a timeout triggers a retry even though the first operation actually succeeded
Local mutexes cannot solve these problems because the race is no longer between threads in one address space. Now the system needs idempotency, versioning, transactions, leases, consensus, or other distributed coordination techniques.
### Starvation
Starvation means a thread or task is not deadlocked, but it still makes no useful progress because others keep getting serviced first.
This can happen when:
- a scheduler keeps favoring higher-priority work
- a lock is unfair and one waiter repeatedly loses
- a thread pool is saturated with long-running jobs and short jobs never get a turn
The system is active, but some participant is effectively excluded.
### Livelock
Livelock is different from deadlock.
In deadlock, nothing moves.
In livelock, everything moves, but no one makes progress.
Imagine two polite people in a hallway who both keep stepping aside in the same direction over and over. They are active, but still blocked.
In software, aggressive retry loops, repeated conflict detection, or backoff schemes with bad coordination can create livelock.
### A Practical Warning
The most dangerous concurrency bugs are often:
- rare
- timing-dependent
- load-dependent
- difficult to reproduce in development
That is why teams use code review, lock-order rules, timeouts, stress tests, tracing, and metrics to catch them before users do.
---
## 10. Summary Mental Model
The simplest useful mental model of concurrency in operating systems is this:
1. The system has more ongoing work than it can execute all at once.
2. The scheduler decides which runnable thread gets CPU time next.
3. Threads inside a process share memory, so they can cooperate cheaply but also interfere with each other.
4. Critical sections are the places where shared state can be corrupted.
5. Synchronization primitives enforce rules about who may proceed and when.
6. The hardware and kernel together make those rules real through atomic instructions, wait queues, wakeups, and context switches.
If you want one sentence that ties the whole topic together, use this:
Concurrency is the art of making many activities make progress on limited hardware without losing correctness.
### How To Think About Threads, Locks, And Scheduling Together
When you analyze a concurrent system, ask these questions in order.
#### What are the units of execution?
Processes, threads, tasks, event-loop callbacks, transactions, or requests?
#### What state is shared?
Memory, files, rows, queues, caches, or sockets?
#### Who decides when work runs?
The OS scheduler, a thread pool, an event loop, or a database lock manager?
#### What invariants must remain true?
Queue size never negative, account balance updates not lost, one writer at a time, transaction isolation preserved.
#### What blocks progress?
I/O waits, lock contention, full queues, CPU saturation, dependency cycles.
#### What is the failure mode?
Race, deadlock, starvation, livelock, or throughput collapse.
Once you start asking those questions, concurrency stops looking like a bag of unrelated mechanisms. It becomes a coherent systems story:
- scheduling decides when a thread runs
- synchronization decides what it may safely touch
- memory rules decide what other cores can observe
- design choices decide whether the system scales cleanly or collapses under contention
That is the level of understanding that helps in interviews, in systems design, and in real production debugging.