1109 lines
35 KiB
Markdown
1109 lines
35 KiB
Markdown
# System Operations and OS Internals for Interviews
|
|
|
|
This guide is written for software engineers who already build and debug real systems but want a stronger operating-systems mental model for interviews. The focus is not only on definitions, but on what actually happens when software crosses the boundary into the operating system, how hardware and the kernel cooperate, and how these ideas show up in Linux backend systems.
|
|
|
|
---
|
|
|
|
## 1. Why This Topic Matters
|
|
|
|
Most application code runs in a protected, abstracted environment. You write to a socket, read a file, allocate memory, create a thread, or wait on a timer, and it feels like a normal function call. Underneath that API, the operating system is enforcing protection, multiplexing hardware, handling interrupts, programming devices, managing memory, and deciding which thread gets CPU time.
|
|
|
|
Interviewers ask these topics because they reveal whether you understand:
|
|
|
|
- where the application boundary ends and the OS boundary begins,
|
|
- why some operations are cheap and others are expensive,
|
|
- how blocking, I/O, and scheduling interact,
|
|
- how Linux servers actually spend their time,
|
|
- and how the kernel preserves isolation and security.
|
|
|
|
If you understand the flow from user code to hardware and back, a lot of unrelated-looking interview questions become much easier.
|
|
|
|
---
|
|
|
|
## 2. Big Picture: What the OS Actually Does
|
|
|
|
An operating system is the privileged software layer that sits between applications and hardware. It provides a controlled way to use CPUs, memory, storage, devices, and networking.
|
|
|
|
At a high level, the OS is responsible for:
|
|
|
|
- Process and thread management
|
|
- Memory management
|
|
- I/O and device management
|
|
- File systems
|
|
- Scheduling
|
|
- Protection and isolation
|
|
- Interrupt handling
|
|
- Resource accounting and policy decisions
|
|
|
|
An application generally cannot touch hardware directly. Instead, it asks the OS to perform privileged work on its behalf.
|
|
|
|
---
|
|
|
|
## 3. User Mode vs Kernel Mode
|
|
|
|
One of the most important OS concepts is that the CPU runs code in different privilege levels.
|
|
|
|
### User Mode
|
|
|
|
Most application code runs in user mode.
|
|
|
|
In user mode:
|
|
|
|
- Code cannot execute privileged instructions.
|
|
- Code cannot directly access arbitrary physical memory.
|
|
- Code cannot directly reprogram devices or interrupt tables.
|
|
- Code must request OS services through controlled entry points.
|
|
|
|
This protects the system from buggy or malicious applications. If any process could directly write page tables, reconfigure the disk controller, or disable interrupts, the entire machine would be unstable and insecure.
|
|
|
|
### Kernel Mode
|
|
|
|
The kernel runs in a more privileged CPU mode.
|
|
|
|
In kernel mode:
|
|
|
|
- The kernel can execute privileged instructions.
|
|
- The kernel can manage page tables and MMU state.
|
|
- The kernel can program devices and install interrupt handlers.
|
|
- The kernel can inspect and manipulate process state.
|
|
|
|
Kernel mode is powerful, but dangerous. A kernel bug is much more serious than a user-space bug because it can crash the system or violate isolation.
|
|
|
|
### Why the Separation Exists
|
|
|
|
The OS relies on hardware support to enforce this boundary. The CPU, MMU, and page tables together make sure a user process cannot simply decide to access kernel memory or execute privileged instructions.
|
|
|
|
This boundary is the foundation of protection.
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[User Process in User Mode] -->|system call or fault| B[Controlled CPU transition]
|
|
B --> C[Kernel Mode]
|
|
C --> D[Kernel validates request]
|
|
D --> E[Kernel performs privileged work]
|
|
E --> F[Return to user mode]
|
|
F --> A
|
|
A -. cannot directly .-> G[Device registers]
|
|
A -. cannot directly .-> H[Page tables]
|
|
A -. cannot directly .-> I[Interrupt controller]
|
|
```
|
|
|
|
### Interview framing
|
|
|
|
A strong concise answer is:
|
|
|
|
> User mode is the restricted execution mode for applications. Kernel mode is the privileged mode where the OS can manage hardware and system-wide resources. The boundary exists so the machine can enforce isolation, safety, and access control.
|
|
|
|
---
|
|
|
|
## 4. Privileged Instructions
|
|
|
|
Privileged instructions are CPU instructions that can only be executed in kernel mode or another sufficiently privileged mode.
|
|
|
|
Examples include instructions that:
|
|
|
|
- modify page tables or MMU configuration,
|
|
- disable or enable interrupts,
|
|
- access device control registers,
|
|
- install interrupt descriptor tables,
|
|
- switch certain processor control registers,
|
|
- halt or reboot the machine.
|
|
|
|
If user code tries to execute one of these instructions, the CPU raises an exception rather than allowing it.
|
|
|
|
### Why this matters
|
|
|
|
Without privileged instructions, any user process could:
|
|
|
|
- bypass memory isolation,
|
|
- intercept device traffic,
|
|
- block interrupts and freeze progress,
|
|
- or read or modify another process's memory.
|
|
|
|
So the hardware does not merely rely on the kernel being polite. It enforces privilege checks.
|
|
|
|
---
|
|
|
|
## 5. Protection Context and Security Boundaries
|
|
|
|
When interviewers ask about protection, they are usually probing whether you understand what exactly is being isolated and how.
|
|
|
|
### Main protection boundaries
|
|
|
|
#### 1. User space vs kernel space
|
|
|
|
This is the main privilege boundary. User code cannot directly perform privileged operations; it must go through the kernel.
|
|
|
|
#### 2. Process vs process
|
|
|
|
Each process typically has its own virtual address space. Process A cannot directly read or write process B's memory unless the OS explicitly allows sharing.
|
|
|
|
#### 3. File and device permissions
|
|
|
|
The kernel enforces ownership, permissions, capabilities, ACLs, and namespace boundaries.
|
|
|
|
#### 4. Execution identity
|
|
|
|
Every request arrives with a protection context such as:
|
|
|
|
- user ID and group IDs,
|
|
- capabilities,
|
|
- current namespace and cgroup context,
|
|
- open file descriptors,
|
|
- current memory map,
|
|
- current working directory and root context.
|
|
|
|
The kernel uses this context when deciding whether an operation is allowed.
|
|
|
|
### Example
|
|
|
|
Suppose a backend service calls `open("/etc/shadow", O_RDONLY)`.
|
|
|
|
The kernel does not ask whether the function call exists. It asks whether the current process identity and security context are allowed to perform that operation on that inode. The check is enforced by the kernel, not by the application.
|
|
|
|
### The role of the MMU
|
|
|
|
Memory protection is heavily supported by hardware:
|
|
|
|
- Each process gets virtual memory mappings.
|
|
- Page tables mark pages as readable, writable, executable, user-accessible, or kernel-only.
|
|
- The MMU translates virtual addresses to physical addresses and enforces access rules.
|
|
|
|
So process isolation is not just a software convention. It is a hardware-backed protection boundary.
|
|
|
|
---
|
|
|
|
## 6. System Calls
|
|
|
|
System calls are the controlled interface through which user-space programs request kernel services.
|
|
|
|
Typical examples:
|
|
|
|
- `read`, `write`, `open`, `close`
|
|
- `fork`, `execve`, `wait`
|
|
- `mmap`, `brk`
|
|
- `socket`, `bind`, `listen`, `accept`, `connect`
|
|
- `epoll_wait`
|
|
- `ioctl`
|
|
|
|
### System call vs normal function call
|
|
|
|
A normal function call stays within the process and the same privilege level.
|
|
|
|
A system call crosses into the kernel and usually involves:
|
|
|
|
- a privilege transition,
|
|
- register convention for syscall number and arguments,
|
|
- CPU state save/restore,
|
|
- kernel validation and dispatch,
|
|
- possible blocking or scheduling,
|
|
- and a return path back to user mode.
|
|
|
|
This is why system calls are much more expensive than pure user-space function calls.
|
|
|
|
### Why libc wrappers exist
|
|
|
|
In Linux, user programs often call libc functions such as `read()` or `open()`. Those are wrappers. At some point the wrapper issues the actual syscall instruction and enters the kernel.
|
|
|
|
Historically, x86 Linux used `int 0x80`. Modern x86-64 Linux typically uses `syscall`, which is faster and designed for this purpose.
|
|
|
|
---
|
|
|
|
## 7. What Happens When a Program Requests OS Services
|
|
|
|
This is one of the most important end-to-end interview flows to understand.
|
|
|
|
Suppose a program calls `read(fd, buf, 4096)`.
|
|
|
|
### Step-by-step view
|
|
|
|
1. User code prepares arguments.
|
|
The file descriptor, buffer pointer, and length are placed in registers or the stack according to the calling convention and syscall ABI.
|
|
|
|
2. A syscall instruction is executed.
|
|
The CPU performs a controlled transition from user mode to kernel mode.
|
|
|
|
3. CPU switches to kernel execution context.
|
|
The CPU saves enough state to resume later, loads the kernel entry path, and begins running kernel code.
|
|
|
|
4. Kernel identifies the syscall.
|
|
A syscall number selects the correct kernel handler from the syscall table.
|
|
|
|
5. Kernel validates the request.
|
|
It checks that the file descriptor is valid, the user buffer is accessible, permissions are valid, and the arguments are well-formed.
|
|
|
|
6. Kernel performs the operation.
|
|
It may satisfy the read from a page cache, a socket buffer, or may need to ask a device driver and possibly block the process until data is available.
|
|
|
|
7. Kernel prepares the return value.
|
|
The result or error code is placed in a register.
|
|
|
|
8. CPU returns to user mode.
|
|
User code resumes after the syscall instruction.
|
|
|
|
9. libc wrapper may translate kernel error return to `errno`.
|
|
|
|
### Important interview point
|
|
|
|
The application does not jump into arbitrary kernel code. The transition happens only through hardware-controlled entry paths using designated instructions and entry tables.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant U as User Code
|
|
participant L as libc Wrapper
|
|
participant C as CPU
|
|
participant K as Kernel
|
|
participant D as Driver or Device
|
|
|
|
U->>L: call read(fd, buf, n)
|
|
L->>C: execute syscall instruction
|
|
C->>K: switch to kernel mode and enter syscall handler
|
|
K->>K: validate fd, buffer, permissions
|
|
alt data already available
|
|
K->>K: copy data to user buffer
|
|
else need device or network progress
|
|
K->>D: request I/O or wait for completion
|
|
D-->>K: completion event or data ready
|
|
K->>K: copy result and set return value
|
|
end
|
|
K-->>C: return-from-syscall
|
|
C-->>L: resume user mode
|
|
L-->>U: bytes read or -1 with errno
|
|
```
|
|
|
|
---
|
|
|
|
## 8. System Call Flow: User Space to Kernel Space
|
|
|
|
It helps to remember system call flow in three layers.
|
|
|
|
### Layer 1: API layer
|
|
|
|
User code calls a familiar interface like `open`, `send`, or `fork`.
|
|
|
|
### Layer 2: ABI and CPU transition
|
|
|
|
Arguments are placed where the kernel expects them. A special instruction triggers the transition.
|
|
|
|
### Layer 3: Kernel service path
|
|
|
|
The kernel dispatches to the correct subsystem:
|
|
|
|
- VFS for files,
|
|
- scheduler for process and thread changes,
|
|
- network stack for sockets,
|
|
- memory manager for `mmap`,
|
|
- block layer for storage I/O,
|
|
- device drivers for hardware-specific work.
|
|
|
|
### Important kernel checks
|
|
|
|
The kernel generally must:
|
|
|
|
- check the process identity and permissions,
|
|
- copy or validate user pointers,
|
|
- enforce resource limits,
|
|
- preserve isolation,
|
|
- possibly sleep the thread if the operation cannot complete immediately.
|
|
|
|
### Why copying matters
|
|
|
|
Kernel code cannot blindly trust a user pointer. That pointer belongs to user space. The kernel has to validate access and usually copy data using controlled helper routines. Otherwise, a process could trick the kernel into reading or writing invalid memory.
|
|
|
|
---
|
|
|
|
## 9. Interrupts
|
|
|
|
An interrupt is a signal that causes the CPU to stop its current flow of execution and run a handler for an event.
|
|
|
|
Interrupts are a core reason the OS can respond to external events without constantly busy-waiting.
|
|
|
|
### What interrupts are for
|
|
|
|
Common reasons for interrupts:
|
|
|
|
- a network card received a packet,
|
|
- a disk completed an I/O request,
|
|
- a timer fired,
|
|
- a keyboard event occurred,
|
|
- an inter-processor signal was sent,
|
|
- or software intentionally triggered a protected control transfer.
|
|
|
|
### Key idea
|
|
|
|
Interrupts let hardware and low-level software notify the CPU that attention is needed.
|
|
|
|
---
|
|
|
|
## 10. Hardware Interrupts vs Software Interrupts
|
|
|
|
Interview discussions often mix these terms loosely, so it helps to be precise.
|
|
|
|
### Hardware Interrupts
|
|
|
|
These originate from hardware devices or controllers.
|
|
|
|
Examples:
|
|
|
|
- NIC signals packet arrival
|
|
- disk controller signals I/O completion
|
|
- timer chip signals time slice expiration
|
|
|
|
Properties:
|
|
|
|
- generally asynchronous relative to the currently running instruction stream,
|
|
- arrive from outside the current program,
|
|
- handled by kernel interrupt handlers.
|
|
|
|
### Software Interrupts
|
|
|
|
This term is used in two related ways.
|
|
|
|
#### Historical meaning
|
|
|
|
An instruction such as `int` on x86 deliberately causes a controlled transfer to a privileged handler.
|
|
|
|
#### Broader interview meaning
|
|
|
|
People sometimes use it loosely to refer to synchronous control transfers caused by software, including system calls, traps, and exceptions.
|
|
|
|
### Safer wording in interviews
|
|
|
|
It is often better to say:
|
|
|
|
- hardware interrupts are asynchronous events from devices,
|
|
- traps and exceptions are synchronous events caused by the current instruction stream,
|
|
- and system calls are controlled synchronous entries into the kernel.
|
|
|
|
That phrasing is more precise and avoids architecture-specific confusion.
|
|
|
|
---
|
|
|
|
## 11. Traps and Exceptions
|
|
|
|
Traps and exceptions are synchronous events related to the current instruction being executed.
|
|
|
|
### Exception
|
|
|
|
An exception occurs when the CPU detects a condition while executing an instruction.
|
|
|
|
Examples:
|
|
|
|
- divide by zero,
|
|
- invalid opcode,
|
|
- page fault,
|
|
- general protection fault.
|
|
|
|
### Trap
|
|
|
|
In interview usage, a trap is often described as a deliberate, synchronous transfer to the kernel, such as a debugger breakpoint or a syscall-style software-triggered entry.
|
|
|
|
### Useful refinement
|
|
|
|
In lower-level architecture discussions, exceptions are often subdivided into:
|
|
|
|
- faults: potentially restartable events, such as page faults,
|
|
- traps: reported after the instruction, often used for breakpoints or intentional transitions,
|
|
- aborts: serious failures that are not meaningfully restartable.
|
|
|
|
You do not always need that level of detail, but it helps if the interviewer is very systems-oriented.
|
|
|
|
### Example: page fault
|
|
|
|
A page fault is not inherently a crash.
|
|
|
|
When a process accesses a virtual page that is not currently mapped in RAM but is valid, the CPU raises a page fault exception, the kernel loads or maps the page, updates page tables, and then resumes the instruction.
|
|
|
|
If the access is invalid, the kernel may send a signal such as `SIGSEGV` to the process.
|
|
|
|
This is a good example of how an exception can be part of normal control flow.
|
|
|
|
---
|
|
|
|
## 12. Interrupt Handling Flow
|
|
|
|
You should understand the general shape, even if you do not memorize architecture-specific registers.
|
|
|
|
### Typical flow
|
|
|
|
1. An interrupt or exception occurs.
|
|
2. CPU saves enough current execution state.
|
|
3. CPU switches to a privileged handler path.
|
|
4. Kernel identifies the interrupt or exception vector.
|
|
5. A low-level handler runs.
|
|
6. The handler may acknowledge the device, record state, and schedule deferred work.
|
|
7. If necessary, the scheduler may run another thread before returning.
|
|
8. Eventually execution returns to some user or kernel context.
|
|
|
|
### Why deferred work exists
|
|
|
|
Interrupt handlers usually need to be fast. They often do the minimum urgent work and defer heavier processing to a later stage such as a softirq, tasklet, workqueue, kernel thread, or bottom-half style mechanism.
|
|
|
|
That keeps interrupt latency low.
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[Device event or CPU exception] --> B[CPU saves current state]
|
|
B --> C[CPU enters privileged handler]
|
|
C --> D[Kernel identifies vector]
|
|
D --> E[Top-half or immediate handler]
|
|
E --> F[Acknowledge source and capture minimal state]
|
|
F --> G{More work needed?}
|
|
G -->|Yes| H[Schedule deferred processing]
|
|
G -->|No| I[Prepare return]
|
|
H --> I
|
|
I --> J{Need reschedule?}
|
|
J -->|Yes| K[Scheduler picks next runnable task]
|
|
J -->|No| L[Return to interrupted context]
|
|
K --> L
|
|
```
|
|
|
|
### Real Linux example
|
|
|
|
For network receive:
|
|
|
|
- NIC raises an interrupt,
|
|
- kernel handler acknowledges it,
|
|
- packet processing may be deferred using NAPI-style polling,
|
|
- packet eventually reaches the socket receive queue,
|
|
- a blocked process may be woken up.
|
|
|
|
This is much more realistic than imagining the application directly talks to the NIC.
|
|
|
|
---
|
|
|
|
## 13. I/O Management
|
|
|
|
I/O is where the OS earns its keep. CPUs are fast, but devices are comparatively slow and unpredictable. The OS exists partly to hide those differences while keeping the system efficient.
|
|
|
|
### What the kernel does for I/O
|
|
|
|
The kernel provides:
|
|
|
|
- abstract interfaces such as files and sockets,
|
|
- buffering and caching,
|
|
- scheduling and queuing,
|
|
- synchronization and wake-up mechanisms,
|
|
- driver interaction,
|
|
- permission checks,
|
|
- and completion notification.
|
|
|
|
### Main I/O path idea
|
|
|
|
An application usually works with abstractions like:
|
|
|
|
- file descriptor,
|
|
- pathname,
|
|
- socket,
|
|
- pipe,
|
|
- terminal,
|
|
- block device.
|
|
|
|
The kernel translates those abstractions into device-specific work.
|
|
|
|
---
|
|
|
|
## 14. Blocking vs Non-Blocking I/O
|
|
|
|
These terms describe what the calling thread experiences.
|
|
|
|
### Blocking I/O
|
|
|
|
In blocking I/O, the call does not return until it can make meaningful progress or complete.
|
|
|
|
Examples:
|
|
|
|
- `read()` on a socket with no available data blocks until data arrives,
|
|
- `accept()` blocks until a connection is ready,
|
|
- `waitpid()` blocks until child state changes.
|
|
|
|
When a thread blocks, the scheduler usually marks it non-runnable and runs something else.
|
|
|
|
### Non-Blocking I/O
|
|
|
|
In non-blocking I/O, the call returns immediately if it cannot proceed right now.
|
|
|
|
For example, `read()` on a non-blocking socket may return `-1` with `EAGAIN` or `EWOULDBLOCK`.
|
|
|
|
The application then decides whether to:
|
|
|
|
- retry later,
|
|
- use `select`, `poll`, `epoll`, or `kqueue`,
|
|
- hand the work to an event loop,
|
|
- or queue it in some application scheduler.
|
|
|
|
### Real backend example
|
|
|
|
A high-concurrency web server usually cannot afford one OS thread per slow client connection. Instead, it uses non-blocking sockets plus a readiness notification API such as `epoll`.
|
|
|
|
That lets one thread manage many connections efficiently.
|
|
|
|
---
|
|
|
|
## 15. Synchronous vs Asynchronous I/O
|
|
|
|
These terms are related to completion semantics, not just whether the thread blocks.
|
|
|
|
### Synchronous I/O
|
|
|
|
In synchronous I/O, the operation is conceptually tied to the calling thread. Completion is generally observed by waiting in that call path.
|
|
|
|
Typical examples:
|
|
|
|
- blocking `read()` and `write()`,
|
|
- `fsync()`,
|
|
- many simple file operations.
|
|
|
|
### Asynchronous I/O
|
|
|
|
In asynchronous I/O, the request is submitted and completion is delivered later through a separate notification path.
|
|
|
|
Examples:
|
|
|
|
- signal-based AIO,
|
|
- completion queues,
|
|
- `io_uring` completion entries,
|
|
- overlapped I/O on some platforms.
|
|
|
|
### Important distinction
|
|
|
|
Blocking vs non-blocking asks: does the thread wait right now?
|
|
|
|
Synchronous vs asynchronous asks: how is completion reported and who owns the completion path?
|
|
|
|
These are different axes.
|
|
|
|
### Common interview trap
|
|
|
|
People often say non-blocking I/O is the same as asynchronous I/O. It is not.
|
|
|
|
You can have:
|
|
|
|
- non-blocking synchronous-style APIs where you keep retrying or wait for readiness,
|
|
- asynchronous APIs that still require careful completion handling,
|
|
- and blocking APIs that are entirely synchronous.
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[I/O request] --> B{Does caller wait now?}
|
|
B -->|Yes| C[Blocking]
|
|
B -->|No| D[Non-blocking]
|
|
A --> E{How is completion observed?}
|
|
E -->|Same call path| F[Synchronous]
|
|
E -->|Later notification or CQ| G[Asynchronous]
|
|
```
|
|
|
|
---
|
|
|
|
## 16. Buffered vs Unbuffered I/O
|
|
|
|
These terms ask whether data passes through kernel or library-managed buffers.
|
|
|
|
### Buffered I/O
|
|
|
|
Buffered I/O uses intermediate storage to smooth differences in producer and consumer speed.
|
|
|
|
Examples:
|
|
|
|
- stdio buffering in user space,
|
|
- kernel page cache for files,
|
|
- socket receive and send buffers,
|
|
- disk write buffering.
|
|
|
|
Benefits:
|
|
|
|
- fewer device accesses,
|
|
- better batching,
|
|
- better throughput,
|
|
- smoother interaction with slower devices.
|
|
|
|
Costs:
|
|
|
|
- extra copies,
|
|
- more memory usage,
|
|
- less immediate visibility of writes unless explicitly flushed.
|
|
|
|
### Unbuffered or direct-style I/O
|
|
|
|
This usually means minimizing intermediate buffering, often for control or performance reasons.
|
|
|
|
In Linux, direct I/O with flags like `O_DIRECT` aims to bypass the page cache for some workloads. It does not mean literally zero buffering everywhere, but it avoids the usual file cache path.
|
|
|
|
### Interview angle
|
|
|
|
If asked why databases sometimes use direct I/O, a good answer is:
|
|
|
|
> Databases often want explicit control over caching and flushing. Using the kernel page cache on top of the database's own cache can create double buffering and reduce predictability.
|
|
|
|
---
|
|
|
|
## 17. Polling vs Interrupt-Driven I/O
|
|
|
|
These are two ways of discovering whether a device or resource needs attention.
|
|
|
|
### Polling
|
|
|
|
With polling, software repeatedly checks device or resource state.
|
|
|
|
Advantages:
|
|
|
|
- simple control flow,
|
|
- can be efficient at very high event rates,
|
|
- avoids interrupt overhead in some cases.
|
|
|
|
Costs:
|
|
|
|
- wastes CPU if nothing is happening,
|
|
- may add latency depending on poll frequency.
|
|
|
|
### Interrupt-driven I/O
|
|
|
|
With interrupt-driven I/O, the device notifies the CPU when it needs attention.
|
|
|
|
Advantages:
|
|
|
|
- avoids constant busy checking,
|
|
- good for sporadic events,
|
|
- allows the CPU to do other work.
|
|
|
|
Costs:
|
|
|
|
- interrupt handling overhead,
|
|
- can become expensive under extremely high rates.
|
|
|
|
### Real Linux nuance
|
|
|
|
Modern networking often blends both. A NIC may raise an interrupt to indicate work, and then the kernel may switch into a polling mode such as NAPI to drain many packets efficiently.
|
|
|
|
That hybrid approach reduces interrupt storms under load.
|
|
|
|
---
|
|
|
|
## 18. Device Drivers
|
|
|
|
A device driver is the kernel component that knows how to operate a particular hardware device or family of devices.
|
|
|
|
Applications do not usually talk to hardware registers directly. They interact with kernel abstractions, and the driver handles the device-specific details.
|
|
|
|
### What drivers do
|
|
|
|
- initialize devices,
|
|
- configure DMA,
|
|
- submit commands,
|
|
- handle interrupts,
|
|
- expose interfaces to other kernel subsystems,
|
|
- and report errors or state.
|
|
|
|
### Examples
|
|
|
|
- NVMe driver for SSDs
|
|
- network driver for a NIC
|
|
- USB controller driver
|
|
- GPU driver
|
|
|
|
### Why drivers belong in the kernel path
|
|
|
|
Drivers often need privileged access to:
|
|
|
|
- device MMIO regions,
|
|
- interrupt registration,
|
|
- DMA mappings,
|
|
- power management hooks,
|
|
- and kernel memory.
|
|
|
|
That is why driver bugs can be serious.
|
|
|
|
---
|
|
|
|
## 19. DMA Basics
|
|
|
|
DMA stands for Direct Memory Access.
|
|
|
|
Without DMA, the CPU would need to move every byte between a device and memory itself. That would be inefficient.
|
|
|
|
With DMA:
|
|
|
|
- the kernel and driver program the device,
|
|
- the device transfers data directly to or from main memory,
|
|
- the CPU is interrupted or otherwise notified on completion.
|
|
|
|
### Why DMA matters
|
|
|
|
DMA reduces CPU overhead and increases throughput, especially for networking and storage.
|
|
|
|
### Real example: NIC receive path
|
|
|
|
1. Driver sets up receive buffers in RAM.
|
|
2. NIC DMA engine writes packet data into those buffers.
|
|
3. NIC signals completion.
|
|
4. Kernel processes the packet and eventually wakes a waiting socket reader.
|
|
|
|
### Important nuance
|
|
|
|
DMA is called direct, but it still requires OS and IOMMU coordination. The device does not get unrestricted access to all memory. Modern systems use mapping and protection mechanisms so the device can access only approved memory ranges.
|
|
|
|
---
|
|
|
|
## 20. Boot Process Overview
|
|
|
|
The boot process is the sequence that turns a powered-off machine into a running OS with user processes.
|
|
|
|
At a high level:
|
|
|
|
1. Firmware starts after power-on.
|
|
2. Firmware initializes enough hardware to load boot code.
|
|
3. A bootloader loads the kernel.
|
|
4. The kernel initializes core subsystems.
|
|
5. The kernel starts the first user-space process.
|
|
6. That process starts services and the rest of the system.
|
|
|
|
This is worth knowing because it connects hardware, firmware, kernel, and user space into one story.
|
|
|
|
---
|
|
|
|
## 21. BIOS vs UEFI
|
|
|
|
These are firmware environments that start before the OS.
|
|
|
|
### BIOS
|
|
|
|
BIOS is the older traditional firmware model.
|
|
|
|
Characteristics:
|
|
|
|
- older boot mechanism,
|
|
- limited early environment,
|
|
- legacy partitioning and boot conventions,
|
|
- common in older systems.
|
|
|
|
### UEFI
|
|
|
|
UEFI is the newer firmware standard.
|
|
|
|
Characteristics:
|
|
|
|
- richer pre-boot environment,
|
|
- support for EFI system partitions,
|
|
- boot entries managed in firmware,
|
|
- better support for modern disks and boot flows,
|
|
- support for Secure Boot.
|
|
|
|
### Practical interview answer
|
|
|
|
BIOS and UEFI both initialize the system and hand off to boot code, but UEFI is the modern, more flexible firmware architecture and is what you see on most current machines.
|
|
|
|
---
|
|
|
|
## 22. Bootloader
|
|
|
|
The bootloader is the program that loads the OS kernel into memory and transfers control to it.
|
|
|
|
Examples in Linux environments:
|
|
|
|
- GRUB
|
|
- systemd-boot
|
|
- U-Boot in embedded systems
|
|
|
|
### What the bootloader typically does
|
|
|
|
- locates the kernel image,
|
|
- loads the kernel into memory,
|
|
- often loads an initramfs or initrd,
|
|
- passes boot parameters,
|
|
- and transfers control to the kernel entry point.
|
|
|
|
### Why initramfs matters
|
|
|
|
The initial RAM filesystem contains early user-space tools and drivers needed before the real root filesystem is mounted.
|
|
|
|
That is useful when the real root depends on drivers, RAID, LVM, encryption, or network setup.
|
|
|
|
---
|
|
|
|
## 23. Kernel Initialization
|
|
|
|
Once the bootloader hands control to the kernel, the kernel starts bringing up the system.
|
|
|
|
### Major initialization tasks
|
|
|
|
- set up CPU mode and early memory structures,
|
|
- initialize page tables and memory management,
|
|
- establish interrupt and exception handling,
|
|
- initialize scheduler structures,
|
|
- initialize timers,
|
|
- discover hardware and initialize drivers,
|
|
- mount or prepare the root filesystem,
|
|
- create the first kernel and user-space execution contexts.
|
|
|
|
### Key mental model
|
|
|
|
During kernel initialization, the machine moves from a barely initialized hardware environment to a full operating-system environment with memory management, interrupt handling, device access, and process support.
|
|
|
|
---
|
|
|
|
## 24. Init and systemd Basics
|
|
|
|
After the kernel is ready to start user space, it launches the first user-space process.
|
|
|
|
On Linux, that process is traditionally called `init`, and on most modern distributions it is `systemd` as PID 1.
|
|
|
|
### Why PID 1 matters
|
|
|
|
PID 1 is special because it:
|
|
|
|
- becomes the ancestor of many processes,
|
|
- starts system services,
|
|
- manages service dependencies,
|
|
- reaps orphaned zombie processes,
|
|
- and helps define system startup state.
|
|
|
|
### What systemd adds
|
|
|
|
`systemd` is more than an init replacement. It provides:
|
|
|
|
- service management,
|
|
- dependency ordering,
|
|
- logging integration,
|
|
- socket activation,
|
|
- timer units,
|
|
- cgroup-based supervision.
|
|
|
|
### Interview note
|
|
|
|
You do not need to love `systemd`, but you should understand that after kernel initialization, user-space service orchestration begins with PID 1.
|
|
|
|
---
|
|
|
|
## 25. How Linux Boots: Power-On to Running Processes
|
|
|
|
This is the most useful Linux boot narrative to remember.
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[Power on] --> B[Firmware runs POST and early hardware init]
|
|
B --> C[BIOS or UEFI selects boot target]
|
|
C --> D[Bootloader loads kernel and initramfs]
|
|
D --> E[Kernel decompresses and enters start_kernel]
|
|
E --> F[Kernel initializes memory, scheduler, interrupts, drivers]
|
|
F --> G[Kernel mounts initramfs and finds real root filesystem]
|
|
G --> H[Kernel starts PID 1]
|
|
H --> I[systemd or init starts services and targets]
|
|
I --> J[Login shell, sshd, daemons, containers, apps]
|
|
```
|
|
|
|
### Narrative version
|
|
|
|
1. Power-on starts firmware.
|
|
2. Firmware performs POST and basic hardware initialization.
|
|
3. Firmware selects a boot target and runs the bootloader.
|
|
4. The bootloader loads the Linux kernel and often an initramfs.
|
|
5. The kernel initializes core subsystems.
|
|
6. The kernel sets up enough drivers and storage support to reach the root filesystem.
|
|
7. The kernel launches PID 1.
|
|
8. PID 1 starts the rest of user space.
|
|
9. Services such as networking, logging, SSH, container runtimes, and application daemons come up.
|
|
|
|
That is the end-to-end answer most interviewers want.
|
|
|
|
---
|
|
|
|
## 26. Real-World Linux and Backend Examples
|
|
|
|
The theory becomes much easier if you connect it to software you already know.
|
|
|
|
### Example 1: A web server reading from a socket
|
|
|
|
1. Client sends a packet.
|
|
2. NIC receives it and DMA-writes packet data into memory.
|
|
3. NIC raises an interrupt.
|
|
4. Kernel networking stack processes the packet.
|
|
5. Socket receive queue becomes readable.
|
|
6. If a thread is blocked in `epoll_wait`, the kernel wakes it.
|
|
7. The server calls `read` or `recv`.
|
|
8. Data is copied or mapped into user-visible buffers.
|
|
|
|
This ties together DMA, interrupts, drivers, kernel queues, readiness notification, and system calls.
|
|
|
|
### Example 2: Reading a file
|
|
|
|
If file data is already in the page cache, a `read()` may complete without touching disk hardware at all.
|
|
|
|
If not:
|
|
|
|
1. Kernel resolves the file and inode.
|
|
2. VFS and filesystem code determine the needed block.
|
|
3. Block layer submits storage I/O.
|
|
4. Driver and device cooperate to fetch the data.
|
|
5. Completion wakes the blocked thread.
|
|
6. Data is copied back to user space.
|
|
|
|
This is why page cache behavior matters so much for performance.
|
|
|
|
### Example 3: Non-blocking event loop on Linux
|
|
|
|
A server sets sockets to non-blocking mode and registers them with `epoll`.
|
|
|
|
Instead of blocking on each `read`, it blocks in one place, `epoll_wait`, until one or more sockets become ready. That is how a single thread can manage many mostly-idle connections.
|
|
|
|
### Example 4: `sendfile` and fewer copies
|
|
|
|
Linux can sometimes move file data to a socket more efficiently using `sendfile`, reducing user-space copying and context transitions. This is a good example of why understanding the kernel path helps explain performance features.
|
|
|
|
---
|
|
|
|
## 27. Common Interview Questions and How to Think About Them
|
|
|
|
### What is a system call?
|
|
|
|
Best answer:
|
|
|
|
> A system call is the controlled interface through which user-space code requests privileged services from the kernel, such as file I/O, process creation, memory mapping, or networking.
|
|
|
|
### What happens during a system call?
|
|
|
|
Mention:
|
|
|
|
- arguments prepared in user space,
|
|
- special CPU instruction,
|
|
- switch to kernel mode,
|
|
- kernel dispatch and validation,
|
|
- possible blocking or device interaction,
|
|
- return value back to user space.
|
|
|
|
### What is the difference between user mode and kernel mode?
|
|
|
|
Mention:
|
|
|
|
- privilege level,
|
|
- ability to execute privileged instructions,
|
|
- direct access to hardware and kernel memory,
|
|
- isolation and safety.
|
|
|
|
### Are interrupts and system calls the same thing?
|
|
|
|
Best answer:
|
|
|
|
> No. Hardware interrupts are typically asynchronous events from devices. System calls are controlled synchronous entries into the kernel initiated by the running program. Both cause privileged control transfers, but they originate differently.
|
|
|
|
### What is the difference between a trap, an exception, and an interrupt?
|
|
|
|
Good interview answer:
|
|
|
|
> Interrupts are typically asynchronous external events. Exceptions are synchronous events caused by the current instruction, such as divide-by-zero or page faults. Traps are a synchronous control-transfer category often used for deliberate software-triggered entries such as debugging breakpoints or syscall-style entry points.
|
|
|
|
### Blocking vs non-blocking I/O?
|
|
|
|
Good answer:
|
|
|
|
> Blocking and non-blocking describe whether the calling thread waits immediately. In blocking I/O the call may sleep until progress is possible. In non-blocking I/O the call returns immediately if it cannot proceed.
|
|
|
|
### Synchronous vs asynchronous I/O?
|
|
|
|
Good answer:
|
|
|
|
> Synchronous and asynchronous describe how completion is observed. In synchronous I/O completion is tied to the calling path. In asynchronous I/O the request is submitted now and completion is delivered later via a separate notification mechanism.
|
|
|
|
### What is DMA and why is it useful?
|
|
|
|
Good answer:
|
|
|
|
> DMA lets devices transfer data directly to or from RAM without forcing the CPU to copy every byte itself. That reduces CPU overhead and improves throughput for storage and networking.
|
|
|
|
### How does Linux boot?
|
|
|
|
Mention:
|
|
|
|
- firmware,
|
|
- bootloader,
|
|
- kernel image and initramfs,
|
|
- kernel initialization,
|
|
- PID 1,
|
|
- service startup.
|
|
|
|
---
|
|
|
|
## 28. Practical Scenarios Interviewers Like
|
|
|
|
### Scenario 1: Why is a service thread blocked?
|
|
|
|
Possible explanations:
|
|
|
|
- waiting in a blocking syscall such as `read`, `accept`, `futex`, or `epoll_wait`,
|
|
- blocked on disk I/O,
|
|
- sleeping on a lock or condition variable,
|
|
- waiting for network data,
|
|
- or descheduled because it is not runnable.
|
|
|
|
Good follow-up thinking:
|
|
|
|
- Is it CPU-bound or I/O-bound?
|
|
- Is it blocked in user space or kernel space?
|
|
- Is the problem contention, latency, or starvation?
|
|
|
|
### Scenario 2: Why does one slow disk hurt request latency?
|
|
|
|
Because synchronous blocking I/O can put threads to sleep while the storage path completes. If the application architecture has too little concurrency or poor queueing, tail latency grows quickly.
|
|
|
|
### Scenario 3: Why do event loops scale better than thread-per-connection for many idle sockets?
|
|
|
|
Because most connections are idle most of the time. Non-blocking sockets plus readiness notification let one thread wait efficiently for many connections instead of dedicating a blocked thread to each one.
|
|
|
|
### Scenario 4: Why does a page fault not always mean a crash?
|
|
|
|
Because many page faults are recoverable and part of normal virtual-memory behavior, such as demand paging or lazy allocation.
|
|
|
|
### Scenario 5: Why are syscalls more expensive than normal function calls?
|
|
|
|
Because they cross the protection boundary, switch privilege levels, involve kernel dispatch and validation, and may trigger scheduler interaction or device work.
|
|
|
|
---
|
|
|
|
## 29. Common Mistakes to Avoid in Interviews
|
|
|
|
- Saying non-blocking I/O and asynchronous I/O are the same thing.
|
|
- Saying a page fault always means segmentation fault.
|
|
- Saying user space directly talks to hardware in normal application code.
|
|
- Ignoring protection checks during syscall flow.
|
|
- Forgetting that the kernel may block the thread and schedule something else.
|
|
- Treating all interrupts, traps, and exceptions as identical.
|
|
- Describing BIOS, bootloader, kernel, and init as one undifferentiated startup blob.
|
|
|
|
---
|
|
|
|
## 30. A Compact Mental Model to Remember
|
|
|
|
If you need one interview-ready model, remember this:
|
|
|
|
1. Applications run in user mode with restricted privileges.
|
|
2. They ask the kernel for services through system calls.
|
|
3. The CPU and hardware enforce the user-kernel protection boundary.
|
|
4. Devices communicate readiness and completion through interrupts and DMA-assisted data movement.
|
|
5. The kernel manages scheduling, memory, device drivers, and protection.
|
|
6. Linux boot moves from firmware to bootloader to kernel to PID 1 to the rest of user space.
|
|
|
|
If you can explain those six points cleanly with one or two real Linux examples, you are already at a strong interview level.
|
|
|
|
---
|
|
|
|
## 31. Quick Revision Checklist
|
|
|
|
Before an interview, make sure you can explain each of these without hand-waving:
|
|
|
|
- Why user mode and kernel mode exist
|
|
- What a privileged instruction is
|
|
- What happens during a system call
|
|
- The difference between hardware interrupts and synchronous exceptions
|
|
- The basic interrupt-handling path
|
|
- Blocking vs non-blocking I/O
|
|
- Synchronous vs asynchronous I/O
|
|
- Buffered vs direct-style I/O
|
|
- What drivers do
|
|
- What DMA is for
|
|
- Polling vs interrupt-driven I/O
|
|
- BIOS vs UEFI
|
|
- What a bootloader does
|
|
- What the kernel initializes before user space starts
|
|
- Why PID 1 matters
|
|
|
|
If you can connect each one to a Linux server example, your understanding is in good shape.
|