more text
This commit is contained in:
@@ -0,0 +1,384 @@
|
||||
# File 4: Advanced Java Concepts
|
||||
|
||||
This file covers the topics that make Java powerful in long-running, high-throughput systems: concurrency, the memory model, garbage collection, streams, and performance thinking. These are the areas where beginner knowledge often stops and engineering maturity starts.
|
||||
|
||||
You do not need to become a JVM internals expert on day one, but you do need a solid mental model. Otherwise, concurrency bugs, latency spikes, and memory issues will feel mysterious. The goal here is to remove that mystery.
|
||||
|
||||
## Multithreading and Concurrency
|
||||
|
||||
### Intuition
|
||||
|
||||
A thread is an independent path of execution inside a process. Java supports multiple threads so a program can do more than one thing at the same time or keep making progress while other work waits on I/O.
|
||||
|
||||
In a backend service, threads may handle:
|
||||
|
||||
- incoming HTTP requests
|
||||
- scheduled jobs
|
||||
- asynchronous message processing
|
||||
- background cleanup work
|
||||
- database or network calls coordinated by worker pools
|
||||
|
||||
### Why Concurrency Exists
|
||||
|
||||
Without concurrency, a service that waits on slow operations would waste time doing nothing. With concurrency, other work can proceed while one task waits.
|
||||
|
||||
### Basic Example
|
||||
|
||||
```java
|
||||
Runnable task = () -> System.out.println("Processing in " + Thread.currentThread().getName());
|
||||
Thread worker = new Thread(task);
|
||||
worker.start();
|
||||
```
|
||||
|
||||
This creates and starts a new thread, but in production systems you usually prefer executors and thread pools over manually creating raw threads.
|
||||
|
||||
### Thread Pool Mental Model
|
||||
|
||||
Instead of constantly creating and destroying threads, a thread pool reuses a fixed or managed set of worker threads.
|
||||
|
||||
That reduces overhead and gives you control over concurrency level.
|
||||
|
||||
```java
|
||||
ExecutorService executor = Executors.newFixedThreadPool(4);
|
||||
executor.submit(() -> processOrder(orderId));
|
||||
```
|
||||
|
||||
### Real-World Use Cases
|
||||
|
||||
- processing multiple independent tasks in parallel
|
||||
- handling high request volume in servers
|
||||
- offloading email or report generation from the request thread
|
||||
- consuming events from a queue with worker threads
|
||||
|
||||
### Common Pitfall
|
||||
|
||||
More threads do not automatically mean better performance. Too many threads can increase context switching, memory pressure, lock contention, and tail latency.
|
||||
|
||||
## Thread Lifecycle
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> New
|
||||
New --> Runnable: start()
|
||||
Runnable --> Running: scheduled by CPU
|
||||
Running --> Blocked: waiting for lock or I/O
|
||||
Blocked --> Runnable: lock or I/O available
|
||||
Running --> Waiting: wait/sleep/join
|
||||
Waiting --> Runnable: notified or timeout
|
||||
Running --> Terminated: run completes
|
||||
```
|
||||
|
||||
### Why the Lifecycle Matters
|
||||
|
||||
If you debug a production incident involving stuck requests or slow jobs, understanding whether threads are runnable, blocked, waiting, or deadlocked becomes very important. Tools like thread dumps only make sense if you understand these states.
|
||||
|
||||
## Synchronization
|
||||
|
||||
Concurrency becomes difficult when multiple threads access shared mutable state.
|
||||
|
||||
### The Core Problem
|
||||
|
||||
Imagine two threads incrementing the same counter.
|
||||
|
||||
```java
|
||||
counter++;
|
||||
```
|
||||
|
||||
This looks atomic, but it is actually multiple steps:
|
||||
|
||||
1. read current value
|
||||
2. add one
|
||||
3. write new value back
|
||||
|
||||
If two threads interleave these steps, updates can be lost.
|
||||
|
||||
### `synchronized`
|
||||
|
||||
Java's `synchronized` keyword provides mutual exclusion and visibility guarantees.
|
||||
|
||||
```java
|
||||
public class Counter {
|
||||
private int value;
|
||||
|
||||
public synchronized void increment() {
|
||||
value++;
|
||||
}
|
||||
|
||||
public synchronized int getValue() {
|
||||
return value;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Only one thread can execute a synchronized method or block on the same monitor at a time.
|
||||
|
||||
### Why It Works
|
||||
|
||||
Synchronization does two important things:
|
||||
|
||||
- prevents unsafe concurrent access to the protected region
|
||||
- ensures memory visibility so threads see up-to-date values when entering and leaving synchronized regions
|
||||
|
||||
### Real-World Use Cases
|
||||
|
||||
- protecting shared in-memory caches or mutable counters
|
||||
- coordinating state transitions in schedulers and worker systems
|
||||
- ensuring thread-safe updates in domain objects used by multiple threads
|
||||
|
||||
### Common Pitfall
|
||||
|
||||
Synchronizing too broadly can kill throughput. Synchronizing too narrowly can fail to protect the actual shared invariant.
|
||||
|
||||
## Locks and Higher-Level Concurrency Utilities
|
||||
|
||||
Beyond `synchronized`, Java provides explicit lock types and concurrency utilities in `java.util.concurrent`.
|
||||
|
||||
### `ReentrantLock`
|
||||
|
||||
This can be useful when you need features beyond simple intrinsic locking, such as timed lock acquisition or finer control.
|
||||
|
||||
```java
|
||||
Lock lock = new ReentrantLock();
|
||||
lock.lock();
|
||||
try {
|
||||
updateSharedState();
|
||||
} finally {
|
||||
lock.unlock();
|
||||
}
|
||||
```
|
||||
|
||||
### Other Important Utilities
|
||||
|
||||
- `AtomicInteger`, `AtomicLong`: lock-free atomic updates for simple counters and state
|
||||
- `ConcurrentHashMap`: concurrent map implementation for many common shared lookup patterns
|
||||
- `CountDownLatch`: wait until several tasks complete
|
||||
- `Semaphore`: limit concurrent access to a resource
|
||||
- `CompletableFuture`: compose asynchronous operations more cleanly
|
||||
|
||||
### Production Relevance
|
||||
|
||||
These utilities are common in rate limiting, request fan-out, concurrent caches, asynchronous orchestration, and worker coordination.
|
||||
|
||||
## `volatile`
|
||||
|
||||
`volatile` is often misunderstood.
|
||||
|
||||
### What It Guarantees
|
||||
|
||||
A volatile field provides visibility. When one thread writes a new value, other threads reading that field will see the latest value.
|
||||
|
||||
```java
|
||||
private volatile boolean running = true;
|
||||
```
|
||||
|
||||
This is useful for simple state flags.
|
||||
|
||||
### What It Does Not Guarantee
|
||||
|
||||
`volatile` does not make compound operations atomic.
|
||||
|
||||
This is still unsafe:
|
||||
|
||||
```java
|
||||
volatile int count;
|
||||
count++;
|
||||
```
|
||||
|
||||
The increment is still a read-modify-write sequence.
|
||||
|
||||
### Good Use Case
|
||||
|
||||
Stopping a background loop cleanly:
|
||||
|
||||
```java
|
||||
while (running) {
|
||||
pollQueue();
|
||||
}
|
||||
```
|
||||
|
||||
### Bad Use Case
|
||||
|
||||
Using `volatile` as a replacement for proper locking around shared mutable objects.
|
||||
|
||||
## JVM Memory Model
|
||||
|
||||
The Java Memory Model explains how threads interact through memory and what guarantees exist around reads, writes, ordering, and visibility.
|
||||
|
||||
### Intuition
|
||||
|
||||
In a multithreaded system, one thread can update a value, but another thread may not immediately observe that update unless the program uses the right synchronization rules.
|
||||
|
||||
This is because CPUs, caches, compilers, and runtimes all optimize memory access.
|
||||
|
||||
### Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[Main Memory] --> B[Thread 1 Working Memory]
|
||||
A --> C[Thread 2 Working Memory]
|
||||
B --> D[Local reads and writes]
|
||||
C --> E[Local reads and writes]
|
||||
F[synchronized or volatile] --> A
|
||||
```
|
||||
|
||||
### Why Engineers Care
|
||||
|
||||
Without the memory model, some code would appear to "work on my machine" and fail under load or on different hardware.
|
||||
|
||||
The important practical lesson is not to memorize the whole specification. The important lesson is:
|
||||
|
||||
- do not share mutable state casually between threads
|
||||
- use proper synchronization tools when you do share it
|
||||
- visibility and atomicity are different concerns
|
||||
|
||||
## Garbage Collection
|
||||
|
||||
Java manages memory automatically through garbage collection, but automatic does not mean irrelevant.
|
||||
|
||||
### Intuition
|
||||
|
||||
Instead of manually freeing objects, Java tracks which objects are still reachable. Unreachable objects become candidates for collection.
|
||||
|
||||
### Simplified Heap View
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Application creates objects] --> B[Objects live on heap]
|
||||
B --> C[Reachable from roots]
|
||||
B --> D[Unreachable objects]
|
||||
D --> E[Garbage collector reclaims memory]
|
||||
```
|
||||
|
||||
### What Are Roots
|
||||
|
||||
Objects are considered reachable if they can be reached from GC roots such as:
|
||||
|
||||
- active thread stacks
|
||||
- static references
|
||||
- JNI references and runtime internals
|
||||
|
||||
### Why This Matters in Real Systems
|
||||
|
||||
GC behavior affects:
|
||||
|
||||
- latency spikes
|
||||
- memory footprint
|
||||
- throughput
|
||||
- container sizing decisions
|
||||
|
||||
### Common Memory Problems in Java
|
||||
|
||||
- retaining objects in caches without eviction
|
||||
- storing large collections in long-lived singletons
|
||||
- leaking listeners or thread-local data
|
||||
- creating too many short-lived temporary objects in hot code paths
|
||||
|
||||
### Misconception
|
||||
|
||||
"Java cannot have memory leaks because it has garbage collection" is false. Java absolutely can have memory leaks if your code keeps references to objects that are no longer useful.
|
||||
|
||||
## Streams API and Functional Programming Style
|
||||
|
||||
Java's Streams API provides a declarative way to process collections and sequences of data.
|
||||
|
||||
### Intuition
|
||||
|
||||
Instead of describing every loop step manually, you describe a pipeline of transformations.
|
||||
|
||||
```java
|
||||
List<String> emails = users.stream()
|
||||
.filter(User::isActive)
|
||||
.map(User::getEmail)
|
||||
.sorted()
|
||||
.toList();
|
||||
```
|
||||
|
||||
### What Happens Conceptually
|
||||
|
||||
The stream pipeline is lazy until a terminal operation runs. Operations like `filter` and `map` describe work. A terminal operation such as `toList()`, `count()`, or `forEach()` triggers execution.
|
||||
|
||||
### Why This Is Useful
|
||||
|
||||
Streams can make transformation pipelines clearer when the logic is naturally data-oriented.
|
||||
|
||||
Real-world examples:
|
||||
|
||||
- filtering valid requests
|
||||
- mapping entities to DTOs
|
||||
- aggregating metrics
|
||||
- grouping records for reporting
|
||||
|
||||
### When Streams Help
|
||||
|
||||
- straightforward transformations and aggregations
|
||||
- collection processing where each step is conceptually distinct
|
||||
- code that benefits from a pipeline style
|
||||
|
||||
### When Plain Loops Are Better
|
||||
|
||||
- highly stateful logic
|
||||
- error handling that becomes awkward in a stream chain
|
||||
- performance-sensitive paths where allocation and readability must be controlled carefully
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
- overusing streams for complex business workflows that become unreadable
|
||||
- assuming streams are always faster than loops
|
||||
- using shared mutable state inside stream operations
|
||||
- misunderstanding laziness and side effects
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
Performance in Java is not just about writing "fast code." It is about understanding tradeoffs across CPU, memory, I/O, latency, and concurrency.
|
||||
|
||||
### Key Areas to Watch
|
||||
|
||||
- object allocation rate
|
||||
- unnecessary boxing and unboxing
|
||||
- poor data structure choice
|
||||
- lock contention
|
||||
- blocking I/O on critical threads
|
||||
- large heap pressure and GC pauses
|
||||
|
||||
### Example: Data Structure Choice Matters
|
||||
|
||||
If you repeatedly test membership in a list of one million IDs, a `HashSet` is usually a much better fit than a `List` because lookup behavior is different.
|
||||
|
||||
### Example: Avoiding Work on Hot Paths
|
||||
|
||||
If a request path is called tens of thousands of times per second, avoid unnecessary logging, object churn, repeated parsing, or expensive string formatting.
|
||||
|
||||
### Production Perspective
|
||||
|
||||
Java performance work should be evidence-driven. Good engineers measure before changing code. They use:
|
||||
|
||||
- metrics and tracing
|
||||
- profiling tools
|
||||
- thread dumps
|
||||
- heap dumps
|
||||
- realistic load tests
|
||||
|
||||
Premature optimization is a problem, but ignoring obvious bottlenecks is also a problem. Performance work is about judgment, not superstition.
|
||||
|
||||
## How Advanced Java Shows Up in Real Systems
|
||||
|
||||
Consider a service that consumes messages from a queue:
|
||||
|
||||
1. a thread pool pulls messages concurrently
|
||||
2. shared state such as rate limits or caches needs safe access
|
||||
3. visibility and ordering matter between worker threads
|
||||
4. processed payloads create object allocations on the heap
|
||||
5. stream pipelines may transform batches for enrichment or filtering
|
||||
6. GC and lock contention affect latency under load
|
||||
|
||||
This is why advanced Java matters. These are not academic concerns. They directly influence system correctness and performance.
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
- Concurrency is about safely making progress with multiple threads, not just spawning more work in parallel.
|
||||
- `synchronized`, locks, atomics, and concurrent collections exist because shared mutable state is hard to manage correctly.
|
||||
- `volatile` provides visibility, not full atomicity.
|
||||
- The Java Memory Model explains why synchronization rules matter for correctness across threads.
|
||||
- Garbage collection removes manual memory management, but memory leaks and GC-related latency issues still exist.
|
||||
- Streams are powerful for data transformation pipelines, but they are not automatically clearer or faster in every situation.
|
||||
- Strong Java engineers combine runtime understanding with measurement rather than guessing about performance.
|
||||
Reference in New Issue
Block a user