more text

This commit is contained in:
tarun-elango
2026-04-26 14:09:04 -04:00
parent 26810e43d0
commit be31df2d44
22 changed files with 10664 additions and 0 deletions
+465
View File
@@ -0,0 +1,465 @@
# Go: Concurrency and Goroutines
## Learning Objectives
- Understand the difference between concurrency and parallelism.
- Learn how goroutines work and why they are cheaper than OS threads.
- Use channels, `select`, mutexes, and synchronization primitives appropriately.
- Understand the `context` package as the control plane for cancellation and deadlines.
- Build a light but correct mental model of Go's memory model and data races.
- Recognize common production concurrency patterns and the bugs that come with them.
## Why Concurrency Matters in Go
Go became popular partly because it made concurrent programming feel accessible.
Backend and systems software naturally deals with many things at once:
- handling multiple HTTP requests
- waiting on databases and other services
- processing jobs from queues
- streaming data through pipelines
- watching timers, sockets, and shutdown signals
If you handle all of that in a single linear flow, the program spends a lot of time idle. Concurrency lets you structure work so independent tasks can make progress without blocking each other unnecessarily.
### Concurrency vs Parallelism
- concurrency is about structuring many tasks in progress
- parallelism is about tasks literally running at the same time on multiple CPU cores
Go helps with both, but it starts with concurrency as a programming model.
## Goroutines: Lightweight Concurrent Execution
### What They Are
A goroutine is a function executing independently from other goroutines.
```go
go sendEmail(userID)
```
That single keyword starts a concurrent unit of execution.
### Why Goroutines Exist
OS threads are powerful, but they are relatively expensive to create and manage directly. Go wanted a lighter abstraction so programs could comfortably run thousands or even millions of concurrent tasks, as long as the workload and memory usage made that reasonable.
### How Goroutines Work Internally
Go uses an M:N scheduler. In simplified terms:
- many goroutines are multiplexed onto fewer OS threads
- the runtime scheduler decides which goroutine runs where
- the scheduler cooperates with the runtime and system calls to keep work moving
The common mental model is `G`, `M`, and `P`:
- `G` is a goroutine
- `M` is an OS thread, called a machine in runtime terminology
- `P` is a processor token that lets Go code execute and carries scheduler state
```mermaid
flowchart LR
G1[Goroutine] --> P1[Processor P]
G2[Goroutine] --> P1
G3[Goroutine] --> P2[Processor P]
P1 --> M1[OS Thread M]
P2 --> M2[OS Thread M]
```
This model lets Go keep concurrency cheap while still using real CPU parallelism when available.
### Why Goroutines Feel Cheap
They start with small stacks that can grow as needed. That is very different from traditional thread models where each thread may reserve a much larger stack up front.
Still, "cheap" does not mean "free."
Each goroutine has:
- scheduler overhead
- stack memory
- potential references keeping heap data alive
Launching goroutines without bounds in a busy server can still create memory pressure and operational problems.
## Waiting for Goroutines to Finish
When goroutines need coordination, a common tool is `sync.WaitGroup`.
```go
var wg sync.WaitGroup
for _, id := range ids {
wg.Add(1)
go func(userID int64) {
defer wg.Done()
processUser(userID)
}(id)
}
wg.Wait()
```
Why it exists:
- lets one part of the program wait for a known set of concurrent tasks
- keeps coordination explicit without using channels for every case
In production code, `WaitGroup` is often simpler than a custom done channel when you only need task completion, not data transfer.
## Channels: Communication and Coordination
### What a Channel Is
A channel is a typed conduit used to send values between goroutines.
```go
jobs := make(chan int)
results := make(chan string)
```
### Why Channels Exist
Go popularized the idea "share memory by communicating." The point is not that shared memory is forbidden. The point is that ownership transfer through communication is often easier to reason about than unrestricted shared mutation.
Channels are useful for:
- handing work to workers
- propagating results
- signaling completion
- coordinating pipelines
### Unbuffered Channels
An unbuffered channel requires sender and receiver to synchronize.
```go
done := make(chan struct{})
go func() {
fmt.Println("work complete")
done <- struct{}{}
}()
<-done
```
Why this matters:
- send and receive form a handoff point
- it is both data transfer and synchronization
### Buffered Channels
A buffered channel can hold a fixed number of values without an immediate receiver.
```go
queue := make(chan string, 100)
queue <- "task-1"
```
Why buffered channels exist:
- smooth over short bursts
- decouple producer and consumer timing somewhat
- model bounded queues naturally
Do not treat buffering as magic. A large enough producer can still fill the buffer and block.
### Closing Channels
Closing a channel means no more values will be sent.
```go
close(queue)
```
Rules that matter:
- only close from the sending side when it owns completion
- do not close a channel just because you are done receiving from it
- sending on a closed channel panics
Receivers can use the two-result form:
```go
value, ok := <-queue
```
When `ok` is false, the channel is closed and drained.
### When Not to Use Channels
Channels are excellent, but not universal. If you just need to protect a shared map or counter, a mutex may be simpler. Overusing channels can make code look concurrent while actually becoming harder to understand.
## `select`: Wait on Multiple Communication Paths
`select` lets a goroutine wait on multiple channel operations.
```go
select {
case result := <-results:
fmt.Println("got result", result)
case <-time.After(200 * time.Millisecond):
fmt.Println("timed out")
}
```
Why it exists:
- real systems often wait on multiple events
- timeouts and cancellation are first-class concerns
- many concurrent flows need to react to whichever signal arrives first
### Real-World Use: Timeout and Cancellation
```go
select {
case msg := <-incoming:
handle(msg)
case <-ctx.Done():
return ctx.Err()
}
```
This is the backbone of responsive concurrent systems in Go: do work if possible, but remain interruptible.
## The `context` Package: Cancellation, Deadlines, and Scope
### What It Is
`context.Context` carries request-scoped cancellation, deadlines, and small pieces of request metadata across API boundaries.
```go
func FetchUser(ctx context.Context, id string) (User, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://example.internal/users/"+id, nil)
if err != nil {
return User{}, err
}
// send request with ctx-aware client
return User{}, nil
}
```
### Why It Exists
In distributed systems, work rarely lives in a single function. An HTTP request may trigger:
- JSON parsing
- database queries
- downstream HTTP calls
- cache lookups
- logging and tracing
If the client disconnects or a deadline expires, you want the whole chain to stop promptly. Context gives the program a standard way to express that control signal.
### How It Works Internally
Contexts form a tree.
- a parent context can be derived into child contexts
- canceling the parent cancels all children
- deadlines propagate downward
```mermaid
graph TD
A[Background Context] --> B[HTTP Request Context]
B --> C[DB Query Context]
B --> D[Downstream API Context]
B --> E[Worker Task Context]
```
### Rules for Using Context Correctly
- pass it as the first parameter by convention
- do not store it inside structs for long-lived use
- do not use it as a bag of optional business parameters
- respect cancellation by checking `ctx.Done()` or using context-aware APIs
### Common Misuse
Putting every random value into context makes code opaque. Use context values only for request-scoped metadata that crosses process boundaries or middleware layers, such as trace IDs or auth claims when your framework expects it.
## Mutexes, RWMutexes, and Atomics
### Why These Exist Alongside Channels
The slogan "share memory by communicating" is helpful, but it is not a religion. Some problems are fundamentally shared-state problems.
Example:
- protecting a cache map
- incrementing metrics counters
- updating a shared in-memory registry
For these, a mutex is often clearer than designing a special manager goroutine and channel protocol.
### `sync.Mutex`
```go
type Counter struct {
mu sync.Mutex
value int64
}
func (c *Counter) Inc() {
c.mu.Lock()
defer c.mu.Unlock()
c.value++
}
```
Why it works:
- only one goroutine can hold the lock at a time
- the critical section becomes explicit
### `sync.RWMutex`
Useful when reads are much more frequent than writes, but do not assume it is always faster. Its benefits depend on workload and contention patterns.
### `sync/atomic`
Atomic operations are useful for low-level counters, flags, and lock-free coordination where the semantics are simple and precise.
Use atomics carefully. They are powerful but easy to misuse if you do not understand memory ordering and invariants.
## The Go Memory Model, Lightly Explained
The memory model answers a critical question: when one goroutine writes data, when is another goroutine guaranteed to see it?
If two goroutines touch the same variable without proper synchronization and at least one access is a write, you have a data race.
This is not a style issue. It is a correctness bug.
### Synchronization Creates Visibility Guarantees
Common happens-before edges include:
- sending on a channel before the corresponding receive completes
- unlocking a mutex before a later lock on that mutex
- closing a channel before receives observe closure
- `WaitGroup` and other primitives coordinating completion
If you rely on plain timing, such as "the other goroutine will probably run first," you do not have a guarantee.
### Why This Matters in Production
Data races can pass tests and still fail under load, on different CPUs, or only once every few days. That is why race bugs are among the most frustrating backend failures.
Use the race detector early:
```bash
go test -race ./...
```
## Concurrency Patterns You Will Actually Use
### Worker Pool
Useful when you have many jobs but want bounded concurrency.
```go
func worker(id int, jobs <-chan int, results chan<- int) {
for job := range jobs {
results <- job * 2
}
}
```
```mermaid
flowchart LR
A[Job Producer] --> B[Buffered Jobs Channel]
B --> C[Worker 1]
B --> D[Worker 2]
B --> E[Worker 3]
C --> F[Results Channel]
D --> F
E --> F
```
Why it exists:
- prevents unbounded goroutine creation
- smooths throughput
- matches CPU or downstream capacity constraints
### Fan-Out and Fan-In
This pattern sends work to multiple goroutines and merges results back together. It is common in API aggregation, search, and parallel I/O.
### Pipelines
Each stage reads from an input channel, transforms data, and sends to an output channel. This is useful for streaming transformations, though you must design cancellation carefully or you can leak goroutines when downstream stops consuming.
### Bounded Semaphores with Channels
A buffered channel can act as a semaphore controlling how many operations run at once. This is handy for limiting downstream API calls or database work.
## Real-World Usage Patterns
### HTTP Request Fan-Out
An API gateway might receive one request, then concurrently ask a profile service, inventory service, and pricing service for data. Context cancellation ensures that if the client goes away or a deadline expires, those downstream calls stop too.
### Background Job Processing
A worker service reading from a queue often uses:
- one intake goroutine
- a bounded worker pool
- retry logic
- context cancellation for shutdown
- metrics on success, failure, and latency
### Streaming and Event Processing
Go is good at managing concurrent streams from sockets, brokers, or internal pipelines because goroutines map well to independent flows of work.
## Common Mistakes and Misconceptions
### Mistake: Spawning Unbounded Goroutines
If every request starts many goroutines without a limit, memory and scheduler pressure can explode under load.
### Mistake: Forgetting Cancellation
Goroutines that wait forever on channels, I/O, or timers become leaks. In servers, leaked goroutines are a real operational bug.
### Mistake: Closing Channels from the Wrong Side
Channel closure should usually be owned by the sender that knows when production is complete.
### Mistake: Using Channels for Everything
Sometimes a mutex is the simplest and most correct tool.
### Mistake: Assuming Concurrent Means Safe
Starting work in multiple goroutines does not automatically make the code synchronized. Shared state still needs a correctness story.
### Mistake: Ignoring the Race Detector
If you write concurrent Go and do not run `go test -race`, you are skipping one of the most useful safety tools in the ecosystem.
### Mistake: Misusing Context Values
Context is for cancellation, deadlines, and narrow request-scoped metadata. It is not general dependency injection.
## Summary
Go concurrency is powerful because it combines a simple source-level model with strong runtime support.
- goroutines make concurrent work cheap to express
- channels coordinate ownership transfer and signaling
- `select` handles multiple events, timeouts, and cancellation
- mutexes and atomics remain essential for shared-state problems
- `context` is the control plane for request-scoped work
- the memory model and race detector protect correctness when multiple goroutines interact
The next step is learning how to organize real Go codebases: packages, modules, tests, benchmarks, and the toolchain that keeps production Go code clean and maintainable.