Files
Computer-Fundamentals/go/05_real_world_system_design_in_go.md
T
tarun-elango be31df2d44 more text
2026-04-26 14:09:04 -04:00

14 KiB

Go: Real-World System Design in Go

Learning Objectives

  • Understand how Go is used to build production HTTP services and distributed systems.
  • Learn the request lifecycle in net/http and how handlers interact with context.
  • Design services with clear boundaries, sane package structure, and operational safety.
  • Apply timeouts, cancellation, connection reuse, and graceful shutdown correctly.
  • Recognize common backend patterns in Go for workers, queues, caches, and external service calls.
  • Reason about real-world tradeoffs rather than just writing syntax-correct code.

Why Go Works Well for Production Systems

By the time you reach system design, you should stop thinking of Go as just a language and start thinking of it as an operating model.

Go is attractive in production because it combines:

  • native binaries that are easy to ship
  • a concurrency model that fits network services well
  • a standard library strong enough to build real servers
  • explicit errors and visible control flow
  • tooling that supports fast feedback and straightforward CI

That combination makes Go common in:

  • REST and JSON APIs
  • RPC services
  • control-plane components
  • stream and queue consumers
  • gateways and reverse proxies
  • schedulers and automation tooling

The HTTP Request Lifecycle in Go

What net/http Gives You

Go's standard library includes both an HTTP server and HTTP client. You do not need a framework to build a real API.

At a high level:

  1. the server listens on a socket
  2. connections are accepted
  3. requests are parsed
  4. a handler is invoked
  5. the handler writes a response

Conceptually, the flow looks like this:

graph TD
    A[Client] --> B[Load Balancer]
    B --> C[Go HTTP Server]
    C --> D[Middleware]
    D --> E[Handler]
    E --> F[Service Layer]
    F --> G[Database]
    F --> H[Cache]
    F --> I[Downstream API]

Why This Model Is Powerful

The standard library model is deliberately small:

  • http.Handler is just an interface
  • middleware is ordinary function composition
  • routing can be simple or sophisticated depending on need

This keeps the underlying mechanics easy to understand. Even if you later use a router or framework, it usually plugs into the same http.Handler shape.

Internal Behavior That Matters

You do not need to memorize the internals, but you should know the operational consequences:

  • the server can handle many requests concurrently
  • handler code must therefore be safe under concurrency
  • request bodies and response writers are tied to request lifetime
  • request contexts are canceled when clients disconnect or the server shuts down the request

That last point is critical. Context is not decoration. It is how Go propagates lifecycle control through the call stack.

Building an Idiomatic HTTP Service

Here is a small but production-minded shape for a service:

type UserStore interface {
    Create(ctx context.Context, user User) error
}

type App struct {
    logger *slog.Logger
    store  UserStore
}

func NewApp(logger *slog.Logger, store UserStore) *App {
    return &App{logger: logger, store: store}
}

func (a *App) routes() http.Handler {
    mux := http.NewServeMux()
    mux.HandleFunc("/health", a.handleHealth)
    mux.HandleFunc("/users", a.handleCreateUser)
    return a.logging(a.recover(mux))
}

func (a *App) handleHealth(w http.ResponseWriter, r *http.Request) {
    writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
}

func (a *App) handleCreateUser(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodPost {
        http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
        return
    }

    var req struct {
        Name  string `json:"name"`
        Email string `json:"email"`
    }

    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        http.Error(w, "invalid json", http.StatusBadRequest)
        return
    }

    user := User{Name: req.Name, Email: req.Email}

    if err := a.store.Create(r.Context(), user); err != nil {
        a.logger.Error("create user", "err", err)
        http.Error(w, "internal server error", http.StatusInternalServerError)
        return
    }

    writeJSON(w, http.StatusCreated, user)
}

What This Example Demonstrates

  • constructor-based dependency injection
  • interfaces at the boundary where behavior matters
  • handlers that stay thin and pass context downward
  • standard library routing and JSON handling
  • explicit error handling rather than hidden control flow

This shape scales well. You can add middleware, tracing, validation, auth, metrics, and graceful shutdown without replacing the whole architecture.

Context in Production Request Paths

Why Context Is Central

When an HTTP request comes in, the context attached to it should usually flow through all downstream operations.

Example chain:

  • HTTP handler receives request
  • service layer validates business logic
  • repository executes SQL query
  • service makes a downstream HTTP call
  • background operation respects cancellation if appropriate

If the client disconnects or the server deadline is exceeded, that context cancellation should stop the rest of the work.

Practical Rules

  • pass ctx explicitly as the first parameter
  • use NewRequestWithContext for outbound HTTP
  • use database APIs that accept context
  • never replace request context with context.Background() in the middle of request processing unless you are intentionally detaching work

Timeouts and Deadlines

Timeouts are not just protection against slowness. They are protection against resource exhaustion.

Without timeouts:

  • goroutines can pile up waiting on I/O
  • file descriptors remain occupied
  • request latency can become unbounded
  • downstream incidents can cascade back into your service

Good Go services apply timeouts at multiple layers:

  • incoming server read and header timeouts
  • request-scoped deadlines via context
  • outbound client timeouts
  • database query timeouts

Graceful Shutdown

What It Means

Graceful shutdown means the process stops accepting new work, gives in-flight work a chance to finish within a bounded time, and then exits cleanly.

This matters for:

  • rolling deployments
  • autoscaling events
  • node drains in orchestration platforms
  • operator-triggered restarts

Example

func run() error {
    logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
    app := NewApp(logger, newInMemoryStore())

    srv := &http.Server{
        Addr:              ":8080",
        Handler:           app.routes(),
        ReadHeaderTimeout: 2 * time.Second,
    }

    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
    defer stop()

    go func() {
        <-ctx.Done()

        shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
        defer cancel()

        _ = srv.Shutdown(shutdownCtx)
    }()

    if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
        return err
    }

    return nil
}

Why It Matters Internally

If you just kill the process abruptly:

  • in-flight requests get dropped
  • partial writes may occur
  • background jobs may stop mid-operation
  • queue acknowledgments may be inconsistent

Graceful shutdown is an operational correctness feature, not just polish.

Outbound Clients, Connection Pools, and Resource Reuse

HTTP Clients

Creating a new http.Client per request is usually a mistake. Clients and transports manage connection reuse.

Why reuse matters:

  • avoids needless TCP and TLS setup costs
  • improves latency
  • reduces load on downstream services

At the same time, the zero-value default client setup is not enough for every production case. You usually want explicit timeouts and transport tuning.

Database Pools

Packages like database/sql manage connection pools. Your job is to configure them sanely and use context-aware operations.

Important operational knobs include:

  • max open connections
  • max idle connections
  • connection lifetime

These are part of system design, not just code details. Wrong pool settings can overload databases or starve your service.

Service Architecture Patterns in Go

Composition Root in main

In Go, main often acts as the composition root where you:

  • load config
  • initialize logging
  • create clients and stores
  • wire dependencies together
  • start the server or worker

This keeps wiring visible and avoids magic containers.

Thin Handlers, Clear Services

A healthy pattern is:

  • handlers translate transport concerns
  • services handle business logic
  • repositories or clients handle I/O boundaries

Do not turn this into rigid architecture theater. The point is clarity, not layers for their own sake.

Interfaces at Edges

Use interfaces where they help isolate external systems or enable tests. Do not create an interface for every struct just because a pattern from another language told you to.

Background Workers and Queues

Go is also strong for worker processes.

Common worker responsibilities:

  • poll or receive jobs
  • decode payloads
  • apply business logic
  • talk to storage or downstream services
  • retry or dead-letter on failure

A production worker often combines:

  • bounded concurrency
  • context-driven shutdown
  • idempotent processing
  • metrics and tracing
  • retry with backoff

Why Idempotency Matters

Distributed systems retry. That means your worker or API should behave safely when the same logical operation arrives more than once.

Examples:

  • charging an order only once
  • ignoring duplicate event delivery with a deduplication key
  • using upserts or unique constraints to protect state transitions

Resilience Patterns in Go Services

Retries

Retries can improve reliability, but they are dangerous when used carelessly.

Use retries when:

  • the error is transient
  • the operation is safe to retry
  • you apply limits and backoff

Do not blindly retry every failure. That can turn a partial outage into a full overload event.

Backpressure and Bounded Concurrency

Every service has finite CPU, memory, DB connections, and downstream quota. Good Go systems acknowledge this with:

  • worker pool limits
  • channel buffer sizing based on real capacity, not guesswork
  • request timeouts
  • queue sizing and shedding strategies

Caching

Caches reduce latency and downstream load, but they introduce staleness, invalidation complexity, and memory pressure.

In Go services, a cache may be:

  • in-memory with mutex protection
  • external like Redis
  • layered with local plus remote caching

Choose based on consistency needs and failure modes, not just speed.

Observability: Systems Need to Explain Themselves

Logging

Structured logs are easier to query and correlate than ad hoc strings. Go's log/slog is a good default in modern code.

Metrics

Metrics help answer:

  • how many requests or jobs are happening
  • how often errors occur
  • how long operations take
  • whether queues, pools, or workers are saturating

Tracing

Tracing becomes valuable once a request crosses multiple services. Go's context propagation model fits tracing naturally because trace metadata can move alongside request lifecycle.

Profiling

When a service is slow or memory-hungry, use profiling rather than guesswork. Go's pprof ecosystem is one of the language's strongest practical advantages.

A Realistic Service Architecture Example

graph TD
    A[Client] --> B[API Gateway]
    B --> C[Go API Service]
    C --> D[Auth Middleware]
    D --> E[Business Service]
    E --> F[(Postgres)]
    E --> G[(Redis Cache)]
    E --> H[Message Broker]
    H --> I[Go Worker Service]
    I --> F
    C --> J[Observability Stack]
    I --> J

Why Go fits this architecture well:

  • API and worker components can share libraries and tooling
  • binaries are easy to containerize
  • concurrency model fits request handling and job processing
  • context propagation helps with cancellation and tracing

Real-World Usage Patterns

JSON API Service

Go is widely used for services that accept JSON, validate input, call storage or other APIs, and return typed responses.

Internal Platform Components

Controllers, schedulers, reconcilers, and long-running agents are natural Go workloads because they need networking, concurrency, and operational predictability.

Data and Event Processing

Consumers and workers benefit from Go's lightweight concurrency and straightforward deployment model.

Common Mistakes and Misconceptions

Mistake: Starting with a Framework Instead of Understanding net/http

Frameworks can help, but you should first understand the handler model underneath them.

Mistake: Ignoring Timeouts

Untimed network calls are operational liabilities.

Mistake: Creating New Clients Per Request

That defeats connection reuse and often harms performance badly.

Mistake: Letting Handlers Contain All Business Logic

This makes testing harder and transport concerns bleed into domain behavior.

Mistake: Launching Background Goroutines Without Shutdown Strategy

Every long-lived goroutine in a service should have a lifecycle story.

Mistake: Overabstracting Everything into Interfaces

Use interfaces deliberately at boundaries, not as decoration.

Mistake: Forgetting That Handlers Run Concurrently

Shared state in a server must be synchronized properly.

Summary

Production Go system design is about making the whole service lifecycle explicit:

  • request entry through net/http
  • context propagation through each downstream operation
  • bounded concurrency and sensible resource reuse
  • graceful shutdown during deploys and failures
  • clear package and dependency boundaries
  • observability and performance measurement built into the operational model

The most important mindset shift is this: idiomatic Go systems are not built around hidden magic. They are built around visible control flow, explicit dependencies, clear boundaries, and operationally honest concurrency.

That is exactly why Go remains such a strong language for backend and distributed systems engineering.