Computer-Fundamentals/systems design/2.identityAccess.md

# 2. Identity & Access

Identity and access is the control plane for nearly every backend system. It answers four questions for every request:

1. Who is calling?
2. How do we know they are really that caller?
3. What are they allowed to do right now?
4. How do we prove later that the decision was correct?

If you understand identity and access well, you can reason about login systems, sessions, JWTs, OAuth integrations, enterprise SSO, authorization policies, service-to-service security, and zero-trust architecture as one connected system rather than as isolated buzzwords.

This guide is written for two goals at the same time:

- interview preparation
- real-world backend and system design understanding

The emphasis is practical. The goal is not to memorize definitions, but to understand why these systems exist, how they fail, and how production systems are actually built.

---

## Table of Contents

1. Why Identity & Access Exists
2. Core Concepts and Mental Model
3. Authentication Fundamentals
4. Login and Signup
5. Sessions
6. JWT and Token-Based Authentication
7. OAuth
8. SSO: SAML and OIDC
9. Password Reset
10. Authorization Fundamentals
11. RBAC
12. ABAC
13. Permissions and Access Control
14. Service-to-Service Authentication
15. How These Systems Fit Together
16. Real-World Patterns and Company Examples
17. Interview Discussion Guide
18. Common Mistakes and Best Practices

---

## 1. Why Identity & Access Exists

Most systems are multi-user, multi-device, multi-service, and increasingly multi-tenant. Without identity and access controls, the backend has no safe way to distinguish:

- one user from another
- a user from an attacker
- an employee from a customer
- a production service from a compromised internal service
- a legitimate action from a replayed or forged request

At small scale, identity and access looks like a login form plus a password check. At production scale, it becomes much bigger:

- account creation and identity proofing
- credential storage and recovery
- MFA and risk detection
- sessions and token lifecycle management
- delegated access via OAuth
- enterprise federation via SSO
- role and policy evaluation
- service identity inside microservices
- auditing, revocation, key rotation, and incident response

The reason interviews ask about identity and access so often is simple: it touches security, data modeling, distributed systems, product tradeoffs, and failure handling all at once.

### The Core Tension

Identity systems always balance three goals:

| Goal | What it means | Why it is hard |
| --- | --- | --- |
| Security | Prevent impersonation and unauthorized access | Stronger security usually adds friction |
| Usability | Let real users sign in quickly and recover safely | Easier flows are often easier to abuse |
| Scalability | Support huge traffic, many services, and many tenants | Distributed state and revocation become harder |

An excellent backend engineer treats identity not as a feature checkbox, but as a reliability and security subsystem.

---

## 2. Core Concepts and Mental Model

Before discussing flows, build the right mental model.

### Important Terms

| Term | Meaning | Practical intuition |
| --- | --- | --- |
| Identity | The subject being represented | A user, admin, device, service, or organization |
| Authentication (AuthN) | Verifying who the subject is | "Prove you are Alice" |
| Authorization (AuthZ) | Deciding what that subject may do | "Can Alice read invoice 123?" |
| Session | Server-recognized authenticated continuity over time | "This browser remains logged in" |
| Access token | Credential presented to APIs | Often short-lived |
| Refresh token | Credential used to obtain new access tokens | More sensitive than access tokens |
| Identity Provider (IdP) | System that authenticates identities | Google, Okta, Azure AD |
| Service Provider / Relying Party | App that trusts the IdP | Your SaaS product |
| Policy engine | Evaluates access rules | RBAC, ABAC, ReBAC, custom rules |
| Audit log | Immutable trail of security-relevant events | Needed for forensics and compliance |

### One Request Through the System

```mermaid
sequenceDiagram
	actor User
	participant Client
	participant Edge as API Gateway / Edge
	participant Auth as Auth Service
	participant Policy as Policy Engine
	participant App as Business Service
	participant Data as Data Store

	User->>Client: Click "View invoice"
	Client->>Edge: GET /invoices/123 + cookie/token
	Edge->>Auth: Validate session/token
	Auth-->>Edge: subject, tenant, auth strength, claims
	Edge->>Policy: Can subject read invoice 123?
	Policy-->>Edge: allow/deny + reason
	Edge->>App: Forward authenticated request
	App->>Data: Load resource
	Data-->>App: Resource data
	App-->>Client: 200 OK or 403 Forbidden
```

This is the simplest correct mental model:

- authentication establishes identity
- authorization evaluates permissions for the requested action
- business logic executes only after those checks
- the decision should be observable and auditable

### A Production Identity Stack

In a real system, identity and access usually spans these components:

| Component | Typical responsibility |
| --- | --- |
| Auth service | Login, signup, password verification, MFA, token issuance |
| User directory | Users, credentials metadata, verification state, tenant membership |
| Session store | Server-side sessions and revocation state |
| Token service | Access token and refresh token lifecycle |
| Policy engine | Role/attribute-based access decisions |
| Key management | Signing keys, encryption keys, secret rotation |
| Audit pipeline | Security events, admin actions, login failures, policy decisions |
| Risk engine | Rate limits, device reputation, fraud checks, anomaly detection |

Interview shortcut: if you can clearly separate authentication, session/token management, and authorization, you already sound more senior than candidates who collapse them into one vague "auth layer".

---

## 3. Authentication Fundamentals

Authentication is the process of verifying identity claims. The claim is usually, "I am user X" or "I am service Y".

### 3.1 Identity Verification Basics

Authentication depends on evidence. The most common categories are:

| Factor | Example | Strengths | Weaknesses |
| --- | --- | --- | --- |
| Something you know | Password, PIN | Familiar, cheap | Can be guessed, phished, reused |
| Something you have | Phone, authenticator app, hardware key | Stronger than passwords alone | Device loss, recovery complexity |
| Something you are | Fingerprint, Face ID | Convenient on-device UX | Biometric recovery and privacy concerns |

Important nuance: many systems do not verify a human's real-world identity. They verify control over a credential. For example:

- password login verifies knowledge of a password
- email verification verifies access to an inbox
- TOTP verifies possession of a seed-bound authenticator
- passkeys verify possession of a private key and user presence

That is why identity systems often talk about assurance levels rather than absolute truth.

### 3.2 Identifiers vs Authenticators

Two concepts are often mixed up:

- an identifier tells the system which subject is being referenced
- an authenticator proves control over that identity

Examples:

- `alice@example.com` is an identifier
- the password, passkey, or OAuth login is the authenticator

Production systems often support multiple identifiers for the same user:

- email
- username
- phone number
- enterprise SSO subject ID
- internal immutable user ID

Best practice: use a stable internal user ID as the true primary key, even if the login identifier changes.

### 3.3 Credential Storage

This is one of the most common interview topics because it separates surface-level knowledge from real engineering understanding.

#### Never store plaintext passwords

If a database leak reveals plaintext passwords, the incident is catastrophic. Attackers will also try the same passwords on other services because users reuse credentials.

#### Store password hashes, not passwords

The flow is:

1. User submits password.
2. Server generates a per-user salt.
3. Server applies a slow password hashing algorithm.
4. Server stores the resulting hash and metadata.
5. On login, the server recomputes and compares.

Good password hashing algorithms are intentionally expensive. That is the point. They make offline brute force attacks slower.

| Algorithm | Typical status | Why it matters |
| --- | --- | --- |
| Argon2id | Best modern default | Memory-hard and resistant to GPU attacks |
| bcrypt | Still common and acceptable | Widely supported, battle-tested |
| PBKDF2 | Common in legacy and regulated systems | Safer than fast hashes, but less ideal than Argon2id |
| SHA-256 / MD5 alone | Unsafe for password storage | Too fast, easy to brute force |

#### Salt and Pepper

| Mechanism | Purpose |
| --- | --- |
| Salt | Unique random value per password; prevents rainbow-table reuse |
| Pepper | Extra secret held outside the user table, often in KMS/HSM; raises attack cost after DB leaks |

#### Practical Storage Pattern

- store algorithm name and parameters with the hash
- use constant-time comparison to reduce timing leakage
- rehash on login when old parameters are outdated
- keep password policy reasonable; massive composition rules often lead to weaker behavior

#### Interview depth point

If an interviewer asks, "Why use bcrypt or Argon2 instead of SHA-256?", the real answer is not just "because it is more secure". The real answer is:

- password databases are often attacked offline after leaks
- attackers can run billions of SHA-256 hashes quickly
- slow, memory-hard algorithms make each guess expensive
- cost parameters can be tuned as hardware improves

### 3.4 MFA Basics

Multi-factor authentication exists because passwords are a weak single point of failure.

Common MFA methods:

| Method | Security level | Practical notes |
| --- | --- | --- |
| SMS OTP | Low to medium | Vulnerable to SIM swap and phishing |
| Email OTP | Low | Better than nothing, but email is often the same recovery channel |
| TOTP app | Medium | Common and cheap; still phishable |
| Push approval | Medium | Good UX, but push fatigue attacks exist |
| WebAuthn / passkeys / hardware keys | High | Strong phishing resistance |

Production systems often use risk-based MFA rather than always prompting:

- new device
- new geography
- impossible travel
- admin action
- payout or billing change
- password reset or recovery event

This is called step-up authentication.

#### Recovery Matters

Many teams design MFA setup but forget MFA recovery. Good systems provide:

- recovery codes
- alternate authenticators
- carefully controlled support workflows

The recovery flow is often more attackable than the MFA flow itself.

### 3.5 Email Verification

Email verification usually proves inbox control, not human identity. It exists to:

- reduce fake or mistyped accounts
- ensure password reset reachability
- protect downstream systems from garbage identities
- support trust in notifications, billing, and invites

Good implementation details:

- generate a random, single-use token
- store only a hash of the token server-side if possible
- apply a short TTL
- invalidate older outstanding verification tokens after a new one is issued
- avoid leaking whether the account exists during resend flows

### 3.6 Device Trust

Device trust tries to answer, "Is this a previously seen, low-risk device?"

Typical signals:

- long-lived device cookie
- browser fingerprinting or device metadata
- last successful MFA on that device
- IP reputation and ASN patterns
- OS or app attestation on mobile

Device trust is useful, but dangerous if over-trusted. Devices are compromiseable. Cookies can be stolen. Browsers change. Treat device trust as a risk signal, not a source of truth.

### Authentication Failure Cases

- weak password hashing leads to offline cracking after DB leaks
- email verification links are reusable or never expire
- MFA recovery bypasses stronger checks
- account enumeration leaks whether an email exists
- social login accounts are linked incorrectly to existing local accounts
- device trust becomes an authorization shortcut instead of a risk signal

### Authentication Best Practices

- prefer Argon2id or bcrypt for passwords
- rate-limit login, signup, reset, and verification endpoints
- use MFA for privileged users and step-up auth for sensitive actions
- log auth events with context, but never log secrets or raw passwords
- design credential rotation and recovery before launch, not after an incident

---

## 4. Login and Signup

Signup and login flows are the public entry points to your system. They are also some of the most attacked endpoints you will ever run.

### 4.1 Signup Flow

```mermaid
sequenceDiagram
	actor User
	participant Browser
	participant Auth as Auth API
	participant Risk as Risk / Abuse Service
	participant Users as User DB
	participant Mail as Email Service
	participant Session as Session Store

	User->>Browser: Submit email + password
	Browser->>Auth: POST /signup
	Auth->>Risk: Check IP, velocity, disposable email, device
	Risk-->>Auth: risk score / allow / challenge
	Auth->>Users: Create pending account + password hash
	Auth->>Mail: Send verification link
	Mail-->>User: Verification email
	User->>Browser: Click link
	Browser->>Auth: GET /verify?token=...
	Auth->>Users: Mark email verified
	Auth->>Session: Create session
	Auth-->>Browser: Set secure auth cookie
```

#### What actually happens in production

A robust signup flow usually includes:

1. Input normalization
   Normalize email casing rules carefully, trim whitespace, reject obvious malformed values.
2. Abuse screening
   IP reputation, rate limits, disposable email detection, CAPTCHA when needed, device velocity, and signup bursts by network.
3. Account creation state
   Many systems create users in a `pending_verification` state first.
4. Email verification
   The account may exist but have limited capabilities until verified.
5. Bootstrap domain objects
   For SaaS, create workspace, tenant, default role, billing state, and onboarding tasks.
6. Initial session issuance
   Some systems log the user in immediately after verification. Others require explicit login.

#### Why pending state matters

If you create fully active accounts before verification, you may end up with:

- abandoned fake tenants
- spammed invites or API abuse
- polluted analytics and billing pipelines

### 4.2 Login Flow

The login flow is simpler than signup conceptually, but much more operationally sensitive.

Common steps:

1. Identify account by email/username/federated ID.
2. Fetch credential metadata and account status.
3. Verify password or federated assertion.
4. Evaluate account risk and MFA policy.
5. Create session or issue tokens.
6. Log success or failure for audit and anomaly detection.

A production login decision often depends on more than a password:

- account locked or disabled?
- tenant suspended?
- email verified?
- MFA enrolled?
- device known?
- unusual geography?
- refresh token family compromised?

### 4.3 Signup Verification and Fraud Prevention Basics

Fraud prevention is not just a payments problem. Identity systems are abused for:

- spam account creation
- credential stuffing
- promo abuse
- referral fraud
- fake trial creation
- scraping and automated signups

Basic but effective controls:

| Control | What it helps with |
| --- | --- |
| Rate limiting by IP and identifier | Brute force and signup bursts |
| Device and IP reputation | Known bad networks and bots |
| CAPTCHA or challenge step-up | Automated abuse at suspicious thresholds |
| Email domain heuristics | Disposable inboxes, typo domains |
| Phone verification for high-risk cases | Raises attacker cost |
| Idempotency keys on signup APIs | Retry safety without duplicate accounts |

Interview point: fraud controls are part of auth architecture because attackers do not politely separate "security" from "growth" endpoints.

### 4.4 Social Login Considerations

"Login with Google" or "Login with GitHub" improves user experience, but introduces federation complexity.

Benefits:

- no local password to manage
- faster onboarding
- higher conversion for some user segments

Risks and edge cases:

- provider outage affects sign-in
- incorrect account linking can cause account takeover
- email from provider may be unverified or not globally unique in the way you assume
- enterprise customers may not want personal social identities linked to business workspaces

Best practice for account linking:

- if a social identity is new, do not blindly attach it to a local account just because the email matches
- require proof of control or signed-in confirmation before linking to an existing account

### 4.5 Onboarding Architecture

Signup is not just about auth. It often triggers business setup:

- create personal or team workspace
- assign owner role
- seed settings and notification preferences
- create billing customer object
- publish analytics and onboarding events

This makes signup a distributed workflow. Real systems often handle it with:

- synchronous creation for the minimum needed to log in
- async events for non-critical setup
- idempotent consumers to avoid duplicate workspaces or billing objects

### Login and Signup Failure Cases

- verification emails delayed or blocked, leaving users in limbo
- duplicate accounts created because signup is not idempotent
- support team manually verifies accounts in insecure ways
- social and password accounts merge incorrectly
- signup path leaks which emails already exist

### Login and Signup Best Practices

- keep the critical path small and reliable
- separate abuse checks from core credential logic, but make them part of the final decision
- use generic error messages externally and detailed audit logs internally
- make signup and login events observable with metrics and tracing

---

## 5. Sessions

Sessions are the classic way to keep users logged in across multiple HTTP requests.

### 5.1 What a Session Really Is

A session means the server has already authenticated the user and stores an authenticated state keyed by a session identifier.

Typical flow:

1. User logs in successfully.
2. Server creates a session record.
3. Server sends the client a session ID in a cookie.
4. Client sends the cookie on future requests.
5. Server looks up session state and reconstructs identity.

### 5.2 Server-Side Sessions

In server-side session architecture, the browser usually only stores an opaque identifier.

Example session data:

- user ID
- tenant ID
- auth strength or MFA state
- issued time and last activity time
- device metadata
- CSRF-related state

Advantages:

- easy revocation
- easy logout across devices
- server fully controls state
- easy to add security flags or session versioning

Disadvantages:

- needs a session store lookup
- requires shared state across app instances
- harder to scale if poorly designed

### 5.3 Redis-Backed Sessions

Redis is a very common session backend because it is fast, supports TTL, and works well as shared ephemeral state.

```mermaid
flowchart LR
	Browser[Browser with secure cookie] --> LB[Load Balancer]
	LB --> App1[App Instance A]
	LB --> App2[App Instance B]
	App1 --> Redis[(Redis Session Store)]
	App2 --> Redis
	Redis --> Audit[Audit / Security Events]
```

Why Redis is popular for sessions:

- low-latency reads and writes
- TTL expiration built in
- simple key-value model
- easy fit for horizontally scaled app fleets

Scaling considerations:

- shard or cluster if session volume is high
- replicate carefully; understand failover and session loss behavior
- monitor hot keys and uneven access patterns
- decide whether to refresh TTL on every request or on a sliding window

### 5.4 Cookie Security

Session security depends heavily on cookie configuration.

| Cookie attribute | Why it matters |
| --- | --- |
| `HttpOnly` | Prevents JavaScript from reading the cookie, reducing XSS impact |
| `Secure` | Sends cookie only over HTTPS |
| `SameSite=Lax/Strict` | Reduces CSRF risk from cross-site requests |
| Domain scoping | Prevents unintended subdomain sharing |
| Path scoping | Limits where the cookie is sent |
| Expiry / Max-Age | Controls session persistence |

Important nuance:

- `HttpOnly` helps against token theft by frontend JavaScript
- `SameSite` helps against CSRF
- neither one fixes everything if the app has deeper logic flaws

### 5.5 Session Invalidation

Session invalidation is one reason server-side sessions remain attractive.

You can revoke sessions when:

- user logs out
- password changes
- MFA is reset
- admin disables the account
- suspicious activity is detected

Common implementation patterns:

- delete the session record outright
- mark session version or user auth version and reject old versions
- keep a device/session list per user for device management UI

### 5.6 Logout Challenges

Logout sounds trivial, but it is easy to implement incompletely.

Problems include:

- logout only clears client cookie but leaves server session valid
- user has multiple active devices and expects global logout
- session persists in mobile apps with long polling or background refresh
- cached pages or in-flight requests still complete after logout

Good logout design answers:

- single device logout or all devices?
- immediate revocation or eventual consistency?
- what about concurrent refresh operations?

### 5.7 Session Security Issues

| Problem | Meaning | Mitigation |
| --- | --- | --- |
| Session fixation | Attacker forces victim to use known session ID | Regenerate session ID after login |
| CSRF | Browser auto-sends cookies on forged cross-site requests | `SameSite`, CSRF tokens, origin checks |
| Session hijacking | Session token is stolen | HTTPS, `HttpOnly`, device/risk checks, short idle timeouts |
| Store outage | Session backend unavailable | Fallback behavior, multi-AZ design, graceful degradation |

### Sessions in Interviews

A good interview answer on sessions usually includes:

- opaque session ID in secure cookie
- shared store like Redis
- session regeneration after login
- revocation and logout semantics
- CSRF protections
- sliding vs absolute expiration tradeoff

---

## 6. JWT and Token-Based Authentication

JWTs are one of the most discussed and most misunderstood identity topics.

### 6.1 What a JWT Is

JWT stands for JSON Web Token. It is a compact, self-contained token format commonly used to carry claims.

A JWT typically has three parts:

`header.payload.signature`

- header: algorithm and metadata
- payload: claims such as subject, issuer, audience, expiry
- signature: proves integrity if signed correctly

Important practical truth: signed JWTs are not secret by default. They are encoded, not hidden. Anyone holding the token can often read the claims.

### 6.2 Signing vs Encryption

| Mechanism | What it guarantees | Practical meaning |
| --- | --- | --- |
| Signing (JWS) | Integrity and authenticity | Token was issued by trusted signer and not modified |
| Encryption (JWE) | Confidentiality | Token contents are hidden from intermediaries/clients |

Most production JWT usage is signed, not encrypted.

That means:

- do not put secrets in JWT payloads
- do not put more PII than necessary
- use claims for identity and authorization hints, not as a dumping ground

### 6.3 Access Tokens vs Refresh Tokens

| Token type | Lifetime | Used by | Main purpose |
| --- | --- | --- | --- |
| Access token | Short-lived | APIs | Authorize a request |
| Refresh token | Longer-lived | Auth client / backend | Obtain new access tokens |

Best practice:

- keep access tokens short-lived
- treat refresh tokens as highly sensitive credentials
- store refresh tokens more carefully than access tokens

### 6.4 Why Teams Use JWTs

Benefits:

- easy for distributed services to verify locally
- no session store lookup on every request if verification is local
- good fit for API ecosystems and delegated access
- works well across domains and service boundaries

Costs:

- revocation is harder
- permissions embedded in tokens can become stale
- key rotation and issuer validation must be done correctly
- token size can grow dangerously if you stuff too many claims inside

### 6.5 Token Rotation

Refresh token rotation is a major real-world security mechanism.

Idea:

- every refresh use invalidates the previous refresh token
- the auth server issues a new refresh token and new access token
- if an old refresh token is reused, the server assumes theft and can revoke the token family

```mermaid
sequenceDiagram
	actor User
	participant Client
	participant Auth as Auth Server
	participant Store as Token Store

	User->>Client: Continue using app
	Client->>Auth: POST /token/refresh with refresh token
	Auth->>Store: Validate token family and prior use
	Store-->>Auth: valid / reused / revoked
	Auth-->>Client: New access token + new refresh token
	Auth->>Store: Mark old token used, persist new token state
```

### 6.6 Revocation Challenges

Revocation is the biggest practical downside of stateless tokens.

If an access token is self-contained and valid until `exp`, then after it is issued:

- the user may be disabled
- permissions may change
- a tenant may be suspended
- the token may be stolen

But the token may still verify cryptographically.

Mitigations:

- short access token TTLs
- refresh token rotation
- revocation list or denylist for critical cases
- user/session version claim checked against server state
- opaque tokens with introspection for high-control environments

### 6.7 Stateless Auth Tradeoffs

This is a favorite interview question: "Should I use JWT or sessions?"

The mature answer is not dogmatic. It depends.

| Topic | Server-side sessions | JWT |
| --- | --- | --- |
| Request-time state lookup | Usually yes | Not always |
| Easy revocation | Yes | Harder |
| Cross-service portability | Moderate | Strong |
| Simplicity for web apps | Often simpler | Often overused |
| Risk of stale claims | Lower | Higher |
| CSRF concern if cookie-based | Yes | Yes if stored in cookies |
| XSS risk if JS-accessible storage | Lower with `HttpOnly` cookies | Higher if stored in localStorage |

A practical rule:

- for traditional web apps, server-side sessions are often simpler and safer
- for API ecosystems, third-party integrations, and distributed service verification, tokens are often the better fit

### 6.8 Common JWT Mistakes

- storing JWTs in `localStorage` without carefully thinking through XSS risk
- placing roles and permissions in long-lived tokens and forgetting they go stale
- not checking `iss`, `aud`, `exp`, `nbf`, and key identifiers properly
- using symmetric signing keys everywhere and spreading them across many services
- putting secrets or excessive PII in token payloads

### JWT Best Practices

- prefer asymmetric signing for shared verification environments
- expose public keys via a JWKS endpoint if multiple verifiers exist
- keep access tokens short-lived
- rotate signing keys safely and support key overlap during rotation
- use opaque tokens or introspection if real-time revocation is a hard requirement

---

## 7. OAuth

OAuth solves delegated authorization. It lets one application access another application's resources on behalf of a user without receiving the user's password.

### 7.1 The Problem OAuth Solves

Without OAuth, a user might give App A their password to App B. That is unacceptable because:

- App A can now do anything the user can do
- App B cannot scope access cleanly
- the user cannot revoke just that delegated access safely

OAuth introduces a safer model:

- user authenticates with the authorization server / IdP
- user consents to limited scopes
- client receives tokens with bounded permissions

### 7.2 Authorization Code Flow with PKCE

This is the modern default for browser and mobile-friendly public clients.

```mermaid
sequenceDiagram
	actor User
	participant Client as SaaS App
	participant Browser
	participant AS as Authorization Server / IdP
	participant API as Third-Party API

	User->>Client: Click "Connect Google Drive"
	Client->>Browser: Redirect to /authorize + scope + code_challenge
	Browser->>AS: Login and grant consent
	AS-->>Browser: Redirect back with authorization code
	Browser->>Client: Deliver authorization code
	Client->>AS: Exchange code + code_verifier
	AS-->>Client: Access token (+ refresh token)
	Client->>API: Call API with access token
	API-->>Client: Protected resource data
```

#### Why PKCE exists

PKCE protects the code exchange step so a stolen authorization code is less useful. It is critical for public clients such as SPAs and mobile apps.

### 7.3 Scopes

Scopes define the breadth of access. Examples:

- `read:user`
- `repo:write`
- `payments:refunds`
- `calendar.readonly`

Good scope design is product design plus security design.

If scopes are too broad:

- users lose trust
- integrations become over-privileged
- incident blast radius increases

If scopes are too granular:

- consent screens become confusing
- implementation complexity rises
- developers ask for full access anyway

### 7.4 Consent Screens

Consent is the user-visible manifestation of delegated access.

Good consent screens answer:

- who is requesting access?
- to which data or actions?
- for how long?
- can the user revoke later?

This matters a lot in SaaS ecosystems like Google Workspace, GitHub Apps, or Slack apps.

### 7.5 Refresh Tokens in OAuth

Long-running integrations often need refresh tokens so they can keep calling APIs without asking the user to re-consent constantly.

Refresh token concerns:

- high-value credential theft risk
- need for rotation and revocation
- tenant admins may want centralized revocation controls

### 7.6 Third-Party Integrations

In real SaaS systems, OAuth is often used for:

- connecting Google Drive, GitHub, Slack, Salesforce, Stripe, or Dropbox
- importing or exporting data
- posting to external systems on behalf of the user or workspace

Architectural consequences:

- store provider account linkage metadata
- encrypt or otherwise protect provider refresh tokens
- model scopes per installation or workspace
- surface admin controls for revocation and reauthorization

### 7.7 OAuth vs Authentication

OAuth is about authorization. Authentication is not the original purpose of OAuth.

However, many products use OAuth plus an identity layer such as OpenID Connect to support "Sign in with Google".

Interview nuance: saying "OAuth is login" is incomplete. Better answer:

- OAuth is delegated authorization
- OIDC adds identity information for authentication use cases

### OAuth Failure Cases

- client stores provider tokens insecurely
- redirect URI validation is weak
- state parameter not used correctly, enabling CSRF-like attacks in auth flows
- scopes are excessively broad
- tenants cannot audit or revoke third-party access easily

---

## 8. SSO: SAML and OIDC

Enterprise customers often do not want each SaaS app to manage a separate corporate password. They want central identity, central policy, and controlled employee access. That is where SSO comes in.

### 8.1 Identity Provider vs Service Provider

| Role | Meaning |
| --- | --- |
| Identity Provider (IdP) | System that authenticates the employee, such as Okta, Azure AD, Google Workspace |
| Service Provider (SP) / Relying Party (RP) | The SaaS application that trusts the IdP |

### 8.2 SAML Basics

SAML is older, XML-based, and still heavily used in enterprise environments.

Mental model:

- user tries to access the SaaS app
- SaaS redirects user to corporate IdP
- IdP authenticates user
- IdP sends signed assertion back to SaaS
- SaaS creates a local session

Strengths:

- entrenched in enterprise IT
- widely supported by corporate identity systems

Costs:

- XML complexity
- harder developer ergonomics
- trickier debugging and implementation compared with OIDC

### 8.3 OIDC Basics

OpenID Connect is an identity layer on top of OAuth 2.0.

It provides:

- ID tokens with identity claims
- standardized login flows
- better fit for modern web and mobile apps

OIDC is usually easier to work with than SAML for modern applications.

### 8.4 SAML vs OIDC

| Topic | SAML | OIDC |
| --- | --- | --- |
| Typical format | XML assertions | JSON tokens |
| Common use case | Enterprise browser SSO | Modern app login and API ecosystems |
| Developer ergonomics | Heavier | Easier |
| Mobile/API friendliness | Weaker | Stronger |

### 8.5 Enterprise Architecture

```mermaid
flowchart LR
	Employee[Employee] --> SaaS[Your SaaS App]
	SaaS --> IdP[Enterprise IdP]
	IdP --> SaaS
	IdP --> Directory[Corporate Directory]
	IdP --> SCIM[Provisioning / SCIM]
	SCIM --> SaaS
	SaaS --> Policy[Workspace Roles and Policies]
```

In production, enterprise identity usually includes two separate concerns:

- authentication and SSO
- lifecycle management and provisioning

Provisioning is often handled with SCIM or similar directory sync mechanisms so the SaaS app knows:

- who exists
- which groups they belong to
- who has been deprovisioned

### 8.6 Common Enterprise Requirements

- just-in-time user creation on first login
- domain verification to prove company ownership
- group-to-role mapping
- forced MFA at IdP level
- admin-controlled session duration
- audit logs for all SSO events

### 8.7 Failure Cases

- bad mapping from IdP groups to app roles causes privilege escalation
- employee is disabled in IdP but app keeps old sessions alive too long
- email is used as unique identity key and later changes
- multiple IdPs or merged companies create ambiguous identity mapping

### SSO Best Practices

- use stable external subject identifiers, not just email
- model tenant-specific SSO config cleanly
- separate authentication trust from authorization mapping inside the app
- deprovision aggressively and revoke old sessions when identity status changes

---

## 9. Password Reset

Password reset is a high-risk recovery flow. Attackers love it because it often bypasses normal login defenses.

### 9.1 Secure Token Flow

```mermaid
sequenceDiagram
	actor User
	participant App
	participant Auth as Auth Service
	participant Users as User DB
	participant Reset as Reset Token Store
	participant Mail as Email Service

	User->>App: Click "Forgot password"
	App->>Auth: POST /password-reset
	Auth->>Users: Lookup account
	Auth->>Reset: Store hashed single-use token + expiry
	Auth->>Mail: Send password reset link
	Auth-->>User: Generic success response
	User->>App: Open reset link
	App->>Auth: POST /password-reset/confirm token + new password
	Auth->>Reset: Validate token unused and unexpired
	Auth->>Users: Update password hash
	Auth->>Reset: Mark token used
	Auth-->>App: Success + revoke other sessions
```

### 9.2 Why This Design Exists

Password reset has to be secure even if the attacker knows the user's email address. Therefore the reset token must be:

- hard to guess
- short-lived
- single-use
- revocable

Good systems also revoke active sessions or require re-authentication after password reset.

### 9.3 Attack Prevention

| Threat | Mitigation |
| --- | --- |
| Account enumeration | Return generic responses like "If an account exists, email sent" |
| Token guessing | Long random tokens, rate limits |
| Token replay | Single-use storage and invalidation |
| Email inbox compromise | Step-up verification for high-value actions after reset |
| Old session persistence | Revoke sessions after reset |

### 9.4 Practical Advice

- prefer opaque reset tokens over stuffing reset state into a long-lived JWT
- hash reset tokens at rest if you store them server-side
- keep TTL short, often 15 to 60 minutes depending on product sensitivity
- notify users when a reset is requested and completed

### Password Reset Failure Cases

- reset token is reusable
- old sessions remain active after password change
- reset endpoint leaks whether account exists
- support team bypasses the secure flow with weak manual procedures

---

## 10. Authorization Fundamentals

Authentication answers who the subject is. Authorization answers what that subject may do.

### 10.1 AuthN vs AuthZ

This distinction matters a lot.

| Question | Category |
| --- | --- |
| "Who are you?" | Authentication |
| "Are you allowed to do this?" | Authorization |
| "How sure are we?" | Authentication strength / assurance |
| "Why was access denied?" | Authorization decision and audit |

A user can be perfectly authenticated and still not be authorized.

### 10.2 Authorization Decision Shape

Every authZ decision is some variation of:

`Can subject S perform action A on resource R under context C?`

Where context may include:

- tenant
- time of day
- network zone
- device trust level
- MFA level
- resource ownership
- subscription plan
- legal region or data residency constraints

### 10.3 Enforcement Layers

Authorization can happen at multiple layers:

| Layer | Good for | Risk if overused |
| --- | --- | --- |
| API gateway | Coarse access checks, authentication, token validation | Too coarse for resource-specific rules |
| Service layer | Business-specific rules | Easy to duplicate logic across services |
| Data access layer | Row/tenant isolation, final enforcement | Hard to express all product rules here |
| Database native policies | Strong last line of defense in some systems | App logic can still drift if not modeled carefully |

A common mistake is doing all authorization only at the edge. Edge checks are useful, but most real product rules depend on resource-specific business logic deeper inside the system.

### 10.4 Policy Design

Good policy design balances three things:

- expressiveness
- debuggability
- operational simplicity

Ask these questions:

- who is the subject?
- what resource is being accessed?
- what action is requested?
- what context matters?
- who can change the policy?
- how do we explain and audit the decision?

### 10.5 Auditing and Explainability

Authorization is not just about allow or deny. In production you often need:

- reason codes
- which policy matched
- who granted the permission
- when the permission changed
- evidence for support, compliance, and incident response

This is why mature systems treat authorization as both a runtime path and a data model.

---

## 11. RBAC

RBAC stands for role-based access control. Permissions are grouped into roles, and subjects are assigned roles.

### 11.1 Why RBAC Exists

Without roles, you would assign individual permissions to every user. That becomes unmanageable quickly.

RBAC simplifies administration:

- `viewer`
- `editor`
- `admin`
- `billing_admin`

Instead of attaching dozens of permissions directly to users, you attach permissions to roles and roles to users.

### 11.2 Basic Model

| Entity | Example |
| --- | --- |
| Permission | `invoice.read`, `invoice.refund`, `workspace.invite` |
| Role | `support_agent`, `workspace_admin` |
| Assignment | User U has role R in tenant T |

Tenant scoping is critical. In multi-tenant SaaS, a user is rarely just "an admin" globally. They are usually an admin in a specific workspace or organization.

### 11.3 Enterprise Patterns

Common enterprise RBAC patterns include:

- global roles for platform staff
- tenant-scoped roles for customers
- custom roles for larger organizations
- group-to-role mapping from SSO IdP groups

### 11.4 The Role Explosion Problem

RBAC starts simple but can degrade into dozens or hundreds of roles:

- `viewer`
- `viewer_plus_export`
- `viewer_plus_export_plus_billing`
- `regional_admin_eu`
- `regional_admin_us`

This is role explosion.

It happens when RBAC is forced to encode too many contextual conditions that really belong in attributes or policies.

### 11.5 RBAC Tradeoffs

| Strength | Weakness |
| --- | --- |
| Easy to explain to users and admins | Coarse-grained for complex cases |
| Efficient at runtime | Can explode in number of roles |
| Works well for common SaaS admin patterns | Poor fit for dynamic context-heavy rules |

### RBAC Best Practices

- keep the base role set small
- scope roles by tenant, project, or resource container
- separate platform/internal staff roles from customer roles
- use RBAC for broad permissions and combine with finer policies when needed

---

## 12. ABAC

ABAC stands for attribute-based access control. Instead of only asking "What role does this user have?", ABAC asks about attributes of the subject, resource, and environment.

### 12.1 Why ABAC Exists

RBAC is often too static for real-world decisions like:

- support agent can view tickets only in their assigned region
- manager can approve expenses under a threshold for their own department
- user can access data only from a compliant device in an approved country
- payout release requires recent MFA and elevated risk score below threshold

These rules depend on context, not just role labels.

### 12.2 Dynamic Policy Evaluation

ABAC decisions may use attributes such as:

- subject department
- resource owner
- tenant subscription tier
- request IP or network zone
- device trust score
- current time or shift window
- MFA strength

Example policy idea:

"Allow refund approval if the subject role is finance_manager, the order belongs to the same merchant account, the refund amount is below the subject limit, and MFA was performed in the last 10 minutes."

### 12.3 Policy Engines

ABAC often benefits from a dedicated policy engine because hardcoding many dynamic rules directly into services becomes brittle.

Common approaches:

- custom rules in application code
- centralized policy engine such as OPA/Rego
- cloud-style policy systems such as Cedar-like models
- relationship and graph-based systems for object access patterns

### 12.4 ABAC Tradeoffs

| Strength | Weakness |
| --- | --- |
| Expressive and context-aware | Harder to explain and debug |
| Reduces role explosion | Requires clean attribute sources |
| Good for fine-grained enterprise control | Runtime evaluation can be more expensive |

### 12.5 Practical Use

Many production systems do not choose "RBAC or ABAC". They combine them:

- RBAC gives the broad lane
- ABAC applies contextual restrictions inside that lane

Example:

- role says user may edit invoices
- ABAC rule says only for their tenant, below approval threshold, and only after MFA for high-value invoices

### ABAC Failure Cases

- attributes are stale or inconsistently sourced across services
- policies become unreadable and impossible to reason about
- caching hides recent attribute changes like department moves or suspensions

---

## 13. Permissions and Access Control

Permissions are the actual capabilities a subject has. Access control is the mechanism that enforces those permissions correctly and consistently.

### 13.1 Permission Models

Common permission models include:

| Model | Mental model | Example |
| --- | --- | --- |
| RBAC | Roles map to permissions | Workspace admin |
| ACL | Resource has a list of allowed subjects | Shared document editable by Alice and Bob |
| ABAC | Decision based on attributes | Region and MFA aware access |
| ReBAC | Decision based on relationships | User is member of team that owns repo |
| Capability/token-based | Possession of unforgeable capability grants access | Signed download URL |

In modern systems, multiple models often coexist.

GitHub is a good mental example:

- org and team membership look like RBAC/ReBAC
- repo-specific collaborator lists look like ACLs
- fine product actions are individual permissions

### 13.2 Inheritance

Permissions often inherit down a hierarchy:

- org -> workspace -> project -> resource
- folder -> document
- account -> sub-account

Inheritance is useful, but easy to get wrong.

Questions to design explicitly:

- do child resources inherit all parent permissions?
- can child permissions override parent permissions?
- are denies supported, and if so do they take precedence?
- how do you compute effective permissions efficiently?

### 13.3 Auditing

You need to know:

- who granted access
- when access changed
- who accessed a resource
- why a decision was allowed or denied

Auditing matters for:

- customer support
- security investigations
- compliance
- admin trust

### 13.4 Enforcement Patterns

There are two common implementation patterns.

#### Pattern A: Embedded authorization in each service

Pros:

- low latency
- business context close to the resource

Cons:

- duplicated rules across services
- inconsistent decisions and audit semantics

#### Pattern B: Centralized authorization service or policy engine

Pros:

- consistency
- shared policy language
- central auditing and explainability

Cons:

- added network hop
- dependency on a central service
- need good caching and fallback behavior

### 13.5 Centralized Auth Service and Policy Caching

```mermaid
flowchart LR
	Request[Authenticated Request] --> Service[Business Service]
	Service --> Cache[Policy Cache]
	Cache -->|cache miss| Authz[Central Authorization Service]
	Authz --> PDP[Policy Engine]
	Authz --> Attrs[Attribute / Relationship Data]
	PDP --> Decision[Allow / Deny + Reason]
	Decision --> Audit[Audit Log]
	Decision --> Service
```

Caching is often necessary, but introduces staleness risks. Common techniques:

- short TTL caches for policy decisions
- versioned policy snapshots
- event-driven invalidation on membership or role changes
- cache only stable intermediate data, not final decisions, in high-risk systems

### 13.6 Policy Caching Tradeoffs

| Benefit | Cost |
| --- | --- |
| Lower latency | Stale authorization decisions |
| Lower policy service load | Harder revocation semantics |
| Better resilience during partial outage | Risk of fail-open or fail-stale behavior |

Design question: when policy service is down, do you fail closed or fail open?

- fail closed is safer but can hurt availability
- fail open preserves availability but may violate security

For high-risk actions, fail closed is usually the right answer.

### Access Control Best Practices

- enforce tenant isolation early and repeatedly
- keep policy decisions explainable
- separate authentication claims from live authorization state when permissions change frequently
- audit all admin and permission-management actions
- do not trust internal network location as a permission model

---

## 14. Service-to-Service Authentication

User authentication is only half of production security. Modern backends also need to authenticate services to each other.

### 14.1 Why Internal Service Authentication Exists

In microservice systems, one request may pass through many services:

- edge/API gateway
- auth service
- order service
- payment service
- notification service

If internal calls are trusted just because they are "inside the VPC", a compromised service can impersonate others too easily.

This is why zero-trust principles matter internally too.

### 14.2 Service Identity

A service needs its own identity, just like a user does.

Examples:

- `payments-service.prod`
- `orders-service.eu-west-1`
- workload identity bound to a Kubernetes service account

A strong service identity system lets the platform answer:

- which service is calling?
- is it the real deployed workload?
- is it allowed to call this destination?

### 14.3 mTLS Basics

Mutual TLS means both sides authenticate each other during the TLS handshake.

Benefits:

- encryption in transit
- client and server authentication
- strong cryptographic service identity

Typical pattern:

- internal CA issues short-lived certificates to workloads
- service presents client cert on outbound call
- destination validates issuer and identity

```mermaid
sequenceDiagram
	participant A as Service A
	participant CA as Internal CA / Identity System
	participant B as Service B

	A->>CA: Request workload certificate
	CA-->>A: Short-lived cert
	A->>B: TLS handshake + client cert
	B->>B: Verify cert, issuer, SAN, expiry
	B-->>A: Authenticated secure channel
	A->>B: Application request
	B-->>A: Response
```

### 14.4 Short-Lived Credentials

Short-lived credentials are a major production best practice.

Why?

- if stolen, they expire quickly
- less need for manual secret rotation
- better fit with workload identity and automation

This pattern shows up in:

- cloud IAM temporary credentials
- Kubernetes workload identity
- service mesh certificates
- internal token minting systems

### 14.5 Zero Trust Basics

Zero trust does not mean trust nothing blindly forever. It means:

- do not grant access solely based on network location
- verify identity continuously and explicitly
- enforce least privilege
- assume compromise is possible and reduce blast radius

Google's public BeyondCorp ideas are the canonical mental model here: access should depend on identity, device state, and policy, not on whether traffic comes from "inside the office network".

### 14.6 Service-to-Service Authorization

Authentication tells you that a caller is `payments-service`. Authorization must still decide whether `payments-service` may:

- read card metadata
- call refund APIs
- publish to a payout topic
- access a particular database table

This is often implemented with:

- service identity plus policy
- SPIFFE-like identity patterns
- service mesh policy
- signed internal tokens with audience restrictions

### Service Auth Failure Cases

- long-lived shared secrets copied across many services
- no certificate rotation automation
- any internal service can call any other service
- internal service trusts caller-provided headers like `X-User-Id` without verification
- service identity is authenticated but not authorized

### Service Auth Best Practices

- use workload or service identity, not shared static secrets where possible
- prefer short-lived credentials and automatic rotation
- bind end-user identity propagation carefully when needed
- separate service identity from end-user identity in request context

---

## 15. How These Systems Fit Together

A strong interview answer connects all the pieces into one architecture.

### 15.1 Typical SaaS Architecture

```mermaid
flowchart TD
	User[Browser / Mobile App] --> Edge[Edge / API Gateway]
	Edge --> Auth[Auth Service]
	Auth --> UserDB[(User Directory)]
	Auth --> Session[(Session Store / Token Store)]
	Auth --> Keys[KMS / Signing Keys]
	Edge --> App[Business Services]
	App --> Authz[Authorization Service / Policy Engine]
	Authz --> Perms[(Roles, Relationships, Attributes)]
	App --> Data[(Application Data)]
	Auth --> Audit[Security Audit Log]
	Authz --> Audit
	App --> Audit
```

The key idea is separation of concerns:

- auth service proves identity and issues continuity artifacts
- session/token store manages continuity and revocation
- policy engine decides access
- application services enforce business operations
- audit pipeline records security-relevant facts

### 15.2 Consumer SaaS vs Enterprise SaaS vs Internal Platform

| Environment | Identity priorities |
| --- | --- |
| Consumer SaaS | Signup conversion, password recovery, abuse prevention, social login |
| Enterprise SaaS | SSO, provisioning, group mapping, auditability, tenant admin controls |
| Internal platform | Service identity, zero trust, least privilege, strong device posture |

### 15.3 Data Freshness vs Statelessness

One of the deepest identity design tradeoffs is this:

- stateless verification is fast and scalable
- fresh authorization state often requires looking up server-side data

That is why many mature architectures mix the two:

- token or session for authentication continuity
- live policy check for sensitive authorization

### 15.4 Tenant Isolation

For SaaS systems, tenant isolation must be explicit in both authentication and authorization.

Common patterns:

- include tenant membership in auth/session context
- scope roles by tenant
- enforce tenant filters in service and data layers
- audit cross-tenant admin actions aggressively

This is especially important in systems like GitHub organizations, Stripe connected accounts, or enterprise SaaS workspaces.

---

## 16. Real-World Patterns and Company Examples

These examples are useful as mental anchors, not as exact internal blueprints.

### Google

- Public Google identity and OIDC flows are a classic example of large-scale federated identity.
- Google's public BeyondCorp ideas are foundational for zero-trust access.
- Zanzibar is the famous reference point for large-scale, relationship-aware authorization.

Interview lesson: centralized authorization models can work at huge scale if the data model, caching, and consistency story are designed carefully.

### Netflix

- Netflix-style service-rich environments highlight the need for service identity, short-lived credentials, and resilient internal auth patterns.
- Streaming and control plane workloads also show why identity systems must stay available under very high traffic.

Interview lesson: internal service auth is not optional in large microservice systems.

### Uber

- Ride-sharing and marketplace architectures depend on strict service-to-service permissions, real-time risk checks, and strong tenant/user context propagation.
- Payment, dispatch, driver, and rider services cannot safely trust each other based only on network placement.

Interview lesson: identity context often flows through many services and must remain verifiable.

### Amazon

- AWS IAM is the public archetype for policy-heavy authorization with users, roles, resource policies, temporary credentials, and least privilege.

Interview lesson: enterprise-grade authorization is really a policy and identity modeling problem, not just a list of roles.

### GitHub

- GitHub demonstrates a mix of organization membership, teams, repository roles, OAuth apps, GitHub Apps, personal access tokens, and enterprise SSO.

Interview lesson: one product often needs several identity and authorization models at the same time.

### Stripe

- Stripe is a useful example for strong dashboard authentication, MFA, API keys, restricted keys, OAuth for Connect-style platforms, and careful access around money movement.

Interview lesson: high-risk actions need stronger auth, auditability, and granular permissions than low-risk read-only actions.

### Typical SaaS Systems

Most B2B SaaS products end up combining:

- email/password and social login for self-serve customers
- SSO for enterprise customers
- RBAC for admin/editor/viewer patterns
- ABAC or policy rules for sensitive workflows
- API tokens or OAuth for integrations
- service identity for microservices

---

## 17. Interview Discussion Guide

If asked to design identity and access for a backend system, structure your answer progressively.

### 17.1 Clarifying Questions

Ask:

- who are the subjects: end users, admins, services, partners?
- is this consumer, enterprise, or internal platform?
- are we designing login, third-party integration, or internal access control?
- what is the risk level: social app, fintech, healthcare, developer platform?
- do we need SSO, API access, or both?
- how fresh must revocation and permission changes be?

### 17.2 Good Interview Structure

1. Define identities and trust boundaries.
2. Choose authentication mechanism.
3. Choose continuity mechanism: session or token.
4. Design authorization model.
5. Address recovery, revocation, audit, and failure cases.
6. Address scale, caching, key rotation, and multi-region concerns.

### 17.3 Common Interview Comparisons

#### Sessions vs JWT

Use when the interviewer asks about stateful vs stateless auth.

| Question | Sessions answer | JWT answer |
| --- | --- | --- |
| Need instant logout? | Strong | Harder |
| Need local verification across services? | Weaker | Strong |
| Web app simplicity? | Often simpler | Often overcomplicated |
| Third-party API ecosystem? | Less natural | Better fit |

#### RBAC vs ABAC

| Question | RBAC | ABAC |
| --- | --- | --- |
| Easy admin mental model | Strong | Weaker |
| Fine-grained contextual rules | Weak | Strong |
| Risk of role explosion | High | Lower |
| Ease of debugging | Stronger | Harder |

#### SAML vs OIDC

| Question | SAML | OIDC |
| --- | --- | --- |
| Enterprise legacy support | Strong | Strong but varies |
| Modern web/mobile friendliness | Weaker | Strong |
| Developer ergonomics | Heavier | Better |

### 17.4 Scaling Considerations to Mention

- Redis or equivalent for shared sessions
- multi-region token verification and key distribution
- policy caching with invalidation strategy
- short-lived credentials for services
- audit event pipelines decoupled from critical-path latency
- abuse protection on login and signup

### 17.5 Failure Cases Worth Calling Out

- auth service outage blocks all logins
- Redis session store failure logs users out or prevents validation
- stale permissions cached after role removal
- signing key rotation breaks old verifiers
- refresh token theft leads to silent session hijack

Interview tip: explicitly talking about revocation, rotation, and failure handling is often what moves an answer from junior to strong mid-level or senior.

---

## 18. Common Mistakes and Best Practices

### Common Mistakes

- treating authentication and authorization as the same problem
- storing passwords with fast hashes
- putting too much trust in long-lived JWTs
- assuming "internal network" means trusted caller
- forgetting logout, revocation, and recovery flows
- doing authorization only at the gateway
- using email as the only durable identity key in enterprise federation
- failing to audit permission changes and admin actions
- building role systems that cannot express tenant or resource scope

### Best Practices

- separate identity proof, session/token continuity, and authorization policy clearly
- use slow password hashing and protect high-value secrets with KMS/HSM support
- prefer MFA and step-up authentication for sensitive actions
- keep access tokens short-lived and refresh tokens protected and rotated
- model tenant-aware roles and permissions explicitly
- centralize policy where consistency matters, but understand cache staleness tradeoffs
- use service identity and short-lived credentials internally
- build auditability and explainability into the system from the beginning

### Final Mental Model

If you remember one thing, remember this:

Identity and access is not one feature. It is a chain of connected systems:

- identity proof
- credential management
- session or token continuity
- authorization policy
- revocation and recovery
- service identity
- auditing and operations

Real systems succeed when all of these parts are designed together.

If one weak link exists, attackers and outages will find it.

---

## Quick Review Checklist

Use this when revising for interviews.

- Can I clearly explain AuthN vs AuthZ?
- Do I know when to use sessions vs JWTs?
- Can I explain password hashing, salts, peppers, and MFA tradeoffs?
- Can I walk through OAuth authorization code flow with PKCE?
- Can I explain SAML vs OIDC and IdP vs SP?
- Can I compare RBAC and ABAC with examples?
- Can I describe revocation, logout, token rotation, and password reset securely?
- Can I explain service-to-service auth, mTLS, and zero trust?
- Can I describe where policy enforcement should happen in a real system?

If the answer is yes to those questions, your identity and access fundamentals are strong enough for most software engineering interview discussions and practical backend design conversations.