1809 lines
58 KiB
Markdown
1809 lines
58 KiB
Markdown
# 2. Identity & Access
|
|
|
|
Identity and access is the control plane for nearly every backend system. It answers four questions for every request:
|
|
|
|
1. Who is calling?
|
|
2. How do we know they are really that caller?
|
|
3. What are they allowed to do right now?
|
|
4. How do we prove later that the decision was correct?
|
|
|
|
If you understand identity and access well, you can reason about login systems, sessions, JWTs, OAuth integrations, enterprise SSO, authorization policies, service-to-service security, and zero-trust architecture as one connected system rather than as isolated buzzwords.
|
|
|
|
This guide is written for two goals at the same time:
|
|
|
|
- interview preparation
|
|
- real-world backend and system design understanding
|
|
|
|
The emphasis is practical. The goal is not to memorize definitions, but to understand why these systems exist, how they fail, and how production systems are actually built.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. Why Identity & Access Exists
|
|
2. Core Concepts and Mental Model
|
|
3. Authentication Fundamentals
|
|
4. Login and Signup
|
|
5. Sessions
|
|
6. JWT and Token-Based Authentication
|
|
7. OAuth
|
|
8. SSO: SAML and OIDC
|
|
9. Password Reset
|
|
10. Authorization Fundamentals
|
|
11. RBAC
|
|
12. ABAC
|
|
13. Permissions and Access Control
|
|
14. Service-to-Service Authentication
|
|
15. How These Systems Fit Together
|
|
16. Real-World Patterns and Company Examples
|
|
17. Interview Discussion Guide
|
|
18. Common Mistakes and Best Practices
|
|
|
|
---
|
|
|
|
## 1. Why Identity & Access Exists
|
|
|
|
Most systems are multi-user, multi-device, multi-service, and increasingly multi-tenant. Without identity and access controls, the backend has no safe way to distinguish:
|
|
|
|
- one user from another
|
|
- a user from an attacker
|
|
- an employee from a customer
|
|
- a production service from a compromised internal service
|
|
- a legitimate action from a replayed or forged request
|
|
|
|
At small scale, identity and access looks like a login form plus a password check. At production scale, it becomes much bigger:
|
|
|
|
- account creation and identity proofing
|
|
- credential storage and recovery
|
|
- MFA and risk detection
|
|
- sessions and token lifecycle management
|
|
- delegated access via OAuth
|
|
- enterprise federation via SSO
|
|
- role and policy evaluation
|
|
- service identity inside microservices
|
|
- auditing, revocation, key rotation, and incident response
|
|
|
|
The reason interviews ask about identity and access so often is simple: it touches security, data modeling, distributed systems, product tradeoffs, and failure handling all at once.
|
|
|
|
### The Core Tension
|
|
|
|
Identity systems always balance three goals:
|
|
|
|
| Goal | What it means | Why it is hard |
|
|
| --- | --- | --- |
|
|
| Security | Prevent impersonation and unauthorized access | Stronger security usually adds friction |
|
|
| Usability | Let real users sign in quickly and recover safely | Easier flows are often easier to abuse |
|
|
| Scalability | Support huge traffic, many services, and many tenants | Distributed state and revocation become harder |
|
|
|
|
An excellent backend engineer treats identity not as a feature checkbox, but as a reliability and security subsystem.
|
|
|
|
---
|
|
|
|
## 2. Core Concepts and Mental Model
|
|
|
|
Before discussing flows, build the right mental model.
|
|
|
|
### Important Terms
|
|
|
|
| Term | Meaning | Practical intuition |
|
|
| --- | --- | --- |
|
|
| Identity | The subject being represented | A user, admin, device, service, or organization |
|
|
| Authentication (AuthN) | Verifying who the subject is | "Prove you are Alice" |
|
|
| Authorization (AuthZ) | Deciding what that subject may do | "Can Alice read invoice 123?" |
|
|
| Session | Server-recognized authenticated continuity over time | "This browser remains logged in" |
|
|
| Access token | Credential presented to APIs | Often short-lived |
|
|
| Refresh token | Credential used to obtain new access tokens | More sensitive than access tokens |
|
|
| Identity Provider (IdP) | System that authenticates identities | Google, Okta, Azure AD |
|
|
| Service Provider / Relying Party | App that trusts the IdP | Your SaaS product |
|
|
| Policy engine | Evaluates access rules | RBAC, ABAC, ReBAC, custom rules |
|
|
| Audit log | Immutable trail of security-relevant events | Needed for forensics and compliance |
|
|
|
|
### One Request Through the System
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
actor User
|
|
participant Client
|
|
participant Edge as API Gateway / Edge
|
|
participant Auth as Auth Service
|
|
participant Policy as Policy Engine
|
|
participant App as Business Service
|
|
participant Data as Data Store
|
|
|
|
User->>Client: Click "View invoice"
|
|
Client->>Edge: GET /invoices/123 + cookie/token
|
|
Edge->>Auth: Validate session/token
|
|
Auth-->>Edge: subject, tenant, auth strength, claims
|
|
Edge->>Policy: Can subject read invoice 123?
|
|
Policy-->>Edge: allow/deny + reason
|
|
Edge->>App: Forward authenticated request
|
|
App->>Data: Load resource
|
|
Data-->>App: Resource data
|
|
App-->>Client: 200 OK or 403 Forbidden
|
|
```
|
|
|
|
This is the simplest correct mental model:
|
|
|
|
- authentication establishes identity
|
|
- authorization evaluates permissions for the requested action
|
|
- business logic executes only after those checks
|
|
- the decision should be observable and auditable
|
|
|
|
### A Production Identity Stack
|
|
|
|
In a real system, identity and access usually spans these components:
|
|
|
|
| Component | Typical responsibility |
|
|
| --- | --- |
|
|
| Auth service | Login, signup, password verification, MFA, token issuance |
|
|
| User directory | Users, credentials metadata, verification state, tenant membership |
|
|
| Session store | Server-side sessions and revocation state |
|
|
| Token service | Access token and refresh token lifecycle |
|
|
| Policy engine | Role/attribute-based access decisions |
|
|
| Key management | Signing keys, encryption keys, secret rotation |
|
|
| Audit pipeline | Security events, admin actions, login failures, policy decisions |
|
|
| Risk engine | Rate limits, device reputation, fraud checks, anomaly detection |
|
|
|
|
Interview shortcut: if you can clearly separate authentication, session/token management, and authorization, you already sound more senior than candidates who collapse them into one vague "auth layer".
|
|
|
|
---
|
|
|
|
## 3. Authentication Fundamentals
|
|
|
|
Authentication is the process of verifying identity claims. The claim is usually, "I am user X" or "I am service Y".
|
|
|
|
### 3.1 Identity Verification Basics
|
|
|
|
Authentication depends on evidence. The most common categories are:
|
|
|
|
| Factor | Example | Strengths | Weaknesses |
|
|
| --- | --- | --- | --- |
|
|
| Something you know | Password, PIN | Familiar, cheap | Can be guessed, phished, reused |
|
|
| Something you have | Phone, authenticator app, hardware key | Stronger than passwords alone | Device loss, recovery complexity |
|
|
| Something you are | Fingerprint, Face ID | Convenient on-device UX | Biometric recovery and privacy concerns |
|
|
|
|
Important nuance: many systems do not verify a human's real-world identity. They verify control over a credential. For example:
|
|
|
|
- password login verifies knowledge of a password
|
|
- email verification verifies access to an inbox
|
|
- TOTP verifies possession of a seed-bound authenticator
|
|
- passkeys verify possession of a private key and user presence
|
|
|
|
That is why identity systems often talk about assurance levels rather than absolute truth.
|
|
|
|
### 3.2 Identifiers vs Authenticators
|
|
|
|
Two concepts are often mixed up:
|
|
|
|
- an identifier tells the system which subject is being referenced
|
|
- an authenticator proves control over that identity
|
|
|
|
Examples:
|
|
|
|
- `alice@example.com` is an identifier
|
|
- the password, passkey, or OAuth login is the authenticator
|
|
|
|
Production systems often support multiple identifiers for the same user:
|
|
|
|
- email
|
|
- username
|
|
- phone number
|
|
- enterprise SSO subject ID
|
|
- internal immutable user ID
|
|
|
|
Best practice: use a stable internal user ID as the true primary key, even if the login identifier changes.
|
|
|
|
### 3.3 Credential Storage
|
|
|
|
This is one of the most common interview topics because it separates surface-level knowledge from real engineering understanding.
|
|
|
|
#### Never store plaintext passwords
|
|
|
|
If a database leak reveals plaintext passwords, the incident is catastrophic. Attackers will also try the same passwords on other services because users reuse credentials.
|
|
|
|
#### Store password hashes, not passwords
|
|
|
|
The flow is:
|
|
|
|
1. User submits password.
|
|
2. Server generates a per-user salt.
|
|
3. Server applies a slow password hashing algorithm.
|
|
4. Server stores the resulting hash and metadata.
|
|
5. On login, the server recomputes and compares.
|
|
|
|
Good password hashing algorithms are intentionally expensive. That is the point. They make offline brute force attacks slower.
|
|
|
|
| Algorithm | Typical status | Why it matters |
|
|
| --- | --- | --- |
|
|
| Argon2id | Best modern default | Memory-hard and resistant to GPU attacks |
|
|
| bcrypt | Still common and acceptable | Widely supported, battle-tested |
|
|
| PBKDF2 | Common in legacy and regulated systems | Safer than fast hashes, but less ideal than Argon2id |
|
|
| SHA-256 / MD5 alone | Unsafe for password storage | Too fast, easy to brute force |
|
|
|
|
#### Salt and Pepper
|
|
|
|
| Mechanism | Purpose |
|
|
| --- | --- |
|
|
| Salt | Unique random value per password; prevents rainbow-table reuse |
|
|
| Pepper | Extra secret held outside the user table, often in KMS/HSM; raises attack cost after DB leaks |
|
|
|
|
#### Practical Storage Pattern
|
|
|
|
- store algorithm name and parameters with the hash
|
|
- use constant-time comparison to reduce timing leakage
|
|
- rehash on login when old parameters are outdated
|
|
- keep password policy reasonable; massive composition rules often lead to weaker behavior
|
|
|
|
#### Interview depth point
|
|
|
|
If an interviewer asks, "Why use bcrypt or Argon2 instead of SHA-256?", the real answer is not just "because it is more secure". The real answer is:
|
|
|
|
- password databases are often attacked offline after leaks
|
|
- attackers can run billions of SHA-256 hashes quickly
|
|
- slow, memory-hard algorithms make each guess expensive
|
|
- cost parameters can be tuned as hardware improves
|
|
|
|
### 3.4 MFA Basics
|
|
|
|
Multi-factor authentication exists because passwords are a weak single point of failure.
|
|
|
|
Common MFA methods:
|
|
|
|
| Method | Security level | Practical notes |
|
|
| --- | --- | --- |
|
|
| SMS OTP | Low to medium | Vulnerable to SIM swap and phishing |
|
|
| Email OTP | Low | Better than nothing, but email is often the same recovery channel |
|
|
| TOTP app | Medium | Common and cheap; still phishable |
|
|
| Push approval | Medium | Good UX, but push fatigue attacks exist |
|
|
| WebAuthn / passkeys / hardware keys | High | Strong phishing resistance |
|
|
|
|
Production systems often use risk-based MFA rather than always prompting:
|
|
|
|
- new device
|
|
- new geography
|
|
- impossible travel
|
|
- admin action
|
|
- payout or billing change
|
|
- password reset or recovery event
|
|
|
|
This is called step-up authentication.
|
|
|
|
#### Recovery Matters
|
|
|
|
Many teams design MFA setup but forget MFA recovery. Good systems provide:
|
|
|
|
- recovery codes
|
|
- alternate authenticators
|
|
- carefully controlled support workflows
|
|
|
|
The recovery flow is often more attackable than the MFA flow itself.
|
|
|
|
### 3.5 Email Verification
|
|
|
|
Email verification usually proves inbox control, not human identity. It exists to:
|
|
|
|
- reduce fake or mistyped accounts
|
|
- ensure password reset reachability
|
|
- protect downstream systems from garbage identities
|
|
- support trust in notifications, billing, and invites
|
|
|
|
Good implementation details:
|
|
|
|
- generate a random, single-use token
|
|
- store only a hash of the token server-side if possible
|
|
- apply a short TTL
|
|
- invalidate older outstanding verification tokens after a new one is issued
|
|
- avoid leaking whether the account exists during resend flows
|
|
|
|
### 3.6 Device Trust
|
|
|
|
Device trust tries to answer, "Is this a previously seen, low-risk device?"
|
|
|
|
Typical signals:
|
|
|
|
- long-lived device cookie
|
|
- browser fingerprinting or device metadata
|
|
- last successful MFA on that device
|
|
- IP reputation and ASN patterns
|
|
- OS or app attestation on mobile
|
|
|
|
Device trust is useful, but dangerous if over-trusted. Devices are compromiseable. Cookies can be stolen. Browsers change. Treat device trust as a risk signal, not a source of truth.
|
|
|
|
### Authentication Failure Cases
|
|
|
|
- weak password hashing leads to offline cracking after DB leaks
|
|
- email verification links are reusable or never expire
|
|
- MFA recovery bypasses stronger checks
|
|
- account enumeration leaks whether an email exists
|
|
- social login accounts are linked incorrectly to existing local accounts
|
|
- device trust becomes an authorization shortcut instead of a risk signal
|
|
|
|
### Authentication Best Practices
|
|
|
|
- prefer Argon2id or bcrypt for passwords
|
|
- rate-limit login, signup, reset, and verification endpoints
|
|
- use MFA for privileged users and step-up auth for sensitive actions
|
|
- log auth events with context, but never log secrets or raw passwords
|
|
- design credential rotation and recovery before launch, not after an incident
|
|
|
|
---
|
|
|
|
## 4. Login and Signup
|
|
|
|
Signup and login flows are the public entry points to your system. They are also some of the most attacked endpoints you will ever run.
|
|
|
|
### 4.1 Signup Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
actor User
|
|
participant Browser
|
|
participant Auth as Auth API
|
|
participant Risk as Risk / Abuse Service
|
|
participant Users as User DB
|
|
participant Mail as Email Service
|
|
participant Session as Session Store
|
|
|
|
User->>Browser: Submit email + password
|
|
Browser->>Auth: POST /signup
|
|
Auth->>Risk: Check IP, velocity, disposable email, device
|
|
Risk-->>Auth: risk score / allow / challenge
|
|
Auth->>Users: Create pending account + password hash
|
|
Auth->>Mail: Send verification link
|
|
Mail-->>User: Verification email
|
|
User->>Browser: Click link
|
|
Browser->>Auth: GET /verify?token=...
|
|
Auth->>Users: Mark email verified
|
|
Auth->>Session: Create session
|
|
Auth-->>Browser: Set secure auth cookie
|
|
```
|
|
|
|
#### What actually happens in production
|
|
|
|
A robust signup flow usually includes:
|
|
|
|
1. Input normalization
|
|
Normalize email casing rules carefully, trim whitespace, reject obvious malformed values.
|
|
2. Abuse screening
|
|
IP reputation, rate limits, disposable email detection, CAPTCHA when needed, device velocity, and signup bursts by network.
|
|
3. Account creation state
|
|
Many systems create users in a `pending_verification` state first.
|
|
4. Email verification
|
|
The account may exist but have limited capabilities until verified.
|
|
5. Bootstrap domain objects
|
|
For SaaS, create workspace, tenant, default role, billing state, and onboarding tasks.
|
|
6. Initial session issuance
|
|
Some systems log the user in immediately after verification. Others require explicit login.
|
|
|
|
#### Why pending state matters
|
|
|
|
If you create fully active accounts before verification, you may end up with:
|
|
|
|
- abandoned fake tenants
|
|
- spammed invites or API abuse
|
|
- polluted analytics and billing pipelines
|
|
|
|
### 4.2 Login Flow
|
|
|
|
The login flow is simpler than signup conceptually, but much more operationally sensitive.
|
|
|
|
Common steps:
|
|
|
|
1. Identify account by email/username/federated ID.
|
|
2. Fetch credential metadata and account status.
|
|
3. Verify password or federated assertion.
|
|
4. Evaluate account risk and MFA policy.
|
|
5. Create session or issue tokens.
|
|
6. Log success or failure for audit and anomaly detection.
|
|
|
|
A production login decision often depends on more than a password:
|
|
|
|
- account locked or disabled?
|
|
- tenant suspended?
|
|
- email verified?
|
|
- MFA enrolled?
|
|
- device known?
|
|
- unusual geography?
|
|
- refresh token family compromised?
|
|
|
|
### 4.3 Signup Verification and Fraud Prevention Basics
|
|
|
|
Fraud prevention is not just a payments problem. Identity systems are abused for:
|
|
|
|
- spam account creation
|
|
- credential stuffing
|
|
- promo abuse
|
|
- referral fraud
|
|
- fake trial creation
|
|
- scraping and automated signups
|
|
|
|
Basic but effective controls:
|
|
|
|
| Control | What it helps with |
|
|
| --- | --- |
|
|
| Rate limiting by IP and identifier | Brute force and signup bursts |
|
|
| Device and IP reputation | Known bad networks and bots |
|
|
| CAPTCHA or challenge step-up | Automated abuse at suspicious thresholds |
|
|
| Email domain heuristics | Disposable inboxes, typo domains |
|
|
| Phone verification for high-risk cases | Raises attacker cost |
|
|
| Idempotency keys on signup APIs | Retry safety without duplicate accounts |
|
|
|
|
Interview point: fraud controls are part of auth architecture because attackers do not politely separate "security" from "growth" endpoints.
|
|
|
|
### 4.4 Social Login Considerations
|
|
|
|
"Login with Google" or "Login with GitHub" improves user experience, but introduces federation complexity.
|
|
|
|
Benefits:
|
|
|
|
- no local password to manage
|
|
- faster onboarding
|
|
- higher conversion for some user segments
|
|
|
|
Risks and edge cases:
|
|
|
|
- provider outage affects sign-in
|
|
- incorrect account linking can cause account takeover
|
|
- email from provider may be unverified or not globally unique in the way you assume
|
|
- enterprise customers may not want personal social identities linked to business workspaces
|
|
|
|
Best practice for account linking:
|
|
|
|
- if a social identity is new, do not blindly attach it to a local account just because the email matches
|
|
- require proof of control or signed-in confirmation before linking to an existing account
|
|
|
|
### 4.5 Onboarding Architecture
|
|
|
|
Signup is not just about auth. It often triggers business setup:
|
|
|
|
- create personal or team workspace
|
|
- assign owner role
|
|
- seed settings and notification preferences
|
|
- create billing customer object
|
|
- publish analytics and onboarding events
|
|
|
|
This makes signup a distributed workflow. Real systems often handle it with:
|
|
|
|
- synchronous creation for the minimum needed to log in
|
|
- async events for non-critical setup
|
|
- idempotent consumers to avoid duplicate workspaces or billing objects
|
|
|
|
### Login and Signup Failure Cases
|
|
|
|
- verification emails delayed or blocked, leaving users in limbo
|
|
- duplicate accounts created because signup is not idempotent
|
|
- support team manually verifies accounts in insecure ways
|
|
- social and password accounts merge incorrectly
|
|
- signup path leaks which emails already exist
|
|
|
|
### Login and Signup Best Practices
|
|
|
|
- keep the critical path small and reliable
|
|
- separate abuse checks from core credential logic, but make them part of the final decision
|
|
- use generic error messages externally and detailed audit logs internally
|
|
- make signup and login events observable with metrics and tracing
|
|
|
|
---
|
|
|
|
## 5. Sessions
|
|
|
|
Sessions are the classic way to keep users logged in across multiple HTTP requests.
|
|
|
|
### 5.1 What a Session Really Is
|
|
|
|
A session means the server has already authenticated the user and stores an authenticated state keyed by a session identifier.
|
|
|
|
Typical flow:
|
|
|
|
1. User logs in successfully.
|
|
2. Server creates a session record.
|
|
3. Server sends the client a session ID in a cookie.
|
|
4. Client sends the cookie on future requests.
|
|
5. Server looks up session state and reconstructs identity.
|
|
|
|
### 5.2 Server-Side Sessions
|
|
|
|
In server-side session architecture, the browser usually only stores an opaque identifier.
|
|
|
|
Example session data:
|
|
|
|
- user ID
|
|
- tenant ID
|
|
- auth strength or MFA state
|
|
- issued time and last activity time
|
|
- device metadata
|
|
- CSRF-related state
|
|
|
|
Advantages:
|
|
|
|
- easy revocation
|
|
- easy logout across devices
|
|
- server fully controls state
|
|
- easy to add security flags or session versioning
|
|
|
|
Disadvantages:
|
|
|
|
- needs a session store lookup
|
|
- requires shared state across app instances
|
|
- harder to scale if poorly designed
|
|
|
|
### 5.3 Redis-Backed Sessions
|
|
|
|
Redis is a very common session backend because it is fast, supports TTL, and works well as shared ephemeral state.
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Browser[Browser with secure cookie] --> LB[Load Balancer]
|
|
LB --> App1[App Instance A]
|
|
LB --> App2[App Instance B]
|
|
App1 --> Redis[(Redis Session Store)]
|
|
App2 --> Redis
|
|
Redis --> Audit[Audit / Security Events]
|
|
```
|
|
|
|
Why Redis is popular for sessions:
|
|
|
|
- low-latency reads and writes
|
|
- TTL expiration built in
|
|
- simple key-value model
|
|
- easy fit for horizontally scaled app fleets
|
|
|
|
Scaling considerations:
|
|
|
|
- shard or cluster if session volume is high
|
|
- replicate carefully; understand failover and session loss behavior
|
|
- monitor hot keys and uneven access patterns
|
|
- decide whether to refresh TTL on every request or on a sliding window
|
|
|
|
### 5.4 Cookie Security
|
|
|
|
Session security depends heavily on cookie configuration.
|
|
|
|
| Cookie attribute | Why it matters |
|
|
| --- | --- |
|
|
| `HttpOnly` | Prevents JavaScript from reading the cookie, reducing XSS impact |
|
|
| `Secure` | Sends cookie only over HTTPS |
|
|
| `SameSite=Lax/Strict` | Reduces CSRF risk from cross-site requests |
|
|
| Domain scoping | Prevents unintended subdomain sharing |
|
|
| Path scoping | Limits where the cookie is sent |
|
|
| Expiry / Max-Age | Controls session persistence |
|
|
|
|
Important nuance:
|
|
|
|
- `HttpOnly` helps against token theft by frontend JavaScript
|
|
- `SameSite` helps against CSRF
|
|
- neither one fixes everything if the app has deeper logic flaws
|
|
|
|
### 5.5 Session Invalidation
|
|
|
|
Session invalidation is one reason server-side sessions remain attractive.
|
|
|
|
You can revoke sessions when:
|
|
|
|
- user logs out
|
|
- password changes
|
|
- MFA is reset
|
|
- admin disables the account
|
|
- suspicious activity is detected
|
|
|
|
Common implementation patterns:
|
|
|
|
- delete the session record outright
|
|
- mark session version or user auth version and reject old versions
|
|
- keep a device/session list per user for device management UI
|
|
|
|
### 5.6 Logout Challenges
|
|
|
|
Logout sounds trivial, but it is easy to implement incompletely.
|
|
|
|
Problems include:
|
|
|
|
- logout only clears client cookie but leaves server session valid
|
|
- user has multiple active devices and expects global logout
|
|
- session persists in mobile apps with long polling or background refresh
|
|
- cached pages or in-flight requests still complete after logout
|
|
|
|
Good logout design answers:
|
|
|
|
- single device logout or all devices?
|
|
- immediate revocation or eventual consistency?
|
|
- what about concurrent refresh operations?
|
|
|
|
### 5.7 Session Security Issues
|
|
|
|
| Problem | Meaning | Mitigation |
|
|
| --- | --- | --- |
|
|
| Session fixation | Attacker forces victim to use known session ID | Regenerate session ID after login |
|
|
| CSRF | Browser auto-sends cookies on forged cross-site requests | `SameSite`, CSRF tokens, origin checks |
|
|
| Session hijacking | Session token is stolen | HTTPS, `HttpOnly`, device/risk checks, short idle timeouts |
|
|
| Store outage | Session backend unavailable | Fallback behavior, multi-AZ design, graceful degradation |
|
|
|
|
### Sessions in Interviews
|
|
|
|
A good interview answer on sessions usually includes:
|
|
|
|
- opaque session ID in secure cookie
|
|
- shared store like Redis
|
|
- session regeneration after login
|
|
- revocation and logout semantics
|
|
- CSRF protections
|
|
- sliding vs absolute expiration tradeoff
|
|
|
|
---
|
|
|
|
## 6. JWT and Token-Based Authentication
|
|
|
|
JWTs are one of the most discussed and most misunderstood identity topics.
|
|
|
|
### 6.1 What a JWT Is
|
|
|
|
JWT stands for JSON Web Token. It is a compact, self-contained token format commonly used to carry claims.
|
|
|
|
A JWT typically has three parts:
|
|
|
|
`header.payload.signature`
|
|
|
|
- header: algorithm and metadata
|
|
- payload: claims such as subject, issuer, audience, expiry
|
|
- signature: proves integrity if signed correctly
|
|
|
|
Important practical truth: signed JWTs are not secret by default. They are encoded, not hidden. Anyone holding the token can often read the claims.
|
|
|
|
### 6.2 Signing vs Encryption
|
|
|
|
| Mechanism | What it guarantees | Practical meaning |
|
|
| --- | --- | --- |
|
|
| Signing (JWS) | Integrity and authenticity | Token was issued by trusted signer and not modified |
|
|
| Encryption (JWE) | Confidentiality | Token contents are hidden from intermediaries/clients |
|
|
|
|
Most production JWT usage is signed, not encrypted.
|
|
|
|
That means:
|
|
|
|
- do not put secrets in JWT payloads
|
|
- do not put more PII than necessary
|
|
- use claims for identity and authorization hints, not as a dumping ground
|
|
|
|
### 6.3 Access Tokens vs Refresh Tokens
|
|
|
|
| Token type | Lifetime | Used by | Main purpose |
|
|
| --- | --- | --- | --- |
|
|
| Access token | Short-lived | APIs | Authorize a request |
|
|
| Refresh token | Longer-lived | Auth client / backend | Obtain new access tokens |
|
|
|
|
Best practice:
|
|
|
|
- keep access tokens short-lived
|
|
- treat refresh tokens as highly sensitive credentials
|
|
- store refresh tokens more carefully than access tokens
|
|
|
|
### 6.4 Why Teams Use JWTs
|
|
|
|
Benefits:
|
|
|
|
- easy for distributed services to verify locally
|
|
- no session store lookup on every request if verification is local
|
|
- good fit for API ecosystems and delegated access
|
|
- works well across domains and service boundaries
|
|
|
|
Costs:
|
|
|
|
- revocation is harder
|
|
- permissions embedded in tokens can become stale
|
|
- key rotation and issuer validation must be done correctly
|
|
- token size can grow dangerously if you stuff too many claims inside
|
|
|
|
### 6.5 Token Rotation
|
|
|
|
Refresh token rotation is a major real-world security mechanism.
|
|
|
|
Idea:
|
|
|
|
- every refresh use invalidates the previous refresh token
|
|
- the auth server issues a new refresh token and new access token
|
|
- if an old refresh token is reused, the server assumes theft and can revoke the token family
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
actor User
|
|
participant Client
|
|
participant Auth as Auth Server
|
|
participant Store as Token Store
|
|
|
|
User->>Client: Continue using app
|
|
Client->>Auth: POST /token/refresh with refresh token
|
|
Auth->>Store: Validate token family and prior use
|
|
Store-->>Auth: valid / reused / revoked
|
|
Auth-->>Client: New access token + new refresh token
|
|
Auth->>Store: Mark old token used, persist new token state
|
|
```
|
|
|
|
### 6.6 Revocation Challenges
|
|
|
|
Revocation is the biggest practical downside of stateless tokens.
|
|
|
|
If an access token is self-contained and valid until `exp`, then after it is issued:
|
|
|
|
- the user may be disabled
|
|
- permissions may change
|
|
- a tenant may be suspended
|
|
- the token may be stolen
|
|
|
|
But the token may still verify cryptographically.
|
|
|
|
Mitigations:
|
|
|
|
- short access token TTLs
|
|
- refresh token rotation
|
|
- revocation list or denylist for critical cases
|
|
- user/session version claim checked against server state
|
|
- opaque tokens with introspection for high-control environments
|
|
|
|
### 6.7 Stateless Auth Tradeoffs
|
|
|
|
This is a favorite interview question: "Should I use JWT or sessions?"
|
|
|
|
The mature answer is not dogmatic. It depends.
|
|
|
|
| Topic | Server-side sessions | JWT |
|
|
| --- | --- | --- |
|
|
| Request-time state lookup | Usually yes | Not always |
|
|
| Easy revocation | Yes | Harder |
|
|
| Cross-service portability | Moderate | Strong |
|
|
| Simplicity for web apps | Often simpler | Often overused |
|
|
| Risk of stale claims | Lower | Higher |
|
|
| CSRF concern if cookie-based | Yes | Yes if stored in cookies |
|
|
| XSS risk if JS-accessible storage | Lower with `HttpOnly` cookies | Higher if stored in localStorage |
|
|
|
|
A practical rule:
|
|
|
|
- for traditional web apps, server-side sessions are often simpler and safer
|
|
- for API ecosystems, third-party integrations, and distributed service verification, tokens are often the better fit
|
|
|
|
### 6.8 Common JWT Mistakes
|
|
|
|
- storing JWTs in `localStorage` without carefully thinking through XSS risk
|
|
- placing roles and permissions in long-lived tokens and forgetting they go stale
|
|
- not checking `iss`, `aud`, `exp`, `nbf`, and key identifiers properly
|
|
- using symmetric signing keys everywhere and spreading them across many services
|
|
- putting secrets or excessive PII in token payloads
|
|
|
|
### JWT Best Practices
|
|
|
|
- prefer asymmetric signing for shared verification environments
|
|
- expose public keys via a JWKS endpoint if multiple verifiers exist
|
|
- keep access tokens short-lived
|
|
- rotate signing keys safely and support key overlap during rotation
|
|
- use opaque tokens or introspection if real-time revocation is a hard requirement
|
|
|
|
---
|
|
|
|
## 7. OAuth
|
|
|
|
OAuth solves delegated authorization. It lets one application access another application's resources on behalf of a user without receiving the user's password.
|
|
|
|
### 7.1 The Problem OAuth Solves
|
|
|
|
Without OAuth, a user might give App A their password to App B. That is unacceptable because:
|
|
|
|
- App A can now do anything the user can do
|
|
- App B cannot scope access cleanly
|
|
- the user cannot revoke just that delegated access safely
|
|
|
|
OAuth introduces a safer model:
|
|
|
|
- user authenticates with the authorization server / IdP
|
|
- user consents to limited scopes
|
|
- client receives tokens with bounded permissions
|
|
|
|
### 7.2 Authorization Code Flow with PKCE
|
|
|
|
This is the modern default for browser and mobile-friendly public clients.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
actor User
|
|
participant Client as SaaS App
|
|
participant Browser
|
|
participant AS as Authorization Server / IdP
|
|
participant API as Third-Party API
|
|
|
|
User->>Client: Click "Connect Google Drive"
|
|
Client->>Browser: Redirect to /authorize + scope + code_challenge
|
|
Browser->>AS: Login and grant consent
|
|
AS-->>Browser: Redirect back with authorization code
|
|
Browser->>Client: Deliver authorization code
|
|
Client->>AS: Exchange code + code_verifier
|
|
AS-->>Client: Access token (+ refresh token)
|
|
Client->>API: Call API with access token
|
|
API-->>Client: Protected resource data
|
|
```
|
|
|
|
#### Why PKCE exists
|
|
|
|
PKCE protects the code exchange step so a stolen authorization code is less useful. It is critical for public clients such as SPAs and mobile apps.
|
|
|
|
### 7.3 Scopes
|
|
|
|
Scopes define the breadth of access. Examples:
|
|
|
|
- `read:user`
|
|
- `repo:write`
|
|
- `payments:refunds`
|
|
- `calendar.readonly`
|
|
|
|
Good scope design is product design plus security design.
|
|
|
|
If scopes are too broad:
|
|
|
|
- users lose trust
|
|
- integrations become over-privileged
|
|
- incident blast radius increases
|
|
|
|
If scopes are too granular:
|
|
|
|
- consent screens become confusing
|
|
- implementation complexity rises
|
|
- developers ask for full access anyway
|
|
|
|
### 7.4 Consent Screens
|
|
|
|
Consent is the user-visible manifestation of delegated access.
|
|
|
|
Good consent screens answer:
|
|
|
|
- who is requesting access?
|
|
- to which data or actions?
|
|
- for how long?
|
|
- can the user revoke later?
|
|
|
|
This matters a lot in SaaS ecosystems like Google Workspace, GitHub Apps, or Slack apps.
|
|
|
|
### 7.5 Refresh Tokens in OAuth
|
|
|
|
Long-running integrations often need refresh tokens so they can keep calling APIs without asking the user to re-consent constantly.
|
|
|
|
Refresh token concerns:
|
|
|
|
- high-value credential theft risk
|
|
- need for rotation and revocation
|
|
- tenant admins may want centralized revocation controls
|
|
|
|
### 7.6 Third-Party Integrations
|
|
|
|
In real SaaS systems, OAuth is often used for:
|
|
|
|
- connecting Google Drive, GitHub, Slack, Salesforce, Stripe, or Dropbox
|
|
- importing or exporting data
|
|
- posting to external systems on behalf of the user or workspace
|
|
|
|
Architectural consequences:
|
|
|
|
- store provider account linkage metadata
|
|
- encrypt or otherwise protect provider refresh tokens
|
|
- model scopes per installation or workspace
|
|
- surface admin controls for revocation and reauthorization
|
|
|
|
### 7.7 OAuth vs Authentication
|
|
|
|
OAuth is about authorization. Authentication is not the original purpose of OAuth.
|
|
|
|
However, many products use OAuth plus an identity layer such as OpenID Connect to support "Sign in with Google".
|
|
|
|
Interview nuance: saying "OAuth is login" is incomplete. Better answer:
|
|
|
|
- OAuth is delegated authorization
|
|
- OIDC adds identity information for authentication use cases
|
|
|
|
### OAuth Failure Cases
|
|
|
|
- client stores provider tokens insecurely
|
|
- redirect URI validation is weak
|
|
- state parameter not used correctly, enabling CSRF-like attacks in auth flows
|
|
- scopes are excessively broad
|
|
- tenants cannot audit or revoke third-party access easily
|
|
|
|
---
|
|
|
|
## 8. SSO: SAML and OIDC
|
|
|
|
Enterprise customers often do not want each SaaS app to manage a separate corporate password. They want central identity, central policy, and controlled employee access. That is where SSO comes in.
|
|
|
|
### 8.1 Identity Provider vs Service Provider
|
|
|
|
| Role | Meaning |
|
|
| --- | --- |
|
|
| Identity Provider (IdP) | System that authenticates the employee, such as Okta, Azure AD, Google Workspace |
|
|
| Service Provider (SP) / Relying Party (RP) | The SaaS application that trusts the IdP |
|
|
|
|
### 8.2 SAML Basics
|
|
|
|
SAML is older, XML-based, and still heavily used in enterprise environments.
|
|
|
|
Mental model:
|
|
|
|
- user tries to access the SaaS app
|
|
- SaaS redirects user to corporate IdP
|
|
- IdP authenticates user
|
|
- IdP sends signed assertion back to SaaS
|
|
- SaaS creates a local session
|
|
|
|
Strengths:
|
|
|
|
- entrenched in enterprise IT
|
|
- widely supported by corporate identity systems
|
|
|
|
Costs:
|
|
|
|
- XML complexity
|
|
- harder developer ergonomics
|
|
- trickier debugging and implementation compared with OIDC
|
|
|
|
### 8.3 OIDC Basics
|
|
|
|
OpenID Connect is an identity layer on top of OAuth 2.0.
|
|
|
|
It provides:
|
|
|
|
- ID tokens with identity claims
|
|
- standardized login flows
|
|
- better fit for modern web and mobile apps
|
|
|
|
OIDC is usually easier to work with than SAML for modern applications.
|
|
|
|
### 8.4 SAML vs OIDC
|
|
|
|
| Topic | SAML | OIDC |
|
|
| --- | --- | --- |
|
|
| Typical format | XML assertions | JSON tokens |
|
|
| Common use case | Enterprise browser SSO | Modern app login and API ecosystems |
|
|
| Developer ergonomics | Heavier | Easier |
|
|
| Mobile/API friendliness | Weaker | Stronger |
|
|
|
|
### 8.5 Enterprise Architecture
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Employee[Employee] --> SaaS[Your SaaS App]
|
|
SaaS --> IdP[Enterprise IdP]
|
|
IdP --> SaaS
|
|
IdP --> Directory[Corporate Directory]
|
|
IdP --> SCIM[Provisioning / SCIM]
|
|
SCIM --> SaaS
|
|
SaaS --> Policy[Workspace Roles and Policies]
|
|
```
|
|
|
|
In production, enterprise identity usually includes two separate concerns:
|
|
|
|
- authentication and SSO
|
|
- lifecycle management and provisioning
|
|
|
|
Provisioning is often handled with SCIM or similar directory sync mechanisms so the SaaS app knows:
|
|
|
|
- who exists
|
|
- which groups they belong to
|
|
- who has been deprovisioned
|
|
|
|
### 8.6 Common Enterprise Requirements
|
|
|
|
- just-in-time user creation on first login
|
|
- domain verification to prove company ownership
|
|
- group-to-role mapping
|
|
- forced MFA at IdP level
|
|
- admin-controlled session duration
|
|
- audit logs for all SSO events
|
|
|
|
### 8.7 Failure Cases
|
|
|
|
- bad mapping from IdP groups to app roles causes privilege escalation
|
|
- employee is disabled in IdP but app keeps old sessions alive too long
|
|
- email is used as unique identity key and later changes
|
|
- multiple IdPs or merged companies create ambiguous identity mapping
|
|
|
|
### SSO Best Practices
|
|
|
|
- use stable external subject identifiers, not just email
|
|
- model tenant-specific SSO config cleanly
|
|
- separate authentication trust from authorization mapping inside the app
|
|
- deprovision aggressively and revoke old sessions when identity status changes
|
|
|
|
---
|
|
|
|
## 9. Password Reset
|
|
|
|
Password reset is a high-risk recovery flow. Attackers love it because it often bypasses normal login defenses.
|
|
|
|
### 9.1 Secure Token Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
actor User
|
|
participant App
|
|
participant Auth as Auth Service
|
|
participant Users as User DB
|
|
participant Reset as Reset Token Store
|
|
participant Mail as Email Service
|
|
|
|
User->>App: Click "Forgot password"
|
|
App->>Auth: POST /password-reset
|
|
Auth->>Users: Lookup account
|
|
Auth->>Reset: Store hashed single-use token + expiry
|
|
Auth->>Mail: Send password reset link
|
|
Auth-->>User: Generic success response
|
|
User->>App: Open reset link
|
|
App->>Auth: POST /password-reset/confirm token + new password
|
|
Auth->>Reset: Validate token unused and unexpired
|
|
Auth->>Users: Update password hash
|
|
Auth->>Reset: Mark token used
|
|
Auth-->>App: Success + revoke other sessions
|
|
```
|
|
|
|
### 9.2 Why This Design Exists
|
|
|
|
Password reset has to be secure even if the attacker knows the user's email address. Therefore the reset token must be:
|
|
|
|
- hard to guess
|
|
- short-lived
|
|
- single-use
|
|
- revocable
|
|
|
|
Good systems also revoke active sessions or require re-authentication after password reset.
|
|
|
|
### 9.3 Attack Prevention
|
|
|
|
| Threat | Mitigation |
|
|
| --- | --- |
|
|
| Account enumeration | Return generic responses like "If an account exists, email sent" |
|
|
| Token guessing | Long random tokens, rate limits |
|
|
| Token replay | Single-use storage and invalidation |
|
|
| Email inbox compromise | Step-up verification for high-value actions after reset |
|
|
| Old session persistence | Revoke sessions after reset |
|
|
|
|
### 9.4 Practical Advice
|
|
|
|
- prefer opaque reset tokens over stuffing reset state into a long-lived JWT
|
|
- hash reset tokens at rest if you store them server-side
|
|
- keep TTL short, often 15 to 60 minutes depending on product sensitivity
|
|
- notify users when a reset is requested and completed
|
|
|
|
### Password Reset Failure Cases
|
|
|
|
- reset token is reusable
|
|
- old sessions remain active after password change
|
|
- reset endpoint leaks whether account exists
|
|
- support team bypasses the secure flow with weak manual procedures
|
|
|
|
---
|
|
|
|
## 10. Authorization Fundamentals
|
|
|
|
Authentication answers who the subject is. Authorization answers what that subject may do.
|
|
|
|
### 10.1 AuthN vs AuthZ
|
|
|
|
This distinction matters a lot.
|
|
|
|
| Question | Category |
|
|
| --- | --- |
|
|
| "Who are you?" | Authentication |
|
|
| "Are you allowed to do this?" | Authorization |
|
|
| "How sure are we?" | Authentication strength / assurance |
|
|
| "Why was access denied?" | Authorization decision and audit |
|
|
|
|
A user can be perfectly authenticated and still not be authorized.
|
|
|
|
### 10.2 Authorization Decision Shape
|
|
|
|
Every authZ decision is some variation of:
|
|
|
|
`Can subject S perform action A on resource R under context C?`
|
|
|
|
Where context may include:
|
|
|
|
- tenant
|
|
- time of day
|
|
- network zone
|
|
- device trust level
|
|
- MFA level
|
|
- resource ownership
|
|
- subscription plan
|
|
- legal region or data residency constraints
|
|
|
|
### 10.3 Enforcement Layers
|
|
|
|
Authorization can happen at multiple layers:
|
|
|
|
| Layer | Good for | Risk if overused |
|
|
| --- | --- | --- |
|
|
| API gateway | Coarse access checks, authentication, token validation | Too coarse for resource-specific rules |
|
|
| Service layer | Business-specific rules | Easy to duplicate logic across services |
|
|
| Data access layer | Row/tenant isolation, final enforcement | Hard to express all product rules here |
|
|
| Database native policies | Strong last line of defense in some systems | App logic can still drift if not modeled carefully |
|
|
|
|
A common mistake is doing all authorization only at the edge. Edge checks are useful, but most real product rules depend on resource-specific business logic deeper inside the system.
|
|
|
|
### 10.4 Policy Design
|
|
|
|
Good policy design balances three things:
|
|
|
|
- expressiveness
|
|
- debuggability
|
|
- operational simplicity
|
|
|
|
Ask these questions:
|
|
|
|
- who is the subject?
|
|
- what resource is being accessed?
|
|
- what action is requested?
|
|
- what context matters?
|
|
- who can change the policy?
|
|
- how do we explain and audit the decision?
|
|
|
|
### 10.5 Auditing and Explainability
|
|
|
|
Authorization is not just about allow or deny. In production you often need:
|
|
|
|
- reason codes
|
|
- which policy matched
|
|
- who granted the permission
|
|
- when the permission changed
|
|
- evidence for support, compliance, and incident response
|
|
|
|
This is why mature systems treat authorization as both a runtime path and a data model.
|
|
|
|
---
|
|
|
|
## 11. RBAC
|
|
|
|
RBAC stands for role-based access control. Permissions are grouped into roles, and subjects are assigned roles.
|
|
|
|
### 11.1 Why RBAC Exists
|
|
|
|
Without roles, you would assign individual permissions to every user. That becomes unmanageable quickly.
|
|
|
|
RBAC simplifies administration:
|
|
|
|
- `viewer`
|
|
- `editor`
|
|
- `admin`
|
|
- `billing_admin`
|
|
|
|
Instead of attaching dozens of permissions directly to users, you attach permissions to roles and roles to users.
|
|
|
|
### 11.2 Basic Model
|
|
|
|
| Entity | Example |
|
|
| --- | --- |
|
|
| Permission | `invoice.read`, `invoice.refund`, `workspace.invite` |
|
|
| Role | `support_agent`, `workspace_admin` |
|
|
| Assignment | User U has role R in tenant T |
|
|
|
|
Tenant scoping is critical. In multi-tenant SaaS, a user is rarely just "an admin" globally. They are usually an admin in a specific workspace or organization.
|
|
|
|
### 11.3 Enterprise Patterns
|
|
|
|
Common enterprise RBAC patterns include:
|
|
|
|
- global roles for platform staff
|
|
- tenant-scoped roles for customers
|
|
- custom roles for larger organizations
|
|
- group-to-role mapping from SSO IdP groups
|
|
|
|
### 11.4 The Role Explosion Problem
|
|
|
|
RBAC starts simple but can degrade into dozens or hundreds of roles:
|
|
|
|
- `viewer`
|
|
- `viewer_plus_export`
|
|
- `viewer_plus_export_plus_billing`
|
|
- `regional_admin_eu`
|
|
- `regional_admin_us`
|
|
|
|
This is role explosion.
|
|
|
|
It happens when RBAC is forced to encode too many contextual conditions that really belong in attributes or policies.
|
|
|
|
### 11.5 RBAC Tradeoffs
|
|
|
|
| Strength | Weakness |
|
|
| --- | --- |
|
|
| Easy to explain to users and admins | Coarse-grained for complex cases |
|
|
| Efficient at runtime | Can explode in number of roles |
|
|
| Works well for common SaaS admin patterns | Poor fit for dynamic context-heavy rules |
|
|
|
|
### RBAC Best Practices
|
|
|
|
- keep the base role set small
|
|
- scope roles by tenant, project, or resource container
|
|
- separate platform/internal staff roles from customer roles
|
|
- use RBAC for broad permissions and combine with finer policies when needed
|
|
|
|
---
|
|
|
|
## 12. ABAC
|
|
|
|
ABAC stands for attribute-based access control. Instead of only asking "What role does this user have?", ABAC asks about attributes of the subject, resource, and environment.
|
|
|
|
### 12.1 Why ABAC Exists
|
|
|
|
RBAC is often too static for real-world decisions like:
|
|
|
|
- support agent can view tickets only in their assigned region
|
|
- manager can approve expenses under a threshold for their own department
|
|
- user can access data only from a compliant device in an approved country
|
|
- payout release requires recent MFA and elevated risk score below threshold
|
|
|
|
These rules depend on context, not just role labels.
|
|
|
|
### 12.2 Dynamic Policy Evaluation
|
|
|
|
ABAC decisions may use attributes such as:
|
|
|
|
- subject department
|
|
- resource owner
|
|
- tenant subscription tier
|
|
- request IP or network zone
|
|
- device trust score
|
|
- current time or shift window
|
|
- MFA strength
|
|
|
|
Example policy idea:
|
|
|
|
"Allow refund approval if the subject role is finance_manager, the order belongs to the same merchant account, the refund amount is below the subject limit, and MFA was performed in the last 10 minutes."
|
|
|
|
### 12.3 Policy Engines
|
|
|
|
ABAC often benefits from a dedicated policy engine because hardcoding many dynamic rules directly into services becomes brittle.
|
|
|
|
Common approaches:
|
|
|
|
- custom rules in application code
|
|
- centralized policy engine such as OPA/Rego
|
|
- cloud-style policy systems such as Cedar-like models
|
|
- relationship and graph-based systems for object access patterns
|
|
|
|
### 12.4 ABAC Tradeoffs
|
|
|
|
| Strength | Weakness |
|
|
| --- | --- |
|
|
| Expressive and context-aware | Harder to explain and debug |
|
|
| Reduces role explosion | Requires clean attribute sources |
|
|
| Good for fine-grained enterprise control | Runtime evaluation can be more expensive |
|
|
|
|
### 12.5 Practical Use
|
|
|
|
Many production systems do not choose "RBAC or ABAC". They combine them:
|
|
|
|
- RBAC gives the broad lane
|
|
- ABAC applies contextual restrictions inside that lane
|
|
|
|
Example:
|
|
|
|
- role says user may edit invoices
|
|
- ABAC rule says only for their tenant, below approval threshold, and only after MFA for high-value invoices
|
|
|
|
### ABAC Failure Cases
|
|
|
|
- attributes are stale or inconsistently sourced across services
|
|
- policies become unreadable and impossible to reason about
|
|
- caching hides recent attribute changes like department moves or suspensions
|
|
|
|
---
|
|
|
|
## 13. Permissions and Access Control
|
|
|
|
Permissions are the actual capabilities a subject has. Access control is the mechanism that enforces those permissions correctly and consistently.
|
|
|
|
### 13.1 Permission Models
|
|
|
|
Common permission models include:
|
|
|
|
| Model | Mental model | Example |
|
|
| --- | --- | --- |
|
|
| RBAC | Roles map to permissions | Workspace admin |
|
|
| ACL | Resource has a list of allowed subjects | Shared document editable by Alice and Bob |
|
|
| ABAC | Decision based on attributes | Region and MFA aware access |
|
|
| ReBAC | Decision based on relationships | User is member of team that owns repo |
|
|
| Capability/token-based | Possession of unforgeable capability grants access | Signed download URL |
|
|
|
|
In modern systems, multiple models often coexist.
|
|
|
|
GitHub is a good mental example:
|
|
|
|
- org and team membership look like RBAC/ReBAC
|
|
- repo-specific collaborator lists look like ACLs
|
|
- fine product actions are individual permissions
|
|
|
|
### 13.2 Inheritance
|
|
|
|
Permissions often inherit down a hierarchy:
|
|
|
|
- org -> workspace -> project -> resource
|
|
- folder -> document
|
|
- account -> sub-account
|
|
|
|
Inheritance is useful, but easy to get wrong.
|
|
|
|
Questions to design explicitly:
|
|
|
|
- do child resources inherit all parent permissions?
|
|
- can child permissions override parent permissions?
|
|
- are denies supported, and if so do they take precedence?
|
|
- how do you compute effective permissions efficiently?
|
|
|
|
### 13.3 Auditing
|
|
|
|
You need to know:
|
|
|
|
- who granted access
|
|
- when access changed
|
|
- who accessed a resource
|
|
- why a decision was allowed or denied
|
|
|
|
Auditing matters for:
|
|
|
|
- customer support
|
|
- security investigations
|
|
- compliance
|
|
- admin trust
|
|
|
|
### 13.4 Enforcement Patterns
|
|
|
|
There are two common implementation patterns.
|
|
|
|
#### Pattern A: Embedded authorization in each service
|
|
|
|
Pros:
|
|
|
|
- low latency
|
|
- business context close to the resource
|
|
|
|
Cons:
|
|
|
|
- duplicated rules across services
|
|
- inconsistent decisions and audit semantics
|
|
|
|
#### Pattern B: Centralized authorization service or policy engine
|
|
|
|
Pros:
|
|
|
|
- consistency
|
|
- shared policy language
|
|
- central auditing and explainability
|
|
|
|
Cons:
|
|
|
|
- added network hop
|
|
- dependency on a central service
|
|
- need good caching and fallback behavior
|
|
|
|
### 13.5 Centralized Auth Service and Policy Caching
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
Request[Authenticated Request] --> Service[Business Service]
|
|
Service --> Cache[Policy Cache]
|
|
Cache -->|cache miss| Authz[Central Authorization Service]
|
|
Authz --> PDP[Policy Engine]
|
|
Authz --> Attrs[Attribute / Relationship Data]
|
|
PDP --> Decision[Allow / Deny + Reason]
|
|
Decision --> Audit[Audit Log]
|
|
Decision --> Service
|
|
```
|
|
|
|
Caching is often necessary, but introduces staleness risks. Common techniques:
|
|
|
|
- short TTL caches for policy decisions
|
|
- versioned policy snapshots
|
|
- event-driven invalidation on membership or role changes
|
|
- cache only stable intermediate data, not final decisions, in high-risk systems
|
|
|
|
### 13.6 Policy Caching Tradeoffs
|
|
|
|
| Benefit | Cost |
|
|
| --- | --- |
|
|
| Lower latency | Stale authorization decisions |
|
|
| Lower policy service load | Harder revocation semantics |
|
|
| Better resilience during partial outage | Risk of fail-open or fail-stale behavior |
|
|
|
|
Design question: when policy service is down, do you fail closed or fail open?
|
|
|
|
- fail closed is safer but can hurt availability
|
|
- fail open preserves availability but may violate security
|
|
|
|
For high-risk actions, fail closed is usually the right answer.
|
|
|
|
### Access Control Best Practices
|
|
|
|
- enforce tenant isolation early and repeatedly
|
|
- keep policy decisions explainable
|
|
- separate authentication claims from live authorization state when permissions change frequently
|
|
- audit all admin and permission-management actions
|
|
- do not trust internal network location as a permission model
|
|
|
|
---
|
|
|
|
## 14. Service-to-Service Authentication
|
|
|
|
User authentication is only half of production security. Modern backends also need to authenticate services to each other.
|
|
|
|
### 14.1 Why Internal Service Authentication Exists
|
|
|
|
In microservice systems, one request may pass through many services:
|
|
|
|
- edge/API gateway
|
|
- auth service
|
|
- order service
|
|
- payment service
|
|
- notification service
|
|
|
|
If internal calls are trusted just because they are "inside the VPC", a compromised service can impersonate others too easily.
|
|
|
|
This is why zero-trust principles matter internally too.
|
|
|
|
### 14.2 Service Identity
|
|
|
|
A service needs its own identity, just like a user does.
|
|
|
|
Examples:
|
|
|
|
- `payments-service.prod`
|
|
- `orders-service.eu-west-1`
|
|
- workload identity bound to a Kubernetes service account
|
|
|
|
A strong service identity system lets the platform answer:
|
|
|
|
- which service is calling?
|
|
- is it the real deployed workload?
|
|
- is it allowed to call this destination?
|
|
|
|
### 14.3 mTLS Basics
|
|
|
|
Mutual TLS means both sides authenticate each other during the TLS handshake.
|
|
|
|
Benefits:
|
|
|
|
- encryption in transit
|
|
- client and server authentication
|
|
- strong cryptographic service identity
|
|
|
|
Typical pattern:
|
|
|
|
- internal CA issues short-lived certificates to workloads
|
|
- service presents client cert on outbound call
|
|
- destination validates issuer and identity
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant A as Service A
|
|
participant CA as Internal CA / Identity System
|
|
participant B as Service B
|
|
|
|
A->>CA: Request workload certificate
|
|
CA-->>A: Short-lived cert
|
|
A->>B: TLS handshake + client cert
|
|
B->>B: Verify cert, issuer, SAN, expiry
|
|
B-->>A: Authenticated secure channel
|
|
A->>B: Application request
|
|
B-->>A: Response
|
|
```
|
|
|
|
### 14.4 Short-Lived Credentials
|
|
|
|
Short-lived credentials are a major production best practice.
|
|
|
|
Why?
|
|
|
|
- if stolen, they expire quickly
|
|
- less need for manual secret rotation
|
|
- better fit with workload identity and automation
|
|
|
|
This pattern shows up in:
|
|
|
|
- cloud IAM temporary credentials
|
|
- Kubernetes workload identity
|
|
- service mesh certificates
|
|
- internal token minting systems
|
|
|
|
### 14.5 Zero Trust Basics
|
|
|
|
Zero trust does not mean trust nothing blindly forever. It means:
|
|
|
|
- do not grant access solely based on network location
|
|
- verify identity continuously and explicitly
|
|
- enforce least privilege
|
|
- assume compromise is possible and reduce blast radius
|
|
|
|
Google's public BeyondCorp ideas are the canonical mental model here: access should depend on identity, device state, and policy, not on whether traffic comes from "inside the office network".
|
|
|
|
### 14.6 Service-to-Service Authorization
|
|
|
|
Authentication tells you that a caller is `payments-service`. Authorization must still decide whether `payments-service` may:
|
|
|
|
- read card metadata
|
|
- call refund APIs
|
|
- publish to a payout topic
|
|
- access a particular database table
|
|
|
|
This is often implemented with:
|
|
|
|
- service identity plus policy
|
|
- SPIFFE-like identity patterns
|
|
- service mesh policy
|
|
- signed internal tokens with audience restrictions
|
|
|
|
### Service Auth Failure Cases
|
|
|
|
- long-lived shared secrets copied across many services
|
|
- no certificate rotation automation
|
|
- any internal service can call any other service
|
|
- internal service trusts caller-provided headers like `X-User-Id` without verification
|
|
- service identity is authenticated but not authorized
|
|
|
|
### Service Auth Best Practices
|
|
|
|
- use workload or service identity, not shared static secrets where possible
|
|
- prefer short-lived credentials and automatic rotation
|
|
- bind end-user identity propagation carefully when needed
|
|
- separate service identity from end-user identity in request context
|
|
|
|
---
|
|
|
|
## 15. How These Systems Fit Together
|
|
|
|
A strong interview answer connects all the pieces into one architecture.
|
|
|
|
### 15.1 Typical SaaS Architecture
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
User[Browser / Mobile App] --> Edge[Edge / API Gateway]
|
|
Edge --> Auth[Auth Service]
|
|
Auth --> UserDB[(User Directory)]
|
|
Auth --> Session[(Session Store / Token Store)]
|
|
Auth --> Keys[KMS / Signing Keys]
|
|
Edge --> App[Business Services]
|
|
App --> Authz[Authorization Service / Policy Engine]
|
|
Authz --> Perms[(Roles, Relationships, Attributes)]
|
|
App --> Data[(Application Data)]
|
|
Auth --> Audit[Security Audit Log]
|
|
Authz --> Audit
|
|
App --> Audit
|
|
```
|
|
|
|
The key idea is separation of concerns:
|
|
|
|
- auth service proves identity and issues continuity artifacts
|
|
- session/token store manages continuity and revocation
|
|
- policy engine decides access
|
|
- application services enforce business operations
|
|
- audit pipeline records security-relevant facts
|
|
|
|
### 15.2 Consumer SaaS vs Enterprise SaaS vs Internal Platform
|
|
|
|
| Environment | Identity priorities |
|
|
| --- | --- |
|
|
| Consumer SaaS | Signup conversion, password recovery, abuse prevention, social login |
|
|
| Enterprise SaaS | SSO, provisioning, group mapping, auditability, tenant admin controls |
|
|
| Internal platform | Service identity, zero trust, least privilege, strong device posture |
|
|
|
|
### 15.3 Data Freshness vs Statelessness
|
|
|
|
One of the deepest identity design tradeoffs is this:
|
|
|
|
- stateless verification is fast and scalable
|
|
- fresh authorization state often requires looking up server-side data
|
|
|
|
That is why many mature architectures mix the two:
|
|
|
|
- token or session for authentication continuity
|
|
- live policy check for sensitive authorization
|
|
|
|
### 15.4 Tenant Isolation
|
|
|
|
For SaaS systems, tenant isolation must be explicit in both authentication and authorization.
|
|
|
|
Common patterns:
|
|
|
|
- include tenant membership in auth/session context
|
|
- scope roles by tenant
|
|
- enforce tenant filters in service and data layers
|
|
- audit cross-tenant admin actions aggressively
|
|
|
|
This is especially important in systems like GitHub organizations, Stripe connected accounts, or enterprise SaaS workspaces.
|
|
|
|
---
|
|
|
|
## 16. Real-World Patterns and Company Examples
|
|
|
|
These examples are useful as mental anchors, not as exact internal blueprints.
|
|
|
|
### Google
|
|
|
|
- Public Google identity and OIDC flows are a classic example of large-scale federated identity.
|
|
- Google's public BeyondCorp ideas are foundational for zero-trust access.
|
|
- Zanzibar is the famous reference point for large-scale, relationship-aware authorization.
|
|
|
|
Interview lesson: centralized authorization models can work at huge scale if the data model, caching, and consistency story are designed carefully.
|
|
|
|
### Netflix
|
|
|
|
- Netflix-style service-rich environments highlight the need for service identity, short-lived credentials, and resilient internal auth patterns.
|
|
- Streaming and control plane workloads also show why identity systems must stay available under very high traffic.
|
|
|
|
Interview lesson: internal service auth is not optional in large microservice systems.
|
|
|
|
### Uber
|
|
|
|
- Ride-sharing and marketplace architectures depend on strict service-to-service permissions, real-time risk checks, and strong tenant/user context propagation.
|
|
- Payment, dispatch, driver, and rider services cannot safely trust each other based only on network placement.
|
|
|
|
Interview lesson: identity context often flows through many services and must remain verifiable.
|
|
|
|
### Amazon
|
|
|
|
- AWS IAM is the public archetype for policy-heavy authorization with users, roles, resource policies, temporary credentials, and least privilege.
|
|
|
|
Interview lesson: enterprise-grade authorization is really a policy and identity modeling problem, not just a list of roles.
|
|
|
|
### GitHub
|
|
|
|
- GitHub demonstrates a mix of organization membership, teams, repository roles, OAuth apps, GitHub Apps, personal access tokens, and enterprise SSO.
|
|
|
|
Interview lesson: one product often needs several identity and authorization models at the same time.
|
|
|
|
### Stripe
|
|
|
|
- Stripe is a useful example for strong dashboard authentication, MFA, API keys, restricted keys, OAuth for Connect-style platforms, and careful access around money movement.
|
|
|
|
Interview lesson: high-risk actions need stronger auth, auditability, and granular permissions than low-risk read-only actions.
|
|
|
|
### Typical SaaS Systems
|
|
|
|
Most B2B SaaS products end up combining:
|
|
|
|
- email/password and social login for self-serve customers
|
|
- SSO for enterprise customers
|
|
- RBAC for admin/editor/viewer patterns
|
|
- ABAC or policy rules for sensitive workflows
|
|
- API tokens or OAuth for integrations
|
|
- service identity for microservices
|
|
|
|
---
|
|
|
|
## 17. Interview Discussion Guide
|
|
|
|
If asked to design identity and access for a backend system, structure your answer progressively.
|
|
|
|
### 17.1 Clarifying Questions
|
|
|
|
Ask:
|
|
|
|
- who are the subjects: end users, admins, services, partners?
|
|
- is this consumer, enterprise, or internal platform?
|
|
- are we designing login, third-party integration, or internal access control?
|
|
- what is the risk level: social app, fintech, healthcare, developer platform?
|
|
- do we need SSO, API access, or both?
|
|
- how fresh must revocation and permission changes be?
|
|
|
|
### 17.2 Good Interview Structure
|
|
|
|
1. Define identities and trust boundaries.
|
|
2. Choose authentication mechanism.
|
|
3. Choose continuity mechanism: session or token.
|
|
4. Design authorization model.
|
|
5. Address recovery, revocation, audit, and failure cases.
|
|
6. Address scale, caching, key rotation, and multi-region concerns.
|
|
|
|
### 17.3 Common Interview Comparisons
|
|
|
|
#### Sessions vs JWT
|
|
|
|
Use when the interviewer asks about stateful vs stateless auth.
|
|
|
|
| Question | Sessions answer | JWT answer |
|
|
| --- | --- | --- |
|
|
| Need instant logout? | Strong | Harder |
|
|
| Need local verification across services? | Weaker | Strong |
|
|
| Web app simplicity? | Often simpler | Often overcomplicated |
|
|
| Third-party API ecosystem? | Less natural | Better fit |
|
|
|
|
#### RBAC vs ABAC
|
|
|
|
| Question | RBAC | ABAC |
|
|
| --- | --- | --- |
|
|
| Easy admin mental model | Strong | Weaker |
|
|
| Fine-grained contextual rules | Weak | Strong |
|
|
| Risk of role explosion | High | Lower |
|
|
| Ease of debugging | Stronger | Harder |
|
|
|
|
#### SAML vs OIDC
|
|
|
|
| Question | SAML | OIDC |
|
|
| --- | --- | --- |
|
|
| Enterprise legacy support | Strong | Strong but varies |
|
|
| Modern web/mobile friendliness | Weaker | Strong |
|
|
| Developer ergonomics | Heavier | Better |
|
|
|
|
### 17.4 Scaling Considerations to Mention
|
|
|
|
- Redis or equivalent for shared sessions
|
|
- multi-region token verification and key distribution
|
|
- policy caching with invalidation strategy
|
|
- short-lived credentials for services
|
|
- audit event pipelines decoupled from critical-path latency
|
|
- abuse protection on login and signup
|
|
|
|
### 17.5 Failure Cases Worth Calling Out
|
|
|
|
- auth service outage blocks all logins
|
|
- Redis session store failure logs users out or prevents validation
|
|
- stale permissions cached after role removal
|
|
- signing key rotation breaks old verifiers
|
|
- refresh token theft leads to silent session hijack
|
|
|
|
Interview tip: explicitly talking about revocation, rotation, and failure handling is often what moves an answer from junior to strong mid-level or senior.
|
|
|
|
---
|
|
|
|
## 18. Common Mistakes and Best Practices
|
|
|
|
### Common Mistakes
|
|
|
|
- treating authentication and authorization as the same problem
|
|
- storing passwords with fast hashes
|
|
- putting too much trust in long-lived JWTs
|
|
- assuming "internal network" means trusted caller
|
|
- forgetting logout, revocation, and recovery flows
|
|
- doing authorization only at the gateway
|
|
- using email as the only durable identity key in enterprise federation
|
|
- failing to audit permission changes and admin actions
|
|
- building role systems that cannot express tenant or resource scope
|
|
|
|
### Best Practices
|
|
|
|
- separate identity proof, session/token continuity, and authorization policy clearly
|
|
- use slow password hashing and protect high-value secrets with KMS/HSM support
|
|
- prefer MFA and step-up authentication for sensitive actions
|
|
- keep access tokens short-lived and refresh tokens protected and rotated
|
|
- model tenant-aware roles and permissions explicitly
|
|
- centralize policy where consistency matters, but understand cache staleness tradeoffs
|
|
- use service identity and short-lived credentials internally
|
|
- build auditability and explainability into the system from the beginning
|
|
|
|
### Final Mental Model
|
|
|
|
If you remember one thing, remember this:
|
|
|
|
Identity and access is not one feature. It is a chain of connected systems:
|
|
|
|
- identity proof
|
|
- credential management
|
|
- session or token continuity
|
|
- authorization policy
|
|
- revocation and recovery
|
|
- service identity
|
|
- auditing and operations
|
|
|
|
Real systems succeed when all of these parts are designed together.
|
|
|
|
If one weak link exists, attackers and outages will find it.
|
|
|
|
---
|
|
|
|
## Quick Review Checklist
|
|
|
|
Use this when revising for interviews.
|
|
|
|
- Can I clearly explain AuthN vs AuthZ?
|
|
- Do I know when to use sessions vs JWTs?
|
|
- Can I explain password hashing, salts, peppers, and MFA tradeoffs?
|
|
- Can I walk through OAuth authorization code flow with PKCE?
|
|
- Can I explain SAML vs OIDC and IdP vs SP?
|
|
- Can I compare RBAC and ABAC with examples?
|
|
- Can I describe revocation, logout, token rotation, and password reset securely?
|
|
- Can I explain service-to-service auth, mTLS, and zero trust?
|
|
- Can I describe where policy enforcement should happen in a real system?
|
|
|
|
If the answer is yes to those questions, your identity and access fundamentals are strong enough for most software engineering interview discussions and practical backend design conversations.
|