Files
tarun-elango 26810e43d0 sd text
2026-04-26 13:27:19 -04:00

58 KiB

2. Identity & Access

Identity and access is the control plane for nearly every backend system. It answers four questions for every request:

  1. Who is calling?
  2. How do we know they are really that caller?
  3. What are they allowed to do right now?
  4. How do we prove later that the decision was correct?

If you understand identity and access well, you can reason about login systems, sessions, JWTs, OAuth integrations, enterprise SSO, authorization policies, service-to-service security, and zero-trust architecture as one connected system rather than as isolated buzzwords.

This guide is written for two goals at the same time:

  • interview preparation
  • real-world backend and system design understanding

The emphasis is practical. The goal is not to memorize definitions, but to understand why these systems exist, how they fail, and how production systems are actually built.


Table of Contents

  1. Why Identity & Access Exists
  2. Core Concepts and Mental Model
  3. Authentication Fundamentals
  4. Login and Signup
  5. Sessions
  6. JWT and Token-Based Authentication
  7. OAuth
  8. SSO: SAML and OIDC
  9. Password Reset
  10. Authorization Fundamentals
  11. RBAC
  12. ABAC
  13. Permissions and Access Control
  14. Service-to-Service Authentication
  15. How These Systems Fit Together
  16. Real-World Patterns and Company Examples
  17. Interview Discussion Guide
  18. Common Mistakes and Best Practices

1. Why Identity & Access Exists

Most systems are multi-user, multi-device, multi-service, and increasingly multi-tenant. Without identity and access controls, the backend has no safe way to distinguish:

  • one user from another
  • a user from an attacker
  • an employee from a customer
  • a production service from a compromised internal service
  • a legitimate action from a replayed or forged request

At small scale, identity and access looks like a login form plus a password check. At production scale, it becomes much bigger:

  • account creation and identity proofing
  • credential storage and recovery
  • MFA and risk detection
  • sessions and token lifecycle management
  • delegated access via OAuth
  • enterprise federation via SSO
  • role and policy evaluation
  • service identity inside microservices
  • auditing, revocation, key rotation, and incident response

The reason interviews ask about identity and access so often is simple: it touches security, data modeling, distributed systems, product tradeoffs, and failure handling all at once.

The Core Tension

Identity systems always balance three goals:

Goal What it means Why it is hard
Security Prevent impersonation and unauthorized access Stronger security usually adds friction
Usability Let real users sign in quickly and recover safely Easier flows are often easier to abuse
Scalability Support huge traffic, many services, and many tenants Distributed state and revocation become harder

An excellent backend engineer treats identity not as a feature checkbox, but as a reliability and security subsystem.


2. Core Concepts and Mental Model

Before discussing flows, build the right mental model.

Important Terms

Term Meaning Practical intuition
Identity The subject being represented A user, admin, device, service, or organization
Authentication (AuthN) Verifying who the subject is "Prove you are Alice"
Authorization (AuthZ) Deciding what that subject may do "Can Alice read invoice 123?"
Session Server-recognized authenticated continuity over time "This browser remains logged in"
Access token Credential presented to APIs Often short-lived
Refresh token Credential used to obtain new access tokens More sensitive than access tokens
Identity Provider (IdP) System that authenticates identities Google, Okta, Azure AD
Service Provider / Relying Party App that trusts the IdP Your SaaS product
Policy engine Evaluates access rules RBAC, ABAC, ReBAC, custom rules
Audit log Immutable trail of security-relevant events Needed for forensics and compliance

One Request Through the System

sequenceDiagram
	actor User
	participant Client
	participant Edge as API Gateway / Edge
	participant Auth as Auth Service
	participant Policy as Policy Engine
	participant App as Business Service
	participant Data as Data Store

	User->>Client: Click "View invoice"
	Client->>Edge: GET /invoices/123 + cookie/token
	Edge->>Auth: Validate session/token
	Auth-->>Edge: subject, tenant, auth strength, claims
	Edge->>Policy: Can subject read invoice 123?
	Policy-->>Edge: allow/deny + reason
	Edge->>App: Forward authenticated request
	App->>Data: Load resource
	Data-->>App: Resource data
	App-->>Client: 200 OK or 403 Forbidden

This is the simplest correct mental model:

  • authentication establishes identity
  • authorization evaluates permissions for the requested action
  • business logic executes only after those checks
  • the decision should be observable and auditable

A Production Identity Stack

In a real system, identity and access usually spans these components:

Component Typical responsibility
Auth service Login, signup, password verification, MFA, token issuance
User directory Users, credentials metadata, verification state, tenant membership
Session store Server-side sessions and revocation state
Token service Access token and refresh token lifecycle
Policy engine Role/attribute-based access decisions
Key management Signing keys, encryption keys, secret rotation
Audit pipeline Security events, admin actions, login failures, policy decisions
Risk engine Rate limits, device reputation, fraud checks, anomaly detection

Interview shortcut: if you can clearly separate authentication, session/token management, and authorization, you already sound more senior than candidates who collapse them into one vague "auth layer".


3. Authentication Fundamentals

Authentication is the process of verifying identity claims. The claim is usually, "I am user X" or "I am service Y".

3.1 Identity Verification Basics

Authentication depends on evidence. The most common categories are:

Factor Example Strengths Weaknesses
Something you know Password, PIN Familiar, cheap Can be guessed, phished, reused
Something you have Phone, authenticator app, hardware key Stronger than passwords alone Device loss, recovery complexity
Something you are Fingerprint, Face ID Convenient on-device UX Biometric recovery and privacy concerns

Important nuance: many systems do not verify a human's real-world identity. They verify control over a credential. For example:

  • password login verifies knowledge of a password
  • email verification verifies access to an inbox
  • TOTP verifies possession of a seed-bound authenticator
  • passkeys verify possession of a private key and user presence

That is why identity systems often talk about assurance levels rather than absolute truth.

3.2 Identifiers vs Authenticators

Two concepts are often mixed up:

  • an identifier tells the system which subject is being referenced
  • an authenticator proves control over that identity

Examples:

  • alice@example.com is an identifier
  • the password, passkey, or OAuth login is the authenticator

Production systems often support multiple identifiers for the same user:

  • email
  • username
  • phone number
  • enterprise SSO subject ID
  • internal immutable user ID

Best practice: use a stable internal user ID as the true primary key, even if the login identifier changes.

3.3 Credential Storage

This is one of the most common interview topics because it separates surface-level knowledge from real engineering understanding.

Never store plaintext passwords

If a database leak reveals plaintext passwords, the incident is catastrophic. Attackers will also try the same passwords on other services because users reuse credentials.

Store password hashes, not passwords

The flow is:

  1. User submits password.
  2. Server generates a per-user salt.
  3. Server applies a slow password hashing algorithm.
  4. Server stores the resulting hash and metadata.
  5. On login, the server recomputes and compares.

Good password hashing algorithms are intentionally expensive. That is the point. They make offline brute force attacks slower.

Algorithm Typical status Why it matters
Argon2id Best modern default Memory-hard and resistant to GPU attacks
bcrypt Still common and acceptable Widely supported, battle-tested
PBKDF2 Common in legacy and regulated systems Safer than fast hashes, but less ideal than Argon2id
SHA-256 / MD5 alone Unsafe for password storage Too fast, easy to brute force

Salt and Pepper

Mechanism Purpose
Salt Unique random value per password; prevents rainbow-table reuse
Pepper Extra secret held outside the user table, often in KMS/HSM; raises attack cost after DB leaks

Practical Storage Pattern

  • store algorithm name and parameters with the hash
  • use constant-time comparison to reduce timing leakage
  • rehash on login when old parameters are outdated
  • keep password policy reasonable; massive composition rules often lead to weaker behavior

Interview depth point

If an interviewer asks, "Why use bcrypt or Argon2 instead of SHA-256?", the real answer is not just "because it is more secure". The real answer is:

  • password databases are often attacked offline after leaks
  • attackers can run billions of SHA-256 hashes quickly
  • slow, memory-hard algorithms make each guess expensive
  • cost parameters can be tuned as hardware improves

3.4 MFA Basics

Multi-factor authentication exists because passwords are a weak single point of failure.

Common MFA methods:

Method Security level Practical notes
SMS OTP Low to medium Vulnerable to SIM swap and phishing
Email OTP Low Better than nothing, but email is often the same recovery channel
TOTP app Medium Common and cheap; still phishable
Push approval Medium Good UX, but push fatigue attacks exist
WebAuthn / passkeys / hardware keys High Strong phishing resistance

Production systems often use risk-based MFA rather than always prompting:

  • new device
  • new geography
  • impossible travel
  • admin action
  • payout or billing change
  • password reset or recovery event

This is called step-up authentication.

Recovery Matters

Many teams design MFA setup but forget MFA recovery. Good systems provide:

  • recovery codes
  • alternate authenticators
  • carefully controlled support workflows

The recovery flow is often more attackable than the MFA flow itself.

3.5 Email Verification

Email verification usually proves inbox control, not human identity. It exists to:

  • reduce fake or mistyped accounts
  • ensure password reset reachability
  • protect downstream systems from garbage identities
  • support trust in notifications, billing, and invites

Good implementation details:

  • generate a random, single-use token
  • store only a hash of the token server-side if possible
  • apply a short TTL
  • invalidate older outstanding verification tokens after a new one is issued
  • avoid leaking whether the account exists during resend flows

3.6 Device Trust

Device trust tries to answer, "Is this a previously seen, low-risk device?"

Typical signals:

  • long-lived device cookie
  • browser fingerprinting or device metadata
  • last successful MFA on that device
  • IP reputation and ASN patterns
  • OS or app attestation on mobile

Device trust is useful, but dangerous if over-trusted. Devices are compromiseable. Cookies can be stolen. Browsers change. Treat device trust as a risk signal, not a source of truth.

Authentication Failure Cases

  • weak password hashing leads to offline cracking after DB leaks
  • email verification links are reusable or never expire
  • MFA recovery bypasses stronger checks
  • account enumeration leaks whether an email exists
  • social login accounts are linked incorrectly to existing local accounts
  • device trust becomes an authorization shortcut instead of a risk signal

Authentication Best Practices

  • prefer Argon2id or bcrypt for passwords
  • rate-limit login, signup, reset, and verification endpoints
  • use MFA for privileged users and step-up auth for sensitive actions
  • log auth events with context, but never log secrets or raw passwords
  • design credential rotation and recovery before launch, not after an incident

4. Login and Signup

Signup and login flows are the public entry points to your system. They are also some of the most attacked endpoints you will ever run.

4.1 Signup Flow

sequenceDiagram
	actor User
	participant Browser
	participant Auth as Auth API
	participant Risk as Risk / Abuse Service
	participant Users as User DB
	participant Mail as Email Service
	participant Session as Session Store

	User->>Browser: Submit email + password
	Browser->>Auth: POST /signup
	Auth->>Risk: Check IP, velocity, disposable email, device
	Risk-->>Auth: risk score / allow / challenge
	Auth->>Users: Create pending account + password hash
	Auth->>Mail: Send verification link
	Mail-->>User: Verification email
	User->>Browser: Click link
	Browser->>Auth: GET /verify?token=...
	Auth->>Users: Mark email verified
	Auth->>Session: Create session
	Auth-->>Browser: Set secure auth cookie

What actually happens in production

A robust signup flow usually includes:

  1. Input normalization Normalize email casing rules carefully, trim whitespace, reject obvious malformed values.
  2. Abuse screening IP reputation, rate limits, disposable email detection, CAPTCHA when needed, device velocity, and signup bursts by network.
  3. Account creation state Many systems create users in a pending_verification state first.
  4. Email verification The account may exist but have limited capabilities until verified.
  5. Bootstrap domain objects For SaaS, create workspace, tenant, default role, billing state, and onboarding tasks.
  6. Initial session issuance Some systems log the user in immediately after verification. Others require explicit login.

Why pending state matters

If you create fully active accounts before verification, you may end up with:

  • abandoned fake tenants
  • spammed invites or API abuse
  • polluted analytics and billing pipelines

4.2 Login Flow

The login flow is simpler than signup conceptually, but much more operationally sensitive.

Common steps:

  1. Identify account by email/username/federated ID.
  2. Fetch credential metadata and account status.
  3. Verify password or federated assertion.
  4. Evaluate account risk and MFA policy.
  5. Create session or issue tokens.
  6. Log success or failure for audit and anomaly detection.

A production login decision often depends on more than a password:

  • account locked or disabled?
  • tenant suspended?
  • email verified?
  • MFA enrolled?
  • device known?
  • unusual geography?
  • refresh token family compromised?

4.3 Signup Verification and Fraud Prevention Basics

Fraud prevention is not just a payments problem. Identity systems are abused for:

  • spam account creation
  • credential stuffing
  • promo abuse
  • referral fraud
  • fake trial creation
  • scraping and automated signups

Basic but effective controls:

Control What it helps with
Rate limiting by IP and identifier Brute force and signup bursts
Device and IP reputation Known bad networks and bots
CAPTCHA or challenge step-up Automated abuse at suspicious thresholds
Email domain heuristics Disposable inboxes, typo domains
Phone verification for high-risk cases Raises attacker cost
Idempotency keys on signup APIs Retry safety without duplicate accounts

Interview point: fraud controls are part of auth architecture because attackers do not politely separate "security" from "growth" endpoints.

4.4 Social Login Considerations

"Login with Google" or "Login with GitHub" improves user experience, but introduces federation complexity.

Benefits:

  • no local password to manage
  • faster onboarding
  • higher conversion for some user segments

Risks and edge cases:

  • provider outage affects sign-in
  • incorrect account linking can cause account takeover
  • email from provider may be unverified or not globally unique in the way you assume
  • enterprise customers may not want personal social identities linked to business workspaces

Best practice for account linking:

  • if a social identity is new, do not blindly attach it to a local account just because the email matches
  • require proof of control or signed-in confirmation before linking to an existing account

4.5 Onboarding Architecture

Signup is not just about auth. It often triggers business setup:

  • create personal or team workspace
  • assign owner role
  • seed settings and notification preferences
  • create billing customer object
  • publish analytics and onboarding events

This makes signup a distributed workflow. Real systems often handle it with:

  • synchronous creation for the minimum needed to log in
  • async events for non-critical setup
  • idempotent consumers to avoid duplicate workspaces or billing objects

Login and Signup Failure Cases

  • verification emails delayed or blocked, leaving users in limbo
  • duplicate accounts created because signup is not idempotent
  • support team manually verifies accounts in insecure ways
  • social and password accounts merge incorrectly
  • signup path leaks which emails already exist

Login and Signup Best Practices

  • keep the critical path small and reliable
  • separate abuse checks from core credential logic, but make them part of the final decision
  • use generic error messages externally and detailed audit logs internally
  • make signup and login events observable with metrics and tracing

5. Sessions

Sessions are the classic way to keep users logged in across multiple HTTP requests.

5.1 What a Session Really Is

A session means the server has already authenticated the user and stores an authenticated state keyed by a session identifier.

Typical flow:

  1. User logs in successfully.
  2. Server creates a session record.
  3. Server sends the client a session ID in a cookie.
  4. Client sends the cookie on future requests.
  5. Server looks up session state and reconstructs identity.

5.2 Server-Side Sessions

In server-side session architecture, the browser usually only stores an opaque identifier.

Example session data:

  • user ID
  • tenant ID
  • auth strength or MFA state
  • issued time and last activity time
  • device metadata
  • CSRF-related state

Advantages:

  • easy revocation
  • easy logout across devices
  • server fully controls state
  • easy to add security flags or session versioning

Disadvantages:

  • needs a session store lookup
  • requires shared state across app instances
  • harder to scale if poorly designed

5.3 Redis-Backed Sessions

Redis is a very common session backend because it is fast, supports TTL, and works well as shared ephemeral state.

flowchart LR
	Browser[Browser with secure cookie] --> LB[Load Balancer]
	LB --> App1[App Instance A]
	LB --> App2[App Instance B]
	App1 --> Redis[(Redis Session Store)]
	App2 --> Redis
	Redis --> Audit[Audit / Security Events]

Why Redis is popular for sessions:

  • low-latency reads and writes
  • TTL expiration built in
  • simple key-value model
  • easy fit for horizontally scaled app fleets

Scaling considerations:

  • shard or cluster if session volume is high
  • replicate carefully; understand failover and session loss behavior
  • monitor hot keys and uneven access patterns
  • decide whether to refresh TTL on every request or on a sliding window

Session security depends heavily on cookie configuration.

Cookie attribute Why it matters
HttpOnly Prevents JavaScript from reading the cookie, reducing XSS impact
Secure Sends cookie only over HTTPS
SameSite=Lax/Strict Reduces CSRF risk from cross-site requests
Domain scoping Prevents unintended subdomain sharing
Path scoping Limits where the cookie is sent
Expiry / Max-Age Controls session persistence

Important nuance:

  • HttpOnly helps against token theft by frontend JavaScript
  • SameSite helps against CSRF
  • neither one fixes everything if the app has deeper logic flaws

5.5 Session Invalidation

Session invalidation is one reason server-side sessions remain attractive.

You can revoke sessions when:

  • user logs out
  • password changes
  • MFA is reset
  • admin disables the account
  • suspicious activity is detected

Common implementation patterns:

  • delete the session record outright
  • mark session version or user auth version and reject old versions
  • keep a device/session list per user for device management UI

5.6 Logout Challenges

Logout sounds trivial, but it is easy to implement incompletely.

Problems include:

  • logout only clears client cookie but leaves server session valid
  • user has multiple active devices and expects global logout
  • session persists in mobile apps with long polling or background refresh
  • cached pages or in-flight requests still complete after logout

Good logout design answers:

  • single device logout or all devices?
  • immediate revocation or eventual consistency?
  • what about concurrent refresh operations?

5.7 Session Security Issues

Problem Meaning Mitigation
Session fixation Attacker forces victim to use known session ID Regenerate session ID after login
CSRF Browser auto-sends cookies on forged cross-site requests SameSite, CSRF tokens, origin checks
Session hijacking Session token is stolen HTTPS, HttpOnly, device/risk checks, short idle timeouts
Store outage Session backend unavailable Fallback behavior, multi-AZ design, graceful degradation

Sessions in Interviews

A good interview answer on sessions usually includes:

  • opaque session ID in secure cookie
  • shared store like Redis
  • session regeneration after login
  • revocation and logout semantics
  • CSRF protections
  • sliding vs absolute expiration tradeoff

6. JWT and Token-Based Authentication

JWTs are one of the most discussed and most misunderstood identity topics.

6.1 What a JWT Is

JWT stands for JSON Web Token. It is a compact, self-contained token format commonly used to carry claims.

A JWT typically has three parts:

header.payload.signature

  • header: algorithm and metadata
  • payload: claims such as subject, issuer, audience, expiry
  • signature: proves integrity if signed correctly

Important practical truth: signed JWTs are not secret by default. They are encoded, not hidden. Anyone holding the token can often read the claims.

6.2 Signing vs Encryption

Mechanism What it guarantees Practical meaning
Signing (JWS) Integrity and authenticity Token was issued by trusted signer and not modified
Encryption (JWE) Confidentiality Token contents are hidden from intermediaries/clients

Most production JWT usage is signed, not encrypted.

That means:

  • do not put secrets in JWT payloads
  • do not put more PII than necessary
  • use claims for identity and authorization hints, not as a dumping ground

6.3 Access Tokens vs Refresh Tokens

Token type Lifetime Used by Main purpose
Access token Short-lived APIs Authorize a request
Refresh token Longer-lived Auth client / backend Obtain new access tokens

Best practice:

  • keep access tokens short-lived
  • treat refresh tokens as highly sensitive credentials
  • store refresh tokens more carefully than access tokens

6.4 Why Teams Use JWTs

Benefits:

  • easy for distributed services to verify locally
  • no session store lookup on every request if verification is local
  • good fit for API ecosystems and delegated access
  • works well across domains and service boundaries

Costs:

  • revocation is harder
  • permissions embedded in tokens can become stale
  • key rotation and issuer validation must be done correctly
  • token size can grow dangerously if you stuff too many claims inside

6.5 Token Rotation

Refresh token rotation is a major real-world security mechanism.

Idea:

  • every refresh use invalidates the previous refresh token
  • the auth server issues a new refresh token and new access token
  • if an old refresh token is reused, the server assumes theft and can revoke the token family
sequenceDiagram
	actor User
	participant Client
	participant Auth as Auth Server
	participant Store as Token Store

	User->>Client: Continue using app
	Client->>Auth: POST /token/refresh with refresh token
	Auth->>Store: Validate token family and prior use
	Store-->>Auth: valid / reused / revoked
	Auth-->>Client: New access token + new refresh token
	Auth->>Store: Mark old token used, persist new token state

6.6 Revocation Challenges

Revocation is the biggest practical downside of stateless tokens.

If an access token is self-contained and valid until exp, then after it is issued:

  • the user may be disabled
  • permissions may change
  • a tenant may be suspended
  • the token may be stolen

But the token may still verify cryptographically.

Mitigations:

  • short access token TTLs
  • refresh token rotation
  • revocation list or denylist for critical cases
  • user/session version claim checked against server state
  • opaque tokens with introspection for high-control environments

6.7 Stateless Auth Tradeoffs

This is a favorite interview question: "Should I use JWT or sessions?"

The mature answer is not dogmatic. It depends.

Topic Server-side sessions JWT
Request-time state lookup Usually yes Not always
Easy revocation Yes Harder
Cross-service portability Moderate Strong
Simplicity for web apps Often simpler Often overused
Risk of stale claims Lower Higher
CSRF concern if cookie-based Yes Yes if stored in cookies
XSS risk if JS-accessible storage Lower with HttpOnly cookies Higher if stored in localStorage

A practical rule:

  • for traditional web apps, server-side sessions are often simpler and safer
  • for API ecosystems, third-party integrations, and distributed service verification, tokens are often the better fit

6.8 Common JWT Mistakes

  • storing JWTs in localStorage without carefully thinking through XSS risk
  • placing roles and permissions in long-lived tokens and forgetting they go stale
  • not checking iss, aud, exp, nbf, and key identifiers properly
  • using symmetric signing keys everywhere and spreading them across many services
  • putting secrets or excessive PII in token payloads

JWT Best Practices

  • prefer asymmetric signing for shared verification environments
  • expose public keys via a JWKS endpoint if multiple verifiers exist
  • keep access tokens short-lived
  • rotate signing keys safely and support key overlap during rotation
  • use opaque tokens or introspection if real-time revocation is a hard requirement

7. OAuth

OAuth solves delegated authorization. It lets one application access another application's resources on behalf of a user without receiving the user's password.

7.1 The Problem OAuth Solves

Without OAuth, a user might give App A their password to App B. That is unacceptable because:

  • App A can now do anything the user can do
  • App B cannot scope access cleanly
  • the user cannot revoke just that delegated access safely

OAuth introduces a safer model:

  • user authenticates with the authorization server / IdP
  • user consents to limited scopes
  • client receives tokens with bounded permissions

7.2 Authorization Code Flow with PKCE

This is the modern default for browser and mobile-friendly public clients.

sequenceDiagram
	actor User
	participant Client as SaaS App
	participant Browser
	participant AS as Authorization Server / IdP
	participant API as Third-Party API

	User->>Client: Click "Connect Google Drive"
	Client->>Browser: Redirect to /authorize + scope + code_challenge
	Browser->>AS: Login and grant consent
	AS-->>Browser: Redirect back with authorization code
	Browser->>Client: Deliver authorization code
	Client->>AS: Exchange code + code_verifier
	AS-->>Client: Access token (+ refresh token)
	Client->>API: Call API with access token
	API-->>Client: Protected resource data

Why PKCE exists

PKCE protects the code exchange step so a stolen authorization code is less useful. It is critical for public clients such as SPAs and mobile apps.

7.3 Scopes

Scopes define the breadth of access. Examples:

  • read:user
  • repo:write
  • payments:refunds
  • calendar.readonly

Good scope design is product design plus security design.

If scopes are too broad:

  • users lose trust
  • integrations become over-privileged
  • incident blast radius increases

If scopes are too granular:

  • consent screens become confusing
  • implementation complexity rises
  • developers ask for full access anyway

Consent is the user-visible manifestation of delegated access.

Good consent screens answer:

  • who is requesting access?
  • to which data or actions?
  • for how long?
  • can the user revoke later?

This matters a lot in SaaS ecosystems like Google Workspace, GitHub Apps, or Slack apps.

7.5 Refresh Tokens in OAuth

Long-running integrations often need refresh tokens so they can keep calling APIs without asking the user to re-consent constantly.

Refresh token concerns:

  • high-value credential theft risk
  • need for rotation and revocation
  • tenant admins may want centralized revocation controls

7.6 Third-Party Integrations

In real SaaS systems, OAuth is often used for:

  • connecting Google Drive, GitHub, Slack, Salesforce, Stripe, or Dropbox
  • importing or exporting data
  • posting to external systems on behalf of the user or workspace

Architectural consequences:

  • store provider account linkage metadata
  • encrypt or otherwise protect provider refresh tokens
  • model scopes per installation or workspace
  • surface admin controls for revocation and reauthorization

7.7 OAuth vs Authentication

OAuth is about authorization. Authentication is not the original purpose of OAuth.

However, many products use OAuth plus an identity layer such as OpenID Connect to support "Sign in with Google".

Interview nuance: saying "OAuth is login" is incomplete. Better answer:

  • OAuth is delegated authorization
  • OIDC adds identity information for authentication use cases

OAuth Failure Cases

  • client stores provider tokens insecurely
  • redirect URI validation is weak
  • state parameter not used correctly, enabling CSRF-like attacks in auth flows
  • scopes are excessively broad
  • tenants cannot audit or revoke third-party access easily

8. SSO: SAML and OIDC

Enterprise customers often do not want each SaaS app to manage a separate corporate password. They want central identity, central policy, and controlled employee access. That is where SSO comes in.

8.1 Identity Provider vs Service Provider

Role Meaning
Identity Provider (IdP) System that authenticates the employee, such as Okta, Azure AD, Google Workspace
Service Provider (SP) / Relying Party (RP) The SaaS application that trusts the IdP

8.2 SAML Basics

SAML is older, XML-based, and still heavily used in enterprise environments.

Mental model:

  • user tries to access the SaaS app
  • SaaS redirects user to corporate IdP
  • IdP authenticates user
  • IdP sends signed assertion back to SaaS
  • SaaS creates a local session

Strengths:

  • entrenched in enterprise IT
  • widely supported by corporate identity systems

Costs:

  • XML complexity
  • harder developer ergonomics
  • trickier debugging and implementation compared with OIDC

8.3 OIDC Basics

OpenID Connect is an identity layer on top of OAuth 2.0.

It provides:

  • ID tokens with identity claims
  • standardized login flows
  • better fit for modern web and mobile apps

OIDC is usually easier to work with than SAML for modern applications.

8.4 SAML vs OIDC

Topic SAML OIDC
Typical format XML assertions JSON tokens
Common use case Enterprise browser SSO Modern app login and API ecosystems
Developer ergonomics Heavier Easier
Mobile/API friendliness Weaker Stronger

8.5 Enterprise Architecture

flowchart LR
	Employee[Employee] --> SaaS[Your SaaS App]
	SaaS --> IdP[Enterprise IdP]
	IdP --> SaaS
	IdP --> Directory[Corporate Directory]
	IdP --> SCIM[Provisioning / SCIM]
	SCIM --> SaaS
	SaaS --> Policy[Workspace Roles and Policies]

In production, enterprise identity usually includes two separate concerns:

  • authentication and SSO
  • lifecycle management and provisioning

Provisioning is often handled with SCIM or similar directory sync mechanisms so the SaaS app knows:

  • who exists
  • which groups they belong to
  • who has been deprovisioned

8.6 Common Enterprise Requirements

  • just-in-time user creation on first login
  • domain verification to prove company ownership
  • group-to-role mapping
  • forced MFA at IdP level
  • admin-controlled session duration
  • audit logs for all SSO events

8.7 Failure Cases

  • bad mapping from IdP groups to app roles causes privilege escalation
  • employee is disabled in IdP but app keeps old sessions alive too long
  • email is used as unique identity key and later changes
  • multiple IdPs or merged companies create ambiguous identity mapping

SSO Best Practices

  • use stable external subject identifiers, not just email
  • model tenant-specific SSO config cleanly
  • separate authentication trust from authorization mapping inside the app
  • deprovision aggressively and revoke old sessions when identity status changes

9. Password Reset

Password reset is a high-risk recovery flow. Attackers love it because it often bypasses normal login defenses.

9.1 Secure Token Flow

sequenceDiagram
	actor User
	participant App
	participant Auth as Auth Service
	participant Users as User DB
	participant Reset as Reset Token Store
	participant Mail as Email Service

	User->>App: Click "Forgot password"
	App->>Auth: POST /password-reset
	Auth->>Users: Lookup account
	Auth->>Reset: Store hashed single-use token + expiry
	Auth->>Mail: Send password reset link
	Auth-->>User: Generic success response
	User->>App: Open reset link
	App->>Auth: POST /password-reset/confirm token + new password
	Auth->>Reset: Validate token unused and unexpired
	Auth->>Users: Update password hash
	Auth->>Reset: Mark token used
	Auth-->>App: Success + revoke other sessions

9.2 Why This Design Exists

Password reset has to be secure even if the attacker knows the user's email address. Therefore the reset token must be:

  • hard to guess
  • short-lived
  • single-use
  • revocable

Good systems also revoke active sessions or require re-authentication after password reset.

9.3 Attack Prevention

Threat Mitigation
Account enumeration Return generic responses like "If an account exists, email sent"
Token guessing Long random tokens, rate limits
Token replay Single-use storage and invalidation
Email inbox compromise Step-up verification for high-value actions after reset
Old session persistence Revoke sessions after reset

9.4 Practical Advice

  • prefer opaque reset tokens over stuffing reset state into a long-lived JWT
  • hash reset tokens at rest if you store them server-side
  • keep TTL short, often 15 to 60 minutes depending on product sensitivity
  • notify users when a reset is requested and completed

Password Reset Failure Cases

  • reset token is reusable
  • old sessions remain active after password change
  • reset endpoint leaks whether account exists
  • support team bypasses the secure flow with weak manual procedures

10. Authorization Fundamentals

Authentication answers who the subject is. Authorization answers what that subject may do.

10.1 AuthN vs AuthZ

This distinction matters a lot.

Question Category
"Who are you?" Authentication
"Are you allowed to do this?" Authorization
"How sure are we?" Authentication strength / assurance
"Why was access denied?" Authorization decision and audit

A user can be perfectly authenticated and still not be authorized.

10.2 Authorization Decision Shape

Every authZ decision is some variation of:

Can subject S perform action A on resource R under context C?

Where context may include:

  • tenant
  • time of day
  • network zone
  • device trust level
  • MFA level
  • resource ownership
  • subscription plan
  • legal region or data residency constraints

10.3 Enforcement Layers

Authorization can happen at multiple layers:

Layer Good for Risk if overused
API gateway Coarse access checks, authentication, token validation Too coarse for resource-specific rules
Service layer Business-specific rules Easy to duplicate logic across services
Data access layer Row/tenant isolation, final enforcement Hard to express all product rules here
Database native policies Strong last line of defense in some systems App logic can still drift if not modeled carefully

A common mistake is doing all authorization only at the edge. Edge checks are useful, but most real product rules depend on resource-specific business logic deeper inside the system.

10.4 Policy Design

Good policy design balances three things:

  • expressiveness
  • debuggability
  • operational simplicity

Ask these questions:

  • who is the subject?
  • what resource is being accessed?
  • what action is requested?
  • what context matters?
  • who can change the policy?
  • how do we explain and audit the decision?

10.5 Auditing and Explainability

Authorization is not just about allow or deny. In production you often need:

  • reason codes
  • which policy matched
  • who granted the permission
  • when the permission changed
  • evidence for support, compliance, and incident response

This is why mature systems treat authorization as both a runtime path and a data model.


11. RBAC

RBAC stands for role-based access control. Permissions are grouped into roles, and subjects are assigned roles.

11.1 Why RBAC Exists

Without roles, you would assign individual permissions to every user. That becomes unmanageable quickly.

RBAC simplifies administration:

  • viewer
  • editor
  • admin
  • billing_admin

Instead of attaching dozens of permissions directly to users, you attach permissions to roles and roles to users.

11.2 Basic Model

Entity Example
Permission invoice.read, invoice.refund, workspace.invite
Role support_agent, workspace_admin
Assignment User U has role R in tenant T

Tenant scoping is critical. In multi-tenant SaaS, a user is rarely just "an admin" globally. They are usually an admin in a specific workspace or organization.

11.3 Enterprise Patterns

Common enterprise RBAC patterns include:

  • global roles for platform staff
  • tenant-scoped roles for customers
  • custom roles for larger organizations
  • group-to-role mapping from SSO IdP groups

11.4 The Role Explosion Problem

RBAC starts simple but can degrade into dozens or hundreds of roles:

  • viewer
  • viewer_plus_export
  • viewer_plus_export_plus_billing
  • regional_admin_eu
  • regional_admin_us

This is role explosion.

It happens when RBAC is forced to encode too many contextual conditions that really belong in attributes or policies.

11.5 RBAC Tradeoffs

Strength Weakness
Easy to explain to users and admins Coarse-grained for complex cases
Efficient at runtime Can explode in number of roles
Works well for common SaaS admin patterns Poor fit for dynamic context-heavy rules

RBAC Best Practices

  • keep the base role set small
  • scope roles by tenant, project, or resource container
  • separate platform/internal staff roles from customer roles
  • use RBAC for broad permissions and combine with finer policies when needed

12. ABAC

ABAC stands for attribute-based access control. Instead of only asking "What role does this user have?", ABAC asks about attributes of the subject, resource, and environment.

12.1 Why ABAC Exists

RBAC is often too static for real-world decisions like:

  • support agent can view tickets only in their assigned region
  • manager can approve expenses under a threshold for their own department
  • user can access data only from a compliant device in an approved country
  • payout release requires recent MFA and elevated risk score below threshold

These rules depend on context, not just role labels.

12.2 Dynamic Policy Evaluation

ABAC decisions may use attributes such as:

  • subject department
  • resource owner
  • tenant subscription tier
  • request IP or network zone
  • device trust score
  • current time or shift window
  • MFA strength

Example policy idea:

"Allow refund approval if the subject role is finance_manager, the order belongs to the same merchant account, the refund amount is below the subject limit, and MFA was performed in the last 10 minutes."

12.3 Policy Engines

ABAC often benefits from a dedicated policy engine because hardcoding many dynamic rules directly into services becomes brittle.

Common approaches:

  • custom rules in application code
  • centralized policy engine such as OPA/Rego
  • cloud-style policy systems such as Cedar-like models
  • relationship and graph-based systems for object access patterns

12.4 ABAC Tradeoffs

Strength Weakness
Expressive and context-aware Harder to explain and debug
Reduces role explosion Requires clean attribute sources
Good for fine-grained enterprise control Runtime evaluation can be more expensive

12.5 Practical Use

Many production systems do not choose "RBAC or ABAC". They combine them:

  • RBAC gives the broad lane
  • ABAC applies contextual restrictions inside that lane

Example:

  • role says user may edit invoices
  • ABAC rule says only for their tenant, below approval threshold, and only after MFA for high-value invoices

ABAC Failure Cases

  • attributes are stale or inconsistently sourced across services
  • policies become unreadable and impossible to reason about
  • caching hides recent attribute changes like department moves or suspensions

13. Permissions and Access Control

Permissions are the actual capabilities a subject has. Access control is the mechanism that enforces those permissions correctly and consistently.

13.1 Permission Models

Common permission models include:

Model Mental model Example
RBAC Roles map to permissions Workspace admin
ACL Resource has a list of allowed subjects Shared document editable by Alice and Bob
ABAC Decision based on attributes Region and MFA aware access
ReBAC Decision based on relationships User is member of team that owns repo
Capability/token-based Possession of unforgeable capability grants access Signed download URL

In modern systems, multiple models often coexist.

GitHub is a good mental example:

  • org and team membership look like RBAC/ReBAC
  • repo-specific collaborator lists look like ACLs
  • fine product actions are individual permissions

13.2 Inheritance

Permissions often inherit down a hierarchy:

  • org -> workspace -> project -> resource
  • folder -> document
  • account -> sub-account

Inheritance is useful, but easy to get wrong.

Questions to design explicitly:

  • do child resources inherit all parent permissions?
  • can child permissions override parent permissions?
  • are denies supported, and if so do they take precedence?
  • how do you compute effective permissions efficiently?

13.3 Auditing

You need to know:

  • who granted access
  • when access changed
  • who accessed a resource
  • why a decision was allowed or denied

Auditing matters for:

  • customer support
  • security investigations
  • compliance
  • admin trust

13.4 Enforcement Patterns

There are two common implementation patterns.

Pattern A: Embedded authorization in each service

Pros:

  • low latency
  • business context close to the resource

Cons:

  • duplicated rules across services
  • inconsistent decisions and audit semantics

Pattern B: Centralized authorization service or policy engine

Pros:

  • consistency
  • shared policy language
  • central auditing and explainability

Cons:

  • added network hop
  • dependency on a central service
  • need good caching and fallback behavior

13.5 Centralized Auth Service and Policy Caching

flowchart LR
	Request[Authenticated Request] --> Service[Business Service]
	Service --> Cache[Policy Cache]
	Cache -->|cache miss| Authz[Central Authorization Service]
	Authz --> PDP[Policy Engine]
	Authz --> Attrs[Attribute / Relationship Data]
	PDP --> Decision[Allow / Deny + Reason]
	Decision --> Audit[Audit Log]
	Decision --> Service

Caching is often necessary, but introduces staleness risks. Common techniques:

  • short TTL caches for policy decisions
  • versioned policy snapshots
  • event-driven invalidation on membership or role changes
  • cache only stable intermediate data, not final decisions, in high-risk systems

13.6 Policy Caching Tradeoffs

Benefit Cost
Lower latency Stale authorization decisions
Lower policy service load Harder revocation semantics
Better resilience during partial outage Risk of fail-open or fail-stale behavior

Design question: when policy service is down, do you fail closed or fail open?

  • fail closed is safer but can hurt availability
  • fail open preserves availability but may violate security

For high-risk actions, fail closed is usually the right answer.

Access Control Best Practices

  • enforce tenant isolation early and repeatedly
  • keep policy decisions explainable
  • separate authentication claims from live authorization state when permissions change frequently
  • audit all admin and permission-management actions
  • do not trust internal network location as a permission model

14. Service-to-Service Authentication

User authentication is only half of production security. Modern backends also need to authenticate services to each other.

14.1 Why Internal Service Authentication Exists

In microservice systems, one request may pass through many services:

  • edge/API gateway
  • auth service
  • order service
  • payment service
  • notification service

If internal calls are trusted just because they are "inside the VPC", a compromised service can impersonate others too easily.

This is why zero-trust principles matter internally too.

14.2 Service Identity

A service needs its own identity, just like a user does.

Examples:

  • payments-service.prod
  • orders-service.eu-west-1
  • workload identity bound to a Kubernetes service account

A strong service identity system lets the platform answer:

  • which service is calling?
  • is it the real deployed workload?
  • is it allowed to call this destination?

14.3 mTLS Basics

Mutual TLS means both sides authenticate each other during the TLS handshake.

Benefits:

  • encryption in transit
  • client and server authentication
  • strong cryptographic service identity

Typical pattern:

  • internal CA issues short-lived certificates to workloads
  • service presents client cert on outbound call
  • destination validates issuer and identity
sequenceDiagram
	participant A as Service A
	participant CA as Internal CA / Identity System
	participant B as Service B

	A->>CA: Request workload certificate
	CA-->>A: Short-lived cert
	A->>B: TLS handshake + client cert
	B->>B: Verify cert, issuer, SAN, expiry
	B-->>A: Authenticated secure channel
	A->>B: Application request
	B-->>A: Response

14.4 Short-Lived Credentials

Short-lived credentials are a major production best practice.

Why?

  • if stolen, they expire quickly
  • less need for manual secret rotation
  • better fit with workload identity and automation

This pattern shows up in:

  • cloud IAM temporary credentials
  • Kubernetes workload identity
  • service mesh certificates
  • internal token minting systems

14.5 Zero Trust Basics

Zero trust does not mean trust nothing blindly forever. It means:

  • do not grant access solely based on network location
  • verify identity continuously and explicitly
  • enforce least privilege
  • assume compromise is possible and reduce blast radius

Google's public BeyondCorp ideas are the canonical mental model here: access should depend on identity, device state, and policy, not on whether traffic comes from "inside the office network".

14.6 Service-to-Service Authorization

Authentication tells you that a caller is payments-service. Authorization must still decide whether payments-service may:

  • read card metadata
  • call refund APIs
  • publish to a payout topic
  • access a particular database table

This is often implemented with:

  • service identity plus policy
  • SPIFFE-like identity patterns
  • service mesh policy
  • signed internal tokens with audience restrictions

Service Auth Failure Cases

  • long-lived shared secrets copied across many services
  • no certificate rotation automation
  • any internal service can call any other service
  • internal service trusts caller-provided headers like X-User-Id without verification
  • service identity is authenticated but not authorized

Service Auth Best Practices

  • use workload or service identity, not shared static secrets where possible
  • prefer short-lived credentials and automatic rotation
  • bind end-user identity propagation carefully when needed
  • separate service identity from end-user identity in request context

15. How These Systems Fit Together

A strong interview answer connects all the pieces into one architecture.

15.1 Typical SaaS Architecture

flowchart TD
	User[Browser / Mobile App] --> Edge[Edge / API Gateway]
	Edge --> Auth[Auth Service]
	Auth --> UserDB[(User Directory)]
	Auth --> Session[(Session Store / Token Store)]
	Auth --> Keys[KMS / Signing Keys]
	Edge --> App[Business Services]
	App --> Authz[Authorization Service / Policy Engine]
	Authz --> Perms[(Roles, Relationships, Attributes)]
	App --> Data[(Application Data)]
	Auth --> Audit[Security Audit Log]
	Authz --> Audit
	App --> Audit

The key idea is separation of concerns:

  • auth service proves identity and issues continuity artifacts
  • session/token store manages continuity and revocation
  • policy engine decides access
  • application services enforce business operations
  • audit pipeline records security-relevant facts

15.2 Consumer SaaS vs Enterprise SaaS vs Internal Platform

Environment Identity priorities
Consumer SaaS Signup conversion, password recovery, abuse prevention, social login
Enterprise SaaS SSO, provisioning, group mapping, auditability, tenant admin controls
Internal platform Service identity, zero trust, least privilege, strong device posture

15.3 Data Freshness vs Statelessness

One of the deepest identity design tradeoffs is this:

  • stateless verification is fast and scalable
  • fresh authorization state often requires looking up server-side data

That is why many mature architectures mix the two:

  • token or session for authentication continuity
  • live policy check for sensitive authorization

15.4 Tenant Isolation

For SaaS systems, tenant isolation must be explicit in both authentication and authorization.

Common patterns:

  • include tenant membership in auth/session context
  • scope roles by tenant
  • enforce tenant filters in service and data layers
  • audit cross-tenant admin actions aggressively

This is especially important in systems like GitHub organizations, Stripe connected accounts, or enterprise SaaS workspaces.


16. Real-World Patterns and Company Examples

These examples are useful as mental anchors, not as exact internal blueprints.

Google

  • Public Google identity and OIDC flows are a classic example of large-scale federated identity.
  • Google's public BeyondCorp ideas are foundational for zero-trust access.
  • Zanzibar is the famous reference point for large-scale, relationship-aware authorization.

Interview lesson: centralized authorization models can work at huge scale if the data model, caching, and consistency story are designed carefully.

Netflix

  • Netflix-style service-rich environments highlight the need for service identity, short-lived credentials, and resilient internal auth patterns.
  • Streaming and control plane workloads also show why identity systems must stay available under very high traffic.

Interview lesson: internal service auth is not optional in large microservice systems.

Uber

  • Ride-sharing and marketplace architectures depend on strict service-to-service permissions, real-time risk checks, and strong tenant/user context propagation.
  • Payment, dispatch, driver, and rider services cannot safely trust each other based only on network placement.

Interview lesson: identity context often flows through many services and must remain verifiable.

Amazon

  • AWS IAM is the public archetype for policy-heavy authorization with users, roles, resource policies, temporary credentials, and least privilege.

Interview lesson: enterprise-grade authorization is really a policy and identity modeling problem, not just a list of roles.

GitHub

  • GitHub demonstrates a mix of organization membership, teams, repository roles, OAuth apps, GitHub Apps, personal access tokens, and enterprise SSO.

Interview lesson: one product often needs several identity and authorization models at the same time.

Stripe

  • Stripe is a useful example for strong dashboard authentication, MFA, API keys, restricted keys, OAuth for Connect-style platforms, and careful access around money movement.

Interview lesson: high-risk actions need stronger auth, auditability, and granular permissions than low-risk read-only actions.

Typical SaaS Systems

Most B2B SaaS products end up combining:

  • email/password and social login for self-serve customers
  • SSO for enterprise customers
  • RBAC for admin/editor/viewer patterns
  • ABAC or policy rules for sensitive workflows
  • API tokens or OAuth for integrations
  • service identity for microservices

17. Interview Discussion Guide

If asked to design identity and access for a backend system, structure your answer progressively.

17.1 Clarifying Questions

Ask:

  • who are the subjects: end users, admins, services, partners?
  • is this consumer, enterprise, or internal platform?
  • are we designing login, third-party integration, or internal access control?
  • what is the risk level: social app, fintech, healthcare, developer platform?
  • do we need SSO, API access, or both?
  • how fresh must revocation and permission changes be?

17.2 Good Interview Structure

  1. Define identities and trust boundaries.
  2. Choose authentication mechanism.
  3. Choose continuity mechanism: session or token.
  4. Design authorization model.
  5. Address recovery, revocation, audit, and failure cases.
  6. Address scale, caching, key rotation, and multi-region concerns.

17.3 Common Interview Comparisons

Sessions vs JWT

Use when the interviewer asks about stateful vs stateless auth.

Question Sessions answer JWT answer
Need instant logout? Strong Harder
Need local verification across services? Weaker Strong
Web app simplicity? Often simpler Often overcomplicated
Third-party API ecosystem? Less natural Better fit

RBAC vs ABAC

Question RBAC ABAC
Easy admin mental model Strong Weaker
Fine-grained contextual rules Weak Strong
Risk of role explosion High Lower
Ease of debugging Stronger Harder

SAML vs OIDC

Question SAML OIDC
Enterprise legacy support Strong Strong but varies
Modern web/mobile friendliness Weaker Strong
Developer ergonomics Heavier Better

17.4 Scaling Considerations to Mention

  • Redis or equivalent for shared sessions
  • multi-region token verification and key distribution
  • policy caching with invalidation strategy
  • short-lived credentials for services
  • audit event pipelines decoupled from critical-path latency
  • abuse protection on login and signup

17.5 Failure Cases Worth Calling Out

  • auth service outage blocks all logins
  • Redis session store failure logs users out or prevents validation
  • stale permissions cached after role removal
  • signing key rotation breaks old verifiers
  • refresh token theft leads to silent session hijack

Interview tip: explicitly talking about revocation, rotation, and failure handling is often what moves an answer from junior to strong mid-level or senior.


18. Common Mistakes and Best Practices

Common Mistakes

  • treating authentication and authorization as the same problem
  • storing passwords with fast hashes
  • putting too much trust in long-lived JWTs
  • assuming "internal network" means trusted caller
  • forgetting logout, revocation, and recovery flows
  • doing authorization only at the gateway
  • using email as the only durable identity key in enterprise federation
  • failing to audit permission changes and admin actions
  • building role systems that cannot express tenant or resource scope

Best Practices

  • separate identity proof, session/token continuity, and authorization policy clearly
  • use slow password hashing and protect high-value secrets with KMS/HSM support
  • prefer MFA and step-up authentication for sensitive actions
  • keep access tokens short-lived and refresh tokens protected and rotated
  • model tenant-aware roles and permissions explicitly
  • centralize policy where consistency matters, but understand cache staleness tradeoffs
  • use service identity and short-lived credentials internally
  • build auditability and explainability into the system from the beginning

Final Mental Model

If you remember one thing, remember this:

Identity and access is not one feature. It is a chain of connected systems:

  • identity proof
  • credential management
  • session or token continuity
  • authorization policy
  • revocation and recovery
  • service identity
  • auditing and operations

Real systems succeed when all of these parts are designed together.

If one weak link exists, attackers and outages will find it.


Quick Review Checklist

Use this when revising for interviews.

  • Can I clearly explain AuthN vs AuthZ?
  • Do I know when to use sessions vs JWTs?
  • Can I explain password hashing, salts, peppers, and MFA tradeoffs?
  • Can I walk through OAuth authorization code flow with PKCE?
  • Can I explain SAML vs OIDC and IdP vs SP?
  • Can I compare RBAC and ABAC with examples?
  • Can I describe revocation, logout, token rotation, and password reset securely?
  • Can I explain service-to-service auth, mTLS, and zero trust?
  • Can I describe where policy enforcement should happen in a real system?

If the answer is yes to those questions, your identity and access fundamentals are strong enough for most software engineering interview discussions and practical backend design conversations.