TarunElango/Computer-Fundamentals

Fork 0

Files

T

tarun-elango 26810e43d0 sd text

2026-04-26 13:27:19 -04:00

58 KiB

Raw Permalink Blame History

2. Identity & Access

Identity and access is the control plane for nearly every backend system. It answers four questions for every request:

Who is calling?
How do we know they are really that caller?
What are they allowed to do right now?
How do we prove later that the decision was correct?

If you understand identity and access well, you can reason about login systems, sessions, JWTs, OAuth integrations, enterprise SSO, authorization policies, service-to-service security, and zero-trust architecture as one connected system rather than as isolated buzzwords.

This guide is written for two goals at the same time:

interview preparation
real-world backend and system design understanding

The emphasis is practical. The goal is not to memorize definitions, but to understand why these systems exist, how they fail, and how production systems are actually built.

Why Identity & Access Exists
Core Concepts and Mental Model
Authentication Fundamentals
Login and Signup
Sessions
JWT and Token-Based Authentication
OAuth
SSO: SAML and OIDC
Password Reset
Authorization Fundamentals
RBAC
ABAC
Permissions and Access Control
Service-to-Service Authentication
How These Systems Fit Together
Real-World Patterns and Company Examples
Interview Discussion Guide
Common Mistakes and Best Practices

1. Why Identity & Access Exists

Most systems are multi-user, multi-device, multi-service, and increasingly multi-tenant. Without identity and access controls, the backend has no safe way to distinguish:

one user from another
a user from an attacker
an employee from a customer
a production service from a compromised internal service
a legitimate action from a replayed or forged request

At small scale, identity and access looks like a login form plus a password check. At production scale, it becomes much bigger:

account creation and identity proofing
credential storage and recovery
MFA and risk detection
sessions and token lifecycle management
delegated access via OAuth
enterprise federation via SSO
role and policy evaluation
service identity inside microservices
auditing, revocation, key rotation, and incident response

The reason interviews ask about identity and access so often is simple: it touches security, data modeling, distributed systems, product tradeoffs, and failure handling all at once.

The Core Tension

Identity systems always balance three goals:

Goal	What it means	Why it is hard
Security	Prevent impersonation and unauthorized access	Stronger security usually adds friction
Usability	Let real users sign in quickly and recover safely	Easier flows are often easier to abuse
Scalability	Support huge traffic, many services, and many tenants	Distributed state and revocation become harder

An excellent backend engineer treats identity not as a feature checkbox, but as a reliability and security subsystem.

2. Core Concepts and Mental Model

Before discussing flows, build the right mental model.

Important Terms

Term	Meaning	Practical intuition
Identity	The subject being represented	A user, admin, device, service, or organization
Authentication (AuthN)	Verifying who the subject is	"Prove you are Alice"
Authorization (AuthZ)	Deciding what that subject may do	"Can Alice read invoice 123?"
Session	Server-recognized authenticated continuity over time	"This browser remains logged in"
Access token	Credential presented to APIs	Often short-lived
Refresh token	Credential used to obtain new access tokens	More sensitive than access tokens
Identity Provider (IdP)	System that authenticates identities	Google, Okta, Azure AD
Service Provider / Relying Party	App that trusts the IdP	Your SaaS product
Policy engine	Evaluates access rules	RBAC, ABAC, ReBAC, custom rules
Audit log	Immutable trail of security-relevant events	Needed for forensics and compliance

One Request Through the System

sequenceDiagram
	actor User
	participant Client
	participant Edge as API Gateway / Edge
	participant Auth as Auth Service
	participant Policy as Policy Engine
	participant App as Business Service
	participant Data as Data Store

	User->>Client: Click "View invoice"
	Client->>Edge: GET /invoices/123 + cookie/token
	Edge->>Auth: Validate session/token
	Auth-->>Edge: subject, tenant, auth strength, claims
	Edge->>Policy: Can subject read invoice 123?
	Policy-->>Edge: allow/deny + reason
	Edge->>App: Forward authenticated request
	App->>Data: Load resource
	Data-->>App: Resource data
	App-->>Client: 200 OK or 403 Forbidden

This is the simplest correct mental model:

authentication establishes identity
authorization evaluates permissions for the requested action
business logic executes only after those checks
the decision should be observable and auditable

A Production Identity Stack

In a real system, identity and access usually spans these components:

Component	Typical responsibility
Auth service	Login, signup, password verification, MFA, token issuance
User directory	Users, credentials metadata, verification state, tenant membership
Session store	Server-side sessions and revocation state
Token service	Access token and refresh token lifecycle
Policy engine	Role/attribute-based access decisions
Key management	Signing keys, encryption keys, secret rotation
Audit pipeline	Security events, admin actions, login failures, policy decisions
Risk engine	Rate limits, device reputation, fraud checks, anomaly detection

Interview shortcut: if you can clearly separate authentication, session/token management, and authorization, you already sound more senior than candidates who collapse them into one vague "auth layer".

3. Authentication Fundamentals

Authentication is the process of verifying identity claims. The claim is usually, "I am user X" or "I am service Y".

3.1 Identity Verification Basics

Authentication depends on evidence. The most common categories are:

Factor	Example	Strengths	Weaknesses
Something you know	Password, PIN	Familiar, cheap	Can be guessed, phished, reused
Something you have	Phone, authenticator app, hardware key	Stronger than passwords alone	Device loss, recovery complexity
Something you are	Fingerprint, Face ID	Convenient on-device UX	Biometric recovery and privacy concerns

Important nuance: many systems do not verify a human's real-world identity. They verify control over a credential. For example:

password login verifies knowledge of a password
email verification verifies access to an inbox
TOTP verifies possession of a seed-bound authenticator
passkeys verify possession of a private key and user presence

That is why identity systems often talk about assurance levels rather than absolute truth.

3.2 Identifiers vs Authenticators

Two concepts are often mixed up:

an identifier tells the system which subject is being referenced
an authenticator proves control over that identity

Examples:

alice@example.com is an identifier
the password, passkey, or OAuth login is the authenticator

Production systems often support multiple identifiers for the same user:

email
username
phone number
enterprise SSO subject ID
internal immutable user ID

Best practice: use a stable internal user ID as the true primary key, even if the login identifier changes.

3.3 Credential Storage

This is one of the most common interview topics because it separates surface-level knowledge from real engineering understanding.

Never store plaintext passwords

If a database leak reveals plaintext passwords, the incident is catastrophic. Attackers will also try the same passwords on other services because users reuse credentials.

Store password hashes, not passwords

The flow is:

User submits password.
Server generates a per-user salt.
Server applies a slow password hashing algorithm.
Server stores the resulting hash and metadata.
On login, the server recomputes and compares.

Good password hashing algorithms are intentionally expensive. That is the point. They make offline brute force attacks slower.

Algorithm	Typical status	Why it matters
Argon2id	Best modern default	Memory-hard and resistant to GPU attacks
bcrypt	Still common and acceptable	Widely supported, battle-tested
PBKDF2	Common in legacy and regulated systems	Safer than fast hashes, but less ideal than Argon2id
SHA-256 / MD5 alone	Unsafe for password storage	Too fast, easy to brute force

Salt and Pepper

Mechanism	Purpose
Salt	Unique random value per password; prevents rainbow-table reuse
Pepper	Extra secret held outside the user table, often in KMS/HSM; raises attack cost after DB leaks

Practical Storage Pattern

store algorithm name and parameters with the hash
use constant-time comparison to reduce timing leakage
rehash on login when old parameters are outdated
keep password policy reasonable; massive composition rules often lead to weaker behavior

Interview depth point

If an interviewer asks, "Why use bcrypt or Argon2 instead of SHA-256?", the real answer is not just "because it is more secure". The real answer is:

password databases are often attacked offline after leaks
attackers can run billions of SHA-256 hashes quickly
slow, memory-hard algorithms make each guess expensive
cost parameters can be tuned as hardware improves

3.4 MFA Basics

Multi-factor authentication exists because passwords are a weak single point of failure.

Common MFA methods:

Method	Security level	Practical notes
SMS OTP	Low to medium	Vulnerable to SIM swap and phishing
Email OTP	Low	Better than nothing, but email is often the same recovery channel
TOTP app	Medium	Common and cheap; still phishable
Push approval	Medium	Good UX, but push fatigue attacks exist
WebAuthn / passkeys / hardware keys	High	Strong phishing resistance

Production systems often use risk-based MFA rather than always prompting:

new device
new geography
impossible travel
admin action
payout or billing change
password reset or recovery event

This is called step-up authentication.

Recovery Matters

Many teams design MFA setup but forget MFA recovery. Good systems provide:

recovery codes
alternate authenticators
carefully controlled support workflows

The recovery flow is often more attackable than the MFA flow itself.

3.5 Email Verification

Email verification usually proves inbox control, not human identity. It exists to:

reduce fake or mistyped accounts
ensure password reset reachability
protect downstream systems from garbage identities
support trust in notifications, billing, and invites

Good implementation details:

generate a random, single-use token
store only a hash of the token server-side if possible
apply a short TTL
invalidate older outstanding verification tokens after a new one is issued
avoid leaking whether the account exists during resend flows

3.6 Device Trust

Device trust tries to answer, "Is this a previously seen, low-risk device?"

Typical signals:

long-lived device cookie
browser fingerprinting or device metadata
last successful MFA on that device
IP reputation and ASN patterns
OS or app attestation on mobile

Device trust is useful, but dangerous if over-trusted. Devices are compromiseable. Cookies can be stolen. Browsers change. Treat device trust as a risk signal, not a source of truth.

Authentication Failure Cases

weak password hashing leads to offline cracking after DB leaks
email verification links are reusable or never expire
MFA recovery bypasses stronger checks
account enumeration leaks whether an email exists
social login accounts are linked incorrectly to existing local accounts
device trust becomes an authorization shortcut instead of a risk signal

Authentication Best Practices

prefer Argon2id or bcrypt for passwords
rate-limit login, signup, reset, and verification endpoints
use MFA for privileged users and step-up auth for sensitive actions
log auth events with context, but never log secrets or raw passwords
design credential rotation and recovery before launch, not after an incident

Signup and login flows are the public entry points to your system. They are also some of the most attacked endpoints you will ever run.

sequenceDiagram
	actor User
	participant Browser
	participant Auth as Auth API
	participant Risk as Risk / Abuse Service
	participant Users as User DB
	participant Mail as Email Service
	participant Session as Session Store

	User->>Browser: Submit email + password
	Browser->>Auth: POST /signup
	Auth->>Risk: Check IP, velocity, disposable email, device
	Risk-->>Auth: risk score / allow / challenge
	Auth->>Users: Create pending account + password hash
	Auth->>Mail: Send verification link
	Mail-->>User: Verification email
	User->>Browser: Click link
	Browser->>Auth: GET /verify?token=...
	Auth->>Users: Mark email verified
	Auth->>Session: Create session
	Auth-->>Browser: Set secure auth cookie

What actually happens in production

A robust signup flow usually includes:

Input normalization Normalize email casing rules carefully, trim whitespace, reject obvious malformed values.
Abuse screening IP reputation, rate limits, disposable email detection, CAPTCHA when needed, device velocity, and signup bursts by network.
Account creation state Many systems create users in a pending_verification state first.
Email verification The account may exist but have limited capabilities until verified.
Bootstrap domain objects For SaaS, create workspace, tenant, default role, billing state, and onboarding tasks.
Initial session issuance Some systems log the user in immediately after verification. Others require explicit login.

Why pending state matters

If you create fully active accounts before verification, you may end up with:

abandoned fake tenants
spammed invites or API abuse
polluted analytics and billing pipelines

The login flow is simpler than signup conceptually, but much more operationally sensitive.

Common steps:

Identify account by email/username/federated ID.
Fetch credential metadata and account status.
Verify password or federated assertion.
Evaluate account risk and MFA policy.
Create session or issue tokens.
Log success or failure for audit and anomaly detection.

A production login decision often depends on more than a password:

account locked or disabled?
tenant suspended?
email verified?
MFA enrolled?
device known?
unusual geography?
refresh token family compromised?

Fraud prevention is not just a payments problem. Identity systems are abused for:

spam account creation
credential stuffing
promo abuse
referral fraud
fake trial creation
scraping and automated signups

Basic but effective controls:

Control	What it helps with
Rate limiting by IP and identifier	Brute force and signup bursts
Device and IP reputation	Known bad networks and bots
CAPTCHA or challenge step-up	Automated abuse at suspicious thresholds
Email domain heuristics	Disposable inboxes, typo domains
Phone verification for high-risk cases	Raises attacker cost
Idempotency keys on signup APIs	Retry safety without duplicate accounts

Interview point: fraud controls are part of auth architecture because attackers do not politely separate "security" from "growth" endpoints.

"Login with Google" or "Login with GitHub" improves user experience, but introduces federation complexity.

Benefits:

no local password to manage
faster onboarding
higher conversion for some user segments

Risks and edge cases:

provider outage affects sign-in
incorrect account linking can cause account takeover
email from provider may be unverified or not globally unique in the way you assume
enterprise customers may not want personal social identities linked to business workspaces

Best practice for account linking:

if a social identity is new, do not blindly attach it to a local account just because the email matches
require proof of control or signed-in confirmation before linking to an existing account

4.5 Onboarding Architecture

Signup is not just about auth. It often triggers business setup:

create personal or team workspace
assign owner role
seed settings and notification preferences
create billing customer object
publish analytics and onboarding events

This makes signup a distributed workflow. Real systems often handle it with:

synchronous creation for the minimum needed to log in
async events for non-critical setup
idempotent consumers to avoid duplicate workspaces or billing objects

verification emails delayed or blocked, leaving users in limbo
duplicate accounts created because signup is not idempotent
support team manually verifies accounts in insecure ways
social and password accounts merge incorrectly
signup path leaks which emails already exist

keep the critical path small and reliable
separate abuse checks from core credential logic, but make them part of the final decision
use generic error messages externally and detailed audit logs internally
make signup and login events observable with metrics and tracing

5. Sessions

Sessions are the classic way to keep users logged in across multiple HTTP requests.

5.1 What a Session Really Is

A session means the server has already authenticated the user and stores an authenticated state keyed by a session identifier.

Typical flow:

User logs in successfully.
Server creates a session record.
Server sends the client a session ID in a cookie.
Client sends the cookie on future requests.
Server looks up session state and reconstructs identity.

5.2 Server-Side Sessions

In server-side session architecture, the browser usually only stores an opaque identifier.

Example session data:

user ID
tenant ID
auth strength or MFA state
issued time and last activity time
device metadata
CSRF-related state

Advantages:

easy revocation
easy logout across devices
server fully controls state
easy to add security flags or session versioning

Disadvantages:

needs a session store lookup
requires shared state across app instances
harder to scale if poorly designed

5.3 Redis-Backed Sessions

Redis is a very common session backend because it is fast, supports TTL, and works well as shared ephemeral state.

flowchart LR
	Browser[Browser with secure cookie] --> LB[Load Balancer]
	LB --> App1[App Instance A]
	LB --> App2[App Instance B]
	App1 --> Redis[(Redis Session Store)]
	App2 --> Redis
	Redis --> Audit[Audit / Security Events]

Why Redis is popular for sessions:

low-latency reads and writes
TTL expiration built in
simple key-value model
easy fit for horizontally scaled app fleets

Scaling considerations:

shard or cluster if session volume is high
replicate carefully; understand failover and session loss behavior
monitor hot keys and uneven access patterns
decide whether to refresh TTL on every request or on a sliding window

Session security depends heavily on cookie configuration.

Cookie attribute	Why it matters
`HttpOnly`	Prevents JavaScript from reading the cookie, reducing XSS impact
`Secure`	Sends cookie only over HTTPS
`SameSite=Lax/Strict`	Reduces CSRF risk from cross-site requests
Domain scoping	Prevents unintended subdomain sharing
Path scoping	Limits where the cookie is sent
Expiry / Max-Age	Controls session persistence

Important nuance:

HttpOnly helps against token theft by frontend JavaScript
SameSite helps against CSRF
neither one fixes everything if the app has deeper logic flaws

5.5 Session Invalidation

Session invalidation is one reason server-side sessions remain attractive.

You can revoke sessions when:

user logs out
password changes
MFA is reset
admin disables the account
suspicious activity is detected

Common implementation patterns:

delete the session record outright
mark session version or user auth version and reject old versions
keep a device/session list per user for device management UI

5.6 Logout Challenges

Logout sounds trivial, but it is easy to implement incompletely.

Problems include:

logout only clears client cookie but leaves server session valid
user has multiple active devices and expects global logout
session persists in mobile apps with long polling or background refresh
cached pages or in-flight requests still complete after logout

Good logout design answers:

single device logout or all devices?
immediate revocation or eventual consistency?
what about concurrent refresh operations?

5.7 Session Security Issues

Problem	Meaning	Mitigation
Session fixation	Attacker forces victim to use known session ID	Regenerate session ID after login
CSRF	Browser auto-sends cookies on forged cross-site requests	`SameSite`, CSRF tokens, origin checks
Session hijacking	Session token is stolen	HTTPS, `HttpOnly`, device/risk checks, short idle timeouts
Store outage	Session backend unavailable	Fallback behavior, multi-AZ design, graceful degradation

Sessions in Interviews

A good interview answer on sessions usually includes:

opaque session ID in secure cookie
shared store like Redis
session regeneration after login
revocation and logout semantics
CSRF protections
sliding vs absolute expiration tradeoff

6. JWT and Token-Based Authentication

JWTs are one of the most discussed and most misunderstood identity topics.

6.1 What a JWT Is

JWT stands for JSON Web Token. It is a compact, self-contained token format commonly used to carry claims.

A JWT typically has three parts:

header.payload.signature

header: algorithm and metadata
payload: claims such as subject, issuer, audience, expiry
signature: proves integrity if signed correctly

Important practical truth: signed JWTs are not secret by default. They are encoded, not hidden. Anyone holding the token can often read the claims.

6.2 Signing vs Encryption

Mechanism	What it guarantees	Practical meaning
Signing (JWS)	Integrity and authenticity	Token was issued by trusted signer and not modified
Encryption (JWE)	Confidentiality	Token contents are hidden from intermediaries/clients

Most production JWT usage is signed, not encrypted.

That means:

do not put secrets in JWT payloads
do not put more PII than necessary
use claims for identity and authorization hints, not as a dumping ground

6.3 Access Tokens vs Refresh Tokens

Token type	Lifetime	Used by	Main purpose
Access token	Short-lived	APIs	Authorize a request
Refresh token	Longer-lived	Auth client / backend	Obtain new access tokens

Best practice:

keep access tokens short-lived
treat refresh tokens as highly sensitive credentials
store refresh tokens more carefully than access tokens

6.4 Why Teams Use JWTs

Benefits:

easy for distributed services to verify locally
no session store lookup on every request if verification is local
good fit for API ecosystems and delegated access
works well across domains and service boundaries

Costs:

revocation is harder
permissions embedded in tokens can become stale
key rotation and issuer validation must be done correctly
token size can grow dangerously if you stuff too many claims inside

6.5 Token Rotation

Refresh token rotation is a major real-world security mechanism.

Idea:

every refresh use invalidates the previous refresh token
the auth server issues a new refresh token and new access token
if an old refresh token is reused, the server assumes theft and can revoke the token family

sequenceDiagram
	actor User
	participant Client
	participant Auth as Auth Server
	participant Store as Token Store

	User->>Client: Continue using app
	Client->>Auth: POST /token/refresh with refresh token
	Auth->>Store: Validate token family and prior use
	Store-->>Auth: valid / reused / revoked
	Auth-->>Client: New access token + new refresh token
	Auth->>Store: Mark old token used, persist new token state

6.6 Revocation Challenges

Revocation is the biggest practical downside of stateless tokens.

If an access token is self-contained and valid until exp, then after it is issued:

the user may be disabled
permissions may change
a tenant may be suspended
the token may be stolen

But the token may still verify cryptographically.

Mitigations:

short access token TTLs
refresh token rotation
revocation list or denylist for critical cases
user/session version claim checked against server state
opaque tokens with introspection for high-control environments

6.7 Stateless Auth Tradeoffs

This is a favorite interview question: "Should I use JWT or sessions?"

The mature answer is not dogmatic. It depends.

Topic	Server-side sessions	JWT
Request-time state lookup	Usually yes	Not always
Easy revocation	Yes	Harder
Cross-service portability	Moderate	Strong
Simplicity for web apps	Often simpler	Often overused
Risk of stale claims	Lower	Higher
CSRF concern if cookie-based	Yes	Yes if stored in cookies
XSS risk if JS-accessible storage	Lower with `HttpOnly` cookies	Higher if stored in localStorage

A practical rule:

for traditional web apps, server-side sessions are often simpler and safer
for API ecosystems, third-party integrations, and distributed service verification, tokens are often the better fit

6.8 Common JWT Mistakes

storing JWTs in localStorage without carefully thinking through XSS risk
placing roles and permissions in long-lived tokens and forgetting they go stale
not checking iss, aud, exp, nbf, and key identifiers properly
using symmetric signing keys everywhere and spreading them across many services
putting secrets or excessive PII in token payloads

JWT Best Practices

prefer asymmetric signing for shared verification environments
expose public keys via a JWKS endpoint if multiple verifiers exist
keep access tokens short-lived
rotate signing keys safely and support key overlap during rotation
use opaque tokens or introspection if real-time revocation is a hard requirement

7. OAuth

OAuth solves delegated authorization. It lets one application access another application's resources on behalf of a user without receiving the user's password.

7.1 The Problem OAuth Solves

Without OAuth, a user might give App A their password to App B. That is unacceptable because:

App A can now do anything the user can do
App B cannot scope access cleanly
the user cannot revoke just that delegated access safely

OAuth introduces a safer model:

user authenticates with the authorization server / IdP
user consents to limited scopes
client receives tokens with bounded permissions

7.2 Authorization Code Flow with PKCE

This is the modern default for browser and mobile-friendly public clients.

sequenceDiagram
	actor User
	participant Client as SaaS App
	participant Browser
	participant AS as Authorization Server / IdP
	participant API as Third-Party API

	User->>Client: Click "Connect Google Drive"
	Client->>Browser: Redirect to /authorize + scope + code_challenge
	Browser->>AS: Login and grant consent
	AS-->>Browser: Redirect back with authorization code
	Browser->>Client: Deliver authorization code
	Client->>AS: Exchange code + code_verifier
	AS-->>Client: Access token (+ refresh token)
	Client->>API: Call API with access token
	API-->>Client: Protected resource data

Why PKCE exists

PKCE protects the code exchange step so a stolen authorization code is less useful. It is critical for public clients such as SPAs and mobile apps.

7.3 Scopes

Scopes define the breadth of access. Examples:

read:user
repo:write
payments:refunds
calendar.readonly

Good scope design is product design plus security design.

If scopes are too broad:

users lose trust
integrations become over-privileged
incident blast radius increases

If scopes are too granular:

consent screens become confusing
implementation complexity rises
developers ask for full access anyway

Consent is the user-visible manifestation of delegated access.

Good consent screens answer:

who is requesting access?
to which data or actions?
for how long?
can the user revoke later?

This matters a lot in SaaS ecosystems like Google Workspace, GitHub Apps, or Slack apps.

7.5 Refresh Tokens in OAuth

Long-running integrations often need refresh tokens so they can keep calling APIs without asking the user to re-consent constantly.

Refresh token concerns:

high-value credential theft risk
need for rotation and revocation
tenant admins may want centralized revocation controls

7.6 Third-Party Integrations

In real SaaS systems, OAuth is often used for:

connecting Google Drive, GitHub, Slack, Salesforce, Stripe, or Dropbox
importing or exporting data
posting to external systems on behalf of the user or workspace

Architectural consequences:

store provider account linkage metadata
encrypt or otherwise protect provider refresh tokens
model scopes per installation or workspace
surface admin controls for revocation and reauthorization

7.7 OAuth vs Authentication

OAuth is about authorization. Authentication is not the original purpose of OAuth.

However, many products use OAuth plus an identity layer such as OpenID Connect to support "Sign in with Google".

Interview nuance: saying "OAuth is login" is incomplete. Better answer:

OAuth is delegated authorization
OIDC adds identity information for authentication use cases

OAuth Failure Cases

client stores provider tokens insecurely
redirect URI validation is weak
state parameter not used correctly, enabling CSRF-like attacks in auth flows
scopes are excessively broad
tenants cannot audit or revoke third-party access easily

8. SSO: SAML and OIDC

Enterprise customers often do not want each SaaS app to manage a separate corporate password. They want central identity, central policy, and controlled employee access. That is where SSO comes in.

8.1 Identity Provider vs Service Provider

Role	Meaning
Identity Provider (IdP)	System that authenticates the employee, such as Okta, Azure AD, Google Workspace
Service Provider (SP) / Relying Party (RP)	The SaaS application that trusts the IdP

8.2 SAML Basics

SAML is older, XML-based, and still heavily used in enterprise environments.

Mental model:

user tries to access the SaaS app
SaaS redirects user to corporate IdP
IdP authenticates user
IdP sends signed assertion back to SaaS
SaaS creates a local session

Strengths:

entrenched in enterprise IT
widely supported by corporate identity systems

Costs:

XML complexity
harder developer ergonomics
trickier debugging and implementation compared with OIDC

8.3 OIDC Basics

OpenID Connect is an identity layer on top of OAuth 2.0.

It provides:

ID tokens with identity claims
standardized login flows
better fit for modern web and mobile apps

OIDC is usually easier to work with than SAML for modern applications.

8.4 SAML vs OIDC

Topic	SAML	OIDC
Typical format	XML assertions	JSON tokens
Common use case	Enterprise browser SSO	Modern app login and API ecosystems
Developer ergonomics	Heavier	Easier
Mobile/API friendliness	Weaker	Stronger

8.5 Enterprise Architecture

flowchart LR
	Employee[Employee] --> SaaS[Your SaaS App]
	SaaS --> IdP[Enterprise IdP]
	IdP --> SaaS
	IdP --> Directory[Corporate Directory]
	IdP --> SCIM[Provisioning / SCIM]
	SCIM --> SaaS
	SaaS --> Policy[Workspace Roles and Policies]

In production, enterprise identity usually includes two separate concerns:

authentication and SSO
lifecycle management and provisioning

Provisioning is often handled with SCIM or similar directory sync mechanisms so the SaaS app knows:

who exists
which groups they belong to
who has been deprovisioned

8.6 Common Enterprise Requirements

just-in-time user creation on first login
domain verification to prove company ownership
group-to-role mapping
forced MFA at IdP level
admin-controlled session duration
audit logs for all SSO events

8.7 Failure Cases

bad mapping from IdP groups to app roles causes privilege escalation
employee is disabled in IdP but app keeps old sessions alive too long
email is used as unique identity key and later changes
multiple IdPs or merged companies create ambiguous identity mapping

SSO Best Practices

use stable external subject identifiers, not just email
model tenant-specific SSO config cleanly
separate authentication trust from authorization mapping inside the app
deprovision aggressively and revoke old sessions when identity status changes

9. Password Reset

Password reset is a high-risk recovery flow. Attackers love it because it often bypasses normal login defenses.

9.1 Secure Token Flow

sequenceDiagram
	actor User
	participant App
	participant Auth as Auth Service
	participant Users as User DB
	participant Reset as Reset Token Store
	participant Mail as Email Service

	User->>App: Click "Forgot password"
	App->>Auth: POST /password-reset
	Auth->>Users: Lookup account
	Auth->>Reset: Store hashed single-use token + expiry
	Auth->>Mail: Send password reset link
	Auth-->>User: Generic success response
	User->>App: Open reset link
	App->>Auth: POST /password-reset/confirm token + new password
	Auth->>Reset: Validate token unused and unexpired
	Auth->>Users: Update password hash
	Auth->>Reset: Mark token used
	Auth-->>App: Success + revoke other sessions

9.2 Why This Design Exists

Password reset has to be secure even if the attacker knows the user's email address. Therefore the reset token must be:

hard to guess
short-lived
single-use
revocable

Good systems also revoke active sessions or require re-authentication after password reset.

9.3 Attack Prevention

Threat	Mitigation
Account enumeration	Return generic responses like "If an account exists, email sent"
Token guessing	Long random tokens, rate limits
Token replay	Single-use storage and invalidation
Email inbox compromise	Step-up verification for high-value actions after reset
Old session persistence	Revoke sessions after reset

9.4 Practical Advice

prefer opaque reset tokens over stuffing reset state into a long-lived JWT
hash reset tokens at rest if you store them server-side
keep TTL short, often 15 to 60 minutes depending on product sensitivity
notify users when a reset is requested and completed

Password Reset Failure Cases

reset token is reusable
old sessions remain active after password change
reset endpoint leaks whether account exists
support team bypasses the secure flow with weak manual procedures

10. Authorization Fundamentals

Authentication answers who the subject is. Authorization answers what that subject may do.

10.1 AuthN vs AuthZ

This distinction matters a lot.

Question	Category
"Who are you?"	Authentication
"Are you allowed to do this?"	Authorization
"How sure are we?"	Authentication strength / assurance
"Why was access denied?"	Authorization decision and audit

A user can be perfectly authenticated and still not be authorized.

10.2 Authorization Decision Shape

Every authZ decision is some variation of:

Can subject S perform action A on resource R under context C?

Where context may include:

tenant
time of day
network zone
device trust level
MFA level
resource ownership
subscription plan
legal region or data residency constraints

10.3 Enforcement Layers

Authorization can happen at multiple layers:

Layer	Good for	Risk if overused
API gateway	Coarse access checks, authentication, token validation	Too coarse for resource-specific rules
Service layer	Business-specific rules	Easy to duplicate logic across services
Data access layer	Row/tenant isolation, final enforcement	Hard to express all product rules here
Database native policies	Strong last line of defense in some systems	App logic can still drift if not modeled carefully

A common mistake is doing all authorization only at the edge. Edge checks are useful, but most real product rules depend on resource-specific business logic deeper inside the system.

10.4 Policy Design

Good policy design balances three things:

expressiveness
debuggability
operational simplicity

Ask these questions:

who is the subject?
what resource is being accessed?
what action is requested?
what context matters?
who can change the policy?
how do we explain and audit the decision?

10.5 Auditing and Explainability

Authorization is not just about allow or deny. In production you often need:

reason codes
which policy matched
who granted the permission
when the permission changed
evidence for support, compliance, and incident response

This is why mature systems treat authorization as both a runtime path and a data model.

11. RBAC

RBAC stands for role-based access control. Permissions are grouped into roles, and subjects are assigned roles.

11.1 Why RBAC Exists

Without roles, you would assign individual permissions to every user. That becomes unmanageable quickly.

RBAC simplifies administration:

viewer
editor
admin
billing_admin

Instead of attaching dozens of permissions directly to users, you attach permissions to roles and roles to users.

11.2 Basic Model

Entity	Example
Permission	`invoice.read`, `invoice.refund`, `workspace.invite`
Role	`support_agent`, `workspace_admin`
Assignment	User U has role R in tenant T

Tenant scoping is critical. In multi-tenant SaaS, a user is rarely just "an admin" globally. They are usually an admin in a specific workspace or organization.

11.3 Enterprise Patterns

Common enterprise RBAC patterns include:

global roles for platform staff
tenant-scoped roles for customers
custom roles for larger organizations
group-to-role mapping from SSO IdP groups

11.4 The Role Explosion Problem

RBAC starts simple but can degrade into dozens or hundreds of roles:

viewer
viewer_plus_export
viewer_plus_export_plus_billing
regional_admin_eu
regional_admin_us

This is role explosion.

It happens when RBAC is forced to encode too many contextual conditions that really belong in attributes or policies.

11.5 RBAC Tradeoffs

Strength	Weakness
Easy to explain to users and admins	Coarse-grained for complex cases
Efficient at runtime	Can explode in number of roles
Works well for common SaaS admin patterns	Poor fit for dynamic context-heavy rules

RBAC Best Practices

keep the base role set small
scope roles by tenant, project, or resource container
separate platform/internal staff roles from customer roles
use RBAC for broad permissions and combine with finer policies when needed

12. ABAC

ABAC stands for attribute-based access control. Instead of only asking "What role does this user have?", ABAC asks about attributes of the subject, resource, and environment.

12.1 Why ABAC Exists

RBAC is often too static for real-world decisions like:

support agent can view tickets only in their assigned region
manager can approve expenses under a threshold for their own department
user can access data only from a compliant device in an approved country
payout release requires recent MFA and elevated risk score below threshold

These rules depend on context, not just role labels.

12.2 Dynamic Policy Evaluation

ABAC decisions may use attributes such as:

subject department
resource owner
tenant subscription tier
request IP or network zone
device trust score
current time or shift window
MFA strength

Example policy idea:

"Allow refund approval if the subject role is finance_manager, the order belongs to the same merchant account, the refund amount is below the subject limit, and MFA was performed in the last 10 minutes."

12.3 Policy Engines

ABAC often benefits from a dedicated policy engine because hardcoding many dynamic rules directly into services becomes brittle.

Common approaches:

custom rules in application code
centralized policy engine such as OPA/Rego
cloud-style policy systems such as Cedar-like models
relationship and graph-based systems for object access patterns

12.4 ABAC Tradeoffs

Strength	Weakness
Expressive and context-aware	Harder to explain and debug
Reduces role explosion	Requires clean attribute sources
Good for fine-grained enterprise control	Runtime evaluation can be more expensive

12.5 Practical Use

Many production systems do not choose "RBAC or ABAC". They combine them:

RBAC gives the broad lane
ABAC applies contextual restrictions inside that lane

Example:

role says user may edit invoices
ABAC rule says only for their tenant, below approval threshold, and only after MFA for high-value invoices

ABAC Failure Cases

attributes are stale or inconsistently sourced across services
policies become unreadable and impossible to reason about
caching hides recent attribute changes like department moves or suspensions

13. Permissions and Access Control

Permissions are the actual capabilities a subject has. Access control is the mechanism that enforces those permissions correctly and consistently.

13.1 Permission Models

Common permission models include:

Model	Mental model	Example
RBAC	Roles map to permissions	Workspace admin
ACL	Resource has a list of allowed subjects	Shared document editable by Alice and Bob
ABAC	Decision based on attributes	Region and MFA aware access
ReBAC	Decision based on relationships	User is member of team that owns repo
Capability/token-based	Possession of unforgeable capability grants access	Signed download URL

In modern systems, multiple models often coexist.

GitHub is a good mental example:

org and team membership look like RBAC/ReBAC
repo-specific collaborator lists look like ACLs
fine product actions are individual permissions

13.2 Inheritance

Permissions often inherit down a hierarchy:

org -> workspace -> project -> resource
folder -> document
account -> sub-account

Inheritance is useful, but easy to get wrong.

Questions to design explicitly:

do child resources inherit all parent permissions?
can child permissions override parent permissions?
are denies supported, and if so do they take precedence?
how do you compute effective permissions efficiently?

13.3 Auditing

You need to know:

who granted access
when access changed
who accessed a resource
why a decision was allowed or denied

Auditing matters for:

customer support
security investigations
compliance
admin trust

13.4 Enforcement Patterns

There are two common implementation patterns.

Pattern A: Embedded authorization in each service

Pros:

low latency
business context close to the resource

Cons:

duplicated rules across services
inconsistent decisions and audit semantics

Pattern B: Centralized authorization service or policy engine

Pros:

consistency
shared policy language
central auditing and explainability

Cons:

added network hop
dependency on a central service
need good caching and fallback behavior

13.5 Centralized Auth Service and Policy Caching

flowchart LR
	Request[Authenticated Request] --> Service[Business Service]
	Service --> Cache[Policy Cache]
	Cache -->|cache miss| Authz[Central Authorization Service]
	Authz --> PDP[Policy Engine]
	Authz --> Attrs[Attribute / Relationship Data]
	PDP --> Decision[Allow / Deny + Reason]
	Decision --> Audit[Audit Log]
	Decision --> Service

Caching is often necessary, but introduces staleness risks. Common techniques:

short TTL caches for policy decisions
versioned policy snapshots
event-driven invalidation on membership or role changes
cache only stable intermediate data, not final decisions, in high-risk systems

13.6 Policy Caching Tradeoffs

Benefit	Cost
Lower latency	Stale authorization decisions
Lower policy service load	Harder revocation semantics
Better resilience during partial outage	Risk of fail-open or fail-stale behavior

Design question: when policy service is down, do you fail closed or fail open?

fail closed is safer but can hurt availability
fail open preserves availability but may violate security

For high-risk actions, fail closed is usually the right answer.

Access Control Best Practices

enforce tenant isolation early and repeatedly
keep policy decisions explainable
separate authentication claims from live authorization state when permissions change frequently
audit all admin and permission-management actions
do not trust internal network location as a permission model

14. Service-to-Service Authentication

User authentication is only half of production security. Modern backends also need to authenticate services to each other.

14.1 Why Internal Service Authentication Exists

In microservice systems, one request may pass through many services:

edge/API gateway
auth service
order service
payment service
notification service

If internal calls are trusted just because they are "inside the VPC", a compromised service can impersonate others too easily.

This is why zero-trust principles matter internally too.

14.2 Service Identity

A service needs its own identity, just like a user does.

Examples:

payments-service.prod
orders-service.eu-west-1
workload identity bound to a Kubernetes service account

A strong service identity system lets the platform answer:

which service is calling?
is it the real deployed workload?
is it allowed to call this destination?

14.3 mTLS Basics

Mutual TLS means both sides authenticate each other during the TLS handshake.

Benefits:

encryption in transit
client and server authentication
strong cryptographic service identity

Typical pattern:

internal CA issues short-lived certificates to workloads
service presents client cert on outbound call
destination validates issuer and identity

sequenceDiagram
	participant A as Service A
	participant CA as Internal CA / Identity System
	participant B as Service B

	A->>CA: Request workload certificate
	CA-->>A: Short-lived cert
	A->>B: TLS handshake + client cert
	B->>B: Verify cert, issuer, SAN, expiry
	B-->>A: Authenticated secure channel
	A->>B: Application request
	B-->>A: Response

14.4 Short-Lived Credentials

Short-lived credentials are a major production best practice.

Why?

if stolen, they expire quickly
less need for manual secret rotation
better fit with workload identity and automation

This pattern shows up in:

cloud IAM temporary credentials
Kubernetes workload identity
service mesh certificates
internal token minting systems

14.5 Zero Trust Basics

Zero trust does not mean trust nothing blindly forever. It means:

do not grant access solely based on network location
verify identity continuously and explicitly
enforce least privilege
assume compromise is possible and reduce blast radius

Google's public BeyondCorp ideas are the canonical mental model here: access should depend on identity, device state, and policy, not on whether traffic comes from "inside the office network".

14.6 Service-to-Service Authorization

Authentication tells you that a caller is payments-service. Authorization must still decide whether payments-service may:

read card metadata
call refund APIs
publish to a payout topic
access a particular database table

This is often implemented with:

service identity plus policy
SPIFFE-like identity patterns
service mesh policy
signed internal tokens with audience restrictions

Service Auth Failure Cases

long-lived shared secrets copied across many services
no certificate rotation automation
any internal service can call any other service
internal service trusts caller-provided headers like X-User-Id without verification
service identity is authenticated but not authorized

Service Auth Best Practices

use workload or service identity, not shared static secrets where possible
prefer short-lived credentials and automatic rotation
bind end-user identity propagation carefully when needed
separate service identity from end-user identity in request context

15. How These Systems Fit Together

A strong interview answer connects all the pieces into one architecture.

15.1 Typical SaaS Architecture

flowchart TD
	User[Browser / Mobile App] --> Edge[Edge / API Gateway]
	Edge --> Auth[Auth Service]
	Auth --> UserDB[(User Directory)]
	Auth --> Session[(Session Store / Token Store)]
	Auth --> Keys[KMS / Signing Keys]
	Edge --> App[Business Services]
	App --> Authz[Authorization Service / Policy Engine]
	Authz --> Perms[(Roles, Relationships, Attributes)]
	App --> Data[(Application Data)]
	Auth --> Audit[Security Audit Log]
	Authz --> Audit
	App --> Audit

The key idea is separation of concerns:

auth service proves identity and issues continuity artifacts
session/token store manages continuity and revocation
policy engine decides access
application services enforce business operations
audit pipeline records security-relevant facts

15.2 Consumer SaaS vs Enterprise SaaS vs Internal Platform

Environment	Identity priorities
Consumer SaaS	Signup conversion, password recovery, abuse prevention, social login
Enterprise SaaS	SSO, provisioning, group mapping, auditability, tenant admin controls
Internal platform	Service identity, zero trust, least privilege, strong device posture

15.3 Data Freshness vs Statelessness

One of the deepest identity design tradeoffs is this:

stateless verification is fast and scalable
fresh authorization state often requires looking up server-side data

That is why many mature architectures mix the two:

token or session for authentication continuity
live policy check for sensitive authorization

15.4 Tenant Isolation

For SaaS systems, tenant isolation must be explicit in both authentication and authorization.

Common patterns:

include tenant membership in auth/session context
scope roles by tenant
enforce tenant filters in service and data layers
audit cross-tenant admin actions aggressively

This is especially important in systems like GitHub organizations, Stripe connected accounts, or enterprise SaaS workspaces.

16. Real-World Patterns and Company Examples

These examples are useful as mental anchors, not as exact internal blueprints.

Google

Public Google identity and OIDC flows are a classic example of large-scale federated identity.
Google's public BeyondCorp ideas are foundational for zero-trust access.
Zanzibar is the famous reference point for large-scale, relationship-aware authorization.

Interview lesson: centralized authorization models can work at huge scale if the data model, caching, and consistency story are designed carefully.

Netflix

Netflix-style service-rich environments highlight the need for service identity, short-lived credentials, and resilient internal auth patterns.
Streaming and control plane workloads also show why identity systems must stay available under very high traffic.

Interview lesson: internal service auth is not optional in large microservice systems.

Uber

Ride-sharing and marketplace architectures depend on strict service-to-service permissions, real-time risk checks, and strong tenant/user context propagation.
Payment, dispatch, driver, and rider services cannot safely trust each other based only on network placement.

Interview lesson: identity context often flows through many services and must remain verifiable.

Amazon

AWS IAM is the public archetype for policy-heavy authorization with users, roles, resource policies, temporary credentials, and least privilege.

Interview lesson: enterprise-grade authorization is really a policy and identity modeling problem, not just a list of roles.

GitHub

GitHub demonstrates a mix of organization membership, teams, repository roles, OAuth apps, GitHub Apps, personal access tokens, and enterprise SSO.

Interview lesson: one product often needs several identity and authorization models at the same time.

Stripe

Stripe is a useful example for strong dashboard authentication, MFA, API keys, restricted keys, OAuth for Connect-style platforms, and careful access around money movement.

Interview lesson: high-risk actions need stronger auth, auditability, and granular permissions than low-risk read-only actions.

Typical SaaS Systems

Most B2B SaaS products end up combining:

email/password and social login for self-serve customers
SSO for enterprise customers
RBAC for admin/editor/viewer patterns
ABAC or policy rules for sensitive workflows
API tokens or OAuth for integrations
service identity for microservices

17. Interview Discussion Guide

If asked to design identity and access for a backend system, structure your answer progressively.

17.1 Clarifying Questions

Ask:

who are the subjects: end users, admins, services, partners?
is this consumer, enterprise, or internal platform?
are we designing login, third-party integration, or internal access control?
what is the risk level: social app, fintech, healthcare, developer platform?
do we need SSO, API access, or both?
how fresh must revocation and permission changes be?

17.2 Good Interview Structure

Define identities and trust boundaries.
Choose authentication mechanism.
Choose continuity mechanism: session or token.
Design authorization model.
Address recovery, revocation, audit, and failure cases.
Address scale, caching, key rotation, and multi-region concerns.

17.3 Common Interview Comparisons

Sessions vs JWT

Use when the interviewer asks about stateful vs stateless auth.

Question	Sessions answer	JWT answer
Need instant logout?	Strong	Harder
Need local verification across services?	Weaker	Strong
Web app simplicity?	Often simpler	Often overcomplicated
Third-party API ecosystem?	Less natural	Better fit

RBAC vs ABAC

Question	RBAC	ABAC
Easy admin mental model	Strong	Weaker
Fine-grained contextual rules	Weak	Strong
Risk of role explosion	High	Lower
Ease of debugging	Stronger	Harder

SAML vs OIDC

Question	SAML	OIDC
Enterprise legacy support	Strong	Strong but varies
Modern web/mobile friendliness	Weaker	Strong
Developer ergonomics	Heavier	Better

17.4 Scaling Considerations to Mention

Redis or equivalent for shared sessions
multi-region token verification and key distribution
policy caching with invalidation strategy
short-lived credentials for services
audit event pipelines decoupled from critical-path latency
abuse protection on login and signup

17.5 Failure Cases Worth Calling Out

auth service outage blocks all logins
Redis session store failure logs users out or prevents validation
stale permissions cached after role removal
signing key rotation breaks old verifiers
refresh token theft leads to silent session hijack

Interview tip: explicitly talking about revocation, rotation, and failure handling is often what moves an answer from junior to strong mid-level or senior.

18. Common Mistakes and Best Practices

Common Mistakes

treating authentication and authorization as the same problem
storing passwords with fast hashes
putting too much trust in long-lived JWTs
assuming "internal network" means trusted caller
forgetting logout, revocation, and recovery flows
doing authorization only at the gateway
using email as the only durable identity key in enterprise federation
failing to audit permission changes and admin actions
building role systems that cannot express tenant or resource scope

Best Practices

separate identity proof, session/token continuity, and authorization policy clearly
use slow password hashing and protect high-value secrets with KMS/HSM support
prefer MFA and step-up authentication for sensitive actions
keep access tokens short-lived and refresh tokens protected and rotated
model tenant-aware roles and permissions explicitly
centralize policy where consistency matters, but understand cache staleness tradeoffs
use service identity and short-lived credentials internally
build auditability and explainability into the system from the beginning

Final Mental Model

If you remember one thing, remember this:

Identity and access is not one feature. It is a chain of connected systems:

identity proof
credential management
session or token continuity
authorization policy
revocation and recovery
service identity
auditing and operations

Real systems succeed when all of these parts are designed together.

If one weak link exists, attackers and outages will find it.

Quick Review Checklist

Use this when revising for interviews.

Can I clearly explain AuthN vs AuthZ?
Do I know when to use sessions vs JWTs?
Can I explain password hashing, salts, peppers, and MFA tradeoffs?
Can I walk through OAuth authorization code flow with PKCE?
Can I explain SAML vs OIDC and IdP vs SP?
Can I compare RBAC and ABAC with examples?
Can I describe revocation, logout, token rotation, and password reset securely?
Can I explain service-to-service auth, mTLS, and zero trust?
Can I describe where policy enforcement should happen in a real system?

If the answer is yes to those questions, your identity and access fundamentals are strong enough for most software engineering interview discussions and practical backend design conversations.

58 KiB Raw Permalink Blame History

2. Identity & Access

Table of Contents

1. Why Identity & Access Exists

The Core Tension

2. Core Concepts and Mental Model

Important Terms

One Request Through the System

A Production Identity Stack

3. Authentication Fundamentals

3.1 Identity Verification Basics

3.2 Identifiers vs Authenticators

3.3 Credential Storage

Never store plaintext passwords

Store password hashes, not passwords

Salt and Pepper

Practical Storage Pattern

Interview depth point

3.4 MFA Basics

Recovery Matters

3.5 Email Verification

3.6 Device Trust

Authentication Failure Cases

Authentication Best Practices

4. Login and Signup

4.1 Signup Flow

What actually happens in production

Why pending state matters

4.2 Login Flow

4.3 Signup Verification and Fraud Prevention Basics

4.4 Social Login Considerations

4.5 Onboarding Architecture

Login and Signup Failure Cases

Login and Signup Best Practices

5. Sessions

5.1 What a Session Really Is

5.2 Server-Side Sessions

5.3 Redis-Backed Sessions

5.4 Cookie Security

5.5 Session Invalidation

5.6 Logout Challenges

5.7 Session Security Issues

Sessions in Interviews

6. JWT and Token-Based Authentication

6.1 What a JWT Is

6.2 Signing vs Encryption

6.3 Access Tokens vs Refresh Tokens

6.4 Why Teams Use JWTs

6.5 Token Rotation

6.6 Revocation Challenges

6.7 Stateless Auth Tradeoffs

6.8 Common JWT Mistakes

JWT Best Practices

7. OAuth

7.1 The Problem OAuth Solves

7.2 Authorization Code Flow with PKCE

Why PKCE exists

7.3 Scopes

7.4 Consent Screens

7.5 Refresh Tokens in OAuth

7.6 Third-Party Integrations

7.7 OAuth vs Authentication

OAuth Failure Cases

8. SSO: SAML and OIDC

8.1 Identity Provider vs Service Provider

8.2 SAML Basics

8.3 OIDC Basics

8.4 SAML vs OIDC

8.5 Enterprise Architecture

8.6 Common Enterprise Requirements

8.7 Failure Cases

SSO Best Practices

9. Password Reset

9.1 Secure Token Flow

9.2 Why This Design Exists

9.3 Attack Prevention

9.4 Practical Advice

Password Reset Failure Cases

10. Authorization Fundamentals

10.1 AuthN vs AuthZ

58 KiB

Raw Permalink Blame History