33. Service Principal Design and Authentication Modes
Date: 2025-01-28Status
AcceptedCategory
Authentication & AuthorizationContext
Machine-to-machine authentication, batch jobs, streaming tasks, and background processes require persistent identity and long-lived credentials that differ from human user patterns. Challenges include:- Long-Running Tasks: OAuth2 JWTs have short lifespans (15 min) unsuitable for batch jobs running hours/days
- Streaming Sessions: WebSocket connections need credentials persisting throughout session
- Permission Attribution: Automated processes need clear ownership and auditable identity
- Permission Inheritance: Background jobs often need to act on behalf of specific users
- Credential Management: Services need programmatic authentication without human interaction
Decision
We will implement Service Principals as first-class identities with two authentication modes and optional user association for permission inheritance.Core Principles
- First-Class Identity: Service principals distinct from users in authorization model
- Dual Authentication: Support OAuth2 client credentials AND service account users
- Optional User Association: Service principals can act on behalf of specific users
- Permission Inheritance: Can inherit permissions from associated user principals
- Long-Lived Credentials: Support 30-day refresh tokens for persistent tasks
- Keycloak-Issued: All service principal tokens issued by Keycloak (JWT standard)
Architecture
Service Principal Identity Format:- Subject (sub) claim:
service:<client-id>(e.g.,service:batch-etl-job) - OpenFGA Object:
service_principal:batch-etl-job
Authentication Modes
Mode 1: Client Credentials Flow (Preferred)
Use Case: Dedicated services, microservices, batch jobs Keycloak Configuration:Mode 2: Service Account User (Alternative)
Use Case: Legacy systems migration, mixed authentication needs Keycloak Configuration:User Association and Permission Inheritance
OpenFGA Tuples:acts_as relationship.
Configuration
Consequences
Positive Consequences
- Long-running support (30-day tokens), clear attribution
- Permission delegation without password sharing
- Audit trail, flexible authentication (two modes)
- Security isolation, ownership tracking
Negative Consequences
- Implementation complexity (two modes)
- Keycloak configuration expertise required
- OpenFGA model changes, additional secrets to manage
- Permission inheritance may be non-obvious
Mitigation Strategies
- High-level SDK abstracting modes, clear documentation
- Automated secret rotation (90 days), IP whitelisting
- Audit reports showing effective permissions
Alternatives Considered
- User Accounts Only: Rejected - conflates humans with machines, poor audit trail
- API Keys Only: Rejected - violates JWT standardization
- Client Credentials Only: Rejected - lacks flexibility for legacy systems
- Impersonation Flow: Rejected - security risk of admin credentials in services
Implementation
ServicePrincipalManager (src/mcp_server_langgraph/auth/service_principal.py):
src/mcp_server_langgraph/api/service_principals.py):
- POST
/api/v1/service-principals- Create - GET
/api/v1/service-principals- List owned - POST
/api/v1/service-principals/{id}/rotate-secret- Rotate - DELETE
/api/v1/service-principals/{id}- Delete
References
- Implementation:
src/mcp_server_langgraph/auth/service_principal.py(to be created) - API:
src/mcp_server_langgraph/api/service_principals.py(to be created) - Related ADRs: ADR-0031, ADR-0032, ADR-0036, ADR-0039
- External: OAuth 2.0 Client Credentials, Keycloak Service Accounts