OAuth State Migration and Security Hardening (SOK-76, SOK-77)
Summary
The migration of the Grimoire (Wise Delivery) OAuth state management from volatile in-memory storage to a persistent Redis backend, alongside significant security hardening of the authorization flow. This process addressed critical vulnerabilities in token verification, implemented Proof Key for Code Exchange (PKCE), and tightened Dynamic Client Registration (DCR) protocols.
Details
This development phase focused on transitioning the Grimoire system’s OAuth implementation from a development-centric in-memory model to a production-ready infrastructure. The work was tracked under three primary Linear tickets: SOK-76 (Redis state), SOK-77 (None-safety and DCR), and SOK-81 (Test suite expansion).
Redis State Backend (SOK-76)
To support horizontal scaling and persistence across container restarts in Azure Container Apps, the system’s three primary OAuth state dictionaries—registered_clients, auth_codes, and pending_authorizations—were replaced with a Redis-backed store.
- Infrastructure: Introduced
redis[hiredis]>=5.0.0as a dependency. - Implementation: Created
infrastructure/redis/client.pyfor async Redis management andinfrastructure/redis/oauth_store.pyfor theOAuthStateStore. - Key Patterns: Data is partitioned using specific key prefixes:
grimoire:oauth:pending:{auth_id}(10-minute TTL)grimoire:oauth:auth_code:{code}(10-minute TTL)grimoire:oauth:client:{client_id}(Persistent)
- Configuration: The system uses a
REDIS_URLenvironment variable. For local development, this can be left empty (triggering a warning), but for production on Azure, it requires arediss://URL to enable TLS for Azure Cache for Redis. - Resilience: The
/healthendpoint was updated to monitor Redis connectivity. If Redis is unavailable, the system reports adegradedstatus, and OAuth attempts return a503 Service Unavailablevia the_require_redis()guard.
Security Hardening and OAuth Refactoring (SOK-77)
The OAuth flow in oauth.py underwent a full rewrite to address safety and security gaps:
- None-Safety: Fixed issues in
verify_token()wherepayload.get("sub")and email fields could returnNone, causing downstream crashes. Explicitisinstancechecks were added forsub,email, andscopes. - PKCE Implementation: Added support for PKCE S256 verification. The
code_challengeis stored during authorization and verified against thecode_verifierduring the token exchange. - DCR Validation: Tightened Dynamic Client Registration. The system no longer allows auto-registration of clients based on ID prefixes (e.g.,
client_). Clients must now be explicitly registered via the/oauth/registerendpoint. - Validation Logic: Added
redirect_urivalidation against registered URIs and implementedsecrets.compare_digestfor secureclient_secretvalidation. - Logging: Replaced standard f-string logging with structured kwargs to improve observability in production logs.
Verification and Testing (SOK-81)
A comprehensive test suite was built to verify these changes, achieving 60% overall code coverage and 96% coverage for OAuth-specific logic. The suite includes 227 green tests covering:
- OAuth E2E: Well-known discovery, DCR registration, PKCE success/failure, and refresh token flows.
- Domain Models: Validation of
MemoryTypeenums, Neo4j label conversions, and discriminated union dispatch. - Infrastructure: CRUD operations for the Redis store and Cypher query generation for the Neo4j backend.