Resilience: Circuit Breaker Pattern for External Services
Resilience: Circuit Breaker Pattern for External Services
Overview
The platform integrates with three external services at runtime:
- Twilio — outbound SMS alerts for sanctions matches and compliance notifications
- Stripe — subscription and billing management
- OFSI endpoint — nightly sync of the UK consolidated sanctions list
This page describes how the platform handles sustained outages for each of these services.
Background: ERR-12 Audit Finding
An internal resilience audit (control ERR-12) identified that no circuit breaker pattern was in place for any external service. Without a circuit breaker, repeated failures are retried on every request with no trip threshold, which can cause:
- Alert spam — continuous failed SMS attempts flooding Twilio's dead endpoint
- Increased latency — every request waits for a timeout before failing
- Cascading load — a degraded downstream amplifies load on the platform itself
The isStripeConfigured() check was explicitly reviewed during this audit. It is a static configuration guard (it returns 503 when Stripe credentials are not configured), not a runtime circuit breaker, and does not protect against a Stripe outage that occurs after startup.
Remediation Strategy
A full circuit breaker library was evaluated and deemed over-engineering for the current scale of the platform. A targeted, lightweight approach was adopted per service.
Twilio SMS
A failure counter is used to trip the SMS circuit after sustained failures:
- Trip threshold: 5 consecutive failures
- Window: 5 minutes
- Backend: in-memory (single instance) or Redis-backed (multi-instance / production)
- Behaviour when tripped: outbound SMS is suppressed; a service-degraded warning is surfaced in the monitoring dashboard
- Reset: automatic reset after the window elapses with no further failures
This prevents alert spam to a dead Twilio service and protects compliance teams from missing genuine alerts that would be silently dropped.
Failure 1 → log warning
Failure 2 → log warning
Failure 3 → log warning
Failure 4 → log warning
Failure 5 → TRIP: suppress SMS, raise dashboard warning
...
[5 min window resets] → RESET: resume SMS delivery
Stripe
No runtime circuit breaker is implemented. The existing isStripeConfigured() guard returns HTTP 503 when Stripe credentials are absent, which is sufficient for the billing use-case at this scale. Individual Stripe errors are handled gracefully at the call site.
OFSI Endpoint
The OFSI sanctions list sync runs as a nightly background job, fully isolated from the request path. Failures in the sync do not impact real-time screening. Sync failures are logged and surfaced in the monitoring dashboard. No per-request circuit breaker is required.
Monitoring
When the Twilio circuit is tripped, the compliance monitoring dashboard will display a service-degraded warning. Operators should:
- Check Twilio service status at https://status.twilio.com
- Review recent SMS delivery logs for the specific error
- The circuit will reset automatically; no manual intervention is required unless the outage persists
Related Controls
| Control | Description |
|---|---|
| ERR-12 | Circuit breaker pattern for external services |