Hardening Twilio SMS: Retry Logic & Circuit Breaking in v1.0.92
Hardening Twilio SMS: Retry Logic & Circuit Breaking in v1.0.92
Version: 1.0.92
Control: SCR-04
Area: API Connection · SMS Notifications
Background
NurtureHub uses Twilio to deliver SMS hot lead alerts and notification messages to property agents. These alerts are time-sensitive — a missed SMS at the moment a prospect clicks through can mean a missed viewing or lost instruction.
As part of our ongoing Supply Chain Resilience (SCR) programme, every external API integration is reviewed for error handling robustness. This release addresses a gap identified in SCR-04: the Twilio SMS sender had no retry logic and silently discarded transient failures.
The Problem
The sendSms() function in src/lib/notifications/sms.ts made a single fetch() call to the Twilio REST API and returned null on any non-success response. This created two compounding failure modes:
1. Silent Failures on Transient Errors
Twilio, like any external API, occasionally returns transient errors:
| Status Code | Meaning |
|---|---|
429 | Rate limited — too many requests |
500 | Twilio internal server error |
503 | Twilio service temporarily unavailable |
With no retry logic in place, any of these responses would cause sendSms() to return null. The calling Inngest step would see this as a failed send, increment retryCount, and move on — without actually retrying the Twilio call.
2. Inngest Retries Were Never Triggered
Inngest has a powerful built-in retry mechanism: if a step throws an error, Inngest will automatically retry it with backoff. However, because sendSms() returned null instead of throwing, Inngest had no signal that anything had gone wrong. The retry mechanism was effectively bypassed for all transient Twilio failures.
3. No Request Timeout
The fetch() call had no timeout configured. A slow or hanging Twilio response could cause the Inngest step to stall indefinitely, blocking the job queue.
The Fix
Three changes were made to src/lib/notifications/sms.ts:
Retry Logic with Exponential Backoff
sendSms() now retries automatically on retryable status codes (429, 500, 503) using exponential backoff before propagating the failure. This handles the majority of transient Twilio blips transparently, without any Inngest step retry being consumed.
Throw on Failure
When all retries are exhausted (or a non-retryable error is received), sendSms() now throws an error rather than returning null. This correctly surfaces the failure to Inngest, which will then apply its own step-level retry policy — ensuring the message eventually delivers even under sustained Twilio degradation.
Request Timeout via AbortSignal.timeout
A timeout has been added to the fetch() call using the standard AbortSignal.timeout API. This ensures that a slow Twilio response does not block the Inngest worker indefinitely.
Behaviour Summary
| Scenario | Before v1.0.92 | After v1.0.92 |
|---|---|---|
Twilio returns 429 | Returns null, message silently lost | Retries with backoff, then throws for Inngest retry |
Twilio returns 503 | Returns null, message silently lost | Retries with backoff, then throws for Inngest retry |
| Twilio hangs | Fetch stalls indefinitely | Aborted after timeout |
| Inngest retry triggered | Never (no throw) | Yes, on all unrecovered failures |
| Transient blip recovery | No | Yes, via internal backoff retries |
What Agents Will Notice
For the vast majority of agents, nothing changes — SMS hot lead alerts continue to arrive as expected. The improvement is in the platform's resilience:
- Fewer missed SMS alerts during Twilio service blips or rate limit windows.
- No silent failures — every undelivered SMS is now a visible, retried event in the job queue.
- Faster recovery from transient Twilio errors without manual intervention.
Files Changed
src/lib/notifications/sms.ts
Related
- Supply Chain Resilience Control: SCR-04
- Changelog — v1.0.92