Hardening Twilio SMS: Retry Logic & Circuit Breaking in v1.0.92

Version: 1.0.92
Control: SCR-04
Area: API Connection · SMS Notifications

Background

NurtureHub uses Twilio to deliver SMS hot lead alerts and notification messages to property agents. These alerts are time-sensitive — a missed SMS at the moment a prospect clicks through can mean a missed viewing or lost instruction.

As part of our ongoing Supply Chain Resilience (SCR) programme, every external API integration is reviewed for error handling robustness. This release addresses a gap identified in SCR-04: the Twilio SMS sender had no retry logic and silently discarded transient failures.

The Problem

The sendSms() function in src/lib/notifications/sms.ts made a single fetch() call to the Twilio REST API and returned null on any non-success response. This created two compounding failure modes:

1. Silent Failures on Transient Errors

Twilio, like any external API, occasionally returns transient errors:

Status Code	Meaning
`429`	Rate limited — too many requests
`500`	Twilio internal server error
`503`	Twilio service temporarily unavailable

With no retry logic in place, any of these responses would cause sendSms() to return null. The calling Inngest step would see this as a failed send, increment retryCount, and move on — without actually retrying the Twilio call.

2. Inngest Retries Were Never Triggered

Inngest has a powerful built-in retry mechanism: if a step throws an error, Inngest will automatically retry it with backoff. However, because sendSms() returned null instead of throwing, Inngest had no signal that anything had gone wrong. The retry mechanism was effectively bypassed for all transient Twilio failures.

3. No Request Timeout

The fetch() call had no timeout configured. A slow or hanging Twilio response could cause the Inngest step to stall indefinitely, blocking the job queue.

The Fix

Three changes were made to src/lib/notifications/sms.ts:

Retry Logic with Exponential Backoff

sendSms() now retries automatically on retryable status codes (429, 500, 503) using exponential backoff before propagating the failure. This handles the majority of transient Twilio blips transparently, without any Inngest step retry being consumed.

Throw on Failure

When all retries are exhausted (or a non-retryable error is received), sendSms() now throws an error rather than returning null. This correctly surfaces the failure to Inngest, which will then apply its own step-level retry policy — ensuring the message eventually delivers even under sustained Twilio degradation.

Request Timeout via `AbortSignal.timeout`

A timeout has been added to the fetch() call using the standard AbortSignal.timeout API. This ensures that a slow Twilio response does not block the Inngest worker indefinitely.

Behaviour Summary

Scenario	Before v1.0.92	After v1.0.92
Twilio returns `429`	Returns `null`, message silently lost	Retries with backoff, then throws for Inngest retry
Twilio returns `503`	Returns `null`, message silently lost	Retries with backoff, then throws for Inngest retry
Twilio hangs	Fetch stalls indefinitely	Aborted after timeout
Inngest retry triggered	Never (no throw)	Yes, on all unrecovered failures
Transient blip recovery	No	Yes, via internal backoff retries

What Agents Will Notice

For the vast majority of agents, nothing changes — SMS hot lead alerts continue to arrive as expected. The improvement is in the platform's resilience:

Fewer missed SMS alerts during Twilio service blips or rate limit windows.
No silent failures — every undelivered SMS is now a visible, retried event in the job queue.
Faster recovery from transient Twilio errors without manual intervention.

Files Changed

src/lib/notifications/sms.ts

Supply Chain Resilience Control: SCR-04
Changelog — v1.0.92

Hardening Twilio SMS: Retry Logic & Circuit Breaking in v1.0.92

Hardening Twilio SMS: Retry Logic & Circuit Breaking in v1.0.92

Background

The Problem

1. Silent Failures on Transient Errors

2. Inngest Retries Were Never Triggered

3. No Request Timeout

The Fix

Retry Logic with Exponential Backoff

Throw on Failure

Request Timeout via AbortSignal.timeout

Behaviour Summary

What Agents Will Notice

Files Changed

Related

Request Timeout via `AbortSignal.timeout`