All Docs
FeaturesMaking Tax DigitalUpdated March 11, 2026

Improving Resilience: Retry Logic for External API Calls

Improving Resilience: Retry Logic for External API Calls

Release: v1.0.401 | Control: ERR-11 | Category: Resilience

The Problem

When users trigger actions that reach out to HMRC, AgentOS, or bank connection APIs — such as refreshing their registered businesses or listing properties — those calls previously had no protection against transient failures.

A single dropped connection or a momentary HMRC 503 response would immediately return an error to the user, even though a retry a fraction of a second later would likely have succeeded.

This was a gap between two parts of the platform:

  • Inngest background functions had robust retry logic configured (retries: 3, with NonRetriableError used to skip retries on permanent failures).
  • tRPC handlers making direct external API calls had no equivalent — they failed immediately on the first error.

The Fix

A retry utility has been added to tRPC query and mutation handlers that make direct external API calls. It implements exponential backoff with up to three retries:

Attempt 1  →  fails  →  wait 200ms
Attempt 2  →  fails  →  wait 400ms
Attempt 3  →  fails  →  wait 800ms
Attempt 4  →  fails  →  surface error to user

What Gets Retried

Only transient failures are retried:

  • 5xx server errors (e.g. HMRC 503 Service Unavailable, 500 Internal Server Error)
  • Network-level errors (e.g. connection timeouts, DNS failures, socket resets)

4xx client errors are not retried. These indicate a permanent problem with the request itself — an invalid parameter, an expired token, or a resource that does not exist. Retrying them would not help and would only add unnecessary latency.

Affected Endpoints

tRPC HandlerExternal ServiceFile
hmrc.refreshBusinessesHMRC APIsrc/lib/routers/hmrc.ts
agentos.listPropertiesAgentOSsrc/lib/routers/hmrc.ts
bank.getConnectionBank feed providersrc/lib/routers/hmrc.ts

What This Means for Users

Most transient failures — the kind caused by brief network instability or a momentary API hiccup — will now be resolved automatically and transparently. Users will see a successful response rather than an error, without needing to manually refresh or retry the action themselves.

Only genuine, persistent failures (e.g. HMRC returning a 4xx because of an authentication issue, or a prolonged outage lasting several seconds) will surface as errors.

Relationship to Inngest Retry Logic

This retry utility is entirely separate from Inngest's built-in retry mechanism. Inngest handles background jobs (asynchronous processing, webhook ingestion, scheduled tasks). The new retry utility handles synchronous tRPC paths — the real-time calls that happen directly in response to a user action in the UI. Both layers now have consistent resilience strategies.