All Docs
FeaturesMaking Tax DigitalUpdated March 11, 2026

How We Protect Your HMRC Submissions When External APIs Go Down

How We Protect Your HMRC Submissions When External APIs Go Down

Released in v1.0.399 · Error Resilience · ERR-12


The problem we solved

Our platform talks to three external services on your behalf:

ServicePurpose
HMRC APIQuarterly submissions, annual summaries, final declarations
TrueLayerBank feed transaction imports
AgentOSProperty management data ingestion

Any of these services can experience temporary outages or degraded performance. Before this release, the platform had no mechanism to detect or respond to that degradation at a structural level. Individual errors were caught and classified — for example, AgentosApiClientError would correctly identify a failed AgentOS call — but there was nothing to prevent the platform from continuing to fire request after request into a service that was already down.

The result: a degraded upstream service would receive a flood of repeated failed calls, error logs would fill rapidly, and users would experience sustained failures rather than a clean, time-bounded outage.


The solution: circuit breaker pattern

A circuit breaker sits in front of each external service call. It monitors consecutive failures and, once a threshold is reached, "trips" — temporarily stopping calls to the affected service and returning a safe fallback response instead. After a short recovery window, it allows a single probe through to check whether the service has recovered.

State machine

             ┌─────────────────────────────────────┐
             │                                     │
    ┌────────▼────────┐   5 failures / 60s   ┌─────┴──────┐
    │     CLOSED      │─────────────────────►│    OPEN    │
    │  (normal flow)  │                      │ (blocked)  │
    └─────────────────┘                      └─────┬──────┘
             ▲                                     │
             │         30 seconds pass             │
             │                               ┌─────▼──────┐
             │      probe succeeds           │ HALF-OPEN  │
             └───────────────────────────────│  (1 probe) │
                                             └────────────┘
                                              │
                                              │ probe fails
                                              ▼
                                           OPEN (reset timer)

Thresholds at a glance

ParameterValue
Failure window60 seconds
Trip threshold5 consecutive failures
Open (blocked) duration30 seconds
Probe attempts in half-open1

Behaviour during an outage

Closed circuit (normal)

All requests pass through to the upstream service as usual. The failure counter increments on each error and resets on each success.

Open circuit (tripped)

Once 5 consecutive failures occur within 60 seconds, the circuit opens. For the next 30 seconds:

  • Calls to the affected service are not attempted
  • A cached or degraded response is returned immediately
  • No further pressure is placed on the struggling service
  • Upstream submission queues are preserved — nothing is lost

Half-open circuit (probing)

After 30 seconds, one request is allowed through as a probe.

  • If it succeeds: the circuit closes and normal operation resumes
  • If it fails: the circuit re-opens and the 30-second timer resets

Async paths: Inngest backoff

For background jobs — including scheduled quarterly submission workflows and historical transaction imports — Inngest's built-in exponential backoff serves as a complementary soft circuit breaker. Failed async steps are retried with increasing delays rather than immediately re-attempted, preventing burst pressure on recovering services.


What this means for you

  • Submissions are not lost. When a circuit trips, queued submissions wait safely and are retried once the circuit recovers.
  • Faster failure feedback. Instead of waiting for multiple network timeouts, you receive an immediate degraded-mode response while the outage resolves.
  • Reduced blast radius. A TrueLayer bank feed outage, for example, will no longer affect HMRC submission paths — each service has its own independent circuit.

Current limitations

  • Circuit breaker state is in-memory and per-instance. In a horizontally scaled deployment, each instance maintains its own counters independently. A shared state store (e.g. Redis) is on the roadmap for a future release.
  • Cached/degraded responses during the open state are informational — the platform will not submit stale data to HMRC on your behalf.

Related