All Docs
FeaturesMaking Tax DigitalUpdated March 7, 2026

Incident Response — ISO-05

Incident Response — ISO-05

Status: Not yet deployed. The code and runbook described on this page were developed in PR #342 but could not be merged — the source branch was deleted before the merge completed. This page describes the intended behaviour once the branch is restored and merged.

This page documents the ISO/IEC 27001:2022 Control ISO-05 implementation: automated Slack on-call alerting and the formal incident response runbook.


Severity Tiers

SeverityLabelFirst ResponseResolution TargetExample
P0Critical15 min2 hrEncryption key failure; RLS bypass; submission engine down at a quarterly deadline
P1High1 hr4 hrHMRC OAuth broken; submission failures >10%; mass bank token expiry
P2Medium4 hr24 hrAgentOS sync failing for specific accounts; single-user submission error
P3Low24 hrNext sprintNotification email delayed; non-blocking UI glitch

Severity is derived automatically from the ErrorDomain tag attached to every captureError() call. Domains submission, hmrc, auth, bank, agentos, and inngest are classified as P0/P1 and trigger immediate alerting.


Automated Slack Alerting

When a P0/P1 error is captured, src/lib/incident-alert.ts fires a Slack Block Kit message to the configured #alerts-critical channel alongside the existing Sentry event.

What the alert contains

  • Severity badge (P0 🔴 / P1 🟠)
  • Error domain and operation name
  • Organisation ID (orgId)
  • Sanitised error summary (NINOs, bearer tokens, and long hex strings are stripped)
  • Response SLA reminder
  • Direct link to the incident response runbook

Behaviour guarantees

  • Fire-and-forget — alerting runs in parallel with Sentry and never blocks the error-handling path
  • 5-second timeout — if the Slack webhook is unreachable, the call is aborted and the error is swallowed silently
  • Graceful no-op — if SLACK_INCIDENT_WEBHOOK_URL is not set, no alert is sent and no error is thrown

Configuration

Set the following environment variable in Vercel (and locally in .env.local):

# .env.example
SLACK_INCIDENT_WEBHOOK_URL=  # Slack Incoming Webhook URL for P0/P1 on-call alerts

To create the webhook:

  1. Go to https://api.slack.com/apps → select or create your app
  2. Navigate to Incoming WebhooksAdd New Webhook to Workspace
  3. Point it at your #alerts-critical channel
  4. Copy the webhook URL into the environment variable

On-Call Escalation Path

Automated alert (Sentry / Slack webhook)
  → Primary on-call (15 min SLA)
    → Secondary on-call if unacknowledged (30 min)
      → Engineering Lead page (45 min)
        → CEO notification for P0 with data breach potential (60 min)

The escalation contact table (names, Slack handles, phone numbers) must be filled in docs/INCIDENT-RESPONSE.md before go-live.


Domain-Specific Runbooks

The full runbook in docs/INCIDENT-RESPONSE.md covers the following critical paths:

DomainExample incidentContainment action
submission / hmrcHMRC submission engine failureCheck HMRC API status; pause Inngest crons; check for rows stuck in submitting status
authEncryption or key management failureRotate HMRC_TOKEN_ENCRYPTION_KEY / NINO_ENCRYPTION_KEY; restart Vercel deployment
authRLS / data isolation bypassDisable affected org immediately; notify DPO; treat as P0 data breach
bankTrueLayer / bank feed outageTrigger bulk re-consent email; disable sync crons
agentosAgentOS sync failureCheck internal ops channel; verify AgentOS status
inngestBackground job failure after all retriesCheck Inngest dashboard; review function failure logs

Post-Incident Review

Post-incident reviews are mandatory for P0 and P1 and must be completed within 5 business days of resolution.

The review covers:

  1. Timeline reconstruction
  2. Contributing factors (code, process, tooling)
  3. GitHub Issues created for each remediation action
  4. Runbook updates if procedures need changing

Regulatory Deadlines

Certain dates elevate any P1 incident to P0 urgency:

  • 31 January — Self-assessment crystallisation deadline
  • Quarterly MTD filing dates (5 Aug, 5 Nov, 5 Feb, 5 May)

UK GDPR Article 33: For incidents involving actual or potential data breaches, the DPO must be notified within 72 hours of becoming aware. The on-call escalation path to the CEO covers this notification requirement.


External Status Pages

DependencyStatus URL
HMRC APIhttps://api.service.hmrc.gov.uk/public/status
TrueLayerhttps://status.truelayer.com
Vercelhttps://www.vercel-status.com
Neonhttps://neonstatus.com
Sentryhttps://status.sentry.io
Inngesthttps://status.inngest.com