Incident Response — ISO-05
Incident Response — ISO-05
Status: Not yet deployed. The code and runbook described on this page were developed in PR #342 but could not be merged — the source branch was deleted before the merge completed. This page describes the intended behaviour once the branch is restored and merged.
This page documents the ISO/IEC 27001:2022 Control ISO-05 implementation: automated Slack on-call alerting and the formal incident response runbook.
Severity Tiers
| Severity | Label | First Response | Resolution Target | Example |
|---|---|---|---|---|
| P0 | Critical | 15 min | 2 hr | Encryption key failure; RLS bypass; submission engine down at a quarterly deadline |
| P1 | High | 1 hr | 4 hr | HMRC OAuth broken; submission failures >10%; mass bank token expiry |
| P2 | Medium | 4 hr | 24 hr | AgentOS sync failing for specific accounts; single-user submission error |
| P3 | Low | 24 hr | Next sprint | Notification email delayed; non-blocking UI glitch |
Severity is derived automatically from the ErrorDomain tag attached to every captureError() call. Domains submission, hmrc, auth, bank, agentos, and inngest are classified as P0/P1 and trigger immediate alerting.
Automated Slack Alerting
When a P0/P1 error is captured, src/lib/incident-alert.ts fires a Slack Block Kit message to the configured #alerts-critical channel alongside the existing Sentry event.
What the alert contains
- Severity badge (P0 🔴 / P1 🟠)
- Error domain and operation name
- Organisation ID (
orgId) - Sanitised error summary (NINOs, bearer tokens, and long hex strings are stripped)
- Response SLA reminder
- Direct link to the incident response runbook
Behaviour guarantees
- Fire-and-forget — alerting runs in parallel with Sentry and never blocks the error-handling path
- 5-second timeout — if the Slack webhook is unreachable, the call is aborted and the error is swallowed silently
- Graceful no-op — if
SLACK_INCIDENT_WEBHOOK_URLis not set, no alert is sent and no error is thrown
Configuration
Set the following environment variable in Vercel (and locally in .env.local):
# .env.example
SLACK_INCIDENT_WEBHOOK_URL= # Slack Incoming Webhook URL for P0/P1 on-call alerts
To create the webhook:
- Go to https://api.slack.com/apps → select or create your app
- Navigate to Incoming Webhooks → Add New Webhook to Workspace
- Point it at your
#alerts-criticalchannel - Copy the webhook URL into the environment variable
On-Call Escalation Path
Automated alert (Sentry / Slack webhook)
→ Primary on-call (15 min SLA)
→ Secondary on-call if unacknowledged (30 min)
→ Engineering Lead page (45 min)
→ CEO notification for P0 with data breach potential (60 min)
The escalation contact table (names, Slack handles, phone numbers) must be filled in
docs/INCIDENT-RESPONSE.mdbefore go-live.
Domain-Specific Runbooks
The full runbook in docs/INCIDENT-RESPONSE.md covers the following critical paths:
| Domain | Example incident | Containment action |
|---|---|---|
submission / hmrc | HMRC submission engine failure | Check HMRC API status; pause Inngest crons; check for rows stuck in submitting status |
auth | Encryption or key management failure | Rotate HMRC_TOKEN_ENCRYPTION_KEY / NINO_ENCRYPTION_KEY; restart Vercel deployment |
auth | RLS / data isolation bypass | Disable affected org immediately; notify DPO; treat as P0 data breach |
bank | TrueLayer / bank feed outage | Trigger bulk re-consent email; disable sync crons |
agentos | AgentOS sync failure | Check internal ops channel; verify AgentOS status |
inngest | Background job failure after all retries | Check Inngest dashboard; review function failure logs |
Post-Incident Review
Post-incident reviews are mandatory for P0 and P1 and must be completed within 5 business days of resolution.
The review covers:
- Timeline reconstruction
- Contributing factors (code, process, tooling)
- GitHub Issues created for each remediation action
- Runbook updates if procedures need changing
Regulatory Deadlines
Certain dates elevate any P1 incident to P0 urgency:
- 31 January — Self-assessment crystallisation deadline
- Quarterly MTD filing dates (5 Aug, 5 Nov, 5 Feb, 5 May)
UK GDPR Article 33: For incidents involving actual or potential data breaches, the DPO must be notified within 72 hours of becoming aware. The on-call escalation path to the CEO covers this notification requirement.
External Status Pages
| Dependency | Status URL |
|---|---|
| HMRC API | https://api.service.hmrc.gov.uk/public/status |
| TrueLayer | https://status.truelayer.com |
| Vercel | https://www.vercel-status.com |
| Neon | https://neonstatus.com |
| Sentry | https://status.sentry.io |
| Inngest | https://status.inngest.com |