SOC2-09: Configuring Critical Failure Alerting
SOC2-09: Configuring Critical Failure Alerting
This page documents the SOC 2 control gap identified in v0.1.146 and provides a step-by-step guide for remediating missing critical failure alerting across the platform.
Background
SOC 2 control SOC2-09 requires that the organisation has monitoring and alerting mechanisms in place to notify responsible personnel of critical system failures in a timely manner. An audit identified that the following failure scenarios were not generating any alerts:
| Failure Scenario | Where It Occurs | Risk |
|---|---|---|
| Nightly OFSI sanctions sync fails | nightly-sync.yml workflow | Stale sanctions data served to compliance users |
| Monthly billing job errors | monthly-billing.yml workflow | Revenue loss, customer impact |
/api/health returns 503 | Production runtime | Application unavailability undetected |
No Slack, PagerDuty, email, or webhook destinations were configured for any of these scenarios.
Remediation Steps
1. GitHub Actions — Slack Webhook Notifications
Add a failure notification step to each critical workflow. This step runs only when a preceding step fails, using the if: failure() condition.
Prerequisites:
- Create a Slack incoming webhook URL at api.slack.com/messaging/webhooks.
- Store the URL as a GitHub Actions secret named
SLACK_WEBHOOK_URL.
Add to nightly-sync.yml and monthly-billing.yml:
jobs:
sync:
runs-on: ubuntu-latest
steps:
# ... existing steps ...
- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "🚨 *${{ github.workflow }}* failed.\nBranch: `${{ github.ref }}`\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
Tip: The same pattern can be used with PagerDuty or any webhook-based incident management tool by replacing the
usesaction and payload format.
2. Runtime Alerting via Sentry
If Sentry is already integrated for error tracking, configure alert rules to notify on-call engineers:
- Navigate to Sentry → Alerts → Create Alert Rule.
- Set the condition to trigger on: "Number of events is greater than 0 in 1 minute" for issues with level
fatalorerror. - Add a notification action targeting the appropriate Slack channel or PagerDuty service.
- Scope the rule to the
productionenvironment.
3. Uptime Monitoring for /api/health
Configure an external uptime monitor to poll the health endpoint and alert when the application is degraded or unavailable.
Recommended tools: Better Uptime, Checkly, Pingdom
Configuration:
| Setting | Value |
|---|---|
| URL | https://<your-domain>/api/health |
| Method | GET |
| Check interval | Every 1 minute |
| Alert condition | HTTP status != 200 (or >= 500) |
| Alert channel | Slack / PagerDuty / email |
| Confirmation period | 2 consecutive failures before alerting |
A healthy response from /api/health should return HTTP 200. A 503 response indicates the application or a critical dependency (e.g. database, OFSI sync) is unhealthy.
Required Secrets
The following secrets must be added to your GitHub repository and/or CI environment:
| Secret | Description |
|---|---|
SLACK_WEBHOOK_URL | Incoming webhook URL for your Slack alerting channel |
Compliance Status
| Control | Status | Resolved In |
|---|---|---|
| SOC2-09 — Critical Failure Alerting | ⚠️ Open | Pending — see remediation above |
Once the steps above are implemented and verified, this control can be marked as remediated in the next SOC 2 audit cycle.