SOC2-09: Configuring Critical Failure Alerting

This page documents the SOC 2 control gap identified in v0.1.146 and provides a step-by-step guide for remediating missing critical failure alerting across the platform.

Background

SOC 2 control SOC2-09 requires that the organisation has monitoring and alerting mechanisms in place to notify responsible personnel of critical system failures in a timely manner. An audit identified that the following failure scenarios were not generating any alerts:

Failure Scenario	Where It Occurs	Risk
Nightly OFSI sanctions sync fails	`nightly-sync.yml` workflow	Stale sanctions data served to compliance users
Monthly billing job errors	`monthly-billing.yml` workflow	Revenue loss, customer impact
`/api/health` returns `503`	Production runtime	Application unavailability undetected

No Slack, PagerDuty, email, or webhook destinations were configured for any of these scenarios.

Remediation Steps

1. GitHub Actions — Slack Webhook Notifications

Add a failure notification step to each critical workflow. This step runs only when a preceding step fails, using the if: failure() condition.

Prerequisites:

Create a Slack incoming webhook URL at api.slack.com/messaging/webhooks.
Store the URL as a GitHub Actions secret named SLACK_WEBHOOK_URL.

Add to nightly-sync.yml and monthly-billing.yml:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      # ... existing steps ...

      - name: Notify Slack on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "🚨 *${{ github.workflow }}* failed.\nBranch: `${{ github.ref }}`\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Tip: The same pattern can be used with PagerDuty or any webhook-based incident management tool by replacing the uses action and payload format.

2. Runtime Alerting via Sentry

If Sentry is already integrated for error tracking, configure alert rules to notify on-call engineers:

Navigate to Sentry → Alerts → Create Alert Rule.
Set the condition to trigger on: "Number of events is greater than 0 in 1 minute" for issues with level fatal or error.
Add a notification action targeting the appropriate Slack channel or PagerDuty service.
Scope the rule to the production environment.

3. Uptime Monitoring for `/api/health`

Configure an external uptime monitor to poll the health endpoint and alert when the application is degraded or unavailable.

Recommended tools: Better Uptime, Checkly, Pingdom

Configuration:

Setting	Value
URL	`https://<your-domain>/api/health`
Method	`GET`
Check interval	Every 1 minute
Alert condition	HTTP status `!= 200` (or `>= 500`)
Alert channel	Slack / PagerDuty / email
Confirmation period	2 consecutive failures before alerting

A healthy response from /api/health should return HTTP 200. A 503 response indicates the application or a critical dependency (e.g. database, OFSI sync) is unhealthy.

Required Secrets

The following secrets must be added to your GitHub repository and/or CI environment:

Secret	Description
`SLACK_WEBHOOK_URL`	Incoming webhook URL for your Slack alerting channel

Compliance Status

Control	Status	Resolved In
SOC2-09 — Critical Failure Alerting	⚠️ Open	Pending — see remediation above

Once the steps above are implemented and verified, this control can be marked as remediated in the next SOC 2 audit cycle.

SOC2-09: Configuring Critical Failure Alerting

SOC2-09: Configuring Critical Failure Alerting

Background

Remediation Steps

1. GitHub Actions — Slack Webhook Notifications

2. Runtime Alerting via Sentry

3. Uptime Monitoring for /api/health

Required Secrets

Compliance Status

3. Uptime Monitoring for `/api/health`