All Docs
FeaturesCalmony Sanctions MonitorUpdated March 12, 2026

ERR-24: Critical Error Alerting — Closing the P1 Observability Gap

ERR-24: Critical Error Alerting — Closing the P1 Observability Gap

Version: v0.1.132
Category: Error Monitoring
Severity: P1 — No alerting infrastructure exists


Overview

An audit of the sanctions screening platform's observability stack (control ERR-24) found that no alerting infrastructure is in place for P1-level errors. This means that critical failures — including nightly OFSI sync failures, database outages, and Stripe webhook processing errors — currently produce no notifications to on-call engineers or operations teams.

This post documents the identified gaps and the recommended steps to close them.


Identified Gaps

1. Nightly Sanctions Sync — No Failure Notifications

The nightly sync GitHub Actions workflow (nightly-sync.yml) downloads and processes the OFSI consolidated sanctions list. If this workflow fails, no notification is sent. Compliance teams and engineers have no automated signal that the sanctions data may be stale.

2. Database Health Check — No Uptime Alerting

The /api/health endpoint returns an HTTP 503 when the database is unavailable. However, no uptime monitor is pointed at this endpoint. A prolonged DB outage would go undetected unless observed directly.

3. Stripe Webhook Failures — Console Logging Only

Errors during Stripe webhook processing are written to the console log only. There is no exception capture and no alert triggered for payment-critical failures, making silent failures a real risk in production.

4. No Alerting Integrations Configured

None of the following integrations are currently configured:

  • Slack webhook
  • Email alerts
  • PagerDuty
  • Sentry alert rules

Recommended Remediation Steps

Step 1 — Configure Sentry Alert Rules

Add alert rules in your Sentry project for error rate spikes. This is also a prerequisite for ERR-22.

  • Navigate to Sentry → Alerts → Create Alert Rule
  • Set a threshold on error rate (e.g. >5 errors/min on any critical transaction)
  • Route alerts to the appropriate Slack channel or PagerDuty service

Step 2 — Add Slack Webhook Notification to Nightly Sync

Update nightly-sync.yml to include a failure notification step:

- name: Notify Slack on failure
  if: failure()
  uses: slackapi/slack-github-action@v1
  with:
    payload: |
      {
        "text": ":rotating_light: Nightly OFSI sanctions sync failed. Please investigate immediately."
      }
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Ensure SLACK_WEBHOOK_URL is configured as a GitHub Actions secret.

Step 3 — Configure an Uptime Monitor on /api/health

Point an uptime monitoring service (e.g. Better Uptime, UptimeRobot) at your /api/health endpoint:

  • Monitor type: HTTP(S)
  • URL: https://<your-domain>/api/health
  • Alert condition: Status code 503 or non-200 response
  • Escalation: Page on-call via PagerDuty or send a Slack notification

This ensures that database availability issues surface immediately rather than being discovered by end users.

Step 4 — Instrument Stripe Webhook Failures with captureException

In the Stripe webhook handler, replace or supplement console logging with Sentry exception capture, tagged at high severity:

import * as Sentry from '@sentry/nextjs';

try {
  // ... Stripe webhook processing
} catch (error) {
  Sentry.withScope((scope) => {
    scope.setTag('severity', 'high');
    scope.setTag('subsystem', 'stripe-webhook');
    Sentry.captureException(error);
  });
  console.error('Stripe webhook processing failed:', error);
  return res.status(500).json({ error: 'Webhook processing failed' });
}

This ensures payment-critical failures are captured in Sentry and can trigger alert rules configured in Step 1.


Interim Guidance

Until all remediation steps are applied, operations teams should:

  • Manually verify the nightly sync GitHub Actions workflow each morning.
  • Periodically check /api/health during business hours.
  • Review application logs for Stripe webhook errors after any payment activity.

Related Controls

ControlDescriptionStatus
ERR-24P1 error alerting infrastructure⚠️ Open — No alerting configured
ERR-22Sentry alert rules for error rate spikes⛔ Blocked by ERR-24

Note: This is a P1 issue. Until alerting is in place, silent failures across sanctions data freshness, database availability, and payment processing represent operational and compliance risk.