All Docs
FeaturesAgentOS WorkUpdated March 11, 2026

Bug Fix Deep-Dive: SLA Monitor False Escalations (v1.0.15)

Bug Fix Deep-Dive: SLA Monitor False Escalations (v1.0.15)

Overview

Version 1.0.15 ships a targeted fix for a bug in the Workflow SLA Monitor that caused escalation events to fire for every active workflow step, every hour — even when no deadlines had been missed. This post explains what went wrong, what the impact was, and how the fix works.


What Is the SLA Monitor?

The Workflow SLA Monitor is an Inngest scheduled function (workflowSlaMonitor) that runs on an hourly cadence. Its job is to:

  1. Query the database for workflow steps that are overdue.
  2. Emit escalation events so that the appropriate parties are notified.

A step is considered overdue when its dueDate is in the past and the step is still active or in progress.


The Bug

File: src/inngest/functions/workflow-engine.ts
Function: workflowSlaMonitor
Lines: 196–220

The query fetched all steps with a status of active or in-progress — but included no filter on dueDate. A code comment acknowledged this:

// Filter in JS for overdue (avoids complex Drizzle lt with nullable)

However, the promised JavaScript filter was never written. The result was that overdueSteps contained every active step in the system, and the function dutifully emitted an escalation event for each one.

Why This Is Harmful

  • Spurious notifications: Assignees and managers received escalation alerts for tasks that were on time or had no deadline set at all.
  • Noise drowns signal: Legitimate overdue alerts became indistinguishable from false ones, reducing trust in the notification system.
  • Unnecessary load: Every hourly run processed and emitted events proportional to the total number of active steps — not just the overdue subset.

The Fix

The fix moves the overdue predicate into the database query itself using Drizzle ORM conditions:

// Before (broken — no dueDate filter)
const overdueSteps = await db
  .select()
  .from(workflowSteps)
  .where(
    inArray(workflowSteps.status, ['active', 'in-progress'])
  );

// After (fixed — only steps with a past, non-null dueDate)
const now = new Date();
const overdueSteps = await db
  .select()
  .from(workflowSteps)
  .where(
    and(
      inArray(workflowSteps.status, ['active', 'in-progress']),
      isNotNull(workflowSteps.dueDate),
      lt(workflowSteps.dueDate, now)
    )
  );

Why at the Query Level?

Filtering in the database rather than in JavaScript avoids two problems:

  1. N+1 queries — a JS-side loop would require loading each step's dueDate individually if it wasn't already included in the initial select.
  2. Memory overhead — fetching thousands of active steps only to discard most of them wastes both memory and database I/O on every hourly tick.

Using isNotNull alongside lt also correctly handles steps that have no dueDate set — they are excluded from escalation entirely, which is the intended behaviour.


Upgrade Notes

VersionBehaviour
≤ 1.0.14All active/in-progress steps escalated every hour
1.0.15Only steps with dueDate set and in the past are escalated

No database migrations are required for this fix. After deploying v1.0.15, the hourly SLA monitor run will immediately begin producing accurate results.

If you have downstream automations or integrations that consume escalation events, be aware that event volume will drop significantly following the upgrade. This is expected and correct behaviour — the previous high volume was erroneous.


Summary

  • Bug: Missing dueDate < now filter caused all active workflow steps to be flagged as overdue.
  • Fix: Added isNotNull(workflowSteps.dueDate) and lt(workflowSteps.dueDate, now) directly in the Drizzle query.
  • Result: Escalation events are now only emitted for steps that are genuinely past their deadline.