Bug Fix Deep-Dive: SLA Monitor False Escalations (v1.0.15)
Bug Fix Deep-Dive: SLA Monitor False Escalations (v1.0.15)
Overview
Version 1.0.15 ships a targeted fix for a bug in the Workflow SLA Monitor that caused escalation events to fire for every active workflow step, every hour — even when no deadlines had been missed. This post explains what went wrong, what the impact was, and how the fix works.
What Is the SLA Monitor?
The Workflow SLA Monitor is an Inngest scheduled function (workflowSlaMonitor) that runs on an hourly cadence. Its job is to:
- Query the database for workflow steps that are overdue.
- Emit escalation events so that the appropriate parties are notified.
A step is considered overdue when its dueDate is in the past and the step is still active or in progress.
The Bug
File: src/inngest/functions/workflow-engine.ts
Function: workflowSlaMonitor
Lines: 196–220
The query fetched all steps with a status of active or in-progress — but included no filter on dueDate. A code comment acknowledged this:
// Filter in JS for overdue (avoids complex Drizzle lt with nullable)
However, the promised JavaScript filter was never written. The result was that overdueSteps contained every active step in the system, and the function dutifully emitted an escalation event for each one.
Why This Is Harmful
- Spurious notifications: Assignees and managers received escalation alerts for tasks that were on time or had no deadline set at all.
- Noise drowns signal: Legitimate overdue alerts became indistinguishable from false ones, reducing trust in the notification system.
- Unnecessary load: Every hourly run processed and emitted events proportional to the total number of active steps — not just the overdue subset.
The Fix
The fix moves the overdue predicate into the database query itself using Drizzle ORM conditions:
// Before (broken — no dueDate filter)
const overdueSteps = await db
.select()
.from(workflowSteps)
.where(
inArray(workflowSteps.status, ['active', 'in-progress'])
);
// After (fixed — only steps with a past, non-null dueDate)
const now = new Date();
const overdueSteps = await db
.select()
.from(workflowSteps)
.where(
and(
inArray(workflowSteps.status, ['active', 'in-progress']),
isNotNull(workflowSteps.dueDate),
lt(workflowSteps.dueDate, now)
)
);
Why at the Query Level?
Filtering in the database rather than in JavaScript avoids two problems:
- N+1 queries — a JS-side loop would require loading each step's
dueDateindividually if it wasn't already included in the initial select. - Memory overhead — fetching thousands of active steps only to discard most of them wastes both memory and database I/O on every hourly tick.
Using isNotNull alongside lt also correctly handles steps that have no dueDate set — they are excluded from escalation entirely, which is the intended behaviour.
Upgrade Notes
| Version | Behaviour |
|---|---|
| ≤ 1.0.14 | All active/in-progress steps escalated every hour |
| 1.0.15 | Only steps with dueDate set and in the past are escalated |
No database migrations are required for this fix. After deploying v1.0.15, the hourly SLA monitor run will immediately begin producing accurate results.
If you have downstream automations or integrations that consume escalation events, be aware that event volume will drop significantly following the upgrade. This is expected and correct behaviour — the previous high volume was erroneous.
Summary
- Bug: Missing
dueDate < nowfilter caused all active workflow steps to be flagged as overdue. - Fix: Added
isNotNull(workflowSteps.dueDate)andlt(workflowSteps.dueDate, now)directly in the Drizzle query. - Result: Escalation events are now only emitted for steps that are genuinely past their deadline.