AI Assessment Stuck-Running Watchdog
AI Assessment Stuck-Running Watchdog
Available from: v0.1.228
Overview
The AI Assessment Watchdog is a nightly automated job that detects and recovers AI deduction assessments that have become stuck in a running state. It ensures agents are never left waiting indefinitely on an assessment that has silently failed.
How it works
Schedule
The watchdog runs on a cron schedule every night at 03:00 UTC (0 3 * * *).
Detection logic
On each run, the job queries ai_deduction_assessments for records matching both of the following conditions:
| Field | Condition |
|---|---|
status | 'running' |
startedAt | more than 10 minutes before the sweep time |
Recovery actions
For each stuck assessment found, the job performs two operations atomically:
1. Marks the assessment as failed
| Field | Value set |
|---|---|
status | 'failed' |
errorMessage | 'Assessment timed out' |
failedAt | timestamp of the watchdog sweep |
2. Creates an in-app notification
- Recipient: the agent identified by
requestedByIdon the assessment record - Severity: warning
- Content: includes a direct link to retry the timed-out assessment
Agent experience
If one of your AI deduction assessments is caught by the watchdog, you will receive an in-app warning notification the morning after the assessment got stuck. The notification contains a retry link so you can re-trigger the assessment immediately without navigating away.
Assessments marked failed by the watchdog behave identically to any other failed assessment — they can be retried, and their status history is preserved for audit purposes.
Affected entities
aiDeductionAssessments— status, errorMessage, and failedAt fields updatednotifications— new warning notification record inserted per affected assessment
Frequently asked questions
Q: Why would an assessment get stuck in running?
AI assessments rely on external model inference. Upstream timeouts, infrastructure interruptions, or dropped connections can leave an assessment mid-run with no way to self-resolve. The watchdog provides a safety net for these edge cases.
Q: What is the 10-minute threshold based on?
Under normal conditions, AI deduction assessments complete well within 10 minutes. A running record older than 10 minutes is a reliable signal that the process did not complete successfully.
Q: Will I be notified for every stuck assessment?
Yes — one in-app notification is created per affected assessment, linked to the specific assessment so you can act on each one individually.
Q: Is there any data loss when an assessment is timed out?
No. The assessment record is preserved with its full history. Only the status fields are updated to reflect the failure. You can retry the assessment to generate a new result.