Compliance Change Detection & Diffing Engine — How HMOwatch Detects Ever-Changing Regulation
Compliance Change Detection & Diffing Engine
Released in v1.0.3 · Category: Core Functionality · Severity: Critical
The Problem This Solves
HMOwatch monitors 400+ UK local authority licensing sources. But scraping content is only half the job — the platform's mission is to detect changes to that content and surface them as actionable alerts before landlords and letting agents are caught out.
Before v1.0.3, scraped content was stored but never compared against previous versions. A local authority could change its selective licensing conditions, introduce a new fee band, or remove an exemption category — and HMOwatch would have no way to know. That gap is now closed.
How It Works
The Compliance Change Detection & Diffing Engine operates as a pipeline stage that runs immediately after each scraping pass. It consists of four components:
1. Snapshot Storage
Every scraping run produces a snapshot — a point-in-time record of the regulatory content for a given local authority source. Snapshots are persisted to the snapshots table and include:
- A content hash (used for fast change detection)
- The raw scraped text
- Structured metadata extracted from the page
- A UTC timestamp
Snapshots are never overwritten; they accumulate to form a historical record of each source's regulatory state over time.
2. Diff Computation
When a new snapshot is created, the engine retrieves the most recent prior snapshot for the same source and computes a structured diff. Diffing operates at the section level — the engine identifies changes within discrete content areas such as:
- Licensing conditions and requirements
- Application fees and payment terms
- Licence duration and renewal terms
- Exemption categories
- Application process and contact details
Section-level diffing produces human-readable change summaries that are meaningful to compliance professionals, not just raw text deltas. Results are stored in the diffs table, linked to both the previous and new snapshot.
3. Severity Classification
Not all changes carry equal compliance risk. Every diff is automatically classified by severity:
| Severity | Examples |
|---|---|
| Critical | New licence type introduced, existing licensing requirement changed, exemption removed |
| High | Fee change, deadline change, licence condition amended |
| Medium | Procedural update, contact details changed, new guidance added |
| Low | Formatting change, minor wording adjustment, navigation update |
Severity drives how urgently a change event is surfaced to users — critical changes trigger immediate alerts, while low-severity changes are batched into digest notifications.
4. Change Events
For any diff classified as medium severity or above, the engine writes a change event to the change_events table. A change event is the platform's authoritative record that something regulatorily significant has happened.
Each change event contains:
- The affected local authority and source URL
- The regulation section(s) where the change occurred
- Severity classification
- A human-readable diff summary
- References to the before and after snapshots
- Detection timestamp
Change events are the trigger point for all downstream workflows: user alert emails, in-app notifications, webhook dispatches to integrated property management systems, and dashboard flagging.
Database Schema
Three new tables underpin this feature:
-- Point-in-time captures of scraped regulatory content
CREATE TABLE snapshots (
id UUID PRIMARY KEY,
source_id UUID NOT NULL REFERENCES sources(id),
content_hash TEXT NOT NULL,
raw_text TEXT NOT NULL,
metadata JSONB,
captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Computed differences between consecutive snapshots
CREATE TABLE diffs (
id UUID PRIMARY KEY,
source_id UUID NOT NULL REFERENCES sources(id),
snapshot_before UUID NOT NULL REFERENCES snapshots(id),
snapshot_after UUID NOT NULL REFERENCES snapshots(id),
sections JSONB NOT NULL, -- array of section-level diffs
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Classified change records that trigger downstream workflows
CREATE TABLE change_events (
id UUID PRIMARY KEY,
source_id UUID NOT NULL REFERENCES sources(id),
diff_id UUID NOT NULL REFERENCES diffs(id),
severity TEXT NOT NULL CHECK (severity IN ('critical','high','medium','low')),
summary TEXT NOT NULL,
triggered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
processed BOOLEAN NOT NULL DEFAULT FALSE
);
Alert Pipeline Integration
Change events flow into the alert pipeline as follows:
Scraping Run
│
▼
Snapshot Created
│
▼
Diff Computed (vs. previous snapshot)
│
▼
Severity Classified
│
├─ Low ──────────────────────────► Digest batch (no immediate alert)
│
├─ Medium ───────────────────────► Change event → in-app notification
│
├─ High ─────────────────────────► Change event → email alert + in-app
│
└─ Critical ─────────────────────► Change event → immediate email + webhook
Frequently Asked Questions
How often are snapshots compared?
A diff is computed on every scraping run that produces a new content hash. If the content hash matches the previous snapshot, no diff is computed and no change event is raised — keeping the system efficient at scale.
How far back does the snapshot history go?
Snapshots are retained indefinitely. The diff engine always compares against the most recent prior snapshot, but historical snapshots can be used for retrospective analysis or audit purposes.
What happens if a source returns an error during scraping?
If a scraping run fails to return valid content, no snapshot is created for that run. The previous snapshot remains the reference point for the next successful scrape. Consecutive scraping failures are tracked separately and surface as monitoring alerts, not compliance change events.
Can I query change events via the API?
Yes — change events are exposed via the REST API. See the API Reference for endpoint documentation.