Compliance Change Detection & Diffing Engine — How HMOwatch Detects Ever-Changing Regulation

Compliance Change Detection & Diffing Engine

Released in v1.0.3 · Category: Core Functionality · Severity: Critical

The Problem This Solves

HMOwatch monitors 400+ UK local authority licensing sources. But scraping content is only half the job — the platform's mission is to detect changes to that content and surface them as actionable alerts before landlords and letting agents are caught out.

Before v1.0.3, scraped content was stored but never compared against previous versions. A local authority could change its selective licensing conditions, introduce a new fee band, or remove an exemption category — and HMOwatch would have no way to know. That gap is now closed.

How It Works

The Compliance Change Detection & Diffing Engine operates as a pipeline stage that runs immediately after each scraping pass. It consists of four components:

1. Snapshot Storage

Every scraping run produces a snapshot — a point-in-time record of the regulatory content for a given local authority source. Snapshots are persisted to the snapshots table and include:

A content hash (used for fast change detection)
The raw scraped text
Structured metadata extracted from the page
A UTC timestamp

Snapshots are never overwritten; they accumulate to form a historical record of each source's regulatory state over time.

2. Diff Computation

When a new snapshot is created, the engine retrieves the most recent prior snapshot for the same source and computes a structured diff. Diffing operates at the section level — the engine identifies changes within discrete content areas such as:

Licensing conditions and requirements
Application fees and payment terms
Licence duration and renewal terms
Exemption categories
Application process and contact details

Section-level diffing produces human-readable change summaries that are meaningful to compliance professionals, not just raw text deltas. Results are stored in the diffs table, linked to both the previous and new snapshot.

3. Severity Classification

Not all changes carry equal compliance risk. Every diff is automatically classified by severity:

Severity	Examples
Critical	New licence type introduced, existing licensing requirement changed, exemption removed
High	Fee change, deadline change, licence condition amended
Medium	Procedural update, contact details changed, new guidance added
Low	Formatting change, minor wording adjustment, navigation update

Severity drives how urgently a change event is surfaced to users — critical changes trigger immediate alerts, while low-severity changes are batched into digest notifications.

4. Change Events

For any diff classified as medium severity or above, the engine writes a change event to the change_events table. A change event is the platform's authoritative record that something regulatorily significant has happened.

Each change event contains:

The affected local authority and source URL
The regulation section(s) where the change occurred
Severity classification
A human-readable diff summary
References to the before and after snapshots
Detection timestamp

Change events are the trigger point for all downstream workflows: user alert emails, in-app notifications, webhook dispatches to integrated property management systems, and dashboard flagging.

Database Schema

Three new tables underpin this feature:

-- Point-in-time captures of scraped regulatory content
CREATE TABLE snapshots (
  id            UUID PRIMARY KEY,
  source_id     UUID NOT NULL REFERENCES sources(id),
  content_hash  TEXT NOT NULL,
  raw_text      TEXT NOT NULL,
  metadata      JSONB,
  captured_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Computed differences between consecutive snapshots
CREATE TABLE diffs (
  id              UUID PRIMARY KEY,
  source_id       UUID NOT NULL REFERENCES sources(id),
  snapshot_before UUID NOT NULL REFERENCES snapshots(id),
  snapshot_after  UUID NOT NULL REFERENCES snapshots(id),
  sections        JSONB NOT NULL,  -- array of section-level diffs
  computed_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Classified change records that trigger downstream workflows
CREATE TABLE change_events (
  id          UUID PRIMARY KEY,
  source_id   UUID NOT NULL REFERENCES sources(id),
  diff_id     UUID NOT NULL REFERENCES diffs(id),
  severity    TEXT NOT NULL CHECK (severity IN ('critical','high','medium','low')),
  summary     TEXT NOT NULL,
  triggered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  processed   BOOLEAN NOT NULL DEFAULT FALSE
);

Alert Pipeline Integration

Change events flow into the alert pipeline as follows:

Scraping Run
    │
    ▼
Snapshot Created
    │
    ▼
Diff Computed (vs. previous snapshot)
    │
    ▼
Severity Classified
    │
    ├─ Low ──────────────────────────► Digest batch (no immediate alert)
    │
    ├─ Medium ───────────────────────► Change event → in-app notification
    │
    ├─ High ─────────────────────────► Change event → email alert + in-app
    │
    └─ Critical ─────────────────────► Change event → immediate email + webhook

Frequently Asked Questions

How often are snapshots compared?
A diff is computed on every scraping run that produces a new content hash. If the content hash matches the previous snapshot, no diff is computed and no change event is raised — keeping the system efficient at scale.

How far back does the snapshot history go?
Snapshots are retained indefinitely. The diff engine always compares against the most recent prior snapshot, but historical snapshots can be used for retrospective analysis or audit purposes.

What happens if a source returns an error during scraping?
If a scraping run fails to return valid content, no snapshot is created for that run. The previous snapshot remains the reference point for the next successful scrape. Consecutive scraping failures are tracked separately and surface as monitoring alerts, not compliance change events.

Can I query change events via the API?
Yes — change events are exposed via the REST API. See the API Reference for endpoint documentation.