All Docs
FeaturesHMOwatchUpdated February 21, 2026

Introducing the Local Authority Data Source Registry

Introducing the Local Authority Data Source Registry

Version: 1.0.1
Severity: Critical — foundational infrastructure


Overview

The Local Authority Data Source Registry is the core data layer that enables HMOwatch to monitor UK council licensing information at scale. Every automated scraping bot, compliance alert, and regulatory change notification in the platform traces back to a registered source in this registry.

Before this release, HMOwatch had no structured way to store which councils exist, where their licensing data lives, how often to check it, or what format to expect. This release fixes that gap.


Data Model

local_authorities Table

Represents each of the 400+ UK local authority councils.

ColumnTypeDescription
idUUID / IntegerPrimary key
nameStringFull council name (e.g. Leeds City Council)
regionStringGeographic region (e.g. Yorkshire and the Humber)
slugStringURL-safe identifier
activeBooleanWhether this authority is currently being monitored
created_atTimestampRecord creation time
updated_atTimestampLast modification time

sources Table

Represents individual scrape targets associated with a local authority.

ColumnTypeDescription
idUUID / IntegerPrimary key
local_authority_idForeign KeyReference to the parent council
urlStringThe target URL to scrape
data_schemaJSON / TextExpected structure of the licensing data at this URL
scrape_frequencyString / EnumHow often this source should be checked (e.g. daily, weekly)
last_scraped_atTimestampWhen this source was last successfully scraped
next_scheduled_atTimestampWhen this source is next due to be scraped
activeBooleanWhether scraping is enabled for this source
created_atTimestampRecord creation time
updated_atTimestampLast modification time

Seed Data

An initial seed file is provided to bootstrap the registry with known UK local authority records. This allows the scraping pipeline to begin operating immediately after migration without manual data entry.

To apply the seed:

# Run migrations first
npx prisma migrate deploy
# or
rails db:migrate

# Then seed the registry
npx prisma db seed
# or
rails db:seed

Note: The seed file covers the initial set of known sources. Additional councils and sources can be added via the Admin UI or directly via the API.


Admin UI

A built-in admin interface is available to manage the registry without requiring direct database access.

Capabilities

  • View all registered local authorities and their associated sources
  • Add new councils and scrape targets
  • Edit source URLs, data schemas, and scrape schedules
  • Enable / Disable individual sources without deleting them
  • Inspect last scraped and next scheduled timestamps

Access

The admin UI is accessible at /admin/sources (requires admin role).


Relationship to the Scraping Pipeline

The registry is consumed by the scraping bots at runtime:

  1. The scheduler queries sources for all active records where next_scheduled_at is due.
  2. Each bot receives a source record containing the url and data_schema it needs.
  3. After a successful scrape, the bot updates last_scraped_at and sets the next next_scheduled_at.
  4. If a change is detected against the stored schema, the alerting system is triggered.

Without a populated registry, no scraping occurs and no alerts are generated.


Setup Checklist

  • Run database migrations to create the local_authorities and sources tables
  • Run the seed file to populate initial UK council records
  • Verify records appear in the Admin UI at /admin/sources
  • Confirm the scrape scheduler is reading from the sources table
  • Add or update any sources not covered by the seed data