Introducing the Local Authority Data Source Registry

Version: 1.0.1
Severity: Critical — foundational infrastructure

Overview

The Local Authority Data Source Registry is the core data layer that enables HMOwatch to monitor UK council licensing information at scale. Every automated scraping bot, compliance alert, and regulatory change notification in the platform traces back to a registered source in this registry.

Before this release, HMOwatch had no structured way to store which councils exist, where their licensing data lives, how often to check it, or what format to expect. This release fixes that gap.

Data Model

`local_authorities` Table

Represents each of the 400+ UK local authority councils.

Column	Type	Description
`id`	UUID / Integer	Primary key
`name`	String	Full council name (e.g. Leeds City Council)
`region`	String	Geographic region (e.g. Yorkshire and the Humber)
`slug`	String	URL-safe identifier
`active`	Boolean	Whether this authority is currently being monitored
`created_at`	Timestamp	Record creation time
`updated_at`	Timestamp	Last modification time

`sources` Table

Represents individual scrape targets associated with a local authority.

Column	Type	Description
`id`	UUID / Integer	Primary key
`local_authority_id`	Foreign Key	Reference to the parent council
`url`	String	The target URL to scrape
`data_schema`	JSON / Text	Expected structure of the licensing data at this URL
`scrape_frequency`	String / Enum	How often this source should be checked (e.g. `daily`, `weekly`)
`last_scraped_at`	Timestamp	When this source was last successfully scraped
`next_scheduled_at`	Timestamp	When this source is next due to be scraped
`active`	Boolean	Whether scraping is enabled for this source
`created_at`	Timestamp	Record creation time
`updated_at`	Timestamp	Last modification time

Seed Data

An initial seed file is provided to bootstrap the registry with known UK local authority records. This allows the scraping pipeline to begin operating immediately after migration without manual data entry.

To apply the seed:

# Run migrations first
npx prisma migrate deploy
# or
rails db:migrate

# Then seed the registry
npx prisma db seed
# or
rails db:seed

Note: The seed file covers the initial set of known sources. Additional councils and sources can be added via the Admin UI or directly via the API.

Admin UI

A built-in admin interface is available to manage the registry without requiring direct database access.

Capabilities

View all registered local authorities and their associated sources
Add new councils and scrape targets
Edit source URLs, data schemas, and scrape schedules
Enable / Disable individual sources without deleting them
Inspect last scraped and next scheduled timestamps

Access

The admin UI is accessible at /admin/sources (requires admin role).

Relationship to the Scraping Pipeline

The registry is consumed by the scraping bots at runtime:

The scheduler queries sources for all active records where next_scheduled_at is due.
Each bot receives a source record containing the url and data_schema it needs.
After a successful scrape, the bot updates last_scraped_at and sets the next next_scheduled_at.
If a change is detected against the stored schema, the alerting system is triggered.

Without a populated registry, no scraping occurs and no alerts are generated.

Setup Checklist

Run database migrations to create the local_authorities and sources tables
Run the seed file to populate initial UK council records
Verify records appear in the Admin UI at /admin/sources
Confirm the scrape scheduler is reading from the sources table
Add or update any sources not covered by the seed data

Introducing the Local Authority Data Source Registry

Introducing the Local Authority Data Source Registry

Overview

Data Model

local_authorities Table

sources Table

Seed Data

Admin UI

Capabilities

Access

Relationship to the Scraping Pipeline

Setup Checklist

`local_authorities` Table

`sources` Table