HMO Data Scraper
HMO Data Scraper
Released in v1.0.16
Overview
The HMO Data Scraper is HMOwatch's core data-collection engine. It automatically discovers, scrapes, and continuously updates HMO (House in Multiple Occupation) registers published by local authorities across the UK, turning hundreds of fragmented council data sources into a single, up-to-date reference dataset.
What It Does
1. Local Authority Register Discovery
The scraper identifies every local authority in England, Scotland, Wales, and Northern Ireland that publishes an HMO register — covering both landlord registers and property registers. With 400+ councils each maintaining their own licensing records in different formats, discovery is a critical first step before any data can be collected.
2. Data Scraping
Once a register is identified, automated bots extract the relevant records. Registers may be published as:
- Searchable web pages
- Downloadable files (CSV, Excel, PDF)
- Open data APIs
The scraper handles each format and normalises the output into a consistent schema.
3. Continuous Updates
Registers are not static — councils add new licences, revoke existing ones, and update property details on an ongoing basis. The scraper re-polls all sources on a rolling schedule so the data held in HMOwatch stays current without any manual intervention.
4. Aggregated Reference Layer
All scraped data is aggregated into a structured dataset that the rest of the application references. This powers:
- Property compliance lookups — check whether a specific address holds a valid HMO licence
- Landlord verification — confirm whether a landlord appears on a local authority register
- Change detection — identify when a property's licensing status changes between scrape cycles
Data Coverage
| Source Type | Description |
|---|---|
| Mandatory HMO registers | Large HMOs (5+ occupants, 3+ storeys) — required nationally |
| Additional licensing registers | Council-specific schemes covering smaller HMOs |
| Selective licensing registers | Broader private rented sector schemes where applicable |
How Data Is Kept Fresh
The scraper runs continuously in the background:
- Each local authority source is assigned a polling interval based on how frequently that council updates its register.
- On each poll, the scraper compares new data against the previously stored snapshot.
- Any additions, removals, or changes are written to the database and can trigger downstream alerts.
Limitations
- Some councils do not publish machine-readable registers; these sources are flagged for manual review.
- Data accuracy is dependent on councils keeping their own registers up to date.
- New local authority schemes may take one scrape cycle to appear after they are published.
Related Features
- Compliance Alerts — notified when a property's licence status changes
- Property Lookup — search the aggregated dataset by address or postcode
- Landlord Register Search — query landlord-level licensing data across all councils