Teachable Scraper — Media Asset Migration
Teachable Scraper — Media Asset Migration
Overview
When importing a course from Teachable, the platform now downloads every media asset referenced in the course content and re-uploads it to the platform's own storage (Vercel Blob). All asset URLs in the imported content are rewritten to point to the new platform-hosted locations.
This ensures that after a completed import, no content in the platform retains any dependency on Teachable's CDN.
What Gets Migrated
The scraper identifies and migrates the following asset types during an import:
| Asset Type | Examples |
|---|---|
| Images | Inline images embedded in lesson pages, section headers, descriptions |
| Document attachments | PDFs, slide decks, and other downloadable files attached to lessons |
Video links (e.g. Wistia, Vimeo, YouTube embeds) are handled separately and are not downloaded — only their embed references are preserved.
How It Works
1. Asset Extraction
During the scrape phase, the import engine traverses the full course structure — lectures, lesson pages, and attachment lists — and collects every media URL originating from Teachable's CDN.
2. Download from Teachable CDN
Each discovered asset is downloaded directly from the Teachable CDN at import time. This happens as part of the import job and requires no manual action.
3. Re-upload to Vercel Blob
Downloaded assets are uploaded to Vercel Blob under a path prefix scoped to the importing organization. The path structure isolates each tenant's assets:
/<org-id>/courses/<course-id>/assets/<filename>
This multi-tenant isolation ensures that assets belonging to one organization are never accessible under another organization's path.
4. URL Rewriting
Once an asset is successfully uploaded, every occurrence of the original Teachable CDN URL in the imported content is replaced with the new Vercel Blob URL. This rewrite is applied across:
- Lesson body content (HTML/rich text)
- Section and lecture descriptions
- Attachment metadata
The final imported content stored in the platform contains no references to Teachable's CDN.
Storage & Tenancy
- Assets are stored in Vercel Blob and served via the platform's own CDN-backed URLs.
- Each organization's assets are namespaced under their unique organization ID, enforcing tenant isolation at the storage layer.
- There is no shared storage between organizations — an asset uploaded during one organization's import is not accessible from another organization's namespace.
Behaviour During Import
- Asset download and re-upload occurs automatically as part of the Teachable import job. No additional configuration is required.
- If an individual asset download fails (e.g. the source URL is no longer accessible on Teachable's CDN), the import job records the failure for that asset and continues processing remaining content. The original URL is retained for any asset that could not be migrated.
- The import is considered complete only after all reachable assets have been processed and URLs rewritten.
Before This Release
Prior to v1.0.18, the Teachable scraper imported course structure, lesson text, and embedded media references, but left all image and document URLs pointing at Teachable's CDN. This meant:
- Imported content remained dependent on the source Teachable school staying active.
- Assets could become inaccessible if the Teachable school was unpublished, paused, or deleted.
- Organizations did not own or control the assets embedded in their imported courses.
As of v1.0.18, all media assets are fully migrated and owned by the platform at import time.