HTML-to-TipTap JSON Content Transformer
HTML-to-TipTap JSON Content Transformer
Version 1.0.27 introduces an automated transformer that converts raw HTML scraped from Teachable lesson pages into the structured TipTap / ProseMirror JSON document format used by the platform's lesson content schema.
Why This Exists
When the Teachable import engine scrapes a lesson page, the lesson body is returned as raw HTML. The platform's lesson editor and renderer do not operate on raw HTML — they use the TipTap document model (a superset of ProseMirror's JSON format). The transformer bridges this gap, ensuring imported content is immediately usable without any manual editing or reformatting.
How It Works
The transformer receives an HTML string (the scraped lesson body) and outputs a ProseMirror-compatible JSON document object. The process is:
- Parse — The raw HTML string is parsed into a DOM tree.
- Traverse — Each HTML node is visited recursively.
- Map — Recognised elements and inline marks are converted to their TipTap equivalents.
- Output — A valid TipTap
docJSON object is returned and stored in the lesson content schema.
Supported Elements
Block Nodes
| HTML Element | TipTap Node Type | Notes |
|---|---|---|
<h1> – <h6> | heading | level attribute is preserved (1–6) |
<p> | paragraph | Standard paragraph node |
<ul> | bulletList | Contains listItem nodes |
<ol> | orderedList | Contains listItem nodes |
<li> | listItem | Child of bulletList or orderedList |
Inline Marks
| HTML Element | TipTap Mark | Notes |
|---|---|---|
<strong>, <b> | bold | Applied as an inline mark on text |
<em>, <i> | italic | Applied as an inline mark on text |
<code> | code | Inline code only (not fenced code blocks) |
Output Format
The transformer produces a standard TipTap/ProseMirror top-level document node. Example:
Input HTML:
<h2>Welcome to the course</h2>
<p>In this lesson you will learn <strong>core concepts</strong> and <em>best practices</em>.</p>
<ul>
<li>Concept one</li>
<li>Concept two</li>
</ul>
Output TipTap JSON:
{
"type": "doc",
"content": [
{
"type": "heading",
"attrs": { "level": 2 },
"content": [
{ "type": "text", "text": "Welcome to the course" }
]
},
{
"type": "paragraph",
"content": [
{ "type": "text", "text": "In this lesson you will learn " },
{ "type": "text", "text": "core concepts", "marks": [{ "type": "bold" }] },
{ "type": "text", "text": " and " },
{ "type": "text", "text": "best practices", "marks": [{ "type": "italic" }] },
{ "type": "text", "text": "." }
]
},
{
"type": "bulletList",
"content": [
{
"type": "listItem",
"content": [
{ "type": "paragraph", "content": [{ "type": "text", "text": "Concept one" }] }
]
},
{
"type": "listItem",
"content": [
{ "type": "paragraph", "content": [{ "type": "text", "text": "Concept two" }] }
]
}
]
}
]
}
Integration with the Teachable Import Engine
The transformer runs automatically as part of the Teachable scrape-and-import pipeline:
- The scraper extracts the lesson page HTML from Teachable.
- The HTML is passed through the transformer.
- The resulting TipTap JSON document is saved to the lesson's
contentfield in the platform database. - The lesson is immediately renderable and editable in the platform's lesson editor — no further migration steps required.
Limitations
- Fenced / block-level code: Only inline
<code>elements are supported in this release. Block-level<pre><code>constructs are not yet mapped to acodeBlocknode. - Tables:
<table>elements are not transformed in this release. - Embedded media:
<img>,<iframe>, and video embeds are handled by separate pipeline stages and are not in scope for this transformer. - Custom Teachable widgets: Proprietary Teachable UI components that render as non-standard HTML may not be preserved.