Part 3 — Parsing WordPress HTML and Enriching Embeds
WordPress posts arrive as raw HTML strings. A single post might contain paragraphs, headings, YouTube iframes, Spotify embeds, Instagram blockquotes, image galleries, internal post references, and ordered lists — all mixed together, none of it consistent.
You can't just dump that into a WebView and call it a day. Well, you could. But the result looks like a website inside an app, it doesn't respect your theme, dark mode breaks, fonts are wrong, and interactions feel foreign. For a magazine app where the reading experience is the whole point, that's not acceptable.
The approach: a custom HTML parser that converts raw WordPress HTML into a typed List<ContentPart> that Compose can render natively.
ContentPart — The Typed Model
The first thing to build is the sealed interface that represents every possible block of content:
sealed interface ContentPart {
data class Text(val content: AnnotatedString) : ContentPart
data class Heading(val content: AnnotatedString, val level: Int) : ContentPart
data class Image(val url: String, val alt: String?) : ContentPart
data class Gallery(val imageUrls: List<String>) : ContentPart
data class Video(val thumbnailUrl: String, val videoUrl: String) : ContentPart
data class Spotify(
val embedUrl: String,
val title: String? = null,
val thumbnailUrl: String? = null
) : ContentPart
data class InstagramPost(val url: String, val username: String?) : ContentPart
data class InternalPost(val slug: String, val title: String?) : ContentPart
data class UnorderedList(val items: List<AnnotatedString>) : ContentPart
data class OrderedList(val items: List<AnnotatedString>) : ContentPart
data class CallToAction(val text: String, val url: String) : ContentPart
data class Divider : ContentPart
data class Html(val content: String) : ContentPart
}
Html is the escape hatch — anything the parser doesn't recognize yet goes there. It renders as-is with a fallback composable. Over time, as new block types show up in the content, they get their own ContentPart subtype and a proper UI.
The Parser — Strategy Pattern
The parser uses a BlockHandler interface:
interface BlockHandler {
fun canHandle(tagName: String): Boolean
fun handle(tagName: String, content: String): List<ContentPart>
}
Each handler knows how to process one type of block. The main HtmlParser finds block-level tags with a regex, extracts the full block content (with a depth counter for nested tags), and routes to the right handler:
val blockRegex = Regex(
"<(p|img|iframe|figure|hr|ul|ol|h[1-6])(?:\\s[^>]*)?>",
RegexOption.IGNORE_CASE
)
The handlers are registered as a list and checked in order:
fun defaultHandlers(): List<BlockHandler> = listOf(
DividerBlockHandler,
ImageBlockHandler,
TextBlockHandler,
EmbedBlockHandler,
CallToActionBlockHandler,
ListBlockHandler
)
A KMP-specific gotcha
Kotlin's Regex on KMP doesn't support DOT_MATCHES_ALL as a RegexOption. On JVM you can use (?s) in the pattern or Pattern.DOTALL. In KMP the portable fix is replacing . with [\s\S] in any regex that needs to match across newlines. Worth knowing before you spend an hour wondering why your regex works in unit tests on JVM but silently fails on iOS.
The Tricky Cases
Most block types are straightforward — a <p> becomes ContentPart.Text, an <img> becomes ContentPart.Image, an <hr> becomes ContentPart.Divider. The interesting ones are the embeds.
YouTube — lazy loading
WordPress with the custom theme uses lazy loading on iframes. The actual URL isn't in src — it's in data-lazy-src:
<iframe src="about:blank"
data-lazy-src="https://www.youtube.com/embed/{videoId}">
Any image validation logic needs to filter out about:blank URLs or you'll end up with broken thumbnails. The EmbedBlockHandler checks data-lazy-src first, then falls back to src.
For YouTube we don't embed the player — we show a thumbnail with a play button that opens the YouTube app via a confirmation dialog. Clean, fast, no WebView required.
Spotify — oEmbed enrichment
Spotify embeds arrive as an iframe with just the embed URL. Not very exciting to look at as a card. But Spotify has a public oEmbed endpoint that returns the track/album/playlist title and thumbnail — no authentication required:
GET https://open.spotify.com/oembed?url={spotifyUrl}
The SpotifyContentPart composable fetches this data with a LaunchedEffect and shows a skeleton while it loads. The result is a card with the actual album art and track title — something worth tapping.
The decision of where to make this request was interesting. It could live in the ViewModel, pre-fetched for all Spotify parts before rendering. But that means waiting for N requests before showing anything. Instead, each SpotifyContentPart fetches its own data independently using koinInject<SpotifyService>() directly in the composable. The cards load progressively as you scroll.
Instagram — blockquote detection
Instagram blocks thumbnail URLs from third-party apps — or so I thought.
Instagram actually has a semi-public thumbnail URL pattern:
https://www.instagram.com/p/{shortcode}/media/?size=l
We extract the shortcode from the embed URL and attempt to load it with SubcomposeAsyncImage. If Instagram blocks the request (which it does inconsistently), the error slot shows a fallback placeholder with the Instagram gradient branding. If it loads — and it does load sometimes — you get the actual post image with the aspect ratio adjusted dynamically using onSuccess to read the painter's intrinsicSize.
onSuccess = { state ->
val width = state.painter.intrinsicSize.width
val height = state.painter.intrinsicSize.height
if (width > 0 && height > 0) {
imageAspectRatio = width / height
}
}
The container height starts at 220.dp as a fallback and adjusts to the real aspect ratio once the image loads. Not perfect, but better than a static placeholder.
Internal post references
The magazine embeds links to related articles within post content using a custom WordPress block class: wp-block-embed-my-wp-site. The EmbedBlockHandler detects this and extracts the slug from the href:
if (content.contains("wp-block-embed-my-wp-site")) {
val slug = extractSiteSlug(content) ?: return emptyList()
val title = extractEmbedTitle(content)
return listOf(ContentPart.InternalPost(slug = slug, title = title))
}
At render time, InternalPostContentPart resolves the slug to a post ID via the repository — first checking the local DB, then the API if not cached — and navigates to that post on tap using the AppNavigator singleton from Part 2.
Image Grouping
WordPress content frequently includes multiple consecutive images — a photo series from a concert, an event gallery. Rendering them one-by-one as full-width images takes forever to scroll through.
After parsing, the ViewModel runs a post-processing step that groups three or more consecutive ContentPart.Image items into a ContentPart.Gallery:
private fun groupConsecutiveImages(
parts: List<ContentPart>,
minCount: Int = 3
): List<ContentPart> {
val result = mutableListOf<ContentPart>()
var i = 0
while (i < parts.size) {
val part = parts[i]
if (part is ContentPart.Image) {
var j = i
while (j < parts.size && parts[j] is ContentPart.Image) j++
val count = j - i
if (count >= minCount) {
val images = parts.subList(i, j)
.filterIsInstance<ContentPart.Image>()
.mapNotNull { it.url }
result.add(ContentPart.Gallery(imageUrls = images))
i = j
} else {
repeat(count) { result.add(parts[i + it]) }
i += count
}
} else {
result.add(part)
i++
}
}
return result
}
The GalleryContentPart composable renders these with different layouts depending on count — 3 images get a "hero + two below" layout, 4 get a "large left + three stacked right" layout, and 5+ get a mosaic with a +N counter on the last tile. Tapping any image or the counter opens a full-screen gallery viewer with pinch-to-zoom and swipe navigation.
The Result
After parsing, a typical post produces something like:
ContentPart.Text("La cantante belga Angèle ha regresado...")
ContentPart.InstagramPost(url = "https://www.instagram.com/p/DVTyrbVCPXe/")
ContentPart.Text("El video musical, creado con (LA)HORDE...")
ContentPart.Video(thumbnailUrl = "...", videoUrl = "...")
Each type gets its own composable renderer. The LazyColumn in PostDetailContent iterates over the list and dispatches to the right component. No WebViews, no HTML rendering engines, no platform-specific workarounds — just Compose all the way down.
What's Next
In Part 4 we cover the UI layer — the PostHero component with the collapsible top bar, custom theming with Anton and Space Grotesk fonts, platform-specific transitions, and the Material3 shapes gotcha that will catch you off guard if you expect the theme to propagate automatically.
Stack: Kotlin 2.3.10 · Compose Multiplatform 1.10.2 · Ktor 3.4.1 · SQLDelight 2.3.1 · Koin 4.1.1









Top comments (0)