DEV Community

Cover image for Coming Back to Kotlin: Building a Real App with KMP Part 3 — From Raw HTML to Native Content
Raul Arroyo
Raul Arroyo

Posted on

Coming Back to Kotlin: Building a Real App with KMP Part 3 — From Raw HTML to Native Content

Part 3 — Parsing WordPress HTML and Enriching Embeds


WordPress posts arrive as raw HTML strings. A single post might contain paragraphs, headings, YouTube iframes, Spotify embeds, Instagram blockquotes, image galleries, internal post references, and ordered lists — all mixed together, none of it consistent.

You can't just dump that into a WebView and call it a day. Well, you could. But the result looks like a website inside an app, it doesn't respect your theme, dark mode breaks, fonts are wrong, and interactions feel foreign. For a magazine app where the reading experience is the whole point, that's not acceptable.

The approach: a custom HTML parser that converts raw WordPress HTML into a typed List<ContentPart> that Compose can render natively.


ContentPart — The Typed Model

The first thing to build is the sealed interface that represents every possible block of content:

sealed interface ContentPart {
    data class Text(val content: AnnotatedString) : ContentPart
    data class Heading(val content: AnnotatedString, val level: Int) : ContentPart
    data class Image(val url: String, val alt: String?) : ContentPart
    data class Gallery(val imageUrls: List<String>) : ContentPart
    data class Video(val thumbnailUrl: String, val videoUrl: String) : ContentPart
    data class Spotify(
        val embedUrl: String,
        val title: String? = null,
        val thumbnailUrl: String? = null
    ) : ContentPart
    data class InstagramPost(val url: String, val username: String?) : ContentPart
    data class InternalPost(val slug: String, val title: String?) : ContentPart
    data class UnorderedList(val items: List<AnnotatedString>) : ContentPart
    data class OrderedList(val items: List<AnnotatedString>) : ContentPart
    data class CallToAction(val text: String, val url: String) : ContentPart
    data class Divider : ContentPart
    data class Html(val content: String) : ContentPart
}
Enter fullscreen mode Exit fullscreen mode

Html is the escape hatch — anything the parser doesn't recognize yet goes there. It renders as-is with a fallback composable. Over time, as new block types show up in the content, they get their own ContentPart subtype and a proper UI.


The Parser — Strategy Pattern

The parser uses a BlockHandler interface:

interface BlockHandler {
    fun canHandle(tagName: String): Boolean
    fun handle(tagName: String, content: String): List<ContentPart>
}
Enter fullscreen mode Exit fullscreen mode

Each handler knows how to process one type of block. The main HtmlParser finds block-level tags with a regex, extracts the full block content (with a depth counter for nested tags), and routes to the right handler:

val blockRegex = Regex(
    "<(p|img|iframe|figure|hr|ul|ol|h[1-6])(?:\\s[^>]*)?>",
    RegexOption.IGNORE_CASE
)
Enter fullscreen mode Exit fullscreen mode

The handlers are registered as a list and checked in order:

fun defaultHandlers(): List<BlockHandler> = listOf(
    DividerBlockHandler,
    ImageBlockHandler,
    TextBlockHandler,
    EmbedBlockHandler,
    CallToActionBlockHandler,
    ListBlockHandler
)
Enter fullscreen mode Exit fullscreen mode

A KMP-specific gotcha

Kotlin's Regex on KMP doesn't support DOT_MATCHES_ALL as a RegexOption. On JVM you can use (?s) in the pattern or Pattern.DOTALL. In KMP the portable fix is replacing . with [\s\S] in any regex that needs to match across newlines. Worth knowing before you spend an hour wondering why your regex works in unit tests on JVM but silently fails on iOS.


The Tricky Cases

Most block types are straightforward — a <p> becomes ContentPart.Text, an <img> becomes ContentPart.Image, an <hr> becomes ContentPart.Divider. The interesting ones are the embeds.

YouTube — lazy loading

WordPress with the custom theme uses lazy loading on iframes. The actual URL isn't in src — it's in data-lazy-src:

<iframe src="about:blank" 
        data-lazy-src="https://www.youtube.com/embed/{videoId}">
Enter fullscreen mode Exit fullscreen mode

Any image validation logic needs to filter out about:blank URLs or you'll end up with broken thumbnails. The EmbedBlockHandler checks data-lazy-src first, then falls back to src.

For YouTube we don't embed the player — we show a thumbnail with a play button that opens the YouTube app via a confirmation dialog. Clean, fast, no WebView required.

Youtube

Youtube open dialog

Spotify — oEmbed enrichment

Spotify embeds arrive as an iframe with just the embed URL. Not very exciting to look at as a card. But Spotify has a public oEmbed endpoint that returns the track/album/playlist title and thumbnail — no authentication required:

GET https://open.spotify.com/oembed?url={spotifyUrl}
Enter fullscreen mode Exit fullscreen mode

The SpotifyContentPart composable fetches this data with a LaunchedEffect and shows a skeleton while it loads. The result is a card with the actual album art and track title — something worth tapping.

The decision of where to make this request was interesting. It could live in the ViewModel, pre-fetched for all Spotify parts before rendering. But that means waiting for N requests before showing anything. Instead, each SpotifyContentPart fetches its own data independently using koinInject<SpotifyService>() directly in the composable. The cards load progressively as you scroll.

Spotify

Spotify open dialog

Instagram — blockquote detection

Instagram blocks thumbnail URLs from third-party apps — or so I thought.
Instagram actually has a semi-public thumbnail URL pattern:

https://www.instagram.com/p/{shortcode}/media/?size=l
Enter fullscreen mode Exit fullscreen mode

We extract the shortcode from the embed URL and attempt to load it with SubcomposeAsyncImage. If Instagram blocks the request (which it does inconsistently), the error slot shows a fallback placeholder with the Instagram gradient branding. If it loads — and it does load sometimes — you get the actual post image with the aspect ratio adjusted dynamically using onSuccess to read the painter's intrinsicSize.

onSuccess = { state ->
    val width = state.painter.intrinsicSize.width
    val height = state.painter.intrinsicSize.height
    if (width > 0 && height > 0) {
        imageAspectRatio = width / height
    }
}
Enter fullscreen mode Exit fullscreen mode

The container height starts at 220.dp as a fallback and adjusts to the real aspect ratio once the image loads. Not perfect, but better than a static placeholder.

Instagram

Instagram open dialog

Internal post references

The magazine embeds links to related articles within post content using a custom WordPress block class: wp-block-embed-my-wp-site. The EmbedBlockHandler detects this and extracts the slug from the href:

if (content.contains("wp-block-embed-my-wp-site")) {
    val slug = extractSiteSlug(content) ?: return emptyList()
    val title = extractEmbedTitle(content)
    return listOf(ContentPart.InternalPost(slug = slug, title = title))
}
Enter fullscreen mode Exit fullscreen mode

At render time, InternalPostContentPart resolves the slug to a post ID via the repository — first checking the local DB, then the API if not cached — and navigates to that post on tap using the AppNavigator singleton from Part 2.

Internal post


Image Grouping

WordPress content frequently includes multiple consecutive images — a photo series from a concert, an event gallery. Rendering them one-by-one as full-width images takes forever to scroll through.

After parsing, the ViewModel runs a post-processing step that groups three or more consecutive ContentPart.Image items into a ContentPart.Gallery:

private fun groupConsecutiveImages(
    parts: List<ContentPart>,
    minCount: Int = 3
): List<ContentPart> {
    val result = mutableListOf<ContentPart>()
    var i = 0
    while (i < parts.size) {
        val part = parts[i]
        if (part is ContentPart.Image) {
            var j = i
            while (j < parts.size && parts[j] is ContentPart.Image) j++
            val count = j - i
            if (count >= minCount) {
                val images = parts.subList(i, j)
                    .filterIsInstance<ContentPart.Image>()
                    .mapNotNull { it.url }
                result.add(ContentPart.Gallery(imageUrls = images))
                i = j
            } else {
                repeat(count) { result.add(parts[i + it]) }
                i += count
            }
        } else {
            result.add(part)
            i++
        }
    }
    return result
}
Enter fullscreen mode Exit fullscreen mode

The GalleryContentPart composable renders these with different layouts depending on count — 3 images get a "hero + two below" layout, 4 get a "large left + three stacked right" layout, and 5+ get a mosaic with a +N counter on the last tile. Tapping any image or the counter opens a full-screen gallery viewer with pinch-to-zoom and swipe navigation.

Gallery

Gallery plus


The Result

After parsing, a typical post produces something like:

ContentPart.Text("La cantante belga Angèle ha regresado...")
ContentPart.InstagramPost(url = "https://www.instagram.com/p/DVTyrbVCPXe/")
ContentPart.Text("El video musical, creado con (LA)HORDE...")
ContentPart.Video(thumbnailUrl = "...", videoUrl = "...")
Enter fullscreen mode Exit fullscreen mode

Each type gets its own composable renderer. The LazyColumn in PostDetailContent iterates over the list and dispatches to the right component. No WebViews, no HTML rendering engines, no platform-specific workarounds — just Compose all the way down.


What's Next

In Part 4 we cover the UI layer — the PostHero component with the collapsible top bar, custom theming with Anton and Space Grotesk fonts, platform-specific transitions, and the Material3 shapes gotcha that will catch you off guard if you expect the theme to propagate automatically.


Stack: Kotlin 2.3.10 · Compose Multiplatform 1.10.2 · Ktor 3.4.1 · SQLDelight 2.3.1 · Koin 4.1.1

Top comments (0)