Taylor Hunt

Posted on Mar 15, 2022 • Edited on Jan 17, 2024

The weirdly obscure art of Streamed HTML

#webdev #performance #marko #svelte

My goal from last time: reuse our existing APIs in a demo of the fastest possible version of our ecommerce website… and keep it under 20 kilobytes.

I decided this called for an MPA. (aka a traditional web app. Site. Thang. Not-SPA. Whatever.)

And with that decision, I doomed the site to feel slow and clunky

In theory, there’s no reason MPA interactions need be as slow as commonly encountered. But in practice, there are many reasons.

Here’s an example. At Kroger, product searches take two steps:

Send user’s query to an API to get matching product codes
Send those product codes to an API to get product names, prices, etc.

Using those APIs to generate a search results page would look something like this:

const resultsData = fetch(`/api/search?${new URLSearchParams({
    query: usersSearchString
  })}`)
  .then(r => r.json())
  .then(({ upcs }) =>
    fetch(`/api/products/details?${new URLSearchParams({ upcs })}`
  )
  .then(r => r.json())

res.writeHead(resultsData.success ? 200 : 500, {
  'content-type': 'text/html;charset=utf-8'
})

const htmlResponse = searchPageTemplate.render({
    searchedQuery: usersSearchString,
    results: resultsData
  })

res.write(htmlResponse)
res.end()

Each fetch takes time, and /api/products/details only happens after /api/search finishes. Plus, those requests traveled from my computer to the datacenter and back. A real server making those calls would sit next to the others for very fast requests.

But on my demo machine, the calls usually took ~200 milliseconds, sometimes spiking as high as 800ms. Combined with the target 3G network, server processing time, and other inconvenient realities, I was frequently flouting the 100–1000ms limit for “natural and continuous progression of tasks”.

So the problem is high Time to First Byte, huh?

No worries! High Time to First Byte (TTFB) is a known performance culprit. Browser devtools, Lighthouse, and other speed utensils all warn about it, so there’s lots of advice for fixing it!

Except, none of the easily-found advice for improving TTFB helps:

Optimize the server application to prepare pages faster: Node.js spent 30ms or less handling the request and sending HTML. Very little to be gained there.
Optimize database queries or migrate to faster database systems: I was not allowed to touch our databases or API servers.
Upgrade server hardware to have more memory or CPU: This ran on a MacBook Pro with plenty of unused RAM and CPU.
Cache the expensive lookups: Caching can’t help the first requests, unless I precache all known endpoints or something. Even then, that wouldn’t work for search: users can and will search for strings never seen before.

The problem: web performance is other people

If only two API calls were a struggle, I was in for ruination. Here are some data sources our homepage uses:

Authentication and user info
Items in shopping cart
Selected store, pickup vs. delivery, etc.
Recommended products
Products on sale
Previous purchases
Sponsored products
Location-specific promotions
Recommended coupons
A/B tests
Subscription to Kroger Boost
…and so on. You get it, there’s a lot — and that’s only the stuff you can see.

Like many large companies, each data source may be owned by different teams. With their own schedules, SLAs, and bugs.

A unamused man in front of a whiteboard, upon which scads of shapes with silly labels like “Magic Baby” and “Hell Proxy”, festooned by arrows pointing every which way. — After you see real API charts, Krazam’s satirical microservices diagram gets either more or less funny. Still figuring out which.

Let’s say the 10 data sources I listed are each one API call. What are the odds my server can respond quickly enough?

Let’s say 1 client request creates 10 downstream requests to a longtail-latency affected subsystem. And assume it has a 1% probability of responding slowly to a single request. Then the probability that at least 1 of the 10 downstream requests are affected by the longtail latencies is equivalent to the complement of all downstream requests responding fast (99% probability of responding fast to any single request) which is:

$1 - (0.99)^{10} = 0.095$
That’s 9.5 percent! This means that the 1 client request has an almost 10 percent chance of being affected by a slow response. That is equivalent to expecting 100,000 client requests being affected out of 1 million client requests. That’s a lot of members!

—Who moved my 99th percentile latency?

And since users visit multiple pages in MPAs, the chances of suffering a high TTFB approaches “guaranteed”:

Gil walks through a simple, hypothetical example: a typical user session involves five page loads, averaging 40 resources per page. How many users will not experience something worse than the 95th percentile? 0.003%.

—Everything You Know About Latency Is Wrong

I believe this is why Kroger.com used a SPA in the first place — if disparate teams’ APIs can’t be trusted, at least they won’t affect other teams’ code. (Similar insulation from other teams’ components is probably one reason for React’s industry adoption.)

The solution: streamed HTML

It’s easier to show than to explain:

Both pages show search results in 2.5 seconds. But they sure don’t feel the same.

Not all sites have my API bottlenecking issue, but many have its cousins: database queries and reading files. Showing pieces of a page as data sources finish is useful for almost any dynamic site. For example…

Showing the header before potentially-slow main content
Showing main content before sidebars, related posts, comments, and other non-critical information
Streaming paginated or batched queries as they progress instead of big expensive database queries

Beyond the obvious visual speedup, streamed HTML has other benefits:

Interactive ASAP: If a user visits the homepage and immediately tries searching, they don’t have to wait for anything but the header to submit their query.
Optimized asset delivery: Even with no <body> to show, you can stream the <head>. That lets browsers download and parse styles, scripts, and other assets while waiting for the rest of the HTML.
Less server effort: Streamed HTML uses less memory. Instead of building the full response in RAM, it sends generated bytes immediately.
More robust and faster than incremental updates via JavaScript: Fewer roundtrips, happens before/while JS boots, immune to JS errors and the other reasons 1% of visits have broken JavaScript…; And because it’s more efficient, that leaves more CPU and RAM for the JavaScript we do run, not to mention painting, layout, and user interactions.

But don’t take my word for it:

Hopefully, you see why I considered HTML streaming a must.

And that’s why not Svelte

Previously…

Maybe if I sprinkled the HTML with just enough CSS to look good… and if I had any room left, some laser-focused JavaScript for the pieces that benefit most from complex interactivity.

That’s exactly what Svelte excels at. So why didn’t I use it?

Because Svelte does not stream HTML. (I hope it does someday.)

If not Svelte, then what?

I found only 2 things on NPM that could stream HTML:

Dust, a template language that seems to have died twice.
Marko, some library with an ungoogleable name and a rainbow logo… oh, and JSX-like syntax? And a client-side virtual DOM that fit in my budget? And eBay has battle-tested it for its ecommerce websites? And it only uses client-side JS for stateful components? You don’t say.

It’s nice when a decision makes itself.

Marko’s `<await>` made streaming easy

Marko streams HTML with its <await> tag. I was pleasantly surprised at how easily it could optimize browser rendering, with all the control I wanted over HTTP, HTML, and JavaScript.

Disclaimer

I now work for eBay, but I didn’t yet when I wrote this post.

Buffered pages don’t show content as it loads, but Marko’s streaming pages show content incrementally. — Source: markojs.com/#streaming

As seen in Skeleton screens, but fast:

<SiteHead />

<h1>Search for “${searchQuery}”</h1>

<div.SearchSkeletons>
  <await(searchResultsFetch)> <!-- stalls the HTML stream until the API returns search results -->
    <@then|result|>
      <for|product| of=result.products>
        <ProductCard product=product />
      </for>
    </@then>
  </await>
</div>

`<await>` for nice-to-haves

Imagine a component that displays recommended products. Fetching the recommendations is usually fast, but every once in a while, the API hiccups. <await>’s got your back:

<await(productRecommendations)
    timeout=50> <!-- wait up to 50ms -->
  <@then|recs|>
    <RecommendedProductList of=recs />
  </@then>

  <@catch>
    <!-- don’t render anything; no big deal if this fails -->
  </@catch>
</await>

If you know how much money product recommendations make, you can fine-tune the timeout so the cost of the performance hit never exceeds that revenue.

And that’s not all!

<await(productRecommendations) client-reorder>
  <@placeholder>
    <!-- immediately render placeholder to prevent content jumping around -->
    <RecommendedProductsPlaceholder /> 
  </@placeholder>

  <@then|recs|>
    <RecommendedProductList of=recs />
  </@then>
</await>

The client-reorder attribute turns the <await> into an HTML fragment that doesn’t delay the rest of the page behind it, but asynchronously renders when ready. client-reorder requires JavaScript, so you can weigh the tradeoffs of using it vs. a timeout with no fallback. (I think you can even combine them.)

That’s how Facebook’s BigPipe renderer worked, which once lived on the same page as React. Wouldn’t it be nice to have the best of both?

Let me tell you: it is nice.

Marko’s `<await>` is awesome

Best of all, these <await> techniques are Marko’s golden path — heck, its very reason for being. Marko has stream control no other renderer makes easy, a way to automatically upgrade streamed HTML with JavaScript, and 8+ years of experience with the inevitable bugs and edge cases.

(Yes, I was quite taken with Marko. Let me have my fun.)

However, the fact that Marko was apparently my one option does raise a certain question…

Why is HTML streaming not common?

Or in the words of another developer after my demo: “if Chunked Transfer-Encoding is so useful, how come I’ve never heard of it?”

That is a very fair question. It’s not because it’s poorly-supported — HTML rendered progressively in Netscape 1.0. Beta Netscape 1.0. And it’s not because the technique is barely-used — Google search results stream after the top navbar, for instance.

I think one reason is the inconsistent name

Steve Souders called it “early flushing”, which is not… the best name.
“Chunked transfer-encoding” is the most unique, but it’s only in HTTP/1.1. HTTP/2, HTTP/3, and even HTTP/0.9 stream differently.
It was known as “HTTP streaming” before HLS, DASH, and other forms of video-over-HTTP took up that mindspace.
The catch-all term is “progressive rendering”, but that applies to many other things: interlaced images, visualizing large datsets, video game engine optimizations, etc.

Many languages/frameworks don’t care for streaming

Older languages/frameworks have long been able to stream HTML, but were never really good at it. Some examples:

PHP 🐘

Requires calling inscrutable output-buffering functions in a finicky order.

Ruby on Rails 🛤

ActionController::Streaming has a lot of caveats. In particular:

This approach was introduced in Rails 3.1 and is still improving. Several Rack middlewares may not work and you need to be careful when streaming. Those points are going to be addressed soon.

Rails hit 3.1 in 2011. There was clearly not much demand to address those points.

(Rails’ modern way is Turbo Streams, but those need JS to render, so not the same thing.)

Django 🐍

Django really doesn’t like streaming at all:

StreamingHttpResponse should only be used in situations where it is absolutely required that the whole content isn’t iterated before transferring the data to the client.

Perl 🐪

Perl’s autostream behavior is controlled by a $| variable (yes, that’s a pipe), but that sort of nonsense is normal for it. God I love Perl.

Because streaming was never their default happy path, languages/frameworks considered it a last resort where you gained performance at the expense of the “real” render features. Here’s a telling quote:

You can still write ASP.NET pages that properly stream data to the browser using Response.Write and Response.Flush. But you can’t do it within the normal ASP.NET page lifecycle. Maybe this is a natural consequence of the ASP.NET abstraction layer.

Regardless, it still sucks for users.

—The Lost Art of Progressive HTML Rendering

Node.js is a happy exception. As proudly described on Node’s About page:

HTTP is a first-class citizen in Node.js, designed with streaming and low latency in mind.

Despite that, the “new” hot JavaScript frameworks have been struggling to stream for a while:

React’s had renderToNodeStream since 2016, but using it was tricky. Ergonomic streaming is the focus of a lot of upcoming React SSR work.
Ember’s Glimmer can stream HTML, but it’s been “experimental” since 2016, too.
Vue can stream, but with caveats and incompatibilities since it’s not the default.

These frameworks have the staff, funding, and incentives to make streaming work, so the holdup must be something else. Maybe it’s hard to retrofit streaming onto their abstractions, especially without ruining established third-party integrations.

Streaming rarely mentioned as a TTFB fix

As mentioned near the beginning, when high TTFB is detected, streaming is almost never suggested as a fix.

I think that’s the biggest problem. A Web API with a bad name can become popular if it’s mentioned enough.

Personally, I’ve only seen streaming HTML recommended for TTFB once, and it’s in chapter 10 of High-Performance Browser Networking. In an aside. At the bottom.

(Inside a <details> labeled “Beware of The Leopard”.)

So that’s one silver bullet down

I had streaming HTML, but that was no substitute for the other 999 lead bullets to back it up. Now I had to… make the website.

You know, write the components, style the design, build the features. How hard could that be? (Hint: people are paid to do those things.)

Top comments (51)

Jon Randy 🎖️ • Mar 16 '22 • Edited

Years ago, I remember using a similar technique to stream JS that would update a progress bar for a long running server process. The PHP process would keep sending JS (comment lines I believe) to keep the connection open, and would occasionally drop in an updateProgress() call. The browser was happy to run the JS as it came in. I was amazed it worked, but work it did... Streaming JS!

Taylor Hunt • Mar 16 '22

Do you remember if it was called COMET, or related to that?

Jon Randy 🎖️ • Mar 16 '22

I wrote it all myself. Didn't name it

Alex Lohr • Mar 16 '22

One of the maintainers of Marko has written his own reactive framework in the meantime called Solid.js, which also adopted streaming capabilities.

It is slightly more react-like than Marko, so your other developers might feel more at home with it. Maybe you want to check it out.

Taylor Hunt • Mar 16 '22

Yeah, Ryan and I chat frequently in the Marko Discord. If Solid had adopted those streaming capabilities back when I embarked on this demo, it probably would have been a compelling option.

yw662 • Apr 25 '23

I just noticed today that safari is not working very well with streamed HTML. It waits for the whole document to download before it wants to render anything.

Maybe that is because I have a streaming custom element in the page but that doesn't make sense at all if Chrome is happy with what I do.

Taylor Hunt • Apr 26 '23

I suspect that’s because Safari doesn’t support Declarative Shadow DOM yet — does the page stream fine without the custom element?

yw662 • Apr 27 '23 • Edited

I tried adjusting things and find that the document itself is supported very well.

The real issue is, (on safari), If I modified style or class of an element, when it is still streaming, the style change won't happen until the element is complete. I am not sure it is just style and class, or it is I just cannot set and attributes.

Taylor Hunt • Apr 27 '23

Yeah, it’s kind of a long story. The gist of it though is I suspect you’re right, the streaming custom element sounds like the culprit due to historical plumbing issues. Not sure how you can work around it until Safari updates their support.

yw662 • Apr 27 '23 • Edited

I do that to avoid flashing of unregistered custom element (the custom element version of fouc). And my workaround is, allow that in safari.

yw662 • Apr 27 '23

The interesting part is that, I cannot even do :not(:defined){visibility: hidden} or :not(:defined){display:none}. What I can do, with safari, is,

 :not(:defined) { color: white }

Taylor Hunt • Apr 27 '23 • Edited

Do opacity or filter: opacity(…) work?

yw662 • Apr 28 '23

Yes that works. But really no much difference I guess.

yw662 • Apr 27 '23

But it is not a DSD, it is a normal custom element driven by javascript

peerreynders • Mar 16 '22 • Edited

I believe this is why Kroger.com used a SPA in the first place — if disparate teams’ APIs can’t be trusted, at least they won’t affect other teams’ code.

I think this is an aspect that assisted SPA adoption in general. Given that at the time MPAs were relatively slow to respond (server frameworks probably played a part as well), a ready-to-go (smallish) package of JS could have been shipped to the client and started doing some useful work.

Now when discussing web performance SPA proponents often counter that tuning with respect to FCP and TTI only effects initial page load — missing the point that if page loads are universally fast you may not need that SPA - unless you're going offline-first.

Back in 2013 Ilya Grigorik talked about "Breaking the 1000ms Time to Glass Mobile Barrier" and here we are 9 years later where multi-second load times on mobile are not unusual.

While now dated (2013: during 4G adoption) he describes the complexity of sending a packet over the cellular network which increases the latency for mobile compared to regular networks (sometimes I'm surprised anything works).

He also points out that when it comes to web page load times reducing latency matters more than increasing bandwidth (More bandwidth doesn't matter (much)).

Dust, a template language that seems to have died twice.

Patrick Steele-Idem (2014): Marko versus Dust

"Marko was developed to address some of the limitations of Dust as well as add a way to associate client-side code with specific templates to rein in jQuery, provide some much-needed structure, and provide reusable widgets. Version 1.0 was released in May 2014." from eBay's UI Framework Marko Adds Optimized Reactivity Model

I think one reason is the inconsistent name

Basically if you didn't geek out over HTTP servers and the HTTP spec you probably didn't know or think about "chunked transfer-encoding" (in connection with HTML responses). And since about 2016 online searches would funnel you to the Streams API.

Taylor Hunt • Mar 16 '22

missing the point that if page loads are universally fast you may not need that SPA - unless you're going offline-first.

The rest of this series will essentially be illustrating “are you sure you need that SPA?”

Offline-first is usually the purview of SPAs, or at best an app-shell model bolted onto a content-heavy website. However, I was able to do some mad science with Marko… its compiled-for-server representation only has to write to an outgoing stream. You probably see where this is going. (More on that later.)

Ilya’s work definitely inspired me. Subsequent MPA interactions luckily don’t have to do the full mobile connection TCP warmup if you use Connection: keepalive, and if you have analytics pinging your origin every 30 seconds anyway, that socket rarely gets torn down. We’ll see some of my measurements around that later.

Great point about the Streams API. Maybe we need an entirely new term altogether.

li.li • Feb 3 '23

An network proxy could break streaming likely. if the proxy buff the response, it won't send the chunk to client/user until it receives all the chunks.

lack of CDN support is another issue I guess. Did you ever run into this kind of issue? if you did, how did you fix it?

Taylor Hunt • Feb 3 '23

I have yet to find a CDN that doesn’t work with streaming, but apparently AWS Lambda doesn’t. (I don’t use AWS for other reasons, so this has never been relevant for me, but it may for others.)

Even if a misbehaving middlebox buffers the response, at least it’s no worse than doing the SSR all at once like before.

li.li • Feb 8 '23

what's about compression? does it work with streaming?

Taylor Hunt • Feb 8 '23

Yep! Each chunk of the stream can be compressed — Node.js in particular was designed to excel at combining gzip with Transfer-Encoding: chunked

Jon Nyman • Apr 30 '22

I'm glad Kroger is working on their performance issues. I remember going there and seeing a sign that if I sign up for an e-coupon I could get a deep discount on an item and it took 10 minutes just to log in on my phone. It was an extremely frustrating experience. But since I had more shopping to do it wasn't a huge deal, but still ridiculous.

I always curse devs that don't care about performance. Especially when it is known that the customer will be on a cellular connection. They make the experience horrible for people on slow cellular connections.

And for making websites overly complex. Just keep it simple using straight up simple HTML/CSS most of time works for most sites.

Viorel Mocanu • Mar 17 '22

Oh, I can't wait for the next posts in this saga! :) I've criticized SPAs for their lack of performance, stability, SEO, accessibility etc for a long time now, and I can't wait for an actual story or someone finding an alternative to API-dependent (or "non-static server depending") rendering...

Taylor Hunt • Mar 17 '22

I think you’ll like the (currently) fourth post, then

NFSF • Mar 18 '22

Love the series so far. Some more data points related to streaming

Instagram wrote about streaming HTML in their engineering blog: instagram-engineering.com/making-i...

LinkedIn does something similar, but with API data responses; data is flushed and streamed into the DOM inside script tags. This helps to parallelize the typical SPA lifecycle of 1) load the JS 2) make API calls 3) render views once the data is returned.