<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: HTTP Archive</title>
    <description>The latest articles on DEV Community by HTTP Archive (@httparchive).</description>
    <link>https://dev.to/httparchive</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1881%2F2d7eb2b5-31d8-4263-a339-7c3229e24edb.jpg</url>
      <title>DEV Community: HTTP Archive</title>
      <link>https://dev.to/httparchive</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/httparchive"/>
    <language>en</language>
    <item>
      <title>Querying parsed HTML in BigQuery</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Fri, 26 May 2023 16:05:19 +0000</pubDate>
      <link>https://dev.to/httparchive/querying-parsed-html-in-bigquery-4ia2</link>
      <guid>https://dev.to/httparchive/querying-parsed-html-in-bigquery-4ia2</guid>
      <description>&lt;p&gt;A longstanding problem in the &lt;a href="https://httparchive.org/"&gt;HTTP Archive&lt;/a&gt; dataset has been extracting insights from blobs of HTML in BigQuery. For example, take the source code of example.com:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!doctype html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;head&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Example Domain&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;charset=&lt;/span&gt;&lt;span class="s"&gt;"utf-8"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;http-equiv=&lt;/span&gt;&lt;span class="s"&gt;"Content-type"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"text/html; charset=utf-8"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"viewport"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"width=device-width, initial-scale=1"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;style &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"text/css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/style&amp;gt;&lt;/span&gt;    
&lt;span class="nt"&gt;&amp;lt;/head&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Example Domain&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;p&amp;gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://www.iana.org/domains/example"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;More information...&lt;span class="nt"&gt;&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you wanted to extract the link text in the last paragraph, you could do something relatively straightforward like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 'More information...'&lt;/span&gt;
&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;p:last-child a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But in BigQuery, we don't have the luxury of the &lt;code&gt;document&lt;/code&gt; object, &lt;code&gt;querySelector&lt;/code&gt;, or &lt;code&gt;textContent&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Instead, we've had to resort to unwieldy regular expressions like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="s1"&gt;'More information...'&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;REGEXP_EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;p&amp;gt;&amp;lt;a[^&amp;gt;]*&amp;gt;([^&amp;lt;]*)&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;link_text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;body&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks like it works, but it's brittle.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What if there's text or whitespace between the elements?&lt;/li&gt;
&lt;li&gt;What if there are attributes on the paragraph?&lt;/li&gt;
&lt;li&gt;What if there's another p&amp;gt;a element pair earlier in the page?&lt;/li&gt;
&lt;li&gt;What if the page uses uppercase tag names?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It goes on and on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using regular expressions for parsing HTML seems like a good idea at first, but it becomes a nightmare as you need to ramp it up to increasingly unpredictable inputs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To avoid this headache in HTTP Archive analyses, we've resorted to &lt;a href="https://github.com/HTTPArchive/custom-metrics"&gt;custom metrics&lt;/a&gt;. These are executed on each page at runtime, and it's been really effective. It enables us to analyze both the fully rendered page as well as the static HTML. But one big limitation with custom metrics is that &lt;em&gt;they only work at runtime&lt;/em&gt;. So if we want to change the code or analyze an older dataset, we're out of luck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cheerio
&lt;/h2&gt;

&lt;p&gt;While looking for a way to implement &lt;a href="https://github.com/rviscomi/capo.js"&gt;capo.js&lt;/a&gt; in BigQuery to understand how pages in HTTP Archive are ordered, I came across the &lt;a href="https://cheerio.js.org/"&gt;Cheerio&lt;/a&gt; library, which is a jQuery-like interface over an HTML parser.&lt;/p&gt;

&lt;p&gt;It works &lt;em&gt;beautifully&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kk7SRScE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qhdpvytcc4jzlx3kt16d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kk7SRScE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qhdpvytcc4jzlx3kt16d.png" alt="Screenshot of a BigQuery query and result showing example.com being analyzed with the CAPO custom function." width="800" height="705"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To be able to use Cheerio in BigQuery, I first needed to build a JavaScript binary that I could load into a UDF. The post &lt;a href="https://asyncq.com/using-npm-library-in-google-bigquery-udf"&gt;How To Use NPM Library in Google BigQuery UDF&lt;/a&gt; was a big help. I installed the Cheerio library locally and built it into a script with an exposed &lt;code&gt;cheerio&lt;/code&gt; global variable using Webpack.&lt;/p&gt;

&lt;p&gt;I uploaded the script to HTTP Archive's Google Cloud Storage bucket. Then in BigQuery, I was able to side-load the script into the UDF with &lt;a href="https://cloud.google.com/bigquery/docs/user-defined-functions#including-javascript-libraries"&gt;OPTIONS&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;OPTIONS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'gs://httparchive/lib/cheerio.js'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there, the UDF was able to reference the cheerio object to parse the HTML input and generate the results. You can see it in action at &lt;a href="https://github.com/rviscomi/capo.js/tree/main/bigquery"&gt;capo.sql&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying HTML in BigQuery
&lt;/h3&gt;

&lt;p&gt;Here's a full demo of the example.com link text solution in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DECLARE&lt;/span&gt; &lt;span class="n"&gt;example_html&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;example_html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;
&amp;lt;!doctype html&amp;gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
    &amp;lt;title&amp;gt;Example Domain&amp;lt;/title&amp;gt;

    &amp;lt;meta charset="utf-8" /&amp;gt;
    &amp;lt;meta http-equiv="Content-type" content="text/html; charset=utf-8" /&amp;gt;
    &amp;lt;meta name="viewport" content="width=device-width, initial-scale=1" /&amp;gt;
    &amp;lt;style type="text/css"&amp;gt;...&amp;lt;/style&amp;gt;    
&amp;lt;/head&amp;gt;

&amp;lt;body&amp;gt;
&amp;lt;div&amp;gt;
    &amp;lt;h1&amp;gt;Example Domain&amp;lt;/h1&amp;gt;
    &amp;lt;p&amp;gt;This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.&amp;lt;/p&amp;gt;
    &amp;lt;p&amp;gt;&amp;lt;a href="https://www.iana.org/domains/example"&amp;gt;More information...&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/div&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TEMP&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;getLinkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;js&lt;/span&gt;
&lt;span class="k"&gt;OPTIONS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'gs://httparchive/lib/cheerio.js'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;
try {
  const $ = cheerio.load(html);
  return $('&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="k"&gt;last&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="s1"&gt;').text();
} catch (e) {
  return null;
}
&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;getLinkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example_html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;link_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔗 &lt;a href="https://console.cloud.google.com/bigquery?sq=226352634162:d0993fbf625d4fe986284e437d123c9a"&gt;Try it on BigQuery&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results show it working as expected:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MMke1p6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bq7atbe47wui69js4km5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MMke1p6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bq7atbe47wui69js4km5.png" alt="Query results" width="800" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JuQgfYPE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rtxgedubjtupsktnaxge.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JuQgfYPE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rtxgedubjtupsktnaxge.png" alt="Cheerio screenshot as blazingly fast and incredibly efficient" width="658" height="776"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cheerio is marketed as fast and efficient.&lt;/p&gt;

&lt;p&gt;If you try to parse every HTML response body in HTTP Archive, the query will fail.&lt;/p&gt;

&lt;p&gt;Fully built, the library is 331 KB. And due to the need for storing the HTML in memory to parse it, it consumes a lot of memory for large blobs.&lt;/p&gt;

&lt;p&gt;To minimize the chances of OOM errors and speed up the query, one thing you can do is pare down the HTML to the area of interest using only the most basic regular expressions. Since the capo script is only concerned with the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; element, I grabbed everything up to the closing &lt;code&gt;&amp;lt;/head&amp;gt;&lt;/code&gt; tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;httparchive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CAPO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;REGEXP_EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(?i)(.*&amp;lt;/head&amp;gt;)'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If there are no natural "breakpoints" in the document for your use case, you could also consider restricting the input to a certain character length like &lt;code&gt;WHERE LENGTH(response_body) &amp;lt; 1000&lt;/code&gt;. The query will work and it'll run more quickly, but the results will be biased towards smaller pages.&lt;/p&gt;

&lt;p&gt;Also, some documents may not be able to be parsed at all, resulting in exceptions. I added &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;catch&lt;/code&gt; blocks to the UDF to intercept any exceptions and return null instead.&lt;/p&gt;

&lt;p&gt;That also means that your query needs to be able to handle &lt;code&gt;null&lt;/code&gt; values instead. For example, to get the first &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; element from the results, I needed to use &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#array_subscript_operator"&gt;&lt;code&gt;SAFE_OFFSET&lt;/code&gt;&lt;/a&gt; instead of plain old &lt;code&gt;OFFSET&lt;/code&gt; to avoid breaking the query on &lt;code&gt;null&lt;/code&gt; values: &lt;code&gt;elements[SAFE_OFFSET(0)]&lt;/code&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Cheerio is a really powerful new tool in the HTTP Archive toolbox. It unlocks new types of analysis that used to be prohibitively complex. In the capo.sql use case, I was able to extract insights about pages' &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; elements that would have only been possible with custom metrics on future datasets.&lt;/p&gt;

&lt;p&gt;I'm really interested to see what new insights are possible with this approach. Let me know your thoughts in the comments and how you plan to use it.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>webtransparency</category>
      <category>httparchive</category>
    </item>
    <item>
      <title>Introducing the Core Web Vitals Technology Report</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Thu, 24 Jun 2021 16:02:56 +0000</pubDate>
      <link>https://dev.to/httparchive/introducing-the-core-web-vitals-technology-report-4pep</link>
      <guid>https://dev.to/httparchive/introducing-the-core-web-vitals-technology-report-4pep</guid>
      <description>&lt;p&gt;The technologies you use to build your website can have an effect on your ability to deliver good user experiences. Good UX is key to performing well with &lt;a href="https://web.dev/vitals/" rel="noopener noreferrer"&gt;Core Web Vitals&lt;/a&gt; (CWV), a topic which is probably top of mind for you, as it is for many other web developers now that these metrics play a role in &lt;a href="https://developers.google.com/search/blog/2021/04/more-details-page-experience" rel="noopener noreferrer"&gt;Google Search&lt;/a&gt; ranking. While web developers have had tools like &lt;a href="https://support.google.com/webmasters/answer/9205520?hl=en" rel="noopener noreferrer"&gt;Search Console&lt;/a&gt; and &lt;a href="https://developers.google.com/speed/pagespeed/insights/" rel="noopener noreferrer"&gt;PageSpeed Insights&lt;/a&gt; to get data on how their sites are performing, the web community has been lacking a tool that has operated at the macro level, giving us something more like &lt;em&gt;WebSpeed Insights&lt;/em&gt;. By combining the powers of real-user experiences in the &lt;a href="https://developers.google.com/web/tools/chrome-user-experience-report/" rel="noopener noreferrer"&gt;Chrome UX Report&lt;/a&gt; (CrUX) dataset with web technology detections in &lt;a href="https://httparchive.org/" rel="noopener noreferrer"&gt;HTTP Archive&lt;/a&gt;, we can get a glimpse into how architectural decisions like choices of CMS platform or JavaScript framework play a role in sites' CWV performance. The merger of these datasets is a dashboard called the &lt;strong&gt;&lt;a href="https://datastudio.google.com/reporting/55bc8fad-44c2-4280-aa0b-5f3f0cd3d2be/page/M6ZPC" rel="noopener noreferrer"&gt;Core Web Vitals Technology Report&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlyxhpwt6e9cmahcbmdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlyxhpwt6e9cmahcbmdw.png" alt="Chart comparing three CMSs' Core Web Vitals performance over time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dashboard was developed for the web community to have a shared source of truth for the way websites are both built and experienced. For example, the CWV Technology Report can tell you what percentage of websites built with WordPress pass the CWV assessment. While a number like this on its own is interesting, what's more useful is the ability to track this over time and compare it to other CMSs. And that's exactly what the dashboard offers; it's an interactive way to view how websites perform, broken down by nearly 2,000 technologies.&lt;/p&gt;

&lt;p&gt;This post is a show-and-tell. First I'd like to walk you through the dashboard and show you how to use it, then I'll tell you more about the data methodology behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the dashboard
&lt;/h2&gt;

&lt;p&gt;There are three pages in the dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Technology drilldown&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Technology comparison&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Settings&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;a href="https://datastudio.google.com/s/n22l8A5YSJQ" rel="noopener noreferrer"&gt;drilldown&lt;/a&gt; page lets you see how desktop and mobile experiences change over time for a single technology. The default metric is the percent of origins having good CWV, and it also supports individual CWV metrics (see the "Optional metrics" section below).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://datastudio.google.com/s/pz3Fr3C-Hqw" rel="noopener noreferrer"&gt;comparison&lt;/a&gt; page lets you compare desktop OR mobile experiences for any number of technologies over time. Similar to the drilldown page, you can select overall CWV compliance or individual CWV metrics. Additionally, this page supports visualizing the number of origins per technology.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://datastudio.google.com/s/mLfzLSvQza0" rel="noopener noreferrer"&gt;settings&lt;/a&gt; page is where you can configure report-level preferences. There are currently two settings: categories and number of origins. Refer to &lt;a href="https://www.wappalyzer.com/technologies/" rel="noopener noreferrer"&gt;Wappalyzer&lt;/a&gt; for the list of possible categories. Use this setting to limit the related technologies in the dropdown list. You can also restrict the technologies to those with a minimum level of adoption, for example those used by at least 100 websites. This can be helpful to reduce noisiness.&lt;/p&gt;

&lt;p&gt;By default, the CWV Technology Report is configured to drill down into WordPress performance and compare WordPress, Wix, and Squarespace. This is to demonstrate the kinds of insights that are possible out-of-the-box without having to know how to configure the dashboard yourself. The full URL for the vanilla version of the dashboard is &lt;a href="https://datastudio.google.com/reporting/55bc8fad-44c2-4280-aa0b-5f3f0cd3d2be/page/M6ZPC" rel="noopener noreferrer"&gt;https://datastudio.google.com/reporting/55bc8fad-44c2-4280-aa0b-5f3f0cd3d2be/page/M6ZPC&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optional metrics
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6zqr3wqghl5t4cfefkb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6zqr3wqghl5t4cfefkb.png" alt="Screenshot of the dashboard showing where to find the optional metrics button"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptao8ay2hs60r3f9l1j9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptao8ay2hs60r3f9l1j9.png" alt="Screenshot of the options in the optional metrics menu"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also use the "optional metrics" feature of Data Studio to customize the dashboard and select specific CWV stats in the charts/tables as needed. The icon that looks like a chart with a gear icon is the button to select optional metrics. In the timeseries chart, you can toggle between the percent of origins having good CWV overall or specifically those with good LCP, FID, or CLS. On the table views, you can use this feature to add or remove columns, for example to see all CWV metrics separately or to focus on just one.&lt;/p&gt;

&lt;p&gt;Data Studio also enables you to share deep links into the dashboard for specific configurations. For example, here's a &lt;a href="https://datastudio.google.com/s/mmMyzuJS4hw" rel="noopener noreferrer"&gt;leaderboard of the top 10 most popular CMSs&lt;/a&gt; ordered by CWV performance as of May 2021:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Origins&lt;/th&gt;
&lt;th&gt;Percent good CWV&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;1C-Bitrix&lt;/td&gt;
&lt;td&gt;35,385&lt;/td&gt;
&lt;td&gt;56.30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;TYPO3 CMS&lt;/td&gt;
&lt;td&gt;24,060&lt;/td&gt;
&lt;td&gt;54.39%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Drupal&lt;/td&gt;
&lt;td&gt;115,280&lt;/td&gt;
&lt;td&gt;45.11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Zendesk&lt;/td&gt;
&lt;td&gt;34,713&lt;/td&gt;
&lt;td&gt;43.26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Weebly&lt;/td&gt;
&lt;td&gt;15,920&lt;/td&gt;
&lt;td&gt;33.35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Squarespace&lt;/td&gt;
&lt;td&gt;60,316&lt;/td&gt;
&lt;td&gt;33.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Joomla&lt;/td&gt;
&lt;td&gt;44,459&lt;/td&gt;
&lt;td&gt;32.19%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Wix&lt;/td&gt;
&lt;td&gt;54,604&lt;/td&gt;
&lt;td&gt;31.52%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;Adobe Experience Manager&lt;/td&gt;
&lt;td&gt;15,276&lt;/td&gt;
&lt;td&gt;27.65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2021&lt;/td&gt;
&lt;td&gt;WordPress&lt;/td&gt;
&lt;td&gt;1,731,010&lt;/td&gt;
&lt;td&gt;24.53%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here are some other configurations to help you explore the data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://datastudio.google.com/s/rBB9VQCCZGI" rel="noopener noreferrer"&gt;Leaderboard of all 14 technologies that are in the CMS or Blogs categories having more than 10,000 origins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="[https://datastudio.google.com/s/otsNSd-4WyA](https://datastudio.google.com/s/otsNSd-4WyA)"&gt;Comparison of all JavaScript frameworks and libraries with "jQuery" in their name&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="[https://datastudio.google.com/s/r4hBpzxX1uk](https://datastudio.google.com/s/r4hBpzxX1uk)"&gt;A year-to-date drilldown into React CWV performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Feature roadmap
&lt;/h3&gt;

&lt;p&gt;There are two features missing from the dashboard that I would love to add in the near future: segmenting by &lt;a href="https://developers.google.com/web/updates/2021/03/crux-rank-magnitude" rel="noopener noreferrer"&gt;CrUX rank magnitude&lt;/a&gt; and comparing Lighthouse audit compliance. Origin popularity would be a really interesting way to slice the data and the rank magnitude dimension would enable us to see how technology adoption and CWV performance change at the head, torso, and tail of the web. Adding data from Lighthouse would enable us to get some clues into &lt;em&gt;why&lt;/em&gt; a particular technology may be better or worse with CWV. For example, if a group of websites tend to have poor LCP performance, it'd be interesting to see what loading performance audits they also tend to fail. Of course there are so many variables at play and we can't determine cause and effect, but these results could give us something to think about for further exploration.&lt;/p&gt;
&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;p&gt;The CWV Technology Report is a combination of two data sources: CrUX and HTTP Archive. They are similar datasets in that they measure millions of websites, but they have their own strengths and weaknesses worth exploring.&lt;/p&gt;

&lt;p&gt;CrUX is a &lt;em&gt;field tool&lt;/em&gt;, meaning that it measures real-user experiences. It's also a public dataset, so you could see how users experience any one of over 8 million websites. This is really cool, to put it loosely, because we (as a community) have visibility into how the web as a whole is being experienced.&lt;/p&gt;

&lt;p&gt;The CrUX dataset is powered by Chrome users who enable &lt;a href="https://www.google.com/chrome/browser/privacy/whitepaper.html#usagestats" rel="noopener noreferrer"&gt;usage statistics reporting&lt;/a&gt;. Their experiences on publicly discoverable websites are aggregated together over 28-day windows and the results are published in queryable monthly data dumps on BigQuery and via the CrUX API, updated daily. CrUX measures users' experiences for each of the CWV metrics: &lt;a href="https://web.dev/lcp/" rel="noopener noreferrer"&gt;LCP&lt;/a&gt;, &lt;a href="https://web.dev/fid/" rel="noopener noreferrer"&gt;FID&lt;/a&gt;, and &lt;a href="https://web.dev/cls/" rel="noopener noreferrer"&gt;CLS&lt;/a&gt;. Using this data, we can evaluate whether the website passes the CWV assessment if 75 percent of experiences for each metric are at least as good as thresholds set by the Web Vitals program.&lt;/p&gt;

&lt;p&gt;HTTP Archive is a &lt;em&gt;lab tool&lt;/em&gt;, meaning that it measures how individual web pages are built. Like CrUX, it's a public dataset, and it's actually based on the same websites in the CrUX corpus, so we have perfect parity when combining the two sources together. HTTP Archive is powered by WebPageTest, which integrates with other lab tools like &lt;a href="https://developers.google.com/web/tools/lighthouse" rel="noopener noreferrer"&gt;Lighthouse&lt;/a&gt; and &lt;a href="https://www.wappalyzer.com/" rel="noopener noreferrer"&gt;Wappalyzer&lt;/a&gt; to extract fine-grained data about the page. Lighthouse runs audits against the page to determine how well-optimized it is, for example if it takes advantage of web performance best practices. Wappalyzer is an open-source tool that detects the use of technologies like an entire CMS, a specific JavaScript library, and even what programming languages are probably used on the backend. These detections are what we use in the CWV Technology Report to segment the real-user experience data from CrUX.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8edmxd2fjew7f69j8xlp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8edmxd2fjew7f69j8xlp.png" alt="Screenshot of various Ecommerce technologies' CWV performance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Confession time! This isn't the first tool to look at CrUX data through the lens of how websites are built. &lt;a href="https://perf-track.web.app/" rel="noopener noreferrer"&gt;Perf Track&lt;/a&gt; is a report built by &lt;a href="https://twitter.com/hdjirdeh" rel="noopener noreferrer"&gt;Houssein Djirdeh&lt;/a&gt; that slices CrUX data by JavaScript frameworks. The annual &lt;a href="https://almanac.httparchive.org/en/2020/cms" rel="noopener noreferrer"&gt;CMS chapter&lt;/a&gt; of the Web Almanac slices CrUX data by (you guessed it) CMSs. What makes the CWV Technology Dashboard different is that it facilitates exploration of the data by making &lt;em&gt;all&lt;/em&gt; &lt;a href="https://www.wappalyzer.com/technologies/" rel="noopener noreferrer"&gt;1,950 technologies&lt;/a&gt; across 71 categories discoverable in a single, browseable UI. You can choose your own adventure by filtering technologies to a single category, like &lt;a href="https://datastudio.google.com/s/ibJhd2NEEkM" rel="noopener noreferrer"&gt;Ecommerce&lt;/a&gt;, and comparing platforms head-to-head to see which has more websites passing the CWV assessment.&lt;/p&gt;

&lt;p&gt;The CrUX dataset on BigQuery is aggregated at the origin level. An origin is a way to identify an entire website. For example, &lt;a href="https://httparchive.org" rel="noopener noreferrer"&gt;https://httparchive.org&lt;/a&gt; is the origin for the HTTP Archive website and it's different from &lt;a href="https://almanac.httparchive.org" rel="noopener noreferrer"&gt;https://almanac.httparchive.org&lt;/a&gt;, which is a separate origin for the Web Almanac website.&lt;/p&gt;

&lt;p&gt;HTTP Archive measures individual web pages, not entire websites. And due to capacity limitations, HTTP Archive is limited to testing one page per website. The most natural page to test for a given website is its home page, or the root page of the origin. For example, the home/root page of the HTTP Archive website is &lt;a href="https://httparchive.org/" rel="noopener noreferrer"&gt;https://httparchive.org/&lt;/a&gt; (note the trailing slash). This introduces an important assumption that we make in the CWV Technology Dashboard: an entire website's real-user experiences are attributed to the technologies detected only on its home page. It's entirely possible that many websites we test use different technologies on their interior pages, and some technologies may even be more or less likely to be used on home pages. These biases are worth acknowledging in the methodology for full transparency, but to be honest there's not a lot we at HTTP Archive can do to mitigate them without becoming a full-blown web crawler!&lt;/p&gt;
&lt;h3&gt;
  
  
  Core Web Vitals
&lt;/h3&gt;

&lt;p&gt;There may be different approaches to measure how well a website or group of websites performs with CWV. The approach used by this dashboard is designed to most closely match the &lt;a href="https://developers.google.com/speed/docs/insights/v5/about#categories" rel="noopener noreferrer"&gt;CWV assessment in PageSpeed Insights&lt;/a&gt;. CWV metrics and thresholds may change annually, but we'll do our best to keep the dashboard in sync with the state of the art.&lt;/p&gt;

&lt;p&gt;Each individual CWV metric has a threshold below which user experiences are considered "good". For example, LCP experiences under 2.5 seconds are good. A website must have at least 75% of its LCP experiences in the "good" category to be considered as having good LCP overall. If all of the CWV metrics are good, the website is said to pass the CWV assessment. Refer to the &lt;a href="https://web.dev/vitals/#core-web-vitals" rel="noopener noreferrer"&gt;official CWV documentation&lt;/a&gt; for the latest guidance on the set of metrics and thresholds.&lt;/p&gt;

&lt;p&gt;FID is an exception worth mentioning. Because it relies on user input to be measured, it doesn't occur on as many page loads as metrics like LCP and CLS. That makes it less likely to have sufficient data for pages that may not have many interactive UI elements or websites with low popularity. So the CWV Technology Dashboard replicates the behavior in PageSpeed Insights and assesses a website's CWV, even in the absence of FID data. In that case, if LCP and CLS are good, the website passes, otherwise it doesn't. In the rare case that a website is missing LCP or CLS data, it's not eligible to be assessed at all.&lt;/p&gt;

&lt;p&gt;When evaluating a group of origins, like those in the dashboard that all use the same technology, we quantify them in terms of the percentage of origins that pass the CWV assessment. This is not to be confused with the percentage of users or the percentage of experiences. Origins are aggregated in CrUX in a way that doesn't make it meaningful to combine their distributions together. So instead, we count origins as a unit: those that use jQuery, pass the CWV assessment, have sufficient FID data, have good LCP, etc.&lt;/p&gt;

&lt;p&gt;The CrUX dataset includes a &lt;code&gt;form_factor&lt;/code&gt; dimension representing the type of device the user was on. We segment all of the data in the dashboard by this dimension and call it the "Client", with values of either desktop or mobile.&lt;/p&gt;
&lt;h3&gt;
  
  
  Querying the raw data
&lt;/h3&gt;

&lt;p&gt;The dashboard is implemented in Data Studio with a BigQuery connector to power all of the technology and CWV insights. The underlying table on BigQuery is made publicly available at &lt;a href="[https://pantheon.corp.google.com/bigquery?p=httparchive&amp;amp;d=core_web_vitals&amp;amp;t=technologies&amp;amp;page=table](https://pantheon.corp.google.com/bigquery?p=httparchive&amp;amp;d=core_web_vitals&amp;amp;t=technologies&amp;amp;page=table)"&gt;&lt;code&gt;httparchive.core_web_vitals.technologies&lt;/code&gt;&lt;/a&gt;. Feel free to query this table directly to extract information about specific technology trends, or even to build your own custom dashboards or visualizations.&lt;/p&gt;

&lt;p&gt;For reference, this is the query that generated the &lt;code&gt;core_web_vitals.technologies&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TEMP&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;needs_improvement&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;poor&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;BOOL&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;needs_improvement&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;poor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TEMP&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;IS_NON_ZERO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;needs_improvement&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;poor&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;BOOL&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;good&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;needs_improvement&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;poor&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;unique_categories&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ARRAY_AGG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="nv"&gt;`httparchive.technologies.2021_05_01_mobile`&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ARRAY_TO_STRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY_AGG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="k"&gt;IGNORE&lt;/span&gt; &lt;span class="n"&gt;NULLS&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;', '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_good_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_good_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_good_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_any_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_any_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_any_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_cwv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_with_good_cwv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any_lcp&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;any_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;origins_eligible_for_cwv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;SAFE_DIVIDE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COUNTIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_cwv&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;COUNTIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any_lcp&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;any_cls&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;pct_eligible_origins_with_good_cwv&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'desktop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'desktop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mobile'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_NON_ZERO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_fid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;any_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_fid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;good_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_NON_ZERO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;small_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;medium_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;large_cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;any_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;small_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;medium_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;large_cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;good_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_NON_ZERO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_lcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;any_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_lcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;good_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_fid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_fid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;fast_fid&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
    &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;small_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;medium_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;large_cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
    &lt;span class="n"&gt;IS_GOOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fast_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_lcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slow_lcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;good_cwv&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="nv"&gt;`chrome-ux-report.materialized.device_summary`&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2020-01-01'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt;
    &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;REGEXP_REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_TABLE_SUFFIX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s1"&gt;)_(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s1"&gt;{2})_(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s1"&gt;{2}).*'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="s1"&gt;'202&lt;/span&gt;&lt;span class="se"&gt;\1&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\2&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\3&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;unique_categories&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ENDS_WITH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_TABLE_SUFFIX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'desktop'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'desktop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mobile'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="nv"&gt;`httparchive.technologies.202*`&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
  &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;The most idealistic goal for this dashboard is to empower influencers in the web community to make improvements to swaths of websites at scale. Web transparency projects like this one are meant to inform and inspire, whether that's instilling a sense of competitiveness with other related technologies to climb the leaderboard or giving them actionable data to make meaningful improvements to technologies under their control. Please leave a comment if you have any suggestions to help make the CWV Technology Report better!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>corewebvitals</category>
      <category>httparchive</category>
      <category>webtransparency</category>
    </item>
    <item>
      <title>What can the HTTP Archive tell us about Largest Contentful Paint?</title>
      <dc:creator>Paul Calvano</dc:creator>
      <pubDate>Tue, 08 Jun 2021 03:59:31 +0000</pubDate>
      <link>https://dev.to/httparchive/what-can-the-http-archive-tell-us-about-largest-contentful-paint-f0p</link>
      <guid>https://dev.to/httparchive/what-can-the-http-archive-tell-us-about-largest-contentful-paint-f0p</guid>
      <description>&lt;p&gt;&lt;a href="https://web.dev/lcp/" rel="noopener noreferrer"&gt;Largest Contentful Paint (LCP)&lt;/a&gt; is an important metric that measures when the largest element in the browser’s viewport becomes visible. This could be an image, a background image, a poster image for a video, or even a block of text. The metric is measured with the &lt;a href="https://wicg.github.io/largest-contentful-paint/" rel="noopener noreferrer"&gt;Largest Contentful Paint API&lt;/a&gt;, which is &lt;a href="https://caniuse.com/?search=largestcontentfulpaint" rel="noopener noreferrer"&gt;supported&lt;/a&gt; in Chromium browsers. Optimizing for this metric is critical to end user experience, since it affects their ability to visualize your content.&lt;/p&gt;

&lt;p&gt;Google has promoted this metric as one of the three &lt;a href="https://web.dev/vitals/" rel="noopener noreferrer"&gt;"Core Web Vitals"&lt;/a&gt; that affect user experience on the web. It is also slated to become a &lt;a href="https://developers.google.com/search/blog/2021/04/more-details-page-experience" rel="noopener noreferrer"&gt;search ranking signal over the next few weeks&lt;/a&gt;, which has created a lot of awareness about it. The suggested target for a good Largest Contentful Paint is less than 2.5 seconds for at least 75% of page loads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage9.jpg" alt="Largest Contentful Paint Overview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;Source: &lt;a href="https://web.dev/lcp/" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://web.dev/lcp/" rel="noopener noreferrer"&gt;https://web.dev/lcp/&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Some of the recent posts on &lt;a href="https://wpostats.com/tags/core%20web%20vitals/" rel="noopener noreferrer"&gt;WPOStats&lt;/a&gt; feature interesting case studies about this metric.  For example, &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Google's &lt;a href="https://blog.chromium.org/2020/05/the-science-behind-web-vitals.html" rel="noopener noreferrer"&gt;research&lt;/a&gt; found that when Core Web Vitals are met, users are 24% less likely to abandon a page before it finishes loading.&lt;/li&gt;
&lt;li&gt;  Vodafone improved LCP by 31% and saw an 8% increase in sales.&lt;/li&gt;
&lt;li&gt;  NDTV improved their LCP by 55% and saw a 50% reduction in bounce rate.&lt;/li&gt;
&lt;li&gt;  Tokopedia improved their LCP by 55% and saw a 23% increase in session duration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Identifying the Largest Contentful Paint Element&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The name of this metric implies that size is used as a proxy for importance. Because of this, you may be wondering specifically which image or text triggered it as well as the percentage of the viewport it consumed. There are a few ways to examine this:&lt;/p&gt;

&lt;p&gt;One way to visualize the Largest Contentful Paint is to look at a &lt;a href="https://webpagetest.org/" rel="noopener noreferrer"&gt;WebPageTest&lt;/a&gt; filmstrip. You’ll be able to see when visual changes occurred (yellow outline) as well as when the Largest Contentful Paint event occurred (red outline).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage7.jpg" alt="WebPageTest Filmstrip showing LCP Element"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Chrome DevTools, you can also click on the LCP indicator in the “Performance” tab to examine the Largest Contentful Paint element in your browser. Using this method you can see and inspect the exact element (image, text, etc) that triggered it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage11.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage11.gif" alt="Chrome DevTools Performance Tab"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lighthouse also has an audit that identifies the Largest Contentful Paint element. If you examine the screenshot below you’ll notice that there is a yellow box around the largest element, as well as an HTML snippet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage5.jpg" alt="Lighthouse LCP Element"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Large is the Largest Contentful Paint?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://httparchive.org/" rel="noopener noreferrer"&gt;HTTP Archive&lt;/a&gt; runs Lighthouse audits for approximately 7.2 million websites every month. In the May 2021 dataset, Lighthouse was able to identify an LCP element in 97.35% of the tests. Since we have the ability to query all of these Lighthouse test results, we can analyze the result of the LCP audits and get more insight into what drives this metric across the web. &lt;/p&gt;

&lt;p&gt;Using the same boundaries that Lighthouse uses to draw the rectangle around the LCP element, it’s possible to calculate the area of it. In the above example, the product of the LCP image’s height (191) and width (340) was 64,940 pixels. Since the Lighthouse test was run with an emulated &lt;a href="https://almanac.httparchive.org/en/2020/methodology#webpagetest" rel="noopener noreferrer"&gt;Moto G4 user agent&lt;/a&gt; with a screen size of 640x360, we can also calculate that this particular LCP image took up 28% of the viewport.&lt;/p&gt;

&lt;p&gt;The graph below shows the cumulative distribution of the LCP element as a percentage of screen size. The median LCP element takes up 31% of the screen size! At the 75th percentile the LCP element is nearly twice as large, taking up 59% of the screen size. Additionally 10.6% of sites actually had an LCP element that exceeded the viewport (which is why the y axis doesn’t reach 100%).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage10.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage10.jpg" alt="Distribution of LCP Element Size as a Percent of Screen Size"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graph below illustrates the same data in a histogram. From this we can see that 4.03% of sites (285,751) had a LCP element that took up 0 pixels. Upon further inspection, the 0 pixel elements appear to have been used in carousels, so by the time the audit completed the LCP element slid out of the viewport.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage3.jpg" alt="Histogram of LCP Element Size as a Percent of Screen Size"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node Paths of LCP Elements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another interesting aspect of the Largest Contentful Paint audit is the nodePath of the element, which shows you where in the DOM this element was. In the example we looked at earlier, the nodePath was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,HTML,1,BODY,8,DIV,2,SECTION,1,DIV,0,DIV,0,DIV,0,UL,0,LI,0,ARTICLE,1,DIV,0,DIV,0,A,0,IMG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we look at the last element in the node path, we can get some insight into the type of element that triggered the Largest Contentful Paint. The most common node that triggered the Largest Contentful Paint was &amp;lt;IMG&amp;gt;, which accounted for 42% of all sites.   Next was &amp;lt;DIV&amp;gt; at 27% (which could include text or images). The &amp;lt;H1&amp;gt; through &amp;lt;H5&amp;gt; header elements accounted for 7.18% of all Largest Contentful Paints.  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;LCP Node (last element in path)
   &lt;/td&gt;
   &lt;td&gt;Number of Sites
   &lt;/td&gt;
   &lt;td&gt;Percent of Sites
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;IMG
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3067354&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
42.12%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;DIV
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1981416&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
27.21%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;P
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
766977&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
10.53%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;H1
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
291091&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4.00%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
192498&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.64%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;SECTION
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
182267&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.50%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;H2
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
144534&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.98%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;A
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
107501&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.48%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;SPAN
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
85245&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.17%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;HEADER
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
67762&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.93%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;LI
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
64212&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.88%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;H3
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
60679&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.83%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;RS-SBG
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
51623&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.71%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;TD
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
48470&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.67%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;H4
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
19039&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.26%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;VIDEO
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
15649&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.21%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ARTICLE
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
12860&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.18%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;FIGURE
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
9208&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.13%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;BODY
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
8859&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.12%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;image
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
8077&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.11%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;CENTER
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
7960&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.11%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &amp;lt;VIDEO&amp;gt; element only accounted for 0.21% of sites. According to the Web Almanac, &lt;a href="https://almanac.httparchive.org/en/2020/media#videos" rel="noopener noreferrer"&gt;the &amp;lt;video&amp;gt; element was used on 0.49% of mobile websites&lt;/a&gt; - so from this we can estimate that half of sites loading videos are triggering LCP with video poster images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image Weight for the LCP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the Lighthouse audits looks for opportunities to preload the Largest Contentful Paint element, and estimates the potential savings in performance. This audit also identifies the URL for the LCP element - which can give us some insights into what type of images are being loaded as a LCP element. In the HTTP Archive data, only 67% of the Lighthouse tests were able to identify a URL for an LCP element. Based on this, we can infer that text nodes are used for the LCP on approximately 33% of sites.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage8.jpg" alt="Lighthouse Preload LCP Element Recommendation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graph below shows the distribution of sizes for the image element that was associated with the Largest Contentful Paint. The median LCP element size was 80KB. At the 90th percentile, the LCP element size was 512KB.   If you have a large LCP image then you should consider optimizing it before you attempt to follow the Lighthouse preload recommendation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage12.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage12.jpg" alt="Distribution of LCP Element Size"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Additionally, 70% of the LCP element images were JPEG and 25% were PNG.  Only 3% of sites served a webp as their LCP element.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;format
   &lt;/td&gt;
   &lt;td&gt;sites
   &lt;/td&gt;
   &lt;td&gt;% of Sites
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;jpg
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3161991&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
69.37%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;png
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1122585&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
24.63%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;webp
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
141441&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.10%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;gif
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
84829&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.86%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;svg
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
34123&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.75%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Other
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
13272&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
0.29%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When we look at the LCP element as a percentage of page weight, we can see that the median LCP element is 4.17% of the total page weight. At the higher percentiles, the LCP elements are larger and also a larger percentage of page weight.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage1.jpg" alt="LCP Element as a Percent of Page Weight"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;percentile
   &lt;/td&gt;
   &lt;td&gt;ImageRequests
   &lt;/td&gt;
   &lt;td&gt;ImageKB
   &lt;/td&gt;
   &lt;td&gt;TotalKB
   &lt;/td&gt;
   &lt;td&gt;LCP as a % of Page Weight
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p25
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
15&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
422&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1,138&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.01%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p50
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
26&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1,142&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2,185&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4.17%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p75
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
45&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2,692&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4,108&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
5.58%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;p95
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
103&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
8,008&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
10,036&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
8.42%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Since images account for 52% of the median page weight (for the sites that have a LCP image element), we can infer that at the median 8% of page weight is used to render content to 31% of the screen. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this change based on Site Popularity?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The HTTP Archive now contains rank groupings, obtained from the Chrome User Experience Report.   This can enable us to segment this analysis based on the popularity of sites.  The rank grouping indicator buckets sites into the top 1K, 10K, 100K, 1 million and 10 million. &lt;/p&gt;

&lt;p&gt;When we look at the Largest Contentful Paint image size based on popularity, it’s interesting to note that the most popular sites tend to be serving smaller images for the LCP element. While there may be numerous reasons for this, I suspect that the more popular sites are investing in image optimization solutions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage2.jpg" alt="LCP Image Size by Site Popularity"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Page weight follows the same pattern, with the least popular websites having some of the largest page weights. If we look at the LCP element based on the percentage of page weight, you can see that within the top 100K sites the ratios are very close. In the less popular sites, the LCP element tends to be a much greater percentage of page weight.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td colspan="5"&gt;
&lt;strong&gt;Largest Contentful Paint Image as a Percentage of Page Weight&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;rank&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;p25&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;p50&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;p75&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;p95&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Top 1k
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.61%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.12%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.85%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
5.67%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Top 10k
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
1.76%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.27%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.00%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4.96%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Top 100k
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.07%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.87%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.77%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
5.78%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Top 1 million
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
2.53%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.49%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4.60%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
6.95%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Top 10 million
   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
3.11%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
4.30%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
5.75%&lt;/p&gt;

   &lt;/td&gt;
   &lt;td&gt;
&lt;p&gt;
8.65%&lt;/p&gt;

   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We can also make some interesting observations about how popular sites are optimizing their LCP assets. Looking at the various image formats, JPG images are the most common LCP element. Some other formats such as PNG, WebP, GIF and SVG are used more frequently in the more popular sites. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Flcp-httparchive%2Fimage6.jpg" alt="Largest Contentful Paint Element Format by Rank"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Largest Contentful Paint is an important metric that helps illustrate when a page’s most significant content is rendered to the screen. In reviewing the HTTP Archive data, we can see that this area represents between 30% and 60% of a mobile viewport for a majority of sites.  &lt;/p&gt;

&lt;p&gt;There are a shocking number of sites that have a LCP element that consumes a large percentage of the viewport and are delivered as large unoptimized images. Site owners should evaluate both what is triggering the Largest Contentful Paint as well as how it is loaded. Optimizing for the Largest Contentful Paint will ensure that the browser has the opportunity to load and render this content as quickly as possible.&lt;/p&gt;

&lt;p&gt;If you are interested in seeing some of the SQL queries and raw data used in this analysis, I’ve created a post with all the details in the &lt;a href="https://discuss.httparchive.org/t/analyzing-largest-contentful-paint-stats-via-lighthouse-audits/2166" rel="noopener noreferrer"&gt;HTTP Archive discussion forums&lt;/a&gt;. You can also see all the data used for these graphs in this &lt;a href="https://docs.google.com/spreadsheets/d/1fI_16nby3Yn1LHxWVd4QRyOuPqqMLmBvBU31l5kGF-8/edit?usp=sharing" rel="noopener noreferrer"&gt;Google Sheet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Originally posted at &lt;a href="https://paulcalvano.com/2021-06-07-lcp-httparchive/" rel="noopener noreferrer"&gt;https://paulcalvano.com/2021-06-07-lcp-httparchive/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webperf</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Correlation between Core Web Vitals and web characteristics</title>
      <dc:creator>Sixing Chen</dc:creator>
      <pubDate>Tue, 12 Jan 2021 22:46:17 +0000</pubDate>
      <link>https://dev.to/httparchive/correlation-between-core-web-vitals-and-web-characteristics-219f</link>
      <guid>https://dev.to/httparchive/correlation-between-core-web-vitals-and-web-characteristics-219f</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://web.dev/vitals/" rel="noopener noreferrer"&gt;Core Web Vitals&lt;/a&gt; (CWV) are the metrics that Google considers to be the most important indicators of the quality of experience on the web. The process to identify and optimize CWV issues has typically been a reactive one. The decisions site owners make about which technologies to use or which metrics to look at are usually decided by trial and error, rather than empirical research. A site may be built or rebuilt using a new technology, only to discover that it creates UX issues in production.&lt;/p&gt;

&lt;p&gt;In this analysis, we analyze the correlation between CWV and many different types of web characteristics simultaneously, rather than a single type of web characteristic in a vacuum, since web development choices are not in a vacuum but in the context of many parts of a website. We hope that these results will provide additional reference points to teams as they consider assessing various web development choices and invite the community to help further the understanding of the interplay between CWV and web characteristics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notable negative associations with largest contentful paint:

&lt;ul&gt;
&lt;li&gt;TTFB, bytes of JavaScript, CSS, and images&lt;/li&gt;
&lt;li&gt;JavaScript frameworks - AngularJS, GSAP, MooTools, and RequireJS&lt;/li&gt;
&lt;li&gt;JavaScript libraries - Hammerjs, Lodash, momentjs, YUI, Zepto, jQueryUI, and prettyPhoto&lt;/li&gt;
&lt;li&gt;CMS - Joomla and Squarespace&lt;/li&gt;
&lt;li&gt;UI frameworks - animatecss&lt;/li&gt;
&lt;li&gt;Web frameworks - MicrosoftASPNet&lt;/li&gt;
&lt;li&gt;Widgets - FlexSlider and OWLCarousel&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Notable negative associations with cumulative layout shift:

&lt;ul&gt;
&lt;li&gt;Bytes of images&lt;/li&gt;
&lt;li&gt;JavaScript frameworks - AngularJS, Handlebars, and Vuejs&lt;/li&gt;
&lt;li&gt;JavaScript libraries - FancyBox, Hammerjs, Modernizr, and Slick&lt;/li&gt;
&lt;li&gt;Widgets - Flexslider and OWLCarousel&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  Methodology
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Data source
&lt;/h3&gt;

&lt;p&gt;This analysis is based on data from &lt;a href="https://httparchive.org/" rel="noopener noreferrer"&gt;HTTP Archive&lt;/a&gt;. The HTTP Archive dataset is generated in a &lt;a href="https://developers.google.com/web/fundamentals/performance/speed-tools#lab_data" rel="noopener noreferrer"&gt;lab environment&lt;/a&gt; and contains detailed information on many characteristics of a website as well as performance data. Due to being lab-generated on a single set of hardware, HTTP Archive data will not be completely reflective of real usage and only allows for us to analyze LCP (largest contentful paint) and CLS (cumulative layout shift) as we do not have any user input for FID (first input delay). However, an advantage of being lab-generated is that all data is gathered on a single set of hardware with no bias in the types of websites that are loaded; thus, shields us from confounding due to user/device characteristics that we do not measure. Although we are not shielded from all confounding between website characteristics and web performance, this choice leaves us with far less confounding than a user-generated dataset where we often have no information on the user, and have only limited device information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web characteristics
&lt;/h3&gt;

&lt;p&gt;We conferred with domain experts and established a list of web characteristics of interest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TTFB, font requests, and bytes of content of various types&lt;/li&gt;
&lt;li&gt;Counts of various types of third party requests&lt;/li&gt;
&lt;li&gt;Web technologies (coded as binary to represent whether technology is used)

&lt;ul&gt;
&lt;li&gt;JavaScript frameworks&lt;/li&gt;
&lt;li&gt;JavaScript libraries&lt;/li&gt;
&lt;li&gt;CMS&lt;/li&gt;
&lt;li&gt;UI frameworks&lt;/li&gt;
&lt;li&gt;Web frameworks&lt;/li&gt;
&lt;li&gt;Widgets&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;These characteristics represent the ways that web pages are built and experienced. Pages may be built using various technologies like content management systems (CMSs), JavaScript libraries and frameworks, etc. According to the Web Almanac, &lt;a href="https://almanac.httparchive.org/en/2019/cms#fig-2" rel="noopener noreferrer"&gt;40%&lt;/a&gt; of websites are built with a CMS. That makes it a useful category to inspect qualitatively to see if there are meaningful correlations between CMS and CWV. On the other hand, we use quantitative metrics that represent how users experience the page, including performance and page weight data.&lt;/p&gt;

&lt;p&gt;The list of technologies we include in the analysis is only a subset of all technologies employed by sites in HTTP Archive. We have restricted the analysis to websites that employ only technologies that are used by at least 50,000 websites (this amounts to about 1% of sites in HTTP Archive). This removes underused technologies that we may not have sufficient data to evaluate. The presence of certain technologies also overlap highly, with CMS Wix and JavaScript library Zepto almost overlapping completely. Such high overlap creates modeling issues, and we have chosen to remove Wix from this analysis.&lt;/p&gt;

&lt;h1&gt;
  
  
  Analysis
&lt;/h1&gt;

&lt;p&gt;With LCP and CLS as the outcomes and the web characteristic as the predictors, we attempt to model the relationship between the outcomes and the predictors through random forest. Random forest is a learning algorithm for both regression and classification based on a set of decision trees trained from a randomly chosen set of predictors and a bootstrap sample of the dataset.&lt;/p&gt;

&lt;p&gt;To assess the correlation between the outcome and each predictor as well as their individual effects on the outcome, we derived a measure of correlation (% of higher &amp;gt;= split mean, %HSM) and a measure of effect size (mean split difference, MSD). Both measures are based on the types of splits the trained decision trees make based on the predictors. See appendix for more details.&lt;/p&gt;

&lt;p&gt;%HSM is bounded between 0 and 1, with values close to 0 indicating negative correlation and values close to 1 indicating positive correlation, while values close 0.5 indicates little correlation. MSD’s magnitude is not bounded, and a large positive value indicates that the predictor appears to contribute positively to the mean of the outcome. Note positive here does not necessarily mean it is good, but merely in the numerical sense.&lt;/p&gt;

&lt;h1&gt;
  
  
  Results
&lt;/h1&gt;

&lt;p&gt;Here, we present results on association and make note of specific characteristics that appear especially impactful on CWV.&lt;/p&gt;

&lt;p&gt;When interpreting these results on association, an important thing to note is that positive and negative impact of a particular web characteristics should only be interpreted relative to that of other web characteristics and in the context of websites that employ an array of web technologies, various types of contents, and different third party requests. For instance, if a given web technology shows a strong positive impact, it should be interpreted as this technology appears to be good for performance relative to other technologies, instead of if we add this technology to a website, its web performance will improve.&lt;/p&gt;

&lt;h3&gt;
  
  
  LCP
&lt;/h3&gt;

&lt;p&gt;LCP is modeled as the log of its numerical value, so higher values are worse.&lt;/p&gt;

&lt;p&gt;Predictors with %HSM values close to 1 means higher values of a numerical/count characteristic or presence of a technology are strongly associated with higher values of LCP, and vice versa for predictors with %HSM close to 0 (high %HSM is worse).&lt;/p&gt;

&lt;p&gt;Likewise, predictors with a relatively large and positive MSD means higher values of a numerical/count characteristic or presence of a technology shows a strong negative impact on LCP, and vice versa for predictors with relatively large and negative MSD (large positive MSD is worse).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6vjbbgujk3165nz4uk7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6vjbbgujk3165nz4uk7z.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Higher values of TTFB, bytes of JavaScript, CSS and images show the strongest positive correlation with LCP and most negative impact, though TTFB is not always actionable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0220suv2c7f6sul2pihk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0220suv2c7f6sul2pihk.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
In general, third party requests do not show strong correlation or impact on LCP in the context of the other predictors we consider. This result could be due to most websites in HTTP Archive having a fair number of third party requests, so its effect could not be well ascertained.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fewt5mcmbf5q1abtv394m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fewt5mcmbf5q1abtv394m.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
The presence of most JavaScript frameworks show strong positive correlation with LCP and negative impact, except AMP. AngularJS, GSAP, MooTools, and RequireJS stand out the most.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9wmk46jkl5vkas63y6u8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9wmk46jkl5vkas63y6u8.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Just as JavaScript frameworks, the presence of most JavaScript libraries also show strong positive correlation with LCP and negative impact. Hammerjs, Lodash, momentjs, YUI, and Zepto stand out in terms of correlation and effect size, while jQueryUI and prettyPhoto stand out in terms of correlation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fuq4dtfi7hdm56e03k7yd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fuq4dtfi7hdm56e03k7yd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Among CMS, Joomla and Squarespace show strong positive correlation with LCP and negative impact. On the other hand, WordPress shows low correlation and impact.&lt;/p&gt;

&lt;p&gt;Animatecss stands out among UI frameworks, MicrosoftASPNet stands out among web frameworks.&lt;/p&gt;

&lt;p&gt;Among widgets, FlexSlider and OWLCarousel both show strong positive correlation with LCP, and Flexslider also shows a strong negative effect size.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLS
&lt;/h3&gt;

&lt;p&gt;CLS is modeled as a binary indicator of whether a given threshold is met. 1 indicates a website has CLS &amp;lt; 0.1, and 0 otherwise, so 1s are better than 0s.&lt;/p&gt;

&lt;p&gt;Predictors with %HSM values close to 1 means higher values of a numerical/count characteristic or presence of a technology are strongly associated with meeting the CLS threshold, and vice versa for predictors with %HSM close to 0 (low %HSM is worse).&lt;/p&gt;

&lt;p&gt;Likewise, predictors with a relatively large and positive MSD means higher values of a numerical/count characteristic or presence of a technology shows a strong positive impact on meeting the CLS threshold, and vice versa for predictors with relatively large and negative MSD (large negative MSD is worse).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe7o2o85n3uzkpfxldcv8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe7o2o85n3uzkpfxldcv8.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Most of these characteristics show only weak correlation with LCP and low impact, except bytes of images that show a negative correlation with CLS compliance and negative impact.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb6iwbako258wqkw7og2b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb6iwbako258wqkw7og2b.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Just as LCP, third party requests seem to have low correlation and impact on CLS compliance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1a2l3qn8qzt679ds9jti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1a2l3qn8qzt679ds9jti.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
The presence of several JavaScript frameworks show strong negative correlation with CLS compliance and negative impact, while AMP, GSAP, and React show low correlation and impact. AngularJS, Handlebars, and Vuejs appear to have the most negative impact.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6kpohh8mjr2vd9pktuz2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6kpohh8mjr2vd9pktuz2.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
JavaScript libraries appear less bad for CLS compliance than frameworks, though most still show a negative impact. FancyBox, Hammerjs, Modernizr, and Slick are the most notable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdb7onj4944i5pjnieti5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdb7onj4944i5pjnieti5.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
None of the CMSs have a notable negative impact, with WordPress showing a fairly positive correlation.&lt;/p&gt;

&lt;p&gt;UI frameworks all show low impact. Among web frameworks, RubyonRails shows a fairly positive correlation with CLS compliance.&lt;/p&gt;

&lt;p&gt;Among widgets, Flexslider and OWLCarousel both show a fairly negative impact on CLS compliance.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;This analysis is a first step in an effort to more comprehensively understand the impact of web characteristics on CWV. While the results point out strongly associated characteristics, it would be impactful for the web community to further delve into the associations identified to ascertain which ones are truly causal and which ones are merely associative to inform web developers. In the meantime, the web characteristics with strong negative correlations/effects should be seen as a signal of things that require more attention and/or planning. Finally, it would be of interest to refresh these analyses in the future to see if the associations identified here still hold.&lt;/p&gt;

&lt;h1&gt;
  
  
  Appendix
&lt;/h1&gt;

&lt;p&gt;Random forest trains decision trees by making binary splits of the data. Each split is based on a particular predictor and are of the form X &amp;lt;= c and X &amp;gt; c, for a predictor X based on some purity criterion. Then, all data points with value for X &amp;lt;= c will be put in the corresponding branch, and likewise for data points with X  &amp;gt; c. The data points can then be further split in each subsequent branch based on other predictors in the same way. The measures of correlation and effect size we use exploit these splits.&lt;/p&gt;

&lt;p&gt;Specifically, for a given predictor, we look for splits that are based on the predictor. For each such split, we compute the outcome mean of the data points that are in the &amp;lt;= and &amp;gt; branches. %HSM (% of higher &amp;gt;= split mean) checks the proportion of times that the outcome mean in the &amp;gt;= branch is higher than that in the &amp;lt; branch. This checks how frequently larger outcome means are associated with higher predictor value. MSD (mean split difference) is the outcome mean of the &amp;lt;= branch subtracted from that of the &amp;gt; branch, averaged across all relevant splits of the predictor. This checks the difference in the outcome mean between data points with higher values of the predictor and those with lower values.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>webperf</category>
      <category>webdev</category>
      <category>httparchive</category>
    </item>
    <item>
      <title>Introducing the second annual Web Almanac!</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Tue, 22 Dec 2020 15:23:49 +0000</pubDate>
      <link>https://dev.to/httparchive/introducing-the-second-annual-web-almanac-3ilh</link>
      <guid>https://dev.to/httparchive/introducing-the-second-annual-web-almanac-3ilh</guid>
      <description>&lt;p&gt;The &lt;a href="https://almanac.httparchive.org/en/2020/"&gt;2020 Web Almanac&lt;/a&gt; is a free, open-source, community-made ebook whose mission is to annually track the state of the web. This report is for everyone who's ever wondered how big a typical web page is nowadays, or what the most common CSS breakpoints are, or the most popular CMS. Those questions and &lt;em&gt;many&lt;/em&gt; more are answered in this comprehensive, data-driven research project sourced from over 7 million websites.&lt;/p&gt;

&lt;p&gt;Experts in over 20 web disciplines researched and wrote chapters covering the components web pages are made of, how users experience them, how developers publish them, and how they're delivered. &lt;a href="https://almanac.httparchive.org/en/2020/"&gt;The 2020 report&lt;/a&gt; is out and I'm excited to share it with you all! &lt;em&gt;(Read the &lt;a href="https://dev.to/httparchive/the-web-almanac-2019-is-live-4f98"&gt;2019 announcement on dev.to&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This year &lt;strong&gt;&lt;a href="https://almanac.httparchive.org/en/2020/contributors"&gt;over 100 contributors&lt;/a&gt;&lt;/strong&gt; have collaborated on the report: analyzing data, authoring chapters, peer-reviewing, editing, translating, and maintaining the website. This project was only made possible through the hard work by all of the contributors collectively.  The result is a &lt;strong&gt;free&lt;/strong&gt; 500+ page ebook that you can read on our &lt;a href="https://almanac.httparchive.org/en/2020/table-of-contents"&gt;website&lt;/a&gt;, as a &lt;a href="https://almanac.httparchive.org/static/pdfs/web_almanac_2020_en.pdf"&gt;PDF (22 MB)&lt;/a&gt;, or on &lt;a href="https://play.google.com/store/books/details?id=wqcPEAAAQBAJ"&gt;Google Play&lt;/a&gt; or &lt;a href="https://www.google.com/books/edition/The_2020_Web_Almanac/wqcPEAAAQBAJ"&gt;Google Books&lt;/a&gt;. Translations are in progress in &lt;a href="https://github.com/HTTPArchive/almanac.httparchive.org/issues?q=is%3Aopen+is%3Aissue+label%3Atranslation"&gt;nine languages&lt;/a&gt;, and the 2019 edition is completely translated into Japanese with its own &lt;a href="https://www.google.com/books/edition/The_2019_Web_Almanac/tPACEAAAQBAJ"&gt;ebook&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;This is a community effort and we're a friendly, inclusive group, so if you'd like to be a part of the 2019-2020 &lt;a href="https://github.com/HTTPArchive/almanac.httparchive.org/issues/923"&gt;translation effort&lt;/a&gt; or help write the 2021 edition, we'd love for you to join us! Please fill out this &lt;a href="https://forms.gle/VRBFegGAP7d99Bhp7"&gt;interest form&lt;/a&gt; to let us know how you'd like to contribute, and we'll reach out to you when it's time to start planning in mid-2021.&lt;/p&gt;

&lt;p&gt;Give it a read and let us know what you find most interesting or surprising! Here's what's inside:&lt;/p&gt;

&lt;h3&gt;
  
  
  Part I. Page Content
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/css"&gt;Chapter 1: CSS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/javascript"&gt;Chapter 2: JavaScript&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/markup"&gt;Chapter 3: Markup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/fonts"&gt;Chapter 4: Fonts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/third-parties"&gt;Chapter 6: Third Parties&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part II. User Experience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/seo"&gt;Chapter 7: SEO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/accessibility"&gt;Chapter 8: Accessibility&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/performance"&gt;Chapter 9: Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/privacy"&gt;Chapter 10: Privacy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/security"&gt;Chapter 11: Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/mobile-web"&gt;Chapter 12: Mobile Web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/capabilities"&gt;Chapter 13: Capabilities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/pwa"&gt;Chapter 14: PWA&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part III. Content Publishing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/cms"&gt;Chapter 15: CMS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/jamstack"&gt;Chapter 17: Jamstack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Part IV. Content Distribution
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/page-weight"&gt;Chapter 18: Page Weight&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/compression"&gt;Chapter 19: Compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/caching"&gt;Chapter 20: Caching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/resource-hints"&gt;Chapter 21: Resource Hints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/http2"&gt;Chapter 22: HTTP/2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Appendices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/methodology"&gt;Methodology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2020/contributors"&gt;Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;(Chapters 5 and 16, Media and Ecommerce, are coming soon!)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webalmanac</category>
      <category>webdev</category>
      <category>community</category>
    </item>
    <item>
      <title>Growth of the Web in 2020</title>
      <dc:creator>Paul Calvano</dc:creator>
      <pubDate>Tue, 29 Sep 2020 19:28:48 +0000</pubDate>
      <link>https://dev.to/httparchive/growth-of-the-web-in-2020-20di</link>
      <guid>https://dev.to/httparchive/growth-of-the-web-in-2020-20di</guid>
      <description>&lt;p&gt;For the past 10 years, the &lt;a href="https://httparchive.org/" rel="noopener noreferrer"&gt;HTTP Archive&lt;/a&gt; has tracked the evolution of the web by archiving the technical details of desktop and mobile homepages. During its early years, the &lt;a href="https://www.alexa.com/topsites" rel="noopener noreferrer"&gt;Alexa top million&lt;/a&gt; dataset (which was publicly available until 2017) was used to source the list of URLs included in the archive and the number of sites tracked increased from 16K to almost 500K as testing capacity increased. To keep the archive current and include new sites, towards the end of 2018 we started using the &lt;a href="https://developers.google.com/web/tools/chrome-user-experience-report" rel="noopener noreferrer"&gt;Chrome User Experience Report&lt;/a&gt; as a source of the URLs to track. &lt;/p&gt;

&lt;p&gt;Throughout 2019 the size of the HTTP Archive dataset was mostly constant. However, the sample size has grown quite a bit in 2020 as you can see in the graph below! Additionally, if we combine both desktop and mobile URLs, there was a recent peak of 7.5 million sites!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage1.jpg" alt="HTTP Archive Sample Size Trend"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are sites Included in the Chrome User Experience Report&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;The Chrome User Experience Report (CrUX) is sourced from performance data collected from real Chrome users that have opted into syncing their browser history and sharing anonymized usage statistics reporting.It’s essentially real user measurement (RUM) data for Chrome users.&lt;/p&gt;

&lt;p&gt;You can read more about CrUX on &lt;a href="https://developers.google.com/web/tools/chrome-user-experience-report" rel="noopener noreferrer"&gt;Google’s Developer website&lt;/a&gt;, as well as this informative &lt;a href="https://web.dev/chrome-ux-report/" rel="noopener noreferrer"&gt;blog post from Rick Viscomi&lt;/a&gt;. I’ve also written about it previously &lt;a href="https://paulcalvano.com/2018-04-26-using-googles-crux-to-compare-your-sites-rum-data-w-competitors/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While Google doesn’t publish a definitive list on what it takes to be included in the Chrome User Experience Report dataset, they have &lt;a href="https://groups.google.com/a/chromium.org/g/chrome-ux-report/c/kVKP6LZpk7U/m/rYbcEXiCCQAJ" rel="noopener noreferrer"&gt;indicated&lt;/a&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Origins are automatically curated based on real-user Chrome usage&lt;/li&gt;
&lt;li&gt;Websites must meet a traffic threshold to be included.&lt;/li&gt;
&lt;li&gt;Websites must be publicly accessible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essentially, a website’s inclusion in the Chrome User Experience Report indicates that they’ve reached a certain threshold of activity. According to the CrUX &lt;a href="https://developers.google.com/web/tools/chrome-user-experience-report/bigquery/changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt;, there have been no changes in the methodology.  So it can be inferred that analyzing the number of websites included in this dataset should provide some interesting insights into the month on month growth of the number of websites being visited by real users.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The Chrome User Experience Report does not contain traffic details, and as such this analysis should not be interpreted as growth of traffic on the internet.  This analysis is specifically about the growth in the number of websites that people are visiting.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Global Growth of Origins Accessed in 2020&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The graph below illustrates the total number of websites that the Chrome User Experience reported across all form factors during the previous 12 months. There are a few interesting observations we can make:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In both 2019 and 2020 there were increases in the number of websites at the start of the year.&lt;/li&gt;
&lt;li&gt;There was a linear increase in the number of websites through the first half of 2020. &lt;/li&gt;
&lt;li&gt;The drop in March and April 2020 is interesting, since that coincides with the start of the global COVID-19 pandemic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage2.jpg" alt="CrUX Origins Per Month"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is a similar pattern with the total number of registered domains.  This indicates that much of the growth is new domains and not necessarily subdomains of existing domains. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage3.jpg" alt="CrUX Registered Domains Per Month"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we look at the month over month rate of change, you can see that the max change in 2019 was +/- 5%. The number of sites tends to fluctuate month to month. For example, between August and December 2019 there was an 8.6% decrease in sites. However at the start of 2020 there was a 7.5% increase. &lt;/p&gt;

&lt;p&gt;When comparing the number origins between December 2019 and August 2020, the total number of origins increased by 28.9% this year alone! That’s huge!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage4.jpg" alt="CrUX Month/Month Change in Origins and Registered Domains"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile vs Desktop Growth&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Looking at this by device type, we can see that there are consistently more mobile websites compared to desktop websites.  And over the past year the fluctuations between them have been fairly consistent.  The one exception is between May 2020 and June 2020, where desktop increased by 0.7% and mobile increased by 6.2%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage5.jpg" alt="CrUX Origins Per Month by Form Factor"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overall, there are 22.9% more mobile websites in the CrUX dataset compared to desktop.  We know from sources like &lt;a href="https://gs.statcounter.com/platform-market-share/desktop-mobile-tablet/worldwide/#monthly-201908-202008" rel="noopener noreferrer"&gt;statcounter&lt;/a&gt; that mobile usage has grown significantly over the years, and consistently surpasses desktop.  But why are mobile users navigating to so many more websites compared to desktop users? &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage6.jpg" alt="Desktop vs Mobile vs Tablet Market Share"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Is there something about the mobile experience (such as social media links, email marketing, etc) that increases the change a user may navigate to an unfamiliar website?  &lt;/p&gt;

&lt;p&gt;Or could it be growth in regions where mobile is more dominant? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Has this Varied by Region?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the start of 2020, most regions of the world saw an increase in the domain of sites. The exception to this was western Asia. The regions that had the most substantial increase at the start of the year were Northern Europe, North America and South America.&lt;/p&gt;

&lt;p&gt;Between May and June there was another large uptick in the number of sites. This appeared to be mostly South-East Asian and Western European countries&lt;/p&gt;

&lt;p&gt;The tables below detail the number of sites included in the CrUX dataset during December 2019 as well as January, May and June 2020.  This first table contains the top 10 regions, most of which saw an increase of 15% to 25% during the previous six months!&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th colspan="5"&gt;Number of Sites Included in CrUX dataset&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Sub Region&lt;/th&gt;
&lt;th&gt;Dec 2019&lt;/th&gt;
&lt;th&gt;Jan 2020&lt;/th&gt;
&lt;th&gt;May 2020&lt;/th&gt;
&lt;th&gt;June 2020&lt;/th&gt;
&lt;th&gt;August 2020&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
&lt;td&gt;Northern America&lt;/td&gt;
&lt;td&gt;1,257,159&lt;/td&gt;
&lt;td&gt;1,406,284&lt;/td&gt;
&lt;td&gt;1,676,120&lt;/td&gt;
&lt;td&gt;1,681,454&lt;/td&gt;
&lt;td&gt;1,730,260&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Europe&lt;/td&gt;
&lt;td&gt;713,202&lt;/td&gt;
&lt;td&gt;768,644&lt;/td&gt;
&lt;td&gt;874,164&lt;/td&gt;
&lt;td&gt;908,560&lt;/td&gt;
&lt;td&gt;943,891&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Europe&lt;/td&gt;
&lt;td&gt;640,145&lt;/td&gt;
&lt;td&gt;694,632&lt;/td&gt;
&lt;td&gt;821,024&lt;/td&gt;
&lt;td&gt;913,037&lt;/td&gt;
&lt;td&gt;926,202&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Asia&lt;/td&gt;
&lt;td&gt;720,926&lt;/td&gt;
&lt;td&gt;740,767&lt;/td&gt;
&lt;td&gt;871,854&lt;/td&gt;
&lt;td&gt;882,322&lt;/td&gt;
&lt;td&gt;901,008&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;South America&lt;/td&gt;
&lt;td&gt;486,894&lt;/td&gt;
&lt;td&gt;540,685&lt;/td&gt;
&lt;td&gt;668,604&lt;/td&gt;
&lt;td&gt;726,410&lt;/td&gt;
&lt;td&gt;784,854&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Europe&lt;/td&gt;
&lt;td&gt;506,054&lt;/td&gt;
&lt;td&gt;541,526&lt;/td&gt;
&lt;td&gt;661,416&lt;/td&gt;
&lt;td&gt;710,543&lt;/td&gt;
&lt;td&gt;724,323&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Northern Europe&lt;/td&gt;
&lt;td&gt;453,591&lt;/td&gt;
&lt;td&gt;516,527&lt;/td&gt;
&lt;td&gt;601,459&lt;/td&gt;
&lt;td&gt;638,744&lt;/td&gt;
&lt;td&gt;661,790&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;South-Eastern Asia&lt;/td&gt;
&lt;td&gt;473,962&lt;/td&gt;
&lt;td&gt;485,143&lt;/td&gt;
&lt;td&gt;524,249&lt;/td&gt;
&lt;td&gt;584,214&lt;/td&gt;
&lt;td&gt;629,815&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Asia&lt;/td&gt;
&lt;td&gt;403,325&lt;/td&gt;
&lt;td&gt;419,118&lt;/td&gt;
&lt;td&gt;441,328&lt;/td&gt;
&lt;td&gt;462,282&lt;/td&gt;
&lt;td&gt;500,600&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Asia&lt;/td&gt;
&lt;td&gt;274,339&lt;/td&gt;
&lt;td&gt;273,610&lt;/td&gt;
&lt;td&gt;327,425&lt;/td&gt;
&lt;td&gt;351,186&lt;/td&gt;
&lt;td&gt;362,340&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th colspan="5"&gt;Percent Change of Sites Included in CrUX dataset&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Sub Region&lt;/th&gt;
&lt;th&gt;Dec 2019 - Jan 2020&lt;/th&gt;
&lt;th&gt;May - Jun 2020&lt;/th&gt;
&lt;th&gt;Jun - Aug 2020&lt;/th&gt;
&lt;th&gt;Dec - Aug 2020&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
 &lt;tr&gt;
&lt;td&gt;Northern America&lt;/td&gt;
&lt;td&gt;10.60%&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;td&gt;2.82%&lt;/td&gt;
&lt;td&gt;27.34%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Europe&lt;/td&gt;
&lt;td&gt;7.21%&lt;/td&gt;
&lt;td&gt;3.79%&lt;/td&gt;
&lt;td&gt;3.74%&lt;/td&gt;
&lt;td&gt;24.44%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Europe&lt;/td&gt;
&lt;td&gt;7.84%&lt;/td&gt;
&lt;td&gt;10.08%&lt;/td&gt;
&lt;td&gt;1.42%&lt;/td&gt;
&lt;td&gt;30.88%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Asia&lt;/td&gt;
&lt;td&gt;2.68%&lt;/td&gt;
&lt;td&gt;1.19%&lt;/td&gt;
&lt;td&gt;2.07%&lt;/td&gt;
&lt;td&gt;19.99%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;South America&lt;/td&gt;
&lt;td&gt;9.95%&lt;/td&gt;
&lt;td&gt;7.96%&lt;/td&gt;
&lt;td&gt;7.45%&lt;/td&gt;
&lt;td&gt;37.96%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Europe&lt;/td&gt;
&lt;td&gt;6.55%&lt;/td&gt;
&lt;td&gt;6.91%&lt;/td&gt;
&lt;td&gt;1.90%&lt;/td&gt;
&lt;td&gt;30.13%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Northern Europe&lt;/td&gt;
&lt;td&gt;12.18%&lt;/td&gt;
&lt;td&gt;5.84%&lt;/td&gt;
&lt;td&gt;3.48%&lt;/td&gt;
&lt;td&gt;31.46%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;South-Eastern Asia&lt;/td&gt;
&lt;td&gt;2.30%&lt;/td&gt;
&lt;td&gt;10.26%&lt;/td&gt;
&lt;td&gt;7.24%&lt;/td&gt;
&lt;td&gt;24.75%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Asia&lt;/td&gt;
&lt;td&gt;3.77%&lt;/td&gt;
&lt;td&gt;4.53%&lt;/td&gt;
&lt;td&gt;7.65%&lt;/td&gt;
&lt;td&gt;19.43%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Asia&lt;/td&gt;
&lt;td&gt;-0.27%&lt;/td&gt;
&lt;td&gt;6.77%&lt;/td&gt;
&lt;td&gt;3.08%&lt;/td&gt;
&lt;td&gt;24.29%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Looking at the next 10 in the list, we can see significant growth in Central America, Australia as well as West, Southern and South Africa. Overall the regions with the most growth during the 7 month period was Australia and New Zealand, South America, and Central America.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th colspan="5"&gt;Number of Sites Included in CrUX dataset&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Sub Region&lt;/th&gt;
&lt;th&gt;Dec 2019&lt;/th&gt;
&lt;th&gt;Jan 2020&lt;/th&gt;
&lt;th&gt;May 2020&lt;/th&gt;
&lt;th&gt;June 2020&lt;/th&gt;
&lt;th&gt;August 2020&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
&lt;td&gt;Central America&lt;/td&gt;
&lt;td&gt;155,057&lt;/td&gt;
&lt;td&gt;179,295&lt;/td&gt;
&lt;td&gt;236,255&lt;/td&gt;
&lt;td&gt;242,132&lt;/td&gt;
&lt;td&gt;257,043&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Australia and New Zealand&lt;/td&gt;
&lt;td&gt;124,763&lt;/td&gt;
&lt;td&gt;141,523&lt;/td&gt;
&lt;td&gt;194,212&lt;/td&gt;
&lt;td&gt;196,841&lt;/td&gt;
&lt;td&gt;214,757&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Northern Africa&lt;/td&gt;
&lt;td&gt;68,754&lt;/td&gt;
&lt;td&gt;69,497&lt;/td&gt;
&lt;td&gt;83,312&lt;/td&gt;
&lt;td&gt;88,672&lt;/td&gt;
&lt;td&gt;88,606&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Africa&lt;/td&gt;
&lt;td&gt;50,618&lt;/td&gt;
&lt;td&gt;59,139&lt;/td&gt;
&lt;td&gt;64,978&lt;/td&gt;
&lt;td&gt;66,392&lt;/td&gt;
&lt;td&gt;70,218&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Central Asia&lt;/td&gt;
&lt;td&gt;45,932&lt;/td&gt;
&lt;td&gt;49,192&lt;/td&gt;
&lt;td&gt;57,098&lt;/td&gt;
&lt;td&gt;57,508&lt;/td&gt;
&lt;td&gt;62,112&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Africa&lt;/td&gt;
&lt;td&gt;44,692&lt;/td&gt;
&lt;td&gt;49,868&lt;/td&gt;
&lt;td&gt;47,834&lt;/td&gt;
&lt;td&gt;51,257&lt;/td&gt;
&lt;td&gt;50,853&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Caribbean&lt;/td&gt;
&lt;td&gt;33,840&lt;/td&gt;
&lt;td&gt;37,445&lt;/td&gt;
&lt;td&gt;44,090&lt;/td&gt;
&lt;td&gt;45,910&lt;/td&gt;
&lt;td&gt;45,395&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Africa&lt;/td&gt;
&lt;td&gt;31,010&lt;/td&gt;
&lt;td&gt;34,822&lt;/td&gt;
&lt;td&gt;36,073&lt;/td&gt;
&lt;td&gt;37,388&lt;/td&gt;
&lt;td&gt;38,609&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Middle Africa&lt;/td&gt;
&lt;td&gt;8,873&lt;/td&gt;
&lt;td&gt;9,149&lt;/td&gt;
&lt;td&gt;9,121&lt;/td&gt;
&lt;td&gt;10,057&lt;/td&gt;
&lt;td&gt;10,032&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Melanesia&lt;/td&gt;
&lt;td&gt;2,733&lt;/td&gt;
&lt;td&gt;2,991&lt;/td&gt;
&lt;td&gt;2,580&lt;/td&gt;
&lt;td&gt;2,779&lt;/td&gt;
&lt;td&gt;2,818&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th colspan="5"&gt;Percent Change of Sites Included in CrUX dataset&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;thead&gt;&lt;tr&gt;
&lt;th&gt;Sub Region&lt;/th&gt;
&lt;th&gt;Dec 2019 - Jan 2020&lt;/th&gt;
&lt;th&gt;May - Jun 2020&lt;/th&gt;
&lt;th&gt;Jun - Aug 2020&lt;/th&gt;
&lt;th&gt;Dec - Aug 2020&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;
&lt;tbody&gt;
 &lt;tr&gt;
&lt;td&gt;Central America&lt;/td&gt;
&lt;td&gt;13.52%&lt;/td&gt;
&lt;td&gt;2.43%&lt;/td&gt;
&lt;td&gt;5.80%&lt;/td&gt;
&lt;td&gt;39.68%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Australia and New Zealand&lt;/td&gt;
&lt;td&gt;11.84%&lt;/td&gt;
&lt;td&gt;1.34%&lt;/td&gt;
&lt;td&gt;8.34%&lt;/td&gt;
&lt;td&gt;41.91%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Northern Africa&lt;/td&gt;
&lt;td&gt;1.07%&lt;/td&gt;
&lt;td&gt;6.04%&lt;/td&gt;
&lt;td&gt;-0.07%&lt;/td&gt;
&lt;td&gt;22.40%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Southern Africa&lt;/td&gt;
&lt;td&gt;14.41%&lt;/td&gt;
&lt;td&gt;2.13%&lt;/td&gt;
&lt;td&gt;5.45%&lt;/td&gt;
&lt;td&gt;27.91%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Central Asia&lt;/td&gt;
&lt;td&gt;6.63%&lt;/td&gt;
&lt;td&gt;0.71%&lt;/td&gt;
&lt;td&gt;7.41%&lt;/td&gt;
&lt;td&gt;26.05%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Western Africa&lt;/td&gt;
&lt;td&gt;10.38%&lt;/td&gt;
&lt;td&gt;6.68%&lt;/td&gt;
&lt;td&gt;-0.79%&lt;/td&gt;
&lt;td&gt;12.12%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Caribbean&lt;/td&gt;
&lt;td&gt;9.63%&lt;/td&gt;
&lt;td&gt;3.96%&lt;/td&gt;
&lt;td&gt;-1.13%&lt;/td&gt;
&lt;td&gt;25.45%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Eastern Africa&lt;/td&gt;
&lt;td&gt;10.95%&lt;/td&gt;
&lt;td&gt;3.52%&lt;/td&gt;
&lt;td&gt;3.16%&lt;/td&gt;
&lt;td&gt;19.68%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Middle Africa&lt;/td&gt;
&lt;td&gt;3.02%&lt;/td&gt;
&lt;td&gt;9.31%&lt;/td&gt;
&lt;td&gt;-0.25%&lt;/td&gt;
&lt;td&gt;11.55%&lt;/td&gt;
&lt;/tr&gt;
 &lt;tr&gt;
&lt;td&gt;Melanesia&lt;/td&gt;
&lt;td&gt;8.63%&lt;/td&gt;
&lt;td&gt;7.16%&lt;/td&gt;
&lt;td&gt;1.38%&lt;/td&gt;
&lt;td&gt;3.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Many of the regions that had an increase in sites visited (based on CrUX data), also have a high percentage of mobile visitors compared to the global population (based on statcounter).  So while it’s difficult to say for certain, it’s entirely possible that location is a large factor in the gap between Desktop and Mobile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analyzing by Top Level Domain&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The .com top level domain accounts for 43.7% of all websites tracked in the Chrome User Experience report. The next largest top level domain is .org, which consists of 3.7% of all sites. Overall there were 4111 TLDs in the dataset, and the top 20 of them represented 75% of all websites.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage7.jpg" alt="Distribution of Websites by TLD - Chrome User Experience Report"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most of these top level domains experienced a &amp;gt; 20% growth in active websites since December 2019, with the exception of .info and .net. The domains with the largest percentage growth were co.uk, com.au and de.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage8.jpg" alt="% Growth in Websites by TLD - December 2019 - August 2020"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we look at the month to month growth trends for these TLDs, we can make a few interesting observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There was a significant drop across all TLDs in March 2020.
&lt;/li&gt;
&lt;li&gt;The largest percentage drop was for .it domains in March 2020, although that rebounded with increases in April, May and June.&lt;/li&gt;
&lt;li&gt;In February 2020, there was a 23.9% increase in .edu domains receiving traffic. &lt;/li&gt;
&lt;li&gt;In May 2020, more than a dozen popular TLDs saw a double-digit increase in the number of sites. &lt;/li&gt;
&lt;li&gt;In August 2020 there was a 10.4% increase in edu domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpaulcalvano.com%2Fassets%2Fimg%2Fblog%2Fgrowth-of-the-web-in-2020%2Fimage9.jpg" alt="Month/Month % Growth of Websites in CrUX"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The web is constantly growing and evolving, and clearly it’s rate of growth can vary quite a bit. During this analysis we explored a public dataset that Google provides to show how the web has grown during 2020, and which regions are growing the most.  While this doesn’t speak to the traffic levels experienced in these locations, the number of websites can be used as a proxy for understanding usage of the web. As this analysis shows, 2020 has been a year of substantial global growth for the web.&lt;/p&gt;

&lt;p&gt;If you are interested in seeing some of the SQL queries and raw data used in this analysis, I’ve created a post with all the details in the &lt;a href="https://discuss.httparchive.org/t/growth-of-the-web-in-2020/2029" rel="noopener noreferrer"&gt;HTTP Archive discussion forums&lt;/a&gt;. You can also see all the data used for these graphs in this &lt;a href="https://docs.google.com/spreadsheets/d/1eGfT0dBslpSl8Xl6ey7a_fNT0Cj5tTfJzX6gpwpXDBE/edit?usp=sharing" rel="noopener noreferrer"&gt;Google Sheet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally posted at &lt;a href="https://paulcalvano.com/2020-09-29-growth-of-the-web-in-2020/" rel="noopener noreferrer"&gt;paulcalvano.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webperf</category>
      <category>web</category>
      <category>httparchive</category>
    </item>
    <item>
      <title>An Analysis of Cookie Sizes on the Web</title>
      <dc:creator>Paul Calvano</dc:creator>
      <pubDate>Mon, 13 Jul 2020 14:28:47 +0000</pubDate>
      <link>https://dev.to/httparchive/an-analysis-of-cookies-sizes-on-the-web-1mea</link>
      <guid>https://dev.to/httparchive/an-analysis-of-cookies-sizes-on-the-web-1mea</guid>
      <description>&lt;p&gt;&lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies"&gt;Cookies&lt;/a&gt; are used on a lot of websites - 83.9% of the 5.7 million home pages tracked in the&lt;a href="https://httparchive.org/"&gt; HTTP Archive&lt;/a&gt; to be specific. They are essentially a name/value pair set by a server and stored in a client’s browser. Sites can store these cookies by using the &lt;code&gt;Set-Cookie&lt;/code&gt; HTTP response header, or via JavaScript (&lt;code&gt;document.cookie&lt;/code&gt;). On subsequent requests, these cookies are sent to the server in a &lt;code&gt;Cookie&lt;/code&gt; HTTP request header. &lt;/p&gt;

&lt;p&gt;In this article we’ll be looking at the size of cookies across the web, and discuss some of the web performance implications of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First vs Third Party cookies?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a cookie is set by the domain you see in your address bar, it is considered a first party cookie. These can be used for session management, authentication, personalization, etc. When a cookie is set by a different domain, then it’s considered a third party cookie.&lt;/p&gt;

&lt;p&gt;Based on an analysis of over 109 million cookies, third parties account for 79% of all cookies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bFiy0fAJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/635h6v978qjf1a3egup0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bFiy0fAJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/635h6v978qjf1a3egup0.png" alt="First vs Third Party Cookies"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: I wrote another blog post exploring the use of the SameSite attribute in cookie files, and how third party cookies are affected. You can read it&lt;a href="https://dev.to/httparchive/samesite-cookies-are-you-ready-5abd"&gt; here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cookie Sizes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An example of a cookie set by a server would be:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Set-Cookie: loggedIn=true; Domain=.example.com; Path=/; Max-Age=14400; HttpOnly&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Directives such as &lt;code&gt;Domain&lt;/code&gt;, &lt;code&gt;Path&lt;/code&gt;, &lt;code&gt;max-age&lt;/code&gt;, and &lt;code&gt;HttpOnly&lt;/code&gt; affect how the cookie is stored, and which hostname a browser should share them with. In this example, &lt;code&gt;loggedIn=true&lt;/code&gt; is the name/value portion of the cookie, and that is what we’ll be exploring in this post.&lt;/p&gt;

&lt;p&gt;The median length of all cookies in the HTTP Archive is 36 bytes as of June 2020. That statistic is consistent across both first and third party cookies. The minimum is just a single byte, usually set by empty Set-Cookie headers (which is likely an error).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;Cookie Size&lt;/td&gt;
   &lt;td&gt;First Party&lt;/td&gt;
   &lt;td&gt;Third Party&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Min &lt;/td&gt;
   &lt;td&gt;1&lt;/td&gt;
   &lt;td&gt;1&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Median&lt;/td&gt;
   &lt;td&gt;36&lt;/td&gt;
   &lt;td&gt;37&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;95th Percentile&lt;/td&gt;
   &lt;td&gt;181&lt;/td&gt;
   &lt;td&gt;135&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;99th Percentile&lt;/td&gt;
   &lt;td&gt;287&lt;/td&gt;
   &lt;td&gt;248&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Max&lt;/td&gt;
   &lt;td&gt;29,735&lt;/td&gt;
   &lt;td&gt;8,500&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The largest cookie size was 29,735 bytes, which is quite large! This is so large in fact that it is rejected by all modern browsers. I was curious to see what the limits are, and decided to dig into the source. Both&lt;a href="https://source.chromium.org/chromium/chromium/src/+/master:net/cookies/parsed_cookie.h;l=24"&gt; Chrome&lt;/a&gt; and&lt;a href="https://dxr.mozilla.org/mozilla-central/source/netwerk/cookie/CookieCommons.h#57"&gt; Firefox&lt;/a&gt; will reject cookies greater than 4KB. This is likely due to the &lt;a href="https://tools.ietf.org/html/rfc6265#section-6.1"&gt;implementation limits defined in RFC 6265&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So who is setting these large cookies?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The largest first party cookie was set by&lt;a href="https://www.ridewill.it/"&gt; https://www.ridewill.it/&lt;/a&gt;, and it is named menu. It’s value was a long urlencoded string that contained multiple &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; layers and links. All in all, there were 240 links in a single cookie!&lt;/li&gt;
&lt;li&gt;  Many large first party cookies included large session cookies.&lt;/li&gt;
&lt;li&gt;  The largest third party cookie was set by web.taggbox.com, and consisted of a large JSON array named “liveWall”. &lt;/li&gt;
&lt;li&gt;  Most of the largest third party cookies were set from web.taggbox.com as well as a small number of advertising third parties.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we look at the entire distribution of cookie sizes, it gets even more interesting. 88% of the cookies being set are less than 100 bytes. The 99th percentile is 372 bytes. So really large individual cookies are not common.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9X9iKffY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/1nyyf4rkx48ubjbefzsq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9X9iKffY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/1nyyf4rkx48ubjbefzsq.png" alt="Distribution of Set-Cookie sizes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cookies Sent to First Party Domain&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What about cookies that clients send back to servers? A client can send multiple cookies in a single &lt;code&gt;Cookie&lt;/code&gt; request header. Since the HTTP Archive only collects information on homepages, there is a limit to the insight we can collect here. If we look at just the request headers for favicon.ico, we can get an idea of how large the Cookie request header might be for a subsequent request. However this does not include any additional cookies set later in a session (ie, such as after logging in)&lt;/p&gt;

&lt;p&gt;The median size of cookies sent on the favicon request was 161 bytes and the 95th percentile was 681. The largest was 7,795 bytes, and you can see the distribution below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1LFUcyE6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zqx086gzefbflj091p9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1LFUcyE6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zqx086gzefbflj091p9b.png" alt="Distribution of Request Cookie sizes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s important to note that the cookies set by the time favicon was requested may under represent the size of the cookies users would send later in a browsing session. For example, when logging into an application, a few additional cookies might be set. Some third parties that use a first party subdomain (ie, if&lt;a href="http://www.example.com"&gt; www.example.com&lt;/a&gt; loaded a resource from metrics.example.com) also set a first party cookie.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance Implications of Large Cookies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a browser sends an HTTP request, the HTTP request headers are usually 400 - 500 bytes. In the example below the request headers total 407 bytes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET / HTTP/1.1
Host: www.example.com
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Adding a cookie to that will increase the size of the request header. If we add more than 1KB of cookies to that request, then we exceed 1500 bytes, which is the standard &lt;a href="https://en.wikipedia.org/wiki/Maximum_transmission_unit"&gt;maximum transmission unit (MTU)&lt;/a&gt; used by TCP. This means that the HTTP request would span multiple TCP packets, which may result in multiple round trips and increase the risk of retransmission. This can potentially increase the TTFB of the response since it would take longer to make the request.&lt;/p&gt;

&lt;p&gt;With HTTP/2, we have&lt;a href="https://tools.ietf.org/html/rfc7541"&gt; HPACK compression&lt;/a&gt; - which was designed to help reduce the size of HTTP request headers by utilizing a dynamic index table. Receiving endpoints &lt;a href="https://tools.ietf.org/html/rfc7540#section-6.5.2"&gt;advertise the maximum size of this table&lt;/a&gt; in bytes (default is 4096), the sender can insert headers up to this limit in one request or response and subsequently reference it in another. In theory HPACK seems like it could help reduce the overhead of these large cookies. However, it’s not as easy as it sounds. Consider the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; If a request is made on a new HTTP/2 connection (ie, for a return visitor, or a navigation that happened after the previous connection times out), then the entire cookie string would be sent. This would impact TTFB for that first request (usually the HTML).&lt;/li&gt;
&lt;li&gt; Some servers intentionally exclude cookies from HPACK compression, since cookies are a very valuable target for an attacker. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of the practical constraints on compressing cookies, and the impact of cookie size on the first flight of requests and responses, it is beneficial to use smaller cookies. Based on the observations here, 900 bytes seems like a good budget for a total cookie size, which leaves room for other headers such as user-agent (which can benefit from HPACK compression).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cookies are everywhere, and they are set by both first and third party requests. Most browsers will limit the size of cookies stored to 4KB - which is still quite large.  &lt;/p&gt;

&lt;p&gt;At a minimum, setting a cookie larger than 4KB will simply not work. However, 4KB is still too large and can negatively impact your TTFB by increasing the number of round trips on the HTTP response.&lt;/p&gt;

&lt;p&gt;Even moreso, it’s important to keep track of how many cookies are being sent, as bloated cookies can also impact your TTFB by increasing the time it takes to make an HTTP request.&lt;/p&gt;

&lt;p&gt;If you are interested in seeing some of the SQL queries and raw data used in this analysis, I’ve created a post with all the details in the &lt;a href="https://discuss.httparchive.org/t/analysis-of-cookie-size/1991"&gt;HTTP Archive discussion forums&lt;/a&gt;. You can also see all the data used for these graphs in &lt;a href="https://docs.google.com/spreadsheets/d/1pO3lUWc41jw43rtk8JCxZi3tGFBxEV__Q-K4DZ0lc_s/edit?usp=sharing"&gt;this Google Sheet&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Many thanks to &lt;a href="https://twitter.com/SimmerVigor"&gt;Lucas Pardue&lt;/a&gt; and &lt;a href="https://twitter.com/ringel"&gt;Matt Ringel&lt;/a&gt; for reviewing this.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>webperf</category>
      <category>cookies</category>
      <category>httparchive</category>
    </item>
    <item>
      <title>SameSite Cookies - Are you Ready?</title>
      <dc:creator>Paul Calvano</dc:creator>
      <pubDate>Tue, 07 Jul 2020 20:43:39 +0000</pubDate>
      <link>https://dev.to/httparchive/samesite-cookies-are-you-ready-5abd</link>
      <guid>https://dev.to/httparchive/samesite-cookies-are-you-ready-5abd</guid>
      <description>&lt;p&gt;Last year Google&lt;a href="https://blog.chromium.org/2019/05/improving-privacy-and-security-on-web.html"&gt; announced&lt;/a&gt; updates to Chrome that provide a way for developers to control how cross site cookies should work on their sites. This is a good change - as it ultimately improves end user security and privacy by limiting which third parties can read cookies that were set while visiting a different site. It also defeats &lt;a href="https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)"&gt;cross site request forgery attacks&lt;/a&gt;. The implementation is fairly simple, and only requires developers to add the SameSite attribute to their cookies. &lt;/p&gt;

&lt;p&gt;The SameSite attribute is &lt;a href="https://caniuse.com/#feat=same-site-cookie-attribute"&gt;supported by all modern browsers&lt;/a&gt;, and most have historically defaulted to a permissive use of cookies if the attribute isn’t present. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5_-GJ0ak--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gflnfrrax9xc44z8rxgb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5_-GJ0ak--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gflnfrrax9xc44z8rxgb.png" alt="SameSite Cookie Browser Support"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google changed the default behavior of SameSite attribute to secure cookies by default when Chrome 80 was released in February 2020. However it was&lt;a href="https://blog.chromium.org/2020/05/resuming-samesite-cookie-changes-in-july.html"&gt; rolled back in April 2020&lt;/a&gt;&lt;span&gt; &lt;/span&gt;to ensure stability during the initial stage of the COVID-19 response. Now they are planning to&lt;a href="https://blog.chromium.org/2020/05/resuming-samesite-cookie-changes-in-july.html"&gt; resume SameSite cookie enforcement&lt;/a&gt; with Chrome 84, which will be released on July 14th. &lt;/p&gt;

&lt;p&gt;Despite almost a year of notice and warnings in the browser console, this seemed to catch many by surprise in February.  How ready are third parties for this now?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Cross Site Third Party Cookie?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a request on &lt;a href="http://www.example.com"&gt;www.example.com&lt;/a&gt; sets the following cookie from its domain, then the browser will store this cookie and send it back on subsequent requests to the same domain. This is an example of a first party cookie. Essentially a cookie whose domain matches the domain that appears in the address bar...&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Set-Cookie: session=abc; path=/; Secure; HttpOnly;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now let’s assume that &lt;a href="http://www.example.com"&gt;www.example.com&lt;/a&gt; includes a third party analytics provider, metrics.analyticsexample.com. When the third party request is made, that third party can also set a cookie in the end users browser. And that third party will be able to read the cookie. This is an example of a third party cookie.&lt;/p&gt;

&lt;p&gt;If that same user then navigated to &lt;a href="http://www.example2.com"&gt;www.example2.com&lt;/a&gt;, which uses the same third party analytics provider, then their third party cookies would be readable by them across both sites. The third party is then able to track the user across multiple websites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SameSite Cookies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The SameSite cookie attribute was introduced in a &lt;a href="https://tools.ietf.org/html/draft-west-first-party-cookies-06"&gt;2016 IETF draft&lt;/a&gt;, but had not been widely adopted initially.  This attribute provided developers with the ability to control when a browser would send a cookie to a third party. Using them is simply a matter of adding the SameSite attribute to a cookie declaration, with one of the three supported values: “None”, “Lax”, and “Strict”.&lt;/p&gt;

&lt;p&gt;This provides the following controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  SameSite=None

&lt;ul&gt;
&lt;li&gt;  The browser will send cookies with both cross-site requests and same-site requests.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  SameSite=Lax

&lt;ul&gt;
&lt;li&gt;  Same-site cookies are withheld on cross-site sub-requests, such as calls to load images or frames, but will be sent when a user navigates to the URL from an external site; for example, by following a link.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  SameSite=Strict

&lt;ul&gt;
&lt;li&gt;  The browser will only send cookies for same-site requests (requests originating from the site that set the cookie). If the request originated from a different URL than the URL of the current location, none of the cookies tagged with the Strict attribute will be included.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An example of how this is configured is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Set-Cookie: key=value; SameSite=Strict&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If the SameSite attribute is not included, then most browsers have historically defaulted to the most permissive behavior: SameSite=None.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Chrome’s Update&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google has been planning to update the behavior of SameSite within the Chrome browser to default to the more secure SameSite=Lax. Additionally, if a SameSite=None attribute is present, then they would require that the cookie have the “Secure” attribute. There was some concern that this change would cause breakage for some third parties, so a warning message was included in Chrome since version 77 (September 2019).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A cookie associated with a cross-site resource at  was set without the &lt;code&gt;SameSite&lt;/code&gt; attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with &lt;code&gt;SameSite=None&lt;/code&gt; and &lt;code&gt;Secure&lt;/code&gt;. You can review cookies in developer tools under Application&amp;gt;Storage&amp;gt;Cookies and see more details at &lt;a href="https://www.chromestatus.com/feature/5088147346030592"&gt;https://www.chromestatus.com/feature/5088147346030592&lt;/a&gt; and&lt;a href="https://www.chromestatus.com/feature/5633521622188032"&gt; https://www.chromestatus.com/feature/5633521622188032&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;SameSite Usage Across the Web&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://httparchive.org/"&gt;HTTP Archive&lt;/a&gt; stores a tremendous amount of detail for every HTTP request and response for approximately 5.8 million homepages. In the June 2020 data, there were approximately 108 million third party cookies set across 3.79 million homepages. Of these cookies 35,721,768 (32.9%) included the SameSite attribute.  Comparatively in August 2019, 21.4% of cookies had the SameSite attribute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yK40uYjg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/a1lrpyd2wp7irwpfhbjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yK40uYjg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/a1lrpyd2wp7irwpfhbjs.png" alt="SameSite usage for third party cookies"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Due to a collection issue described&lt;a href="https://discuss.httparchive.org/t/does-bigquery-contain-har-archive-or-cookies-of-crawled-webpages/1968/8"&gt; here&lt;/a&gt;, ~18.6% of third party cookies were unreadable in the June 2020 HTTP Archive data. The remainder of this analysis is on the cookies we could read.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The “Secure” flag of a cookie ensures that the browser only sends the cookie over HTTPS. Chrome made this a requirement to use SameSite=None. Out of the 35 million cookies, nearly 75% of them use the Secure flag. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iLttFAI0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/00imlj5llluyriyrzz2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iLttFAI0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/00imlj5llluyriyrzz2g.png" alt="Table Breakdown of Secure vs Non-Secure Cookies and SameSite attribuets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we look at this graphically, there are a few interesting observations we can make:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  SameSite=Lax, which will be the new default, is in use by only 10.82% of secure cookies, but 97% of insecure cookies.&lt;/li&gt;
&lt;li&gt;  SameSite=None is present on 89.10% of Secure cookies.&lt;/li&gt;
&lt;li&gt;  2.65% (238,810) insecure cookies are set with SameSite=None, but not including the Secure flag.  These will default to SameSite=Lax.&lt;/li&gt;
&lt;li&gt;  Only 0.06% (16,000) of secure cookies are using SameSite=Strict!&lt;/li&gt;
&lt;li&gt;  There are less than 1000 SameSite attributes set to an erroneous value (ie, not Lax, Strict or None).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Scwm9sMs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gvta0gbt9uemri927dff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Scwm9sMs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gvta0gbt9uemri927dff.png" alt="SameSite Usage for Third Party Cookies, Secure and non-Secure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who is using SameSite=None incorrectly?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There were 238,810 third party cookies set with SameSite=None, but missing the Secure flag. These will default to SameSite=Lax in Chrome 84 unless the Secure flag is added. Overall, there were 1749 third party domains that made this error. The top 5 account for 48% of the erroneous SameSite cookies. This includes Spotxchange, ETargetNet, SmartAdServer, BazaarVoice and EntityTag. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wq0dJn2B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3ehsetnuxb2l828noj1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wq0dJn2B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3ehsetnuxb2l828noj1a.png" alt="Cookies with SameSite=None, but not secure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even more concerning than the number of cookies, is the number of websites that are affected. For example spotxchange.com is setting SameSite=None with insecure cookies on 26,174 websites. EntityTag.co.uk is doing the same for 14,358 websites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who is using SameSite Strict?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I thought it was odd that SameSite=Strict was used so infrequently. The table below shows some of the third parties that are using it. The top 10 account for 67% of all SameSite=Strict usage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yL3aoV6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p96dsxgsqdo99n11ylwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yL3aoV6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/p96dsxgsqdo99n11ylwq.png" alt="Cookies with SameSite=Strict"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Erroneous SameSite Attribute Values are in Use?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I was surprised to see that there was such a low percentage of erroneous usage of the SameSite attribute. The top 10 errors listed below account for 80% of the erroneous uses.  Most were SameSite=Secure, SameSite with no value, and SameSite: Lax. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--USlHWOs7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/6i8n6hty9254t2hf8lh7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--USlHWOs7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/6i8n6hty9254t2hf8lh7.png" alt="Erroneous SameSite Usage"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Many of these errors were from a small number of third parties. For example:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  SameSite=secure was set by wildbeardbg.com:

&lt;ul&gt;
&lt;li&gt;  dg_perm_sessid=95951591788161; secure; SameSite=Secure&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  SameSite: Lax was set by nytimes.com.

&lt;ul&gt;
&lt;li&gt;  nyt-purr=cfshcfhssc; Expires=Fri, 18 Jun 2021 01:02:51 GMT; Path=/; Domain=.nytimes.com;SameSite: Lax;Secure&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;  SameSite=; was set by cloudengage.com

&lt;ul&gt;
&lt;li&gt;  CEUID=E80y%2BgcVMl0o6Skdol5X9FRCrdrbqBNssQeOYrNUnQxC42U4gpGNV6VX; expires=Wed, 01-Jul-2020 16:16:13 GMT; Max-Age=2592000; path=/; samesite=; domain=.cloudengage.com; secure; HttpOnly
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SameSite Usage Across Popular Third Parties&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So far we’ve looked at how SameSite is being used across third parties. But which third parties are not setting SameSite attributes? By not setting it, they will default to SameSite=Lax. Whether or not this is intentional or will cause breakage will depend on each third party's usage of the cookies. But not setting an explicit SameSite attribute could be indicative of whether a third party has prepared for this.&lt;/p&gt;

&lt;p&gt;The graph below shows SameSite usage across some of the most popular third parties. Very few sites third parties are setting SameSite=Lax, which is about to become the new default.  Some of these third parties are used by hundreds of thousands of sites. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YJR1vfLW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zofi46l9kj16ojfc8g3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YJR1vfLW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/zofi46l9kj16ojfc8g3n.png" alt="SameSite Usage Across Popular Third Parties"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SameSite cookies are a huge win for privacy and security, but there is a risk that Chrome’s new default settings will cause problems. For many cases, this will likely render some cross site tracking techniques ineffective with little change to end user experience. However, with any changes like this there are risks of breakage and serious site issues. With SameSite adoption at less than 33% of third party cookies, this raises the question of how prepared third parties are for this.&lt;/p&gt;

&lt;p&gt;It’s important to note that the absence of a SameSite attribute does not necessarily mean that there will be breakage. However depending on how the cookie is used, it has the potential of becoming problematic. It may be worth checking the JS console in Chrome DevTools to see if you see the SameSite warning for any of your third parties. You can also set a flag in Chrome to test how this will affect your site ahead of the Chrome 84 release. The Chrome team has published a &lt;a href="https://www.chromium.org/updates/same-site/test-debug"&gt;useful guide for debugging SameSite cookies&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you are interested in seeing some of the SQL queries and raw data used in this analysis, I’ve created a post with all the details in the &lt;a href="https://discuss.httparchive.org/t/samesite-cookies-analysis/1988"&gt;HTTP Archive discussion forums&lt;/a&gt;. You can also see all the data used for these graphs in &lt;a href="https://docs.google.com/spreadsheets/d/1-zx1AmcvDDSKjOLsM3-AbV9qVXV-Of6dmps8LG-oMSk/edit?usp=sharing"&gt;this Google Sheet&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>httparchive</category>
      <category>cookies</category>
      <category>webtransparency</category>
    </item>
    <item>
      <title>The 2019 Web Almanac is now available as a free ebook!</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Sat, 23 May 2020 22:02:41 +0000</pubDate>
      <link>https://dev.to/httparchive/the-2019-web-almanac-is-now-available-as-a-free-ebook-1b67</link>
      <guid>https://dev.to/httparchive/the-2019-web-almanac-is-now-available-as-a-free-ebook-1b67</guid>
      <description>&lt;p&gt;Last year &lt;a href="https://almanac.httparchive.org/en/2019/contributors?teams=authors"&gt;29&lt;/a&gt; subject matter experts from the web community came together in a massive effort to document the state of the web, called the &lt;a href="https://almanac.httparchive.org"&gt;Web Almanac&lt;/a&gt;. They wrote about 20 topics in the areas of page content, user experience, content publishing, and distribution. Chapters include JavaScript, CSS, Performance, SEO, Ecommerce, CMS, CDN, HTTP/2, and many more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_hfTyjkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/rufd2e2iimo2505ltk0v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_hfTyjkE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/rufd2e2iimo2505ltk0v.jpg" alt="Screenshot of the Caching chapter in the 2019 Web Almanac ebook"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you can &lt;a href="https://almanac.httparchive.org/static/pdfs/web_almanac_2019_en.pdf"&gt;download an ebook&lt;/a&gt; of the entire 2019 edition (for free). It's 421 pages and 18 MB of solid research and analysis from trusted web experts packaged up for ergonomic e-reading. We've also translated the entire contents into &lt;a href="https://almanac.httparchive.org/static/pdfs/web_almanac_2019_ja.pdf"&gt;Japanese&lt;/a&gt;. Or if you'd prefer to browse the content on the web, you can always visit &lt;a href="https://almanac.httparchive.org/en/2019/"&gt;almanac.httparchive.org&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The mission of the Web Almanac project is to document and raise awareness of the state of the web using the huge amounts of data from millions of websites in the &lt;a href="https://httparchive.org"&gt;HTTP Archive&lt;/a&gt;. This project is invaluable to understanding how the web is trending and the areas in which we need to be doing better. I hope you check it out!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>webalmanac</category>
      <category>free</category>
    </item>
    <item>
      <title>Certificate Validity Dates</title>
      <dc:creator>Paul Calvano</dc:creator>
      <pubDate>Thu, 20 Feb 2020 20:55:52 +0000</pubDate>
      <link>https://dev.to/httparchive/certificate-validity-dates-1d50</link>
      <guid>https://dev.to/httparchive/certificate-validity-dates-1d50</guid>
      <description>&lt;p&gt;Back in 2017 the maximum validity lifetime for an HTTPS certificate was &lt;a href="https://cabforum.org/2017/03/17/ballot-193-825-day-certificate-lifetimes/"&gt;set to 825 days&lt;/a&gt;, a decision that was widely supported by both browsers and certificate authorities. However, since then there have been multiple unsuccessful attempts at reducing the maximum lifetime to one year. &lt;a href="https://twitter.com/Scott_Helme"&gt;Scott Helme&lt;/a&gt; has &lt;a href="https://scotthelme.co.uk/ballot-sc22-reduce-certificate-lifetimes/"&gt;written about this previously&lt;/a&gt;, and his blog post noted that browser vendors unanimously supported this while some certificate authorities objected to it.&lt;/p&gt;

&lt;p&gt;Information is still trickling in, but it seems that Safari is planning to enforce a max validity lifetime of 398 days effective September 1st, 2020.&lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--HNThWLvv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1102690863233359875/dF-wxPzS_normal.png" alt="Dean Coclin profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Dean Coclin
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @chosensecurity
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P4t6ys1m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Today's big news: One year max public TLS certs are coming, starting 1 Sept 2020, if you want to be trusted in Safari.
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      22:10 PM - 19 Feb 2020
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1230253348236013570" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-reply-action.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1230253348236013570" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-retweet-action.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      103
      &lt;a href="https://twitter.com/intent/like?tweet_id=1230253348236013570" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/twitter-like-action.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
      149
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;Let’s take a look at the state of certificate validity ranges today, so we can track how this evolves over the next few months. The data for this comes from the &lt;a href="https://httparchive.org"&gt;HTTP Archive&lt;/a&gt;, which is an open source project that tracks how the web is built. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate Validity Dates in the Wild&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The HTTP Archive requests table contains certificate details for every HTTPS request that was served to the 5.3 million websites tracked. The details are in the &lt;code&gt;$._securityDetails&lt;/code&gt; and the data contains information on 4,397,690 unique hosts. Out of all of these, 136,007 hostnames served a validity date greater than 825 days! That’s 3.09% of all requests!&lt;/p&gt;

&lt;p&gt;Certificate validity dates are widely distributed, but there seems to be a few popular ranges. THe most common validity date is 90 days, likely to the popularity of LetsEncrypt Overall 55% of certificates have a validity date of less than 364 days, which I’ve highlighted in green below. An additional 20% of certificates have a validity between 365 and 398 days, which will meet Safari’s requirements. The remaining 25% have a validity range of more than 398 days.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4Qx8Dpj9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/AOBjFo3j8o70P2yGzR0HGLyJ28q1MxwwsLCW3HkDU9_5lCdxPLyRjioJfnKCHGr4sgLQNTS9MDJwQxKb9kcQtDBztwXX1CpIjdvrP_K3fHsKr-3KOeukfoMaWSbmKWll1fkBMM2E" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4Qx8Dpj9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/AOBjFo3j8o70P2yGzR0HGLyJ28q1MxwwsLCW3HkDU9_5lCdxPLyRjioJfnKCHGr4sgLQNTS9MDJwQxKb9kcQtDBztwXX1CpIjdvrP_K3fHsKr-3KOeukfoMaWSbmKWll1fkBMM2E" alt="|624x181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at this by certificate authority is also quite interesting. The graph below shows the top 15 certificate authorities. LetsEncrypt accounts for 38.4% of all certificates -&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5ICX-g5c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/tmVhJjeDtjJatp4JQ2G030Xqy4zar5WP3oOq-oXmSNau8yg1uUJLXMpNwTntspj_9myJGGYZ0pwe96YYS47aZBP99TQl7Y00m1WC-YHlOnGIw0BvbMEPjCpnEkFCMBtq5nRQ3xiF" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5ICX-g5c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/tmVhJjeDtjJatp4JQ2G030Xqy4zar5WP3oOq-oXmSNau8yg1uUJLXMpNwTntspj_9myJGGYZ0pwe96YYS47aZBP99TQl7Y00m1WC-YHlOnGIw0BvbMEPjCpnEkFCMBtq5nRQ3xiF" alt="|624x227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we look at certificate validity date ranges for these certificate authorities, we can see that there’s a mix. Certificates issued by LetsEncrypt, Cloudflare, cPanel and Amazon meet the 398 day requirement already. However, Sectigo, GoDaddy, DigiCert, Comodo and RapidSSL have a very large percentage of certificates that exceed 398 days. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6QK_y5AH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/tNsRLp4kNsUJ55xIHvynO_7-pxNKooBth07BFajaVIODH1vZvlvswd_gA72N3rAvj8Ke4Z_DxW6ogi7Sworl4V8SRN0e4MePBE5xsV_nvC8fho0dyKM96GBgh_9R7puv5KE_j0up" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6QK_y5AH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/tNsRLp4kNsUJ55xIHvynO_7-pxNKooBth07BFajaVIODH1vZvlvswd_gA72N3rAvj8Ke4Z_DxW6ogi7Sworl4V8SRN0e4MePBE5xsV_nvC8fho0dyKM96GBgh_9R7puv5KE_j0up" alt="|624x284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.digicert.com/position-on-1-year-certificates/"&gt;DigiCert released a public statement&lt;/a&gt; yesterday, which confirms that existing certificates with a validity range &amp;gt;398 days will continue to be trusted by Safari, but that certificates issued after August 30th won’t be able to exceed 398 days.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For your website to be trusted by Safari, you will no longer be able to issue publicly trusted TLS certificates with validities longer than 398 days after Aug. 30, 2020. Any certificates issued before Sept. 1, 2020 will still be valid, regardless of the validity period (up to 825 days). Certificates that are not publicly trusted can still be recognized, up to a maximum validity of 825 days. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m interested in seeing how this will evolve in the coming months, especially once there are more formal announcements about this. If you would like to see the queries used for this analysis, I've detailed them in this &lt;a href="https://discuss.httparchive.org/t/certificate-validity-dates/1874"&gt;HTTP Archive discussion forum post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Many thanks to Scott Helme for reviewing this.&lt;/p&gt;

</description>
      <category>httparchive</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I created the Web Almanac. Ask me anything about the state of the web!</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Sun, 24 Nov 2019 02:17:24 +0000</pubDate>
      <link>https://dev.to/httparchive/i-created-the-web-almanac-ask-me-anything-about-the-state-of-the-web-3i3c</link>
      <guid>https://dev.to/httparchive/i-created-the-web-almanac-ask-me-anything-about-the-state-of-the-web-3i3c</guid>
      <description>&lt;p&gt;Hi everyone! I'm doing my first AMA to get people thinking about the state of the web.&lt;/p&gt;

&lt;p&gt;My day job for the past ~2 years has been web developer relations at Google. Prior to that I worked on web performance for ~4 years at YouTube. In my current role I'm a steward of web transparency datasets like the &lt;a href="https://httparchive.org/"&gt;HTTP Archive&lt;/a&gt; and &lt;a href="https://web.dev/chrome-ux-report/"&gt;Chrome UX Report&lt;/a&gt; projects. Web transparency is all about cultivating a public body of knowledge about how the web is built and experienced. I also host a video series called the &lt;a href="https://www.youtube.com/playlist?list=PLNYkxOF6rcIBGvYSYO-VxOsaYQDw5rifJ"&gt;State of the Web&lt;/a&gt; where I interview members of the community about web trends and technologies. My job takes me all around the world to meet with developers at conferences, share transparency data, and hear their stories about building on the web.&lt;/p&gt;

&lt;p&gt;The big project I've been working on this year is the &lt;a href="https://almanac.httparchive.org/en/2019/"&gt;Web Almanac&lt;/a&gt;, the first annual edition of HTTP Archive's report on the state of the web, which launched at Chrome Dev Summit last week. I led the project and coordinated with 80+ community contributors to build everything from scratch (planning/writing content, researching stats, developing the website, etc). The end result is a massive resource that sheds a light on how the web is doing at the scale of millions of websites.&lt;/p&gt;

&lt;p&gt;Here are some of the interesting insights from each chapter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/javascript#open-source-libraries-and-frameworks"&gt;jQuery is found on 85% of web pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/css#popular-z-index-values"&gt;the largest known z-index is 780 digits (!important)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/markup#perspective-on-value-and-usage"&gt;there are only 11 types of HTML elements that are found on 90+% of web pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/media"&gt;the median web page is two-thirds images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/third-parties#data"&gt;94% of web pages contain at least one third party&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/fonts#where-did-you-get-those-web-fonts"&gt;Google Fonts makes up 75% of all pages' web fonts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/performance#first-contentful-paint"&gt;13% of websites deliver consistently fast performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/security#content-security-policy"&gt;Content Security Policy is used on 5% of web pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/accessibility#color-contrast"&gt;78% of mobile pages have color contrast issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/seo#content"&gt;the median desktop web page contains 346 words&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/pwa#service-workers"&gt;0.4% of pages register a service worker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/mobile-web#zooming-and-scaling"&gt;one third of mobile web pages disable zooming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/ecommerce"&gt;10% of pages use an ecommerce platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/cms#cms-adoption"&gt;40% of pages use a CMS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/compression#what-types-of-content-are-we-compressing"&gt;56% of HTML resources are uncompressed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/caching#overview-of-http-caching"&gt;72% of HTTP responses include a cache control header&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/cdn#cdn-adoption-and-usage"&gt;20% of web pages use a CDN for their HTML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/page-weight#page-weight"&gt;the median desktop page weighs 1,934 KB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/resource-hints#resource-hints"&gt;29% of web pages are using &lt;code&gt;dns-prefetch&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://almanac.httparchive.org/en/2019/http2#adoption-of-http2"&gt;54% of HTTP responses are served over HTTP/2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you find any of this interesting, I'd love to hear your questions about web transparency, the state of the web, the Almanac project, or anything. AMA!&lt;/p&gt;

</description>
      <category>ama</category>
      <category>discuss</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Web Almanac 2019 is live!</title>
      <dc:creator>Rick Viscomi</dc:creator>
      <pubDate>Mon, 11 Nov 2019 05:41:29 +0000</pubDate>
      <link>https://dev.to/httparchive/the-web-almanac-2019-is-live-4f98</link>
      <guid>https://dev.to/httparchive/the-web-almanac-2019-is-live-4f98</guid>
      <description>&lt;p&gt;I finally get to share with everyone what I've been working on for so long! It's called the &lt;a href="https://almanac.httparchive.org/en/2019/"&gt;Web Almanac&lt;/a&gt; and it's a free, open source, community-made "state of the web" report.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the Web Almanac?
&lt;/h2&gt;

&lt;p&gt;Here is an excerpt of the &lt;a href="https://almanac.httparchive.org/en/2019/table-of-contents"&gt;foreword&lt;/a&gt; I wrote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The open web is an amazingly complex, evolving network of technologies. Entire industries and careers are built on the web and depend on its vibrant ecosystem to succeed. As critical as the web is, understanding how it's doing has been surprisingly elusive. Since 2010, the mission of the HTTP Archive project has been to track how the web is built, and it's been doing an amazing job of it. However, there has been one gap that has been especially challenging to close: bringing meaning to the data that the HTTP Archive project has been collecting and enabling the community to easily understand how the web is performing. That's where the Web Almanac comes in.&lt;/p&gt;

&lt;p&gt;The mission of the Web Almanac is to take the treasure trove of insights that would otherwise be accessible only to intrepid data miners, and package it up in a way that's easy to understand. This is made possible with the help of industry experts who can make sense of the data and tell us what it means. Each of the 20 chapters in the Web Almanac focuses on a specific aspect of the web, and each one has been authored and peer reviewed by experts in their field. The strength of the Web Almanac flows directly from the expertise of the people who write it.&lt;/p&gt;

&lt;p&gt;Many of the findings in the Web Almanac are worthy of celebration, but it's also an important reminder of the work still required to deliver high-quality user experiences. The data-driven analyses in each chapter are a form of accountability we all share for developing a better web. It's not about shaming those that are getting it wrong, but about shining a guiding light on the path of best practices so there is a clear, right way to do things. With the continued help of the web community, we hope to make this an annual tradition, so each year we can track our progress and make course corrections as needed.&lt;/p&gt;

&lt;p&gt;There is so much to learn in this report, so start exploring and share your takeaways with the community so we can collectively advance our understanding of the state of the web.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What's inside
&lt;/h2&gt;

&lt;p&gt;There are 20 chapters organized into four main parts exploring different aspects of the web.&lt;/p&gt;

&lt;h3&gt;
  
  
  Part I. Page Content
&lt;/h3&gt;

&lt;p&gt;Chapter 1. &lt;a href="https://almanac.httparchive.org/en/2019/javascript"&gt;JavaScript&lt;/a&gt;&lt;br&gt;
Chapter 2. &lt;a href="https://almanac.httparchive.org/en/2019/css"&gt;CSS&lt;/a&gt;&lt;br&gt;
Chapter 3. &lt;a href="https://almanac.httparchive.org/en/2019/markup"&gt;Markup&lt;/a&gt;&lt;br&gt;
Chapter 4. &lt;a href="https://almanac.httparchive.org/en/2019/media"&gt;Media&lt;/a&gt;&lt;br&gt;
Chapter 5. &lt;a href="https://almanac.httparchive.org/en/2019/third-parties"&gt;Third Parties&lt;/a&gt;&lt;br&gt;
Chapter 6. &lt;a href="https://almanac.httparchive.org/en/2019/fonts"&gt;Fonts&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Part II. User Experience
&lt;/h3&gt;

&lt;p&gt;Chapter 7. &lt;a href="https://almanac.httparchive.org/en/2019/performance"&gt;Performance&lt;/a&gt;&lt;br&gt;
Chapter 8. &lt;a href="https://almanac.httparchive.org/en/2019/security"&gt;Security&lt;/a&gt;&lt;br&gt;
Chapter 9. &lt;a href="https://almanac.httparchive.org/en/2019/accessibility"&gt;Accessibility&lt;/a&gt;&lt;br&gt;
Chapter 10. &lt;a href="https://almanac.httparchive.org/en/2019/seo"&gt;SEO&lt;/a&gt;&lt;br&gt;
Chapter 11. &lt;a href="https://almanac.httparchive.org/en/2019/pwa"&gt;PWA&lt;/a&gt;&lt;br&gt;
Chapter 12. &lt;a href="https://almanac.httparchive.org/en/2019/mobile-web"&gt;Mobile Web&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Part III. Content Publishing
&lt;/h3&gt;

&lt;p&gt;Chapter 13. &lt;a href="https://almanac.httparchive.org/en/2019/ecommerce"&gt;Ecommerce&lt;/a&gt;&lt;br&gt;
Chapter 14. &lt;a href="https://almanac.httparchive.org/en/2019/cms"&gt;CMS&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Part IV. Content Distribution
&lt;/h3&gt;

&lt;p&gt;Chapter 15. &lt;a href="https://almanac.httparchive.org/en/2019/compression"&gt;Compression&lt;/a&gt;&lt;br&gt;
Chapter 16. &lt;a href="https://almanac.httparchive.org/en/2019/caching"&gt;Caching&lt;/a&gt;&lt;br&gt;
Chapter 17. &lt;a href="https://almanac.httparchive.org/en/2019/cdn"&gt;CDN&lt;/a&gt;&lt;br&gt;
Chapter 18. &lt;a href="https://almanac.httparchive.org/en/2019/page-weight"&gt;Page Weight&lt;/a&gt;&lt;br&gt;
Chapter 19. &lt;a href="https://almanac.httparchive.org/en/2019/resource-hints"&gt;Resource Hints&lt;/a&gt;&lt;br&gt;
Chapter 20. &lt;a href="https://almanac.httparchive.org/en/2019/http2"&gt;HTTP/2&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Who made it
&lt;/h2&gt;

&lt;p&gt;The Web Almanac is a community effort. &lt;a href="https://almanac.httparchive.org/en/2019/contributors"&gt;85 people&lt;/a&gt; have contributed to the project to write, peer review, edit, and translate content, analyze and visualize the data, and build and design the website. This was an enormous effort that couldn't be done without everyone's help. You can check out each and every one of them on the &lt;a href="https://almanac.httparchive.org/en/2019/contributors"&gt;Contributors&lt;/a&gt; page to see where they helped.&lt;/p&gt;

&lt;p&gt;I assumed the role of ring leader to help guide everyone to today's launch :) (And I must say I think it went really well, considering the number of people to herd!)&lt;/p&gt;

&lt;h2&gt;
  
  
  How was it made
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://almanac.httparchive.org/en/2019/methodology"&gt;Methodology&lt;/a&gt; page goes into all the detail of how the process worked and where the results come from. There are so many amazing pieces of technology that went into this report, spanning data from over 5 million websites and consuming terabytes of queryable storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;I don't want us to stop here. The web is constantly changing with new technologies and evolving adoption. I would love to see this renewed each year with fresh perspectives from different members of the community offering their take on the state of the web.&lt;/p&gt;

&lt;p&gt;If you're interested in joining us, please fill out &lt;a href="https://forms.gle/Qyf3q5pKgdH1cBhq5"&gt;this form&lt;/a&gt; and subscribe to our &lt;a href="https://github.com/HTTPArchive/almanac.httparchive.org"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In order to make the web a better place, we need to observe, quantify, and analyze it over time to make sure we're heading in the right direction. Join us for next year's Web Almanac 2020 edition!&lt;/p&gt;

</description>
      <category>webalmanac</category>
      <category>webdev</category>
      <category>community</category>
    </item>
  </channel>
</rss>
