Hi everyone! I'm doing my first AMA to get people thinking about the state of the web.
My day job for the past ~2 years has been web developer relations at Google. Prior to that I worked on web performance for ~4 years at YouTube. In my current role I'm a steward of web transparency datasets like the HTTP Archive and Chrome UX Report projects. Web transparency is all about cultivating a public body of knowledge about how the web is built and experienced. I also host a video series called the State of the Web where I interview members of the community about web trends and technologies. My job takes me all around the world to meet with developers at conferences, share transparency data, and hear their stories about building on the web.
The big project I've been working on this year is the Web Almanac, the first annual edition of HTTP Archive's report on the state of the web, which launched at Chrome Dev Summit last week. I led the project and coordinated with 80+ community contributors to build everything from scratch (planning/writing content, researching stats, developing the website, etc). The end result is a massive resource that sheds a light on how the web is doing at the scale of millions of websites.
Here are some of the interesting insights from each chapter:
- jQuery is found on 85% of web pages
- the largest known z-index is 780 digits (!important)
- there are only 11 types of HTML elements that are found on 90+% of web pages
- the median web page is two-thirds images
- 94% of web pages contain at least one third party
- Google Fonts makes up 75% of all pages' web fonts
- 13% of websites deliver consistently fast performance
- Content Security Policy is used on 5% of web pages
- 78% of mobile pages have color contrast issues
- the median desktop web page contains 346 words
- 0.4% of pages register a service worker
- one third of mobile web pages disable zooming
- 10% of pages use an ecommerce platform
- 40% of pages use a CMS
- 56% of HTML resources are uncompressed
- 72% of HTTP responses include a cache control header
- 20% of web pages use a CDN for their HTML
- the median desktop page weighs 1,934 KB
- 29% of web pages are using
dns-prefetch
- 54% of HTTP responses are served over HTTP/2
If you find any of this interesting, I'd love to hear your questions about web transparency, the state of the web, the Almanac project, or anything. AMA!
Top comments (14)
Are there any plans to add a cookie section? This would be interesting in terms of the upcoming IETF same site proposal, also this section could cover the (nonsensical) cookie banners or stats how many (and what type of) cookies the average page uses. I would be fun to know how many sites do require cookie consent, but do set cookies even before that consent is granted :)
Also are there plans to add more instances especially in terms of geographically distributed in regards of all performance sections?
Out of interest: How many parallel instances do you run now and how long does it take the complete scan process to finish for all pages in the index?
Great questions!
We're still very early in our 2020 planning but a Cookies chapter would make a lot of sense. You're right that there's a lot of interesting data to explore in that space. Detecting cookie banners is an interesting challenge as that depends on an understanding of what the page is trying to convey visually, as opposed to measuring quantitative things.
If you're referring to geographic distribution in the Performance chapter, that data is sourced entirely from the Chrome UX Report, which is collected from real-world user experiences. Otherwise instance location won't have much of an effect on the results. Although it is true that some websites redirect to localized versions depending on ip-geo, but it's a tradeoff.
We're running ~hundreds of WebPageTest instances and scale that up depending on the number of input websites so that we can complete in under 30 days, as the tests are monthly.
Thanks for your questions!
Thanks for your answers Rick and yes, I was referring to geographically distributed instances of the scan processes, not only because of different language/country specific redirects, but mostly due to measuring load performance.
For more context, the HTTP Archive dataset is useful for understanding how websites are built. In that regard, where the client machine is located won't have much of an effect, notwithstanding the redirect exception we discussed. The Chrome UX Report dataset is useful for understanding how websites are experienced. So for experiential metrics like loading performance, the geographic location of the HTTP Archive's test agents won't be relevant because they're accounted for in the other dataset.
Hmmm, not sure I fully understand. Afaik you also use instances running on Google Cloud, requests from these will surely see a way more speedy response for pages (and/or resources/assets) also hosted on the same Google network/cloud so therefore the instance location compared to sites running on complete different networks/continents? Surely this only would be milliseconds for the TTFB stats, but taking external/CDN request and other assets into account these differences might be relevant (even though certainly not the main subject/purpose of the Almanac though).
It's the difference between lab and field data. All loading performance metrics (TTFB, FCP, etc) discussed in the Almanac come from the Chrome UX Report in the field. So the physical location of the HTTP Archive lab instances is irrelevant.
Ah, gotcha! Thanks for the clarification
I am voraciously consuming anything and everything re third-party and first cookies, so please, anything that you can write would I believe be of interest to many, many,many (did I say many?) website developers/owners/marketers. Thanks. Let them know of your articles and yes, they will come.
I was told that jQuery stat a few days ago but they didn't have a reference to back it up! I thought it was a total garbage number! Now it makes so much sense.. It makes me ponder on web "death" and how much I appreciate volatile memory.. Thanks for the fun trivia on the state of the web! 🤗
This is just amazing piece of information.. Thank you for your work on this!!
Those are some incredibly interesting facts, for instance good ol' jQuery still being used on a whopping 85% of web pages ...
Do you have any information about how many users / visitors have javascript disabled?
Just curious about it 😅
We don't. The Chrome UX Report dataset is currently limited to performance metrics like paint/load/interactivity times.
Spotted a few issues affecting the data in the Fonts section, collected them here: discuss.httparchive.org/t/chapter-...