DEV Community

Cover image for I created the Web Almanac. Ask me anything about the state of the web!
Rick Viscomi for HTTP Archive

Posted on • Edited on

I created the Web Almanac. Ask me anything about the state of the web!

Hi everyone! I'm doing my first AMA to get people thinking about the state of the web.

My day job for the past ~2 years has been web developer relations at Google. Prior to that I worked on web performance for ~4 years at YouTube. In my current role I'm a steward of web transparency datasets like the HTTP Archive and Chrome UX Report projects. Web transparency is all about cultivating a public body of knowledge about how the web is built and experienced. I also host a video series called the State of the Web where I interview members of the community about web trends and technologies. My job takes me all around the world to meet with developers at conferences, share transparency data, and hear their stories about building on the web.

The big project I've been working on this year is the Web Almanac, the first annual edition of HTTP Archive's report on the state of the web, which launched at Chrome Dev Summit last week. I led the project and coordinated with 80+ community contributors to build everything from scratch (planning/writing content, researching stats, developing the website, etc). The end result is a massive resource that sheds a light on how the web is doing at the scale of millions of websites.

Here are some of the interesting insights from each chapter:

If you find any of this interesting, I'd love to hear your questions about web transparency, the state of the web, the Almanac project, or anything. AMA!

Top comments (14)

Collapse
 
larsschwarz profile image
Lars Schwarz

Are there any plans to add a cookie section? This would be interesting in terms of the upcoming IETF same site proposal, also this section could cover the (nonsensical) cookie banners or stats how many (and what type of) cookies the average page uses. I would be fun to know how many sites do require cookie consent, but do set cookies even before that consent is granted :)

Also are there plans to add more instances especially in terms of geographically distributed in regards of all performance sections?

Out of interest: How many parallel instances do you run now and how long does it take the complete scan process to finish for all pages in the index?

Collapse
 
rick_viscomi profile image
Rick Viscomi

Great questions!

We're still very early in our 2020 planning but a Cookies chapter would make a lot of sense. You're right that there's a lot of interesting data to explore in that space. Detecting cookie banners is an interesting challenge as that depends on an understanding of what the page is trying to convey visually, as opposed to measuring quantitative things.

If you're referring to geographic distribution in the Performance chapter, that data is sourced entirely from the Chrome UX Report, which is collected from real-world user experiences. Otherwise instance location won't have much of an effect on the results. Although it is true that some websites redirect to localized versions depending on ip-geo, but it's a tradeoff.

We're running ~hundreds of WebPageTest instances and scale that up depending on the number of input websites so that we can complete in under 30 days, as the tests are monthly.

Thanks for your questions!

Collapse
 
larsschwarz profile image
Lars Schwarz

Thanks for your answers Rick and yes, I was referring to geographically distributed instances of the scan processes, not only because of different language/country specific redirects, but mostly due to measuring load performance.

Thread Thread
 
rick_viscomi profile image
Rick Viscomi

For more context, the HTTP Archive dataset is useful for understanding how websites are built. In that regard, where the client machine is located won't have much of an effect, notwithstanding the redirect exception we discussed. The Chrome UX Report dataset is useful for understanding how websites are experienced. So for experiential metrics like loading performance, the geographic location of the HTTP Archive's test agents won't be relevant because they're accounted for in the other dataset.

Thread Thread
 
larsschwarz profile image
Lars Schwarz

Hmmm, not sure I fully understand. Afaik you also use instances running on Google Cloud, requests from these will surely see a way more speedy response for pages (and/or resources/assets) also hosted on the same Google network/cloud so therefore the instance location compared to sites running on complete different networks/continents? Surely this only would be milliseconds for the TTFB stats, but taking external/CDN request and other assets into account these differences might be relevant (even though certainly not the main subject/purpose of the Almanac though).

Thread Thread
 
rick_viscomi profile image
Rick Viscomi

It's the difference between lab and field data. All loading performance metrics (TTFB, FCP, etc) discussed in the Almanac come from the Chrome UX Report in the field. So the physical location of the HTTP Archive lab instances is irrelevant.

Thread Thread
 
larsschwarz profile image
Lars Schwarz

Ah, gotcha! Thanks for the clarification

Collapse
 
easternwawoman profile image
easternwawoman

I am voraciously consuming anything and everything re third-party and first cookies, so please, anything that you can write would I believe be of interest to many, many,many (did I say many?) website developers/owners/marketers. Thanks. Let them know of your articles and yes, they will come.

Collapse
 
amberjones profile image
AmberJ

I was told that jQuery stat a few days ago but they didn't have a reference to back it up! I thought it was a total garbage number! Now it makes so much sense.. It makes me ponder on web "death" and how much I appreciate volatile memory.. Thanks for the fun trivia on the state of the web! 🤗

Collapse
 
piyukore06 profile image
Priyanka Kore

This is just amazing piece of information.. Thank you for your work on this!!

Collapse
 
leob profile image
leob

Those are some incredibly interesting facts, for instance good ol' jQuery still being used on a whopping 85% of web pages ...

Collapse
 
schnubb profile image
Schnubb

Do you have any information about how many users / visitors have javascript disabled?
Just curious about it 😅

Collapse
 
rick_viscomi profile image
Rick Viscomi

We don't. The Chrome UX Report dataset is currently limited to performance metrics like paint/load/interactivity times.

Collapse
 
larsschwarz profile image
Lars Schwarz

Spotted a few issues affecting the data in the Fonts section, collected them here: discuss.httparchive.org/t/chapter-...