DEV Community

Cover image for Google Search for Datasets
Tom Anderson
Tom Anderson

Posted on • Updated on

Google Search for Datasets

๐Ÿ”ง There's a tool for everything nowadays

A colleague of mine showed something to me today that I had never come across before and I was impressed ๐Ÿ‘:

Google Dataset Search ๐Ÿ”

A search engine (powered by Google, who aren't too bad at that search thing) that returns results back as a semi-curated list of datasets ๐Ÿ“š available on the web, regardless of where they are hosted!

(It's been around for quite a while now too!)

๐Ÿคทโ€โ™‚๏ธ So what?

One of the biggest problems with both learning and understanding topics like machine learning and big data analytics is getting access to large datasets.

Lots of sites (such as Kaggle) have made awesome inroads into making datasets more accessible but they can't possibly host everything.

And that's where properly indexing and search can help.

Google has a good history in making popular search engines. But it's the approach behind dataset search that I'm more interested in:

Standardisation ๐Ÿ“‹ - it's up to dataset owners to make their dataset indexable in a specific format, so it can be found more easily and more precisely.

Give me an example! ๐Ÿง

Okay, try searching for "programming":

Search Results

What do we see?

  • ๐Ÿ—‚๏ธ Three different datasets.
  • ๐Ÿ’ก Three potential project ideas.
  • ๐ŸŒ Three different data sources.

It's that last one that works for me - I don't need to go through curated lists of data set sources or validate the security of a dataset found in a Reddit post. I can just search.

Let me know what you think!

  • Have you used dataset search?
  • Where do you get your datasets?
  • What do you use public datasets for?

๐Ÿงก Tom Anderson
Liked something I did and want to help me out?
Buy me a coffee

Top comments (1)

jorgeguevarab profile image
Jorge Guevara

Great resource bro. Thanks