DEV Community

Cover image for Google Search for Datasets
Tom Anderson
Tom Anderson

Posted on • Updated on

Google Search for Datasets

πŸ”§ There's a tool for everything nowadays

A colleague of mine showed something to me today that I had never come across before and I was impressed πŸ‘:

Google Dataset Search πŸ”

A search engine (powered by Google, who aren't too bad at that search thing) that returns results back as a semi-curated list of datasets πŸ“š available on the web, regardless of where they are hosted!

(It's been around for quite a while now too!)

πŸ€·β€β™‚οΈ So what?

One of the biggest problems with both learning and understanding topics like machine learning and big data analytics is getting access to large datasets.

Lots of sites (such as Kaggle) have made awesome inroads into making datasets more accessible but they can't possibly host everything.

And that's where properly indexing and search can help.

Google has a good history in making popular search engines. But it's the approach behind dataset search that I'm more interested in:

Standardisation πŸ“‹ - it's up to dataset owners to make their dataset indexable in a specific format, so it can be found more easily and more precisely.

Give me an example! 🧐

Okay, try searching for "programming":

https://datasetsearch.research.google.com/search?query=Programming

Search Results

What do we see?

  • πŸ—‚οΈ Three different datasets.
  • πŸ’‘ Three potential project ideas.
  • 🌍 Three different data sources.

It's that last one that works for me - I don't need to go through curated lists of data set sources or validate the security of a dataset found in a Reddit post. I can just search.

Let me know what you think!

  • Have you used dataset search?
  • Where do you get your datasets?
  • What do you use public datasets for?

🧑 Tom Anderson
www.thomas-anderson.net
Liked something I did and want to help me out?
Buy me a coffee

Top comments (1)

Collapse
 
jorgeguevarab profile image
Jorge Guevara

Great resource bro. Thanks