DEV Community

Cover image for Creative ways for you to source data without Surveys or Interviews while learning data science
Muhammed-Sodeeq Alawiye
Muhammed-Sodeeq Alawiye

Posted on

Creative ways for you to source data without Surveys or Interviews while learning data science

As a newbie in the process of learning data science skills, one of your major challenges is always finding data to practice with, At that moment you are passionate about creating a sample model and you are very optimistic about its effectiveness in solving problems. You know if you are done with this prediction model, it is certainly going to be a banger, It is going to be a great boost to your portfolio, but you get hooked because there is nowhere to get your data from, and as a fresh head you have tried to send out a survey links your friends, posted on socials, but no positive feedback, maybe they are not even looking at your link, we know everyone is forming busy these days anywhere. But still, that idea has refused to leave your head and your hands are itching to impress, you have to get that data! This is where you will have to consider other methods to get your data source and as you will know there could be several ways but in this article we are about to show you some easy ways to source data alternatively.

Information from Datasets

If you would like to start on a very easy note, using datasets already made by some companies or researchers in the tech space is one way to head. There are certain companies or institutions whose big status has helped them to pull data across major data sensors and have processed these ioT components to datasets that could be used by you. One of the many companies or institutions and the datasets they can provide you with are;

  • Google Dataset Search - Google provides various datasets, which are free to search online( although with some few-based searches), While it might not be a major tool, Google has over the years been a reliable source for information so when you try an attempt to use Google for retrieving ideas for datasets you won't come back with empty hands. Kaggle - Like, Google They also provide you with miscellaneous datasets, and getting data from them is free(just requires registration), Kaggle launched in 2010 and has since then evolved into a reliable open data platform, They are proficient in providing educational materials for learning artificial intelligence, and they have also been reliable in giving cooperation to data scientists, with the lots of datasets they have covering beyond many topics you can imagine. Data Gov- Another impressive company that offers datasets is dataGov which provides external, secondary data, compiled by the government of the United States, They make datasets available for free, with a chunk of datasets estimated to be about 200,000, they have datasets that cover almost everything you can imagine, from the weather forecast to crime, and it is very user friendly it gives the chance for you to navigate down to the geographical area and even organization type, and search results are streamlined to state, county, and city for a better view of it. UCI Machine Learning repository - Compiled by the University of California, and providing machine learning datasets, the UCIMLR is free to use, they have sampled datasets on the attitude of urban traffic in Sao Paulo, the UCI repository is highly regarded by students, educators, and researchers as the primary resource for machine learning data. The datasets are meticulously organized based on tasks (such as classification, regression, or clustering), attributes (categorical or numerical), data type, and field of specialization. This meticulous categorization simplifies the process of finding suitable data for any machine learning project you're engaged in. Earth Data - The datasets are compiled by NASA and provide Earth science data, having sampled environmental conditions during the fall moose hunting season in Alaska, and if you are interested in a space project, their planetary data systems provided by them could be useful, since 1994 they have kept to providing weather and climate data, to atmospheric observation, sea temperatures, and vegetative mappings. Global Health Observatory Data Repository- UN World health organization, provides datasets on heath pattern that have been pulled across the world, the data sets provided are free, some of which include the polio immunization success across certain nations of the globe. If your idea is based on health systems, this is one the best direction to look for datasets, the portal is even equipped with features that lets you preview data tables before you download them for use. Beyond the ones listed above, you have other institutions that provide free datasets, like CERN Open Data Portal, Datahub.io, BFI Film Industry Statistics, NYC Taxi Trip Data, FBI Crime Data explorer.

Making use of data from existing platforms

Image description

One other easy way to look is collection if data from existing platforms, as it can be discouraging or rather tedious to send out surveys or conduct interviews, many if your friends or college mates won't have the patience to click that your google form link and help you find out the questionnaire you sent to experiment, that's sort of an heck! But have you ever thought if using data from some existing platforms online, maybe on social media where people share results of say their Apple Music or Spotify most played artiste or songs. Well you can leverage on such platforms, when you send out links for people to view their Spotify or Apple Music most-listened-to song or artiste it sounds fun and easy compared to making ticks on a google form. For instance, the last.fm Top 10 songs link that went viral during the week on Twitter is one you can capitalize on, if you make a Tweet for people to use the link putting up a screenshot of your own list, they will hop on your tweet with multitude of replies and quote tweet of their own list. You are already building data that way, which you can then compile to build a predictive model that could be used to determine what genre, artiste, country, record label, or any other class, is of deep interest to listeners.

Businesses of your friends and family.

One other way that most new data scientists don't look at is the small data from little businesses owned by family members or friends. You could have a sibling or friend who bakes or say make female outfits. That's an edge for you, you can create a list of the types of cakes or pastries they make and create a Google form or just ask them which of their products get more sales, the most sold flavors, the units they sell to particular gender easily, just any other questions that pop out of your head, or your friend that makes female dresses, how many units of native dresses, what trend of styles, do the client like to give a free will to the designer to
be creative or they tell them how they want their dress to look, what demands are made as of the fashion trends, how many of the clients love this and that, you can make data from them. And that is way cheaper, and gives you a room to be more flexible and productive.

In conclusion, as you venture into the world of data science, the quest for quality data to fuel your learning journey can be a significant hurdle. While traditional methods like surveys and interviews might prove challenging, there are innovative alternatives at your disposal. Leveraging pre-existing datasets from established sources like Google, Kaggle, Data Gov, UCI Machine Learning repository, Earth Data, and the Global Health Observatory Data Repository opens up a treasure trove of information. Additionally, tapping into data generated on existing platforms such as social media, particularly music listening trends, can provide valuable insights. Don't overlook the potential of small businesses owned by friends and family as data sources. By creatively adapting your approach, you can accumulate the data you need to hone your skills and craft impactful predictive models.

Top comments (0)