DEV Community

loading...
Cover image for Creating a dataset for your ML project.

Creating a dataset for your ML project.

rakshakannu profile image Raksha Kannusami ・2 min read

The problem.

I sat out to work on a machine learning project yesterday. I grabbed my cup of coffee and was so motivated to start this project.

yes

The first thing I wanted to search for, was a good dataset to train my model on. I was particular about the dataset because I knew in my mind exactly what variables the dataset should contain for this project. I searched on every platform where you can find free datasets, but couldn't find the one I exactly wanted!

Then I thought, the unavailability of a dataset shouldn't stop me from making this project.

The solution.

Then I decided to generate my own dataset which took me barely a few mins.

I created a table with all the necessary variables as headings for the columns and then used the RANDBETWEEN function under each variable to generate a random value.

Step 1.

Create a simple table with the necessary variables.
Alt Text

Step 2.

Use the RANDBETWEEN function to generate a random value in any range, say 1 to 100.
Alt Text

Step 3.

Drag the cell to how many ever rows to generate data.

Alt Text

Step 4.

Export the file into a CSV file.
This was a simple trick to create your own dataset for any kind of ML project that you want to make.

Now nothing can stop you from making that ML project! πŸŽ‰
keep learning! keep coding! πŸ’–

Let's be friends on Twitter, LinkedIn or Github! 😊

Discussion

pic
Editor guide
Collapse
tbass134 profile image
Tony Hung

If your features are random, I don’t think the model will be able to learn anything. I totally agree this is a great way to get started , but if/when you do get a realistic dataset, you will be basically starting from scratch

Collapse
rakshakannu profile image
Raksha Kannusami Author

You are right Tony! The main aim for me was to build an end to end ML project. My goal was not accuracy and prediction, but create a working model so that with those variables as inputs, I should get an output. If I wanted to focus more on the accuracy of prediction, It can't be done without a real data set.

Collapse
tbass134 profile image
Tony Hung

Sounds good!

Collapse
cobanov profile image
Mert Cobanov

from sklearn.datasets import make_regression