DEV Community

myxzlpltk
myxzlpltk

Posted on

5 4

A Synthetic Data for Predict Probability Senior Student Go To College

I'm back dev. Today, I want to share you about a synthetic data that I was created a few day ago. I already upload it to kaggle which you can access here https://www.kaggle.com/datasets/saddamazyazy/go-to-college-dataset

The data was created using make_classification from sklearn package. But I did add a little touch of clustering to make categorical feature. So, basically this data has 2 label from 1000 rows with 11 columns. Here is the code!

X, y = make_classification(
    n_samples=1000,
    n_features=10,
    n_informative=8,
    random_state=42,
)
Enter fullscreen mode Exit fullscreen mode

After that, I must look up correlation matrix to see how every variable correlate each other in a matrix.
Correlation Matrix

Some variables have positive or negative correlation, but some have none with value close to zero. With 10 variable I have to design a feature that match exactly based on research paper. To see whats correlate and whats not.

Based on those correlation, I can cluster some features with its label. This cluster usually in 2d. Due to underfitting, some cluster will not close with its true label. This is something that will give variation to data.

df['school_accreditation'] = KMeans(2, random_state=42).fit_predict(df[['school_accreditation', 'label']])
df['school_accreditation'] = df['school_accreditation'].replace({0: 'B', 1: 'A'})
Enter fullscreen mode Exit fullscreen mode

I personally use K-Means to make cluster this number.

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay