DEV Community

Cover image for Data splitting using sklearn
es404020
es404020

Posted on

Data splitting using sklearn

Data splitting is a technique of dividing you data into training and testing.This help you use the training data to teach or train the model ,while the test data from the name implies it would be use to test the accuracy level of the model.This technique of splitting is mostly used in supervised learning, where data has some kind of labels attached to it.

`
import numpy as np
from sklearn.model_selection import train_test_split

a = np.arange(1,100)

a_train,a_test =train_test_split(a)

`

Some other options include

  • test_size: It must range from 0 - 1, which shows the percentage of the data required for testing alone.

  • Shuffle: By default it is True but it can be made False.It is to prevent data shuffle

  • random_state: It make randomised data to remain fix and unchanging no matter the amount of time you slit the data

Top comments (0)