Data splitting using sklearn

#machinelearning #python #aws #programming

Data splitting is a technique of dividing you data into training and testing.This help you use the training data to teach or train the model ,while the test data from the name implies it would be use to test the accuracy level of the model.This technique of splitting is mostly used in supervised learning, where data has some kind of labels attached to it.

`
import numpy as np
from sklearn.model_selection import train_test_split

a = np.arange(1,100)

a_train,a_test =train_test_split(a)

Some other options include

test_size: It must range from 0 - 1, which shows the percentage of the data required for testing alone.
Shuffle: By default it is True but it can be made False.It is to prevent data shuffle
random_state: It make randomised data to remain fix and unchanging no matter the amount of time you slit the data