Decision Tree analysis was performed to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The following explanatory variables were included as possible contributors to a decision tree evaluating Survival of a person from the Titanic wreckage (output) includes the Passenger-class, Sex, Age, & Fare.
Python Code
This is my copy of colab notebook so evryone can see the output along with the code
import pandas as pd
df = pd.read_csv('titanic.csv')
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1
Cumings, Mrs. John
Bradley (Florence Briggs
Th...
female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2.
3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques
Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
df.head(5)
X=df[['Pclass','Sex','Age','Fare','Survived']]
from sklearn.preprocessing import LabelEncoder
X.Sex = le.fit_transform(X.Sex)
Pclass Sex Age Fare Survived
0 3 1 22.0 7.2500 0
1 1 0 38.0 71.2833 1
2 3 0 26.0 7.9250 1
3 1 0 35.0 53.1000 1
4 3 1 35.0 8.0500 0
... ... ... ... ... ...
886 2 1 27.0 13.0000 0
887 1 0 19.0 30.0000 1
888 3 0 NaN 23.4500 0
889 1 1 26.0 30.0000 1
890 3 1 32.0 7.7500 0
891 rows × 5 columns
le = LabelEncoder()
X.Sex = le.fit_transform(X.Sex)
X
X = X.dropna()
Y = X['Survived']
X = X.drop('Survived',axis='columns')
Y
0 0
1 1
2 1
3 1
4 0
..
885 0
886 0
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,_Y_test = train_test_split(X,Y,test_size=0.2)
from sklearn import tree
# from sklearn.linear_model import LogisticRegression
▾ DecisionTreeClassifier
DecisionTreeClassifier()
model = tree.DecisionTreeClassifier()
# model = LogisticRegression()
model.fit(X_train,Y_train)
predictions = model.predict(X_test)
predictions
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1,
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1])
Output & Accuracy
model.score(X_test,_Y_test)
0.7482517482517482
model.predict([[1,0,39,71.2833]])
[1]
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(_Y_test,predictions)
import seaborn as sn
sn.heatmap(cm,annot=True)
As mentioned in the code from the decision tree classifier, We get an overall accuracy of about 75% (as shown in code)
Top comments (0)