Importing and parsing
Import pandas as pd
% matplotlib
df = pd.read_csv('file_name.csv')
Creating variable names to represent the groups in which you divide your data. For example, if you divide your data into two groups:
df_group1 = df[df['variable_used_to_divide_data'] == '>number']
df_group2 = df[df['variable_used_to_divide_data'] == '<number']
Creating charts (in this example, a bar chart) to compare values of the same variable in each of the groups
Note: We have to index the value counts in order to make the comparison easier in visual terms
ind = df_group1['comparison_variable'].value_counts().index
df_group1['comparison_variable'].value_counts()[ind].plot(kind='bar');
df_group2['comparison_variable'].value_counts()[ind].plot(kind= 'bar');
Creating pie charts to see variables that dominate each group:
ind df_group1['another_comparison_variable'].value_counts().index
df_group1['another_comparison_variable'].value_counts()[ind].plot(kind= 'pie', figsize= (8, 8));
Creating histograms to plot distributions of each group:
df_group1['third_comparison_variable'].hist();
df_group2['third_comparison_variable'].hist();
Then, viewing summary statistics
df_group1['third_comparison_variable'].describe()
df_group2['third_comparison_variable'].describe()
Top comments (0)