## DEV Community

MdMusfikurRahmanSifar

Posted on

# Seaborn: Python Visualization

Seaborn is a python visualization library built on matplotlib. Its a great tool for statistical visualization. Let's dive....

First we need to pip install seaborn then-

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

For visualization we need data. Here we can use any stock data, csv file etc. Even input manually. More data is better for understanding the plots. Here we can use already made dataset saved in seaborn. Run sns.get_dataset_names() to get all the saved dataset names.

To get the table we gonna use .load_dataset()

Now in seaborn there are mainly three types of plots-

1. Distribution Plots
2. Categorical Plots
3. Matrix Plots

To be really informal-

• Distribution plot: There's a data against a parameter.
• Categorical plot: There's multiple sub-parameter or categories for a parameter and there's data for each one.
• Matrix plot: There's a data against two parameter

For example-

We need to keep this idea in mind while choosing our plot according to the data.
Secondly, we should use shift+tab
Now let's start-

# Distribution Plot:

Let's discuss some plots that allow us to visualize the distribution of a data set. These plots are:

• distplot
• jointplot
• pairplot
• rugplot
• kdeplot

## Distplot:

Again there's lots of option to customize. Just see the docstrings with shift+tab and play with it.

## Jointplot:

Here we join two different distribution plot so we need to define x and y axis. Also we can select the plot kind.

Here's some kind options-scatter,reg,resid,kde,hex

## Pairplot:

Pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns).

Again docstring...add hue(color separation according to the categorical column) and palette(a color combo)

## Rugplot:

Rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot.

## Kdeplot:

kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value.

# Categorical Plots:

Now let's discuss using seaborn to plot categorical data! There are a few main plot types for this:

• factorplot
• boxplot
• barplot
• countplot
• violinplot

## Factorplot:

Factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type:

## Barplot and Countplot:

These very similar plots allow you to get aggregate data off a categorical feature in your data. barplot is a general plot that allows you to aggregate the categorical data based off some function, by default the mean:

Countplot is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:

## Boxplot and violinplot:

### Boxplot:

Boxplots and violinplots are used to shown the distribution of categorical data. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

### Violin plot:

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.

## Matrix Plot:

Matrix plots allow you to plot data as color-encoded matrices.
Let's begin by exploring seaborn's heatmap-
In order for a heatmap to work properly, your data should already be in a matrix form, the sns.heatmap function basically just colors it in for you. Now for getting a matrix presentation we can use .pivot_table().To get a good plot let's input 'flights' dataset.

There are more customizing options and plotting dimensions. But this was the basic idea.

## Notes:

• Know what data goes with your plot and select those datas for the desired plot
• Know the types of the plots
• Syntax is the main thing
• shift+tab is your best friend....get to know him and play with him.