Vlyth_r

Posted on

Creating Statistical Graphics with Python - A Beginner-Friendly Guide

Welcome to a beginner-friendly tutorial on creating statistical graphics for your research using Python, Excel, Seaborn, and Pandas. Whether you're a scientist, researcher, or student, visualizing data is a crucial step in understanding and communicating your findings. In this tutorial, we'll walk you through the process in an easy and straightforward manner, even if you have no prior experience in programming.

We'll leverage the power of Python, a versatile programming language, along with the user-friendly features of Excel to manipulate and organize our data. The data manipulation will be done using Pandas, a powerful data analysis library, and Seaborn will help us visualize the data with beautiful and insightful plots.

Note:
If you are an absolute beginner, you might be wondering what "plots" are. In the context of this tutorial, a plot is a visual representation of data. We'll be creating simple and informative charts, like scatter plots, which help us see patterns and relationships in our data. Don't worry if these terms are new; we'll guide you through each step.

By the end of this tutorial, you should be able to create impactful statistical graphics. So here's one way you can do it:

Getting Started

Step 1: Set Up Your Environment

Before diving into creating statistical graphics, let's make sure you have the necessary tools set up.

1. Code Editor:
A code editor is where you'll write and run your Python code. If you don't have one installed, you can choose from various options such as Visual Studio Code, PyCharm, or Jupyter Notebooks.

• Install Visual Studio Code:
Visit Visual Studio Code and follow the installation instructions for your operating system.

• Install PyCharm:

• Install Jupyter Notebooks:
If you prefer a notebook-based environment, install Jupyter Notebooks using:

`````` pip install notebook
``````

2. Python:
Python is the programming language we'll use. If you haven't installed Python yet, follow these steps:

3. Seaborn and Pandas:
Seaborn and Pandas are Python libraries that will help us with data manipulation and visualization.

• Install Seaborn and Pandas:
Open your command prompt or terminal and run the following commands:

`````` pip install seaborn
pip install pandas
``````

Now that you've set up your environment, let's move on to working with actual data!

Step 2: Getting Started

Now that your environment is set up, let's create a sample spreadsheet that we can later import using Pandas. We'll make a simple Excel file with 4 top columns (A to D) and 4 lateral columns (1 to 4). For simplicity, we'll label them as "Column A," "Column B," and so on.

• Label the top columns A to D as "Column A," "Column B," "Column C," and "Column D."
• Label the lateral columns 1 to 4 as "1," "2," "3," and "4."

``````          A          B          C          D
1  Column A  Column B  Column C  Column D
2    1234      1234      1234      1234
3    1234      1234      1234      1234
4    1234      1234      1234      1234
5    1234      1234      1234      1234
``````
• Save this Excel file in a location where you can easily access it. You might want to create a new folder for your project, and within it, save the file as "sample_data.xlsx."

Note for Visual Studio Code Users:

• Save the Excel file in the same directory as your Python script. If you've created a new folder for your project, save it there.
1. Code to Import and Display Data: Now, let's write the Python code to import this data using Pandas and display all the data points.
``````   import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display all data points
print("Sample Data:")
print(df)

# Use Seaborn's fmri dataset for visualization
plt.title('Seaborn fmri Dataset Example')
plt.xlabel('Timepoint')
plt.ylabel('Signal')
plt.show()
``````

Ensure that you have the "sample_data.xlsx" file in the correct location, and you can run this code to import and display the data. In the next step, we'll delve into customizing and visualizing this data in more detail.

Step 3: Customizing Your Data Visualization

Now that you've successfully imported data, let's explore how to customize your spreadsheet and the corresponding Python code for a more tailored data visualization.

• Replace Sample Data:
Replace the numeric data in your "sample_data.xlsx" file with your own dataset. Simply overwrite the numbers while keeping the same structure.

You can add hues and styles to your data in Excel. For example, you might color cells or columns differently based on specific categories or values. Experiment with Excel's formatting options.

``````          A          B          C          D
1  Column A  Column B  Column C  Column D
2     Red       1234      1234      1234
3     Blue      1234      1234      1234
4    Green      1234      1234      1234
5    Yellow     1234      1234      1234
``````
• Save the Modified Spreadsheet: Save your modified Excel file, ensuring it's still named "sample_data.xlsx."

2. Code Modifications:

Update the Python code to reflect the changes you made in the spreadsheet. Below is an example of how you can modify the code:

``````   import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display all data points
print("Modified Data:")
print(df)

# Use Seaborn's fmri dataset for visualization
sns.lineplot(data=sns.load_dataset('fmri'), x='timepoint', y='signal', hue='region', style='event', markers=True, palette='Set1')
plt.title('Customized Seaborn fmri Dataset')
plt.xlabel('Timepoint')
plt.ylabel('Signal')
plt.legend(title='Legend')
plt.show()
``````
• Explanation:
• `hue`: Use this parameter to distinguish data points based on a category (e.g., color by 'region').
• `style`: Style data points based on another category (e.g., differentiate by 'event').
• `markers`: Display markers for each data point.
• `palette`: Choose a color palette (e.g., 'Set1').

Customize these parameters based on your spreadsheet structure and preferences. This flexibility allows you to visualize your data in a way that best communicates your findings. In the next section, we'll delve into more advanced customization options.

Step 4: Exploring Seaborn's Graphic Types and Advanced Possibilities

Now that you have a foundational understanding, let's explore how to switch between different graphic types in Seaborn and delve into more advanced customization options.

1. Switching Between Graphic Types:

Seaborn offers various plot types to suit different data visualization needs. Let's modify our code to switch between a few common plot types: scatter plot, bar plot, and box plot.

``````   import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display all data points
print("Modified Data:")
print(df)

# Scatter Plot
sns.scatterplot(data=df, x='Column A', y='Column B', hue='Column C', style='Column D', markers=True, palette='viridis')
plt.title('Scatter Plot')
plt.xlabel('Column A')
plt.ylabel('Column B')
plt.legend(title='Column C')
plt.show()

# Bar Plot
sns.barplot(data=df, x='Column A', y='Column B', hue='Column C', palette='muted')
plt.title('Bar Plot')
plt.xlabel('Column A')
plt.ylabel('Column B')
plt.legend(title='Column C')
plt.show()

# Box Plot
sns.boxplot(data=df, x='Column A', y='Column B', hue='Column C', palette='pastel')
plt.title('Box Plot')
plt.xlabel('Column A')
plt.ylabel('Column B')
plt.legend(title='Column C')
plt.show()
``````
• Explanation:
• Replace 'Column A', 'Column B', 'Column C', and 'Column D' with your actual column names.
• Experiment with different parameters to customize each plot.

• FacetGrid:
You can use `FacetGrid` to create a grid of subplots based on the values of one or more variables. This is especially useful when you have additional categorical variables to explore.

`````` g = sns.FacetGrid(df, col='Column C', hue='Column D', palette='Set1', height=4)
g.map(sns.scatterplot, 'Column A', 'Column B', markers=True)
plt.suptitle('FacetGrid Scatter Plot')
plt.show()
``````
• Pair Plot:
Visualize pairwise relationships between numerical variables in your dataset.

`````` sns.pairplot(df, hue='Column C', palette='Set2', markers=["o", "s", "D"])
plt.suptitle('Pair Plot')
plt.show()
``````

These advanced possibilities provide more insights into your data by visualizing relationships and distributions in different ways. Experiment with these examples and adapt them to your specific dataset and research questions. In the final section, we'll discuss how to save your visualizations for presentations or publications.

Now that you've created compelling visualizations, it's time to save them for presentations or publications. Seaborn makes it easy to export your plots in various formats.

1. Save Plots in Seaborn:

After creating a plot, you can use `plt.savefig()` to save it in different formats, such as PNG, PDF, SVG, or others.

``````   import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display all data points
print("Modified Data:")
print(df)

# Example: Scatter Plot
sns.scatterplot(data=df, x='Column A', y='Column B', hue='Column C', style='Column D', markers=True, palette='viridis')
plt.title('Scatter Plot')
plt.xlabel('Column A')
plt.ylabel('Column B')
plt.legend(title='Column C')

# Save the plot
plt.savefig('scatter_plot.png')
plt.show()
``````
• Replace 'Column A', 'Column B', 'Column C', and 'Column D' with your actual column names.
• Adjust the file name and format in `plt.savefig('scatter_plot.png')` as needed.

2. Full Code Block and Data Spreadsheet:

Here's the complete code block for the sample, assuming you have the 'sample_data.xlsx' spreadsheet with your modified data:

``````   import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display all data points
print("Modified Data:")
print(df)

# Scatter Plot
sns.scatterplot(data=df, x='Column A', y='Column B', hue='Column C', style='Column D', markers=True, palette='viridis')
plt.title('Scatter Plot')
plt.xlabel('Column A')
plt.ylabel('Column B')
plt.legend(title='Column C')

# Save the plot
plt.savefig('scatter_plot.png')
plt.show()
``````
• Ensure 'sample_data.xlsx' is in the same directory as your Python script.

That's it! By following these steps, you can create, customize, and save your visualizations with ease. Feel free to experiment with different Seaborn functions and parameters to discover new ways to showcase your data.