DEV Community

loading...
Cover image for Csv to chart with Pandas

Csv to chart with Pandas

petercour
・2 min read

Any computer uses data all the time. Sometimes thats in databases, sometimes on the web, sometimes from sensory input and sometimes office data like excel or csv.

So you probably know you can easily parse a csv file with Pandas. But did you know you can quite easily create plots directly from the csv data?

Data set

A csv data set is simply data. It could come from an office suite like GSheets or Open Office. You can save a file a csv, comma separated value. As the name defines, every value is separated by a comma.

Any data set will work, but the example below uses this csv dataset.

This data set is about movies.
For every movie it saves these values:

  • Rank
  • Title
  • Genre
  • Description
  • Director
  • Actors
  • Year
  • Runtime (Minutes)
  • Rating
  • Votes
  • Revenue (Millions)
  • Metascore

So that's a lot of information. It's a small data set of 1000 records.

Pandas

We first load the pandas module, matplotlib for plotting and numpy for number crunching. Then uses matplotlib to plot the data. Load the csv data and create the figure.

#!/usr/bin/python3
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

movie = pd.read_csv("IMDB-Movie-Data.csv")

movie["Rating"].mean()
movie["Rating"].plot(kind="hist", figsize=(20, 8))

plt.figure(figsize=(20, 8), dpi=80)
plt.hist(movie["Rating"], 20)
plt.xticks(np.linspace(movie["Rating"].min(), movie["Rating"].max(), 21))
plt.grid(linestyle="--", alpha=0.5)
plt.show()

So that shows you the movie rating data. Mind you, there are a lot of records in the csv file and pandas does is instantly.

Related links:

Discussion (0)