DEV Community

Cover image for Pandas and  Creating Dataframe
Aman Gupta
Aman Gupta

Posted on

Pandas and Creating Dataframe

Here today we are talking about pandas, what are data frame and how to create them. So first see about pandas.


Pandas is an open-source Python library providing high-performance data manipulation and analysis tools using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

Pandas have so many uses that it might make sense to list the things it can't do instead of what it can do.

This tool is essentially your data’s home. Through pandas, you get acquainted with your data by cleaning, transforming, and analyzing it.

we import as follows:

>>> import pandas as pd
Enter fullscreen mode Exit fullscreen mode

Python has three main Data Structure :
1 . Series :_ Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

>>> s = pd.Series(data, index=index)
Enter fullscreen mode Exit fullscreen mode

Here, data can be many different things:

  • a Python dict
  • an ndarray
  • a scalar value (like 5)

2 . Data Frame : DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • A Series
  • Another DataFrame
>>> d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
>>> df = pd.DataFrame(d)
>>> df
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

Enter fullscreen mode Exit fullscreen mode

3 . Panel : Panel is a somewhat less-used, but still important container for 3-dimensional data. The term panel data is derived from econometrics and is partially responsible for the name pandas: pan(el)-da(ta)-s. The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data. However, for the strict purposes of slicing and dicing a collection of DataFrame objects, you may find the axis names slightly arbitrary:

  • items: axis 0, each item corresponds to a DataFrame contained inside
  • major_axis: axis 1, it is the index (rows) of each of the DataFrames
  • minor_axis: axis 2, it is the columns of each of the DataFrames
>>> wp = pd.Panel(data)
Enter fullscreen mode Exit fullscreen mode

The most common and used data structure in pandas is DataFrame. Now we see different ways to make dataframe using pandas.

The first one is, creating Dataframe by using list of list:


import pandas as pd    
data = [['Ram', 10], ['Aman', 15], ['Rishi', 14]]   
df = pd.DataFrame(data, columns = ['Name', 'Age'])   
Enter fullscreen mode Exit fullscreen mode


Next methode is to create Dataframe by using python dict or ndarray.


import pandas as pd  
data = {'Name':['Ram', 'jhon', 'krish', 'jack'], 
        'Age':[20, 21, 19, 18]}  
df = pd.DataFrame(data) 
Enter fullscreen mode Exit fullscreen mode

Alt Text

Next is by importing data from csv files. For this we use pd.read_csv() function.

import pandas as pd
df = pd.read_csv('data.csv')  
Enter fullscreen mode Exit fullscreen mode

The next way is by connecting DataBase. We can create a DataFrame by using DataBase also. We take an example code which connects SQLite database and creates dataframe.

For this, first create an Connection Object, and then use pd.read_sql_query() for creating dataframe.

import pandas as pd
import sqlite3
conn = sqlite3.connect("database.db")#put name of database
df = pd.read_sql_query(query)
Enter fullscreen mode Exit fullscreen mode

There are some methods from which we can create data frame in pandas but there are several more ways to create data frames. Pandas IO tools support multiple types of file format for reading and writing data such as CSV, JSON, HTML, SAS, and Many more. For reading more about Pandas IO Tools go here or open this link:

Thanks for reading

Top comments (0)