Your entry into the world of Data Science begins with Pandas. You can regard it as an open source tool for data analysis. It is fast, flexible and powerful library in the Python language. Pandas allow cleaning, manipulating and exploring data in datasets.
The first thing to do before starting to work with Pandas is downloading Anaconda into your system. After installing it, launch Jupyter Notebook. Now, you are ready to put Pandas into action.
As an essential prerequisite, knowing Python is essential.
So, let us start unfolding concepts behind Pandas. To do so, first create a new Jupyter Notebook file. Name it as "PandasBasics.ipynb".
Load Pandas
It is the first thing to do for working with the package. Import Pandas to load it in your system. "pd" is the alias for the library.
So, type in a cell, usually to top one:
import pandas as pd
When you run the cell, it will return nothing. Meaning, the library has successfully loaded. Jupyter Notebook will throw the errors, if there are any.
Data Structures in Pandas
It means the arrangement of data the Pandas library can handle. There are two data structures--Series and DataFrame. The former represents one dimensional data. And, the latter describes two-dimensional data.
Series
It is a one-dimensional data structure. A column can be a typical example of a Series.
Its simplest syntax is:
Series(data, index, dtype, name, copy)
data: The stored data.
index: Values for hashing the data. They are non-unique, and of same length as the data. It takes the default values (0,1,2,3,...n) if not included. If index is set to "None", the keys become the index.
dtype: The data type of the Series output.
name: The name of the Series.
copy= The copy of the input data. It is "False" by default.
If there is nothing for the index, the Series could be written as Series(data=None, index=None, dtype=None, name=None, copy=None).
Creating a Series
The syntax for building a Series is pandas.Series()
Top comments (0)