Table of contents:
- Learning python
- Pandas datatypes
- Importing data
- Exporting data
- Describing data
- Viewing and selecting data
- Boolean operators
- Final thoughts
Learning python is very easy, and if you have any experience with a programming language will certainly learn python easily. I'll not cover it by myself just because you can find all you need here. or, if you want to overkill, learn python the hard way
What we will use here are especially lists and matrixes, but not at a difficult level. This will be just an overview of the various ways to display data in pandas, so don't be afraid, we are all newbies here.
There are two main data types in pandas, the first is a Series, the pandas name for a list.
But always writing data is tedious and not efficient, we'll probably already have all the data, sample or not, and what we'll have to is importing it.
Once we have worked with our data we may want to export them, and to do it is very simple.
But we have a problem, we have an extra column that displays the index of rows as it would be a series of the DataFrame.
To remedy this we can modify the exporting function by adding a parameter that says
index = False
Before describing data we have to know a little detail, the difference between a function and an attribute.
A function is a piece of code that may or may not require parameters and that can change the data, it has () at the end.
An attribute is similar to a function but is used just for visualization and has no brackets, even if the underline operations are the same as a normal function.
Using this attribute we can notice two things:
First that there is an error in the sample and that the name of the columns that are between quotation marks.
Second, now we know the types of data we are using.
Note: now I had to manually adjust all the data between quotation marks and it was simply because this data set was just 10 rows, but in a dataset, with thousands of data this kind of error may be crucial.
This function will show us more or less information about the DataFrame, but for more accurate options you can see the doc
Pandas offer a lot of useful functions to display data and select them, the most useful are head and tails.
Calling the head function on our DataFrame will show us the first 5 elements. It accepts even a number so that we can view the first n element of what we are working on
It may be useful to have a quick look at big DataFrame with thousands of rows so that just viewing the first 3, 5 or 7 we can have an idea of what we are going to work on.
Both loc and iloc have precise properties, similar to when in python one prints a string followed by , it accepts a maximum of three parameters that are [start: stop: stepover].
To see specific columns we can type two commands:
the brackets notation
or the dot notation
both have the same behavior, it's just preference, but they are important because we can display certain rows using them and the booleans operators.
This will work with any boolean operator and will let us search for a row, or a group of rows, with a specific feature.
In the following week, I'll write the second part on python and pandas for then begin seeing numpy.
See you till the next time