DEV Community

Neha Gupta
Neha Gupta

Posted on

Day 3 of Machine Learning

Hey readerđź‘‹hope you are doing well!!
In the last post we have learnt about types of variables. In this we will learn more about our dataset and will see some basic libraries of Python.
So let's get started🔥.

What is DataFrame?

A dataframe is a data structure constructed with rows and columns, similar to a database or Excel spreadsheet.
Image description
Generally we store our data in dataframe and do analysis on this dataframe.

How DataFrame is created?

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file.

Pandas Library

Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.

import pandas as pd
By this statement we are importing our pandas library.

df=pd.read_csv('/kaggle/input/customer-purchases-behaviour-dataset/customer_data.csv')
In this line of code we are loading our CSV file through pd.read_csv() and storing it in our dataframe named as df.

We can create our own dataframe from dictionary of n-D array or list.
Image description
Image description

Now we have knowledge of dataframe. Let's see some of the basic libraries that we are going to use.

Numpy

NumPy is a powerful library for numerical computing in Python, offering efficient operations on arrays and matrices, a vast collection of mathematical functions, and capabilities for advanced array manipulation. It forms the backbone of many other scientific and data analysis libraries in the Python ecosystem.

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is widely used in data science, machine learning, and scientific research to produce high-quality graphs and plots. Matplotlib provides a flexible and extensible framework for generating a wide range of plots and charts, from simple line plots to complex multidimensional visualizations.

Seaborn

Seaborn is a Python data visualization library built on top of Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn simplifies the process of creating complex visualizations and makes it easy to produce aesthetically pleasing charts with minimal code. It is particularly well-suited for working with data frames and arrays, which makes it a popular choice for data analysis and exploration.

We will see use of these libraries in the upcoming blogs.
I hope you have understood this blog. If you have any queries please do comment. I'll try my best to solve your queries.
Please leave some reaction and don't forget to follow me.đź’™

Top comments (0)