Shubham Tiwari

Posted on Dec 27, 2021 • Edited on Jan 7, 2022

Python Data science Libraries for beginners

#python #productivity #programming #datascience

__Hello Guys today i am going to show you some libraries Used for data science in python.I am going to discuss only 5 of them which are commonly used at beginners level.

Lets get started....

Introduction
Python has rapidly become the go-to language in the data science space and is among the first things recruiters search for in a data scientist’s skill set, there’s no doubt about it. It has consistently ranked top in global data science surveys and its widespread popularity only keeps on increasing!

I am categorising these based on the work they do

MATHS -

1. Numpy -

NumPy is one of the most essential Python Libraries for scientific computing and it is used heavily for the applications of Machine Learning and Deep Learning.
NumPy stands for NUMerical PYthon.
NumPy provides support for large multidimensional array objects and various tools to work with them.
NumPy contains a large number of various mathematical operations. NumPy provides standard trigonometric functions, functions for arithmetic operations, handling complex numbers, etc.

Installation -

pip install numpy

Example -

import numpy as np 
a = np.array([0,30,45,60,90]) 

sin = np.sin(a)
print("Numpy Array values are: ",a)
print("Calculating the sin values using np.sin() function :",sin)

OUTPUT -

Numpy Array values are:  [ 0 30 45 60 90]
Calculating the sin values using np.sin() function : [ 0.  -0.98803162  0.85090352 -0.30481062  0.89399666]

Documentation - https://numpy.org/

2. Scipy -

SciPy (Scientific Python) is the go-to library when it comes to scientific computing used heavily in the fields of mathematics, science, and engineering. It is equivalent to using Matlab which is a paid tool.
SciPy as the Documentation says is – “provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.”
It is built upon the NumPy library.

Installation -

pip install scipy

Example -

from scipy import constants

#print the value of pi
print(constants.pi) 

#Prints the value in bytes as how many bytes are there in 1kilobyte(kibi)
# and 1 megabyte(mebi)
print(constants.kibi)
print(2 * constants.kibi) #value of 2 bytes
print(constants.mebi)

#prints the value of seconds in 1 minutes
print(constants.minute)      #60.0

OUTPUT -

3.141592653589793
1024
2048
1048576
60.0

Documentation - https://scipy.github.io/devdocs/getting_started.html

For Beginners - https://www.w3schools.com/python/scipy/index.php

Data Exploration and Visualization

3. Pandas -

From Data Exploration to visualization to analysis – Pandas is the almighty library you must master!
Pandas is an open-source package. It helps you to perform data analysis and data manipulation in Python language. Additionally, it provides us with fast and flexible data structures that make it easy to work with Relational and structured data.

Installation -

pip install pandas

Example -

In this example we will create a DataFrame
A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet or a SQL table .

import pandas as pd

df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

print(df)

OUTPUT -

                       Name  Age     Sex
0   Braund, Mr. Owen Harris   22    male
1  Allen, Mr. William Henry   35    male
2  Bonnell, Miss. Elizabeth   58  female

Documentation - https://pandas.pydata.org/docs/getting_started/install.html

4. Matplotlib -

Matplotlib is the most popular library for exploration and data visualization in the Python ecosystem. Every other library is built upon this library.
Matplotlib offers endless charts and customizations from histograms to scatterplots, matplotlib lays down an array of colors, themes, palettes, and other options to customize and personalize our plots.
Matplotlib is useful whether you’re performing data exploration for a machine learning project or building a report for stakeholders, it is surely the handiest library!
The best part is that you can save the charts as an image in many different formats like png,.jpg, etc.

Installation -

pip install matplotlib

Example 1 -

import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints)
plt.show()

OUTPUT -

Example 2

import matplotlib.pyplot as plt
import numpy as np

y = np.array([35, 25, 25, 15])

plt.pie(y)
plt.show()

OUTPUT -

Documenation - https://matplotlib.org/

Machine Learning -

5. Scikit Learn -

Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.

Installation -

pip install scikit-learn

Example -

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1,1],[1,2],[2,2],[2,3]])
y = np.dot(X, np.array([1,2])) + 3
regr = LinearRegression(
   fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2
).fit(X,y)
regr.predict(np.array([[3,5]]))
regr.score(X,y)
regr.coef_
regr.intercept_