DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Cover image for Python Data science Libraries for beginners
Mysterio
Mysterio

Posted on • Updated on

Python Data science Libraries for beginners

__Hello Guys today i am going to show you some libraries Used for data science in python.I am going to discuss only 5 of them which are commonly used at beginners level.

Lets get started....

Introduction
Python has rapidly become the go-to language in the data science space and is among the first things recruiters search for in a data scientist’s skill set, there’s no doubt about it. It has consistently ranked top in global data science surveys and its widespread popularity only keeps on increasing!

I am categorising these based on the work they do

MATHS -

1. Numpy -

  • NumPy is one of the most essential Python Libraries for scientific computing and it is used heavily for the applications of Machine Learning and Deep Learning.
  • NumPy stands for NUMerical PYthon.
  • NumPy provides support for large multidimensional array objects and various tools to work with them.
  • NumPy contains a large number of various mathematical operations. NumPy provides standard trigonometric functions, functions for arithmetic operations, handling complex numbers, etc.

Installation -

pip install numpy
Enter fullscreen mode Exit fullscreen mode

Example -

import numpy as np 
a = np.array([0,30,45,60,90]) 

sin = np.sin(a)
print("Numpy Array values are: ",a)
print("Calculating the sin values using np.sin() function :",sin)
Enter fullscreen mode Exit fullscreen mode

OUTPUT -

Numpy Array values are:  [ 0 30 45 60 90]
Calculating the sin values using np.sin() function : [ 0.  -0.98803162  0.85090352 -0.30481062  0.89399666]
Enter fullscreen mode Exit fullscreen mode

Documentation - https://numpy.org/

2. Scipy -

  • SciPy (Scientific Python) is the go-to library when it comes to scientific computing used heavily in the fields of mathematics, science, and engineering. It is equivalent to using Matlab which is a paid tool.

  • SciPy as the Documentation says is – β€œprovides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.”

  • It is built upon the NumPy library.

Installation -

pip install scipy
Enter fullscreen mode Exit fullscreen mode

Example -

from scipy import constants

#print the value of pi
print(constants.pi) 

#Prints the value in bytes as how many bytes are there in 1kilobyte(kibi)
# and 1 megabyte(mebi)
print(constants.kibi)
print(2 * constants.kibi) #value of 2 bytes
print(constants.mebi)

#prints the value of seconds in 1 minutes
print(constants.minute)      #60.0
Enter fullscreen mode Exit fullscreen mode

OUTPUT -

3.141592653589793
1024
2048
1048576
60.0
Enter fullscreen mode Exit fullscreen mode

Documentation - https://scipy.github.io/devdocs/getting_started.html

For Beginners - https://www.w3schools.com/python/scipy/index.php

Data Exploration and Visualization

3. Pandas -

  • From Data Exploration to visualization to analysis – Pandas is the almighty library you must master!

  • Pandas is an open-source package. It helps you to perform data analysis and data manipulation in Python language. Additionally, it provides us with fast and flexible data structures that make it easy to work with Relational and structured data.

Installation -

pip install pandas
Enter fullscreen mode Exit fullscreen mode

Example -

  • In this example we will create a DataFrame
  • A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet or a SQL table .
import pandas as pd

df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

print(df)
Enter fullscreen mode Exit fullscreen mode

OUTPUT -

                       Name  Age     Sex
0   Braund, Mr. Owen Harris   22    male
1  Allen, Mr. William Henry   35    male
2  Bonnell, Miss. Elizabeth   58  female
Enter fullscreen mode Exit fullscreen mode

Documentation - https://pandas.pydata.org/docs/getting_started/install.html

4. Matplotlib -

  • Matplotlib is the most popular library for exploration and data visualization in the Python ecosystem. Every other library is built upon this library.

  • Matplotlib offers endless charts and customizations from histograms to scatterplots, matplotlib lays down an array of colors, themes, palettes, and other options to customize and personalize our plots.

  • Matplotlib is useful whether you’re performing data exploration for a machine learning project or building a report for stakeholders, it is surely the handiest library!

  • The best part is that you can save the charts as an image in many different formats like png,.jpg, etc.

Installation -

pip install matplotlib
Enter fullscreen mode Exit fullscreen mode

Example 1 -

import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints)
plt.show()
Enter fullscreen mode Exit fullscreen mode

OUTPUT -

Image description

Example 2

import matplotlib.pyplot as plt
import numpy as np

y = np.array([35, 25, 25, 15])

plt.pie(y)
plt.show() 
Enter fullscreen mode Exit fullscreen mode

OUTPUT -

Image description

Documenation - https://matplotlib.org/

Machine Learning -

5. Scikit Learn -

Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.

Installation -

pip install scikit-learn
Enter fullscreen mode Exit fullscreen mode

Example -

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1,1],[1,2],[2,2],[2,3]])
y = np.dot(X, np.array([1,2])) + 3
regr = LinearRegression(
   fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2
).fit(X,y)
regr.predict(np.array([[3,5]]))
regr.score(X,y)
regr.coef_
regr.intercept_
Enter fullscreen mode Exit fullscreen mode

We have performed the Linear Regression with skikit learn.

Documentation - https://scikit-learn.org/stable/user_guide.html

* Thats it , these are the 5 commonly used Data Science library at beginner and intermediate levels.

^^You can help me by some donation at the link below Thank youπŸ‘‡πŸ‘‡ ^^

β˜• --> https://www.buymeacoffee.com/waaduheck <--

THANK YOU FOR READING THIS POST AND IF YOU FIND ANY MISTAKE OR WANTS TO GIVE ANY SUGGESTION , PLEASE MENTION IT IN THE COMMENT SECTION.

Also check these posts as well

  1. https://dev.to/shubhamtiwari909/animation-with-react-spring-3k22

  2. https://dev.to/shubhamtiwari909/text-to-speech-in-reactjs-52ml

  3. https://dev.to/shubhamtiwari909/best-vs-code-extensions-for-web-development-2lk3

Top comments (2)

Collapse
 
phmota profile image
Paulo Mota

Very nice =)

Collapse
 
shubhamtiwari909 profile image
Mysterio

Thank you

Timeless DEV post...

How to write a kickass README

Arguably the single most important piece of documentation for any open source project is the README. A good README not only informs people what the project does and who it is for but also how they use and contribute to it.

If you write a README without sufficient explanation of what your project does or how people can use it then it pretty much defeats the purpose of being open source as other developers are less likely to engage with or contribute towards it.