The objective of this article is to give brief knowledege about NumPy.
What is NumPy?
- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- NumPy is the fundamental package for scientific computing in Python.NumPy stands for Numerical Python.
What is the purpose of NumPy?
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python's built-in sequences.
NumPy is written in C Language.
Installing NumPy
- You can install NumPy using
pip
and if you are using anaconda than you can also install usingconda
.
pip install numpy
conda install numpy
- After setting up numpy let's start exploring numpy.
- I am using Jupyter Notebook for NumPy and Python Programming, but if you do not have Anaconda than you can use Visual Studio Code as well.
Import Numpy
import numpy as np
Array() Function
- This function is used to convert python array or python list into the numpy array.
- We can also convert python multi-dimensional array into numpy array using this function.
Arange Function
- When working with data, you will often come across use cases where you need to generate data.
- NumPy has an “arrange()” method with which you can generate a range of values between two numbers. The arrange function takes the start, end, and an optional distance parameter.
Zeros and Ones Function
- You can also generate an array or matrix of zeroes or ones using NumPy.
- Also, Both these functions support n-dimensional arrays as well. You can add the shape as a tuple with rows and columns.
Identity Matrix
- You can also generate an identity matrix using a built-in NumPy function called “eye”.
Linspace Function
- NumPy has a linspace method that generates evenly spaced points between two numbers. It takes 3 arguments as starting point, ending point and jump.
Random Number Generation
- When you are working on machine learning problems, you will often need to generate random numbers. NumPy has in-built functions for that as well.
- First, let's look at two major types of distributions.
Normal Distribution
- The normal distribution is a very important concept in statistics since it seen in many natural phenomena. It is also called a “bell curve”.
Uniform Distribution
- If the values in the distribution have the probability as a constant, it is called a uniform distribution.
For example, a coin toss has a uniform distribution since the probability of getting either heads or tails in a coin toss is the same.
To generate random numbers in a uniform distribution, use the
rand()
function fromnp.random
.
- To generate random numbers in a normal distribution, use the
randn()
function fromnp.random
.
- To generate random integers between a low and high value, use the
randint()
function fromnp.random
.
- A seed value is used if you want your random numbers to be the same during each computation.
- Whenever you use a seed number, you will always get the same array generated without any change.
Reshaping Arrays
- As a data scientist, you will work with re-shaping the data sets for different types of computations.
- To get the shape of an array, use the shape property.
- To reshape an array, use the reshape() function.
- Also, reshape only works if the existing structure makes sense. You cannot reshape a 2x2 array into a 3x1 array.
Slicing Data
- Let's look at fetching data from NumPy arrays. NumPy arrays work similarly to Python lists during fetch operations.
If you look at the above example, even though we assigned the slice of “myarr” to the variable “sliced”, changing the value of “sliced” affects the original array. This is because the “slice” was just pointing to the original array.
To make an independent section of an array, use the
copy()
function.
- Slicing multi-dimensional arrays work similarly to one-dimensional arrays.
Array Computations
- NumPy is known for its speed when performing complex computations on large multi-dimensional arrays.
For zero division errors, Numpy will convert the value to NaN (not a number).
There are also a few in-built computation methods available in NumPy to calculate values like mean, standard deviation, variance, and others.
Sum — np.sum()
Square Root — np.sqrt()
Mean — np.mean()
Variance — np.var()
Standard Deviation — np.std()
While working with 2d arrays, you will often need to calculate row wise or column-wise sum, mean, variance, and so on.
You can use the optional axis parameter to specify if you want to choose a row or a column.
Conditional Operations
- You can also do conditional filtering with NumPy using the square bracket notation.
Conclusion
NumPy package has lot more capabilities and especially when working with higher-dimensional data. NumPy is very Fast, Easy to Use and Robust library to work with Data, Manipulating the data.
You can find complete jupyter notebook with the contents of this post.
There are lot more functionalities I have not covered as above are mostly used. If you are interested to learn NumPy i would recommend NumPy Official Documentation.
I want to Thank You very much for reading this post.
Top comments (0)