DEV Community

Mani Sai Prasad
Mani Sai Prasad

Posted on

Data in NumPy

Python is excellent, but some times it can be slow. Happily, it allows us to access libraries that can execute faster code which is written in languages like C. NumPy is one such library: which provides fast alternatives to math operations in Python and is designed to work efficiently with groups of numbers - like tensors

Importing NumPy

When importing the NumPy library, the convention most often used is np

import numpy as np

Data Types and Shapes

The most common way to work with numbers in NumPy is through ndarray objects. Kind of Similar to Python lists, but can have any number of dimensions. Also, ndarray supports fast math operations, which we want.
Since it can store any number of dimensions, we can use ndarrays to represent any of the data types like

  • scalars
  • vectors
  • matrices
  • or tensors

Scalars

Scalar is simply a quantity described by a single element

In NumPy, we can specify signed and unsigned types, as well as different sizes. Instead of Python’s types like uint8, int8, uint16 and so on..

These types of objects we make (vectors, matrices, tensors) eventually store scalars. And when we create a NumPy array, we can specify the type - but every item in the array must have the same type.

NumPy arrays are more like C arrays than Python lists.
We can create a NumPy array that holds a scalar by passing the value to NumPy's array function

s = np.array(5)
s.shape

out: ()

() means it has zero dimensions.

x = s + 3
x

out: 8

Vectors

Vector is simply like an array with one dimension, consist of one or more scalars

We can create a vector, by passing a Python list to the array function

v = np.array([1,2,3])
v.shape

out: (3,)

(3,) means it has one dimension.

x = v[1]
x

out: 2

We simply index to get a value

Matrices

We create matrices using NumPy's array function, as just we did for vectors. However, instead of just passing a list, we need to supply a list of lists, where each list represents a row. So to create a 3x3 matrix containing the numbers one through nine

m = np.array([[1,2,3], [4,5,6], [7,8,9]])
m

array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]])

m.shape

which return (3, 3), it represent two dimensions

Tensors

Tensors are just like vectors and matrices, but they can have more dimensions, like n dimensions
For example, to create a 3x3x2x1 tensor you could do the following:

t = np.array([[[[1],[2]],[[3],[4]],[[5],[6]]],[[[7],[8]],\
    [[9],[10]],[[11],[12]]],[[[13],[14]],[[15],[16]],[[17],[17]]]])
t

it's like

array([[[[ 1],
         [ 2]],

        [[ 3],
         [ 4]],

        [[ 5],
         [ 6]]],


       [[[ 7],
         [ 8]],

        [[ 9],
         [10]],

        [[11],
         [12]]],


       [[[13],
         [14]],

        [[15],
         [16]],

        [[17],
         [17]]]])

with a shape of (3, 3, 2, 1)

Changing Shapes

Sometimes we need to change the shape of the data without actually changing its contents. For example, you may have a vector, which is one-dimensional but need a matrix, which is two-dimensional. So, here we can reshape it

v = np.array([1,2,3,4])
v.shape

(4,)
its a vector
so here we can reshape by following:

x = v.reshape(1,4)
x.shape

(1, 4)
which is a matric we wanted

If you see code from experienced NumPy users, you will often see them use a special slicing syntax instead of calling reshape. Using this syntax, the previous examples would look like this:

x = v[None, :]
x

array([[1, 2, 3, 4]])

Element-wise operations

lets see how we can do with Python

Suppose we had a list of numbers, and you wanted to add 5 to every item in the list. Without NumPy, you might do something like this:

values = [1,2,3,4,5]
for i in range(len(values)):
    values[i] += 5
values

[6, 7, 8, 9, 10]

we can say its not that great

Lets see the NumPy way

values = [1,2,3,4,5]
values = np.array(values) + 5
values

array([ 6, 7, 8, 9, 10])

Creating that array may seem odd, but normally you'll be storing your data in ndarrays anyway. So if you already had an ndarray named values, you could have just done:

values += 5
values

array([11, 12, 13, 14, 15])

NumPy actually has functions for things like adding, multiplying, etc. But it also supports using the standard math operators. So the following two lines are equivalent:

x = np.multiply(values, 5)
x = values * 5
x

array([55, 60, 65, 70, 75])

Matrix Multiplication

NumPy Element-wise Matrix Multiplication

NumPy supports several types of matrix multiplication.
Element-wise Multiplication

m = np.array([[1,2,3],[4,5,6]])
n = m * 0.25
n
array([[0.25, 0.5 , 0.75],
       [1.  , 1.25, 1.5 ]])

or

np.multiply(m, n)
array([[0.25, 1.  , 2.25],
       [4.  , 6.25, 9.  ]])

Matrix Product

  • The number of columns in the left matrix must equal the number of rows in the right matrix.
  • The answer matrix always has the same number of rows as the left matrix and the same number of columns as the right matrix.
  • Order matters. Multiplying A•B is not the same as multiplying B•A.
  • Data in the left matrix should be arranged as rows., while data in the right matrix should be arranged as columns.

To find the matrix product, we use NumPy's matmul function.

If you have compatible shapes, then it's as simple as this:

a = np.array([[1,2,3,4],[5,6,7,8]])
a.shape

(2, 4)

b = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
b.shape

(4, 3)

c = np.matmul(a, b)
c

out:

array([[ 70,  80,  90],
       [158, 184, 210]])

If your matrices have incompatible shapes, you'll get an error, like the following:

np.matmul(b, a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-67-af3b88aa2232> in <module>
----> 1 np.matmul(b, a)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

NumPy's dot function

You may sometimes see NumPy's dot function in places where you would expect a matmul. It turns out that the results of dot and matmul are the same if the matrices are two dimensional.
So these two results are equivalent:

a = np.array([[1,2],[3,4]])
np.dot(a,a)
array([[ 7, 10],
       [15, 22]])
np.matmul(a,a)
array([[ 7, 10],
       [15, 22]])

While these functions return the same results for two-dimensional data, you should be careful about which you choose when working with other data shapes.

Play here

Link to an interactive colab notebook

Thank you :)

Top comments (0)