DEV Community

Thomas Nielson
Thomas Nielson

Posted on

Understanding Data Analytics: Scalars, Vectors, Matrices, Tensors using Pandas, NumPy, and TensorFlow

Welcome back! In our previous blog post, we talked about the differences between data analysis and data analytics. We used weather data as examples and introduced essential vocabulary terms. In this continuation, we'll look into Pandas, NumPy, and TensorFlow and how they can benefit your data analysis workflow.

For the sake of keeping this blog post shorter and avoiding reinventing the wheel.. here are great blogs explaining Pandas and NumPy and TensorFlow.

Scalars / shape(1 x 1) 📊

Scalars represent single numerical values without any specific direction. They play a fundamental role in data analytics, allowing us to represent quantities such as temperature, time, or humidity. Let's see an example using NumPy:

import numpy as np

temperature = np.array([25.0, 26.5, 24.2])
Enter fullscreen mode Exit fullscreen mode

Here, the NumPy Array temperature holds a sequence of temperature values. Scalars provide the starting point for our data analysis adventures.

Vectors / shape:(m x 1) or (1 x m) 📈

Vectors come into play when we have a collection of related data points. They possess both magnitude and direction, making them essential for representing multivariate data. Vectors can be viewed as a collection of Scalars.

import numpy as np

humidity = np.array([0.6, 0.7, 0.5])
Enter fullscreen mode Exit fullscreen mode

In this example, the NumPy array humidity captures humidity values at different points in time. Vectors enable us to explore the relationships between multiple variables.

Matrices / shape:(m x n) 📊

Matrices provide a structured format for organizing and manipulating structured datasets. They are essential for data preprocessing and various analytical tasks. Matrices can be viewed as a collection of Vectors

import numpy as np

data = np.array([
  [90, 85, 92],
  [78, 80, 88],
  [95, 92, 98]
])
Enter fullscreen mode Exit fullscreen mode

Here, the NumPy array data represents student scores in different subjects. Matrices allow us to examine relationships and patterns within our data.

To check the shape of our data, we can use the .shape attribute. Let's see how it works:

print(data.shape)
Enter fullscreen mode Exit fullscreen mode

The output will be (3, 3), indicating that our matrix has 3 rows (m) and 3 columns (n). Understanding the shape of our data is crucial for performing various operations and analyses.

Tensors / shape:(k x m x n) 🌌

Tensors generalize scalars, vectors, and matrices to higher dimensions, enabling us to handle complex data structures. TensorFlow offers incredible capabilities for working with tensors. Let's explore an example:

import tensorflow as tf

image = tf.constant([
  [[255, 0, 0], [0, 255, 0]],
  [[0, 0, 255], [255, 255, 0]]
])

image.shape #Ouput: (2 x 2 x 3) 2 matrices with 2 x 3 shape
Enter fullscreen mode Exit fullscreen mode

In this TensorFlow example, we have a 3D tensor representing a colored image. Tensors are powerful tools for working with diverse data types like images, videos, and time-series data.

Basic Operations: Adding and Subtracting Data Elements 📉

In data analytics, we often need to perform basic arithmetic operations on our data. Let's put our math skills to work and perform addition and subtraction on vectors and matrices using NumPy:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

addition_result = a + b
subtraction_result = a - b
Enter fullscreen mode Exit fullscreen mode

In this example, we add and subtract corresponding elements of vectors a and b. Similarly, we can perform addition and subtraction on matrices by aligning corresponding elements. These operations allow us to combine or compare data elements, providing valuable insights.

Reshaping Data: Adapting to Analytical Needs 🔄

Sometimes, we need to reshape our data to align with our analytical requirements. NumPy provides the .reshape() function for this purpose. Let's see an example:

original_data = np.array([1, 2, 3, 4, 5, 6])
reshaped_data = original_data.reshape((2, 3))
Enter fullscreen mode Exit fullscreen mode

In this case, we start with a 1D array original_data and reshape it into a 2D array with 2 rows and 3 columns using .reshape((2, 3)). Reshaping data allows us to manipulate and analyze it more effectively.

Conclusion: Mastering Data Analytics with Powerful Tools! 💡🔍📊

Now you have a better understanding of scalars, vectors, matrices, tensors, and basic operations in the context of data analytics! With the aid of Pandas, NumPy, and TensorFlow.

Thanks for taking the time to read! If you found this blog post on data analytics helpful, feel free to share it with others who might find it interesting.

Top comments (0)