DEV Community

Cover image for Understanding NumPy: Datatypes, Memory Storage, and Structured Arrays.
Lohith
Lohith

Posted on • Updated on

Understanding NumPy: Datatypes, Memory Storage, and Structured Arrays.

Datatypes and Memory Storage in NumPy Arrays

The numpy.dtype class in NumPy provides essential information about the data type of an array. Utilizing its itemsize attribute, one can easily retrieve the size of one element within the array. This feature is particularly useful for understanding memory usage and data representation within NumPy arrays.

import numpy as np

# Create an array 'b' with float data type
b = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
dtype_info_b = b.dtype  # Get the data type information

# Print the data type information and the size of one element
print("Data type of the array:", dtype_info_b)
print("Size of one element in bytes:", dtype_info_b.itemsize)

# Calculate the total memory size consumed by the array 'b'
total_memory_size_b = len(b) * dtype_info_b.itemsize
print("Total memory size consumed by array 'b':", total_memory_size_b, "bytes")

# Alternatively, use the nbytes attribute to directly get the total memory size
total_memory_size_b_alt = b.nbytes
print("Total memory size consumed by array 'b' (alternative method):", total_memory_size_b_alt, "bytes")
Enter fullscreen mode Exit fullscreen mode
Data type of the array: float64
Size of one element in bytes: 8
Total memory size consumed by array 'b': 40 bytes
Total memory size consumed by array 'b' (alternative method): 40 bytes
Enter fullscreen mode Exit fullscreen mode

In this example, we create an array b containing floating-point numbers. We then retrieve the data type information using b.dtype and store it in dtype_info_b. We print both the data type and the size of one element in bytes using the itemsize attribute of dtype_info_b.

After that, we calculate the total memory size consumed by the array b by multiplying the number of elements (len(b)) with the size of one element (dtype_info_b.itemsize). Finally, we use the nbytes attribute of the array b to directly obtain the total memory size consumed.

Creating the NumPy array with defined datatype

import numpy as np

# Example 1: Creating an array with default storage for integers (32-bit)
arr1 = np.array([1, 2, 3, 4, 5], dtype=int)
print("Example #1:")
print("Array:", arr1)
print("Data type:", arr1.dtype)
print()

# Example 2: Creating an array with defined storage size for integers (16-bit)
arr2 = np.array([1, 2, 3, 4, 5], dtype=np.int16)
print("Example #2:")
print("Array:", arr2)
print("Data type:", arr2.dtype)
print()

# Example 3: Using character code with storage value to specify data type (16-bit integer)
arr3 = np.array([1, 2, 3, 4, 5], dtype='i2')
print("Example #3:")
print("Array:", arr3)
print("Data type:", arr3.dtype)
Enter fullscreen mode Exit fullscreen mode
Example #1:
Array: [1 2 3 4 5]
Data type: int32

Example #2:
Array: [1 2 3 4 5]
Data type: int16

Example #3:
Array: [1 2 3 4 5]
Data type: int16
Enter fullscreen mode Exit fullscreen mode

In these examples:

Example 1 illustrates creating an array with the default storage for integers, resulting in a 32-bit integer array.
Example 2 demonstrates creating an array with a defined storage size for integers, specifying np.int16, which results in a 16-bit integer array.
Example 3 shows another way to achieve the same result as Example 2 by using a character code ('i2') with the desired storage value (16-bit) to specify the data type.

You can specify the boolean data type also directly within the array creation function.
Here's an example demonstrating how to define the boolean data type within the array creation:

import numpy as np

bool_arr = np.array([True, False, True, True], dtype=bool)
print("Array:", bool_arr)
print("Data type:", bool_arr.dtype)
Enter fullscreen mode Exit fullscreen mode
Array: [ True False  True  True]
Data type: bool
Enter fullscreen mode Exit fullscreen mode

Here are examples demonstrating array creation with string and Unicode string types by specifying the dtypes inside the array function:

import numpy as np

# Example_1: Creating an array with string type
str_arr = np.array(['hello', 'world', 'numpy'],dtype='S')
print("Array:", str_arr)
print("Data type:", str_arr.dtype)

# Example_2: Creating an array with Unicode type
str_arr = np.array(['hello', 'world', 'numpy'],dtype='U')
print("Array:", str_arr)
print("Data type:", str_arr.dtype)
Enter fullscreen mode Exit fullscreen mode
Example_1: Array: [b'hello' b'world' b'numpy']
             Data type: |S5

Example_2 : Array: ['hello' 'world' 'numpy']
             Data type: <U5
Enter fullscreen mode Exit fullscreen mode

When creating a NumPy array, if we specify a data type using ā€˜Sā€™ (for string) or ā€˜Uā€™ (for Unicode string), the length of the string will be automatically determined based on the longest element in the array. If a string exceeds the specified length, it will be truncated to fit.

Structured DataType or record type

It allows for fields with different data types within the same structure, unlike a typical NumPy array. To create a structured data type, you can use the numpy.dtype() function. One approach is to define it by passing a list of tuples containing (field_name, data_type) pairs.

import numpy as np

# Define a new structured data type
student_dtype = np.dtype([
    ('student_id', np.int32),
    ('course', 'S20'),  # String with size 20 characters
    ('grade', np.float64)  # Floating-point grade
])

# Create an array with student data
student_array = np.array([
    (101, 'Math', 85.5),
    (102, 'History', 78.2),
    (103, 'Physics', 92.0)
], dtype=student_dtype)

#Print the array
print(student_array)


# Print the dtype
print(student_array.dtype)
Enter fullscreen mode Exit fullscreen mode
[(101, b'Math', 85.5) (102, b'History', 78.2) (103, b'Physics', 92. )]
[('student_id', '<i4'), ('course', 'S20'), ('grade', '<f8')]
Enter fullscreen mode Exit fullscreen mode

In the case of a multidimensional array, you can create a structured array by specifying the third argument, which represents the shape of the field. This allows you to define fields with different dimensions within the same structured array.

Top comments (0)