DEV Community

Silvia-nyawira
Silvia-nyawira

Posted on

Python 101: Introduction to Python as a Data Analytics Tool

1) What is garbage collection in the context of Python, and why is it important? Can you explain how memory management is handled in Python?

In Python, garbage collection is the automatic process of retrieving memory that is no longer in use by the program. It helps to prevent memory leaks, which occur when a program consumes more memory than necessary which may lead to reduced performance and potential crashes.

How Memory Management is Handled in Python:
1. Reference Counting:
Python primarily uses reference counting to manage memory. Each object in Python has a reference count, which keeps track of how many variables or objects are referencing it. When an object’s reference count drops to zero, meaning no one is using it, Python immediately deallocates the memory associated with it.
Example:



a = [1, 2, 3]  # Reference count of list object increases
b = a          # Reference count increases further (now two references)
del a          # Reference count decreases (one reference remains)
del b          # Reference count becomes zero, memory is deallocated


Enter fullscreen mode Exit fullscreen mode

2. Garbage Collection for Cycles:
Reference counting alone cannot handle cyclic references—cases where two or more objects reference each other, creating a cycle. These objects might have non-zero reference counts but are not reachable from any part of the program. To address this, Python uses a cyclic garbage collector, which detects and collects objects involved in reference cycles.



class A:
    def __init__(self):
        self.other = None
obj1 = A()
obj2 = A()
obj1.other = obj2
obj2.other = obj1


Enter fullscreen mode Exit fullscreen mode

3. Generational Garbage Collection:
Python's garbage collector is split into generations. Objects that survive garbage collection in one generation move to an older generation. The rationale is that most objects are short-lived, so younger generations are checked for garbage more often than older ones. This generational approach improves efficiency by not checking long-lived objects as frequently.
4. Manual Memory Management:
In addition to automatic garbage collection, Python provides ways for developers to manually manage memory when needed. For instance, the gc module can be used to interact with the garbage collector, including forcing garbage collection or disabling it if desired



import gc
gc.collect()  # Manually trigger garbage collection


Enter fullscreen mode Exit fullscreen mode

Garbage collection in python enhances; efficient memory usage, simplicity and safety

2) What are the key differences between NumPy arrays and Python lists, and can you explain the advantages of using NumPy arrays in numerical computations?
The key differences between NumPy arrays and Python lists primarily relate to;
1. Data Type Homogeneity:

  • Python Lists
    : A Python list can hold elements of different data types (e.g., integers, floats, strings).

  • NumPy Arrays:
    A NumPy array is homogeneous, meaning that all elements must be of the same data type (e.g., all integers or all floats). This homogeneity allows NumPy arrays to be more memory-efficient and faster
    2. Performance (Speed):

  • Python Lists:
    Since Python lists are dynamically-typed and heterogeneous, operations involving them tend to be slower. Each element requires type checking during operations.

  • NumPy Arrays:
    NumPy arrays are optimized for performance. Operations on arrays are executed at a much lower level (often in C), which makes them significantly faster than operations on Python lists, especially for large datasets.
    3. Memory Efficiency:

  • Python Lists: Lists
    store each element as a full Python object, which includes additional metadata (such as type information and reference count), making them more memory-intensive.

  • NumPy Arrays:
    Arrays store elements as fixed-type data (e.g., int32, float64), eliminating extra memory overhead. This reduces memory consumption, especially when working with large datasets.
    4. Vectorized Operations:

  • Python Lists:
    Operations on lists usually require explicit loops (i.e., iteration over each element) to apply functions, making it slower and more verbose.

  • NumPy Arrays:
    NumPy supports vectorization, meaning operations are applied element-wise automatically across the entire array without the need for loops. This makes code cleaner and faster.
    Advantages of Using NumPy Arrays for Numerical Computations:

  • NumPy is designed for high-performance numerical computations. By using optimized C libraries underneath, it speeds up operations that would be much slower with Python lists

  • NumPy arrays are more memory-efficient than Python lists, especially for large datasets, due to their compact, fixed-size data type storage

  • The ability to apply operations across entire arrays without the need for explicit loops increases both performance and readability of code.

  • NumPy offers a comprehensive suite of mathematical functions, making it ideal for tasks such as matrix algebra, statistical analysis, and numerical simulations

  • NumPy allows efficient handling and computation of multi-dimensional arrays (e.g., matrices and tensors), making it essential for machine learning, data science, and scientific computing

  • NumPy integrates well with other scientific computing libraries such as Pandas, SciPy, and Matplotlib, enabling seamless workflows in data analysis, machine learning, and scientific computing.

3) How does list comprehension work in Python, and can you provide an example of using it to generate a list of squared values or filter a list based on a condition?
In Python list comprehension is a concise way to create lists by applying an expression to each item in an iterable manner (such as a list, range, or string) and optionally filtering elements using conditions. It provides a shorter and more readable syntax compared to using loops for creating or transforming



[expression for item in iterable if condition]


Enter fullscreen mode Exit fullscreen mode
  • expression: The operation or transformation applied to each item in the iterable.
  • item: The variable representing the element in the iterable.
  • iterable: The source collection (like a list, range, or another iterable).
  • if condition (optional): A filter that only includes items that meet the specified condition Generating a list of squared values


squared_values = [x**2 for x in range(10)]
print(squared_values)  # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Enter fullscreen mode Exit fullscreen mode

4) Can you explain the concepts of shallow and deep copying in Python, including when each is appropriate, and how deep copying is implemented?
Shallow Copy:
A shallow copy creates a new object but does not create copies of the objects that the original object references. Instead, it copies the references to these objects. This means that changes made to nested objects (mutable objects like lists or dictionaries inside the original object) will be reflected in both the original and the shallow copy because they share references to the same inner objects

  • Shallow copying is appropriate when you want a new top-level object, but you are okay with changes to nested objects reflecting across both the original and the copy. For instance, if the structure of the object is what you're interested in duplicating, but the inner data can remain shared, a shallow copy is sufficient.


import copy
original_list = [[1, 2, 3], [4, 5, 6]]
shallow_copy = copy.copy(original_list)
# Modifying the nested list in shallow_copy
shallow_copy[0][0] = 99
print(original_list)  # Output: [[99, 2, 3], [4, 5, 6]]
print(shallow_copy)   # Output: [[99, 2, 3], [4, 5, 6]]


Enter fullscreen mode Exit fullscreen mode

Deep copy
A deep copy creates a new object and recursively copies all objects that the original object references, including nested or contained objects. As a result, the deep copy is completely independent of the original object, and changes made to the nested objects in the deep copy do not affect the original.
Deep copying is appropriate when you need a complete duplication of the object, including all nested objects, and want to ensure that changes made to the copy do not affect the original. This is especially useful when dealing with complex objects or data structures that contain references to mutable objects (like lists or dictionaries)
import copy



original_list = [[1, 2, 3], [4, 5, 6]]
deep_copy = copy.deepcopy(original_list)
# Modifying the nested list in deep_copy
deep_copy[0][0] = 99
print(original_list)  # Output: [[1, 2, 3], [4, 5, 6]]
print(deep_copy)      # Output: [[99, 2, 3], [4, 5, 6]]


Enter fullscreen mode Exit fullscreen mode

5) Explain with examples the difference between list and tuples?

  • Lists are mutable, meaning you can modify them after creation while Tuples are immutable, meaning once they are created, you cannot modify, add, or remove items Example


# Lists can be modified
my_list = [1, 2, 3]
my_list[0] = 10       # Changing an element
my_list.append(4)     # Adding an element
print(my_list)        # Output: [10, 2, 3, 4]


Enter fullscreen mode Exit fullscreen mode


# Tuples cannot be modified
my_tuple = (1, 2, 3)
# my_tuple[0] = 10  # This will raise an error: TypeError: 'tuple' object does not support item assignment


Enter fullscreen mode Exit fullscreen mode
  • Lists are created using square brackets [].while Tuples are created using parentheses () Example


my_list = [1, 2, 3]
print(type(my_list))  # Output: <class 'list'>


Enter fullscreen mode Exit fullscreen mode


my_tuple = (1, 2, 3)
print(type(my_tuple))  # Output: <class 'tuple'>


Enter fullscreen mode Exit fullscreen mode

Lists have many more built-in methods for modifying them, such as append(), remove(), sort(), etc.While Tuples only have two built-in methods: count() and index() (since they can't be modified).
Example
list



my_list = [1, 2, 3]
my_list.append(4)  # Adding an element
my_list.remove(2)  # Removing an element
print(my_list)     # Output: [1, 3, 4]


Enter fullscreen mode Exit fullscreen mode

tuple



my_tuple = (1, 2, 3, 2)
print(my_tuple.count(2))  # Output: 2 (counts occurrences of 2)
print(my_tuple.index(3))  # Output: 2 (finds the index of element 3)


Enter fullscreen mode Exit fullscreen mode

Lists are more suitable for collections of items where the contents will change over time. Examples include lists of users, items in a shopping cart, or dynamic data collections.



shopping_list = ['apples', 'bananas', 'oranges']
shopping_list.append('milk')  # Modify the list
print(shopping_list)          # Output: ['apples', 'bananas', 'oranges', 'milk']


Enter fullscreen mode Exit fullscreen mode

while
Tuples are used when you want to store a fixed set of items, or data that should not change. Examples include coordinates (x, y), RGB color values, or records that shouldn’t be altered



coordinates = (10.0, 20.0)  # Coordinates should remain constant
print(coordinates)           # Output: (10.0, 20.0)


Enter fullscreen mode Exit fullscreen mode

Top comments (0)