DEV Community

Ashutosh Sarangi
Ashutosh Sarangi

Posted on

Python Performance Optimization: Detailed Guide

**product Object.model_dump()

Python Performance Optimization: Detailed Guide

1. Overview: Optimize What Needs Optimizing

Why This Matters

Premature optimization is the root of all evil. Optimizing the wrong parts of your code wastes time and can make code harder to maintain.

The Right Approach

# Step 1: Get it right first
def calculate_total(items):
    return sum(item['price'] * item['quantity'] for item in items)

# Step 2: Test it's right
def test_calculate_total():
    items = [{'price': 10, 'quantity': 2}, {'price': 5, 'quantity': 3}]
    assert calculate_total(items) == 35

# Step 3: Profile if slow
import cProfile
cProfile.run('calculate_total(large_item_list)')

# Step 4: Optimize based on profiling results
# Step 5: Repeat testing after optimization
Enter fullscreen mode Exit fullscreen mode

Key Point: Always profile first. What you think is slow might not be the bottleneck!


2. Sorting Optimization

Avoid: Using Comparison Functions

# BAD - Comparison function called O(n log n) times
def compare_by_age(person1, person2):
    if person1['age'] < person2['age']:
        return -1
    elif person1['age'] > person2['age']:
        return 1
    return 0

people = [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
# Python 2 style - SLOW
# people.sort(cmp=compare_by_age)
Enter fullscreen mode Exit fullscreen mode

Why This is Slow:

  • Comparison function is a Python function call (expensive!)
  • Called O(n log n) times during sorting
  • For 10,000 items, ~130,000 function calls
  • Each call has Python interpreter overhead

Use: key Parameter with operator.itemgetter

from operator import itemgetter

# GOOD - key function called only O(n) times
people = [
    {'name': 'Alice', 'age': 30, 'salary': 100000},
    {'name': 'Bob', 'age': 25, 'salary': 80000},
    {'name': 'Charlie', 'age': 35, 'salary': 120000}
]

# Sort by single field
people.sort(key=itemgetter('age'))
# [{'name': 'Bob', 'age': 25, ...}, {'name': 'Alice', 'age': 30, ...}, ...]

# Sort by multiple fields (age, then salary)
people.sort(key=itemgetter('age', 'salary'))

# For tuples/lists, use index
data = [(1, 'apple', 5), (2, 'banana', 3), (3, 'cherry', 8)]
data.sort(key=itemgetter(2))  # Sort by third element
# [(2, 'banana', 3), (1, 'apple', 5), (3, 'cherry', 8)]
Enter fullscreen mode Exit fullscreen mode

Why This is Fast:

  • itemgetter is implemented in C
  • Key function called only once per item (O(n) calls)
  • Native comparisons on extracted keys (fast C code)
  • For 10,000 items: 10,000 key calls vs 130,000 comparison calls

Use: sorted() for Non-Destructive Sorting

# ❌ Bad - modifies original list
original = [3, 1, 4, 1, 5]
original.sort()
print(original)  # [1, 1, 3, 4, 5] - original destroyed!

# ✅ Good - preserves original
original = [3, 1, 4, 1, 5]
sorted_copy = sorted(original)
print(original)      # [3, 1, 4, 1, 5] - unchanged
print(sorted_copy)   # [1, 1, 3, 4, 5]

# Works with any iterable
sorted_set = sorted({3, 1, 4, 1, 5})  # [1, 3, 4, 5]
sorted_dict_keys = sorted({'z': 1, 'a': 2, 'm': 3})  # ['a', 'm', 'z']
Enter fullscreen mode Exit fullscreen mode

Advanced Sorting Techniques

from operator import itemgetter, attrgetter, methodcaller

# 1. Sort objects by attribute
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return f"Person({self.name}, {self.age})"

people = [Person('Alice', 30), Person('Bob', 25), Person('Charlie', 35)]

# Sort by attribute
people.sort(key=attrgetter('age'))
# [Person(Bob, 25), Person(Alice, 30), Person(Charlie, 35)]

# Sort by multiple attributes
people.sort(key=attrgetter('age', 'name'))

# 2. Sort by method result
words = ['Python', 'java', 'C++', 'javascript']
words.sort(key=methodcaller('lower'))  # Case-insensitive sort
# ['C++', 'java', 'javascript', 'Python']

# 3. Reverse sorting (descending)
numbers = [3, 1, 4, 1, 5, 9]
numbers.sort(reverse=True)  # [9, 5, 4, 3, 1, 1]

# Or with key
people.sort(key=attrgetter('age'), reverse=True)  # Oldest first

# 4. Complex sorting with lambda (when operator functions won't work)
points = [(1, 5), (3, 2), (1, 3), (2, 4)]
# Sort by distance from origin
points.sort(key=lambda p: p[0]**2 + p[1]**2)

# But prefer operator when possible (faster):
points.sort(key=itemgetter(1))  # Sort by y-coordinate
Enter fullscreen mode Exit fullscreen mode

Decorate-Sort-Undecorate (DSU) Pattern (Legacy)

# Old Python 2 pattern - now obsolete with key parameter
# But understanding it helps explain how key works internally

# ❌ Manual DSU (old way)
def sortby_manual(somelist, n):
    # Decorate
    decorated = [(x[n], x) for x in somelist]
    # Sort
    decorated.sort()
    # Undecorate
    return [x for (key, x) in decorated]

# ✅ Modern way (Python 2.4+)
def sortby_modern(somelist, n):
    return sorted(somelist, key=itemgetter(n))
Enter fullscreen mode Exit fullscreen mode

Sorting Stability (Important!)

# Python's sort is STABLE (since 2.3)
# Equal elements maintain their relative order

students = [
    {'name': 'Alice', 'grade': 'A', 'age': 20},
    {'name': 'Bob', 'grade': 'B', 'age': 19},
    {'name': 'Charlie', 'grade': 'A', 'age': 21},
    {'name': 'David', 'grade': 'B', 'age': 20}
]

# Sort by grade (stable)
students.sort(key=itemgetter('grade'))
# Within same grade, original order preserved

# Multi-level sorting using stability
# Sort by secondary key first, then primary key
students.sort(key=itemgetter('age'))      # Sort by age first
students.sort(key=itemgetter('grade'))    # Then by grade (stable!)
# Result: Sorted by grade, and within same grade, sorted by age
Enter fullscreen mode Exit fullscreen mode

Performance Comparison

import timeit
from operator import itemgetter

data = [{'id': i, 'value': i % 100} for i in range(10000)]

# Using lambda
time_lambda = timeit.timeit(
    lambda: sorted(data, key=lambda x: x['value']),
    number=1000
)

# Using itemgetter
time_itemgetter = timeit.timeit(
    lambda: sorted(data, key=itemgetter('value')),
    number=1000
)

print(f"Lambda: {time_lambda:.4f}s")
# Output: ~2.5s

print(f"itemgetter: {time_itemgetter:.4f}s")
# Output: ~1.8s (30-40% faster!)
Enter fullscreen mode Exit fullscreen mode

When to Use What

# Use .sort() when:
# - You want to modify the list in place
# - You don't need the original order
# - Slightly more memory efficient
my_list.sort(key=itemgetter('field'))

# Use sorted() when:
# - You need to keep the original
# - Sorting any iterable (not just lists)
# - More functional programming style
new_list = sorted(my_list, key=itemgetter('field'))

# Use itemgetter when:
# - Sorting by dictionary keys or tuple/list indices
# - Need maximum performance
from operator import itemgetter
data.sort(key=itemgetter('age', 'name'))

# Use attrgetter when:
# - Sorting objects by attributes
from operator import attrgetter
objects.sort(key=attrgetter('attribute'))

# Use lambda when:
# - Complex transformation needed
# - itemgetter/attrgetter won't work
data.sort(key=lambda x: (x['category'], -x['priority']))
Enter fullscreen mode Exit fullscreen mode

3. String Concatenation

Avoid: Using += for String Building

# BAD - Creates a new string object on every iteration
def build_html_bad(items):
    html = ""
    for item in items:
        html += "<li>" + item + "</li>"  # Creates new string each time
    return html

# Why it's slow:
# Iteration 1: "" -> "<li>apple</li>" (new string created)
# Iteration 2: "<li>apple</li>" -> "<li>apple</li><li>banana</li>" (another new string)
# Each concatenation copies ALL previous characters again!
Enter fullscreen mode Exit fullscreen mode

Why This is Slow:

  • Strings are immutable in Python
  • Each += creates a completely new string object
  • For n items, this copies characters O(n²) times
  • With 10,000 items, you might copy millions of characters

Use: join() Method

# GOOD - Builds list first, then joins once
def build_html_good(items):
    parts = []
    for item in items:
        parts.append(f"<li>{item}</li>")
    return "".join(parts)

# Even better - list comprehension
def build_html_best(items):
    return "".join(f"<li>{item}</li>" for item in items)
Enter fullscreen mode Exit fullscreen mode

Why This is Fast:

  • List operations are cheap
  • join() calculates total size once and allocates memory once
  • Only one string copy operation at the end
  • O(n) time complexity instead of O(n²)

Performance Comparison

import time

items = ['item' + str(i) for i in range(10000)]

# Bad approach
start = time.time()
result = ""
for item in items:
    result += item
print(f"Concatenation: {time.time() - start:.4f}s")
# Output: ~2-5 seconds

# Good approach
start = time.time()
result = "".join(items)
print(f"Join: {time.time() - start:.4f}s")
# Output: ~0.001 seconds (1000x faster!)
Enter fullscreen mode Exit fullscreen mode

String Formatting

# ❌ Avoid concatenation
output = "<html>" + head + prologue + query + tail + "</html>"

# ✅ Use formatting (better)
output = "<html>%s%s%s%s</html>" % (head, prologue, query, tail)

# ✅ Use f-strings (Python 3.6+, best)
output = f"<html>{head}{prologue}{query}{tail}</html>"

Enter fullscreen mode Exit fullscreen mode

3. Loops and Iteration

Avoid: Manual Loop with Append

# BAD - Slow due to repeated attribute lookups and Python loop overhead
def process_words_bad(words):
    result = []
    for word in words:
        result.append(word.upper())
    return result
Enter fullscreen mode Exit fullscreen mode

Why This is Slow:

  • Python interpreter overhead for each iteration
  • Repeated method lookups (.append, .upper)
  • Function call overhead for each operation

Use: List Comprehensions

# GOOD - Optimized by interpreter
def process_words_good(words):
    return [word.upper() for word in words]
Enter fullscreen mode Exit fullscreen mode

Why This is Fast:

  • List comprehensions are optimized at the bytecode level
  • Reduces interpreter overhead
  • More concise and readable

Use: map() for Simple Operations

# ALSO GOOD - Pushes loop into C code
def process_words_map(words):
    return list(map(str.upper, words))
Enter fullscreen mode Exit fullscreen mode

Why This is Fast:

  • map() is implemented in C
  • No Python interpreter overhead per iteration
  • Very efficient for simple transformations

Map in detail

The map() function in Python is used to apply a function to every item in an iterable (like a list or tuple) and return a map object (which can be converted to a list, set, etc.).


🧠 Syntax of map()

map(function, iterable)
Enter fullscreen mode Exit fullscreen mode
  • function: A function that will be applied to each item in the iterable.
  • iterable: A sequence (like a list, tuple, etc.) whose items will be processed by the function.

Example 1: Using Built-in Function str.upper

words = ["hello", "world"]
result = map(str.upper, words)
print(list(result))  # Output: ['HELLO', 'WORLD']
Enter fullscreen mode Exit fullscreen mode

Here, str.upper is applied to each word in the list.


Example 2: Using a Lambda Function

numbers = [1, 2, 3, 4]
squared = map(lambda x: x**2, numbers)
print(list(squared))  # Output: [1, 4, 9, 16]
Enter fullscreen mode Exit fullscreen mode

The lambda function lambda x: x**2 squares each number.


Example 3: Using a Custom Function

def add_prefix(word):
    return "pre_" + word

words = ["fix", "view", "dict"]
result = map(add_prefix, words)
print(list(result))  # Output: ['pre_fix', 'pre_view', 'pre_dict']
Enter fullscreen mode Exit fullscreen mode

Generator Expressions (Memory Efficient)

# When you don't need the full list at once
def process_large_file(filename):
    with open(filename) as f:
        # ❌ Bad - loads entire file into memory
        lines = [line.upper() for line in f]

        # ✅ Good - processes one line at a time
        lines = (line.upper() for line in f)

        for line in lines:
            process(line)  # Only one line in memory at a time
Enter fullscreen mode Exit fullscreen mode

Generator Expression in Detail

You're asking a great question — and it's a common point of confusion for many Python learners. Let's clarify the difference between yield and generators, and how they relate to each other.


🧠 What Is a Generator?

A generator is a special type of iterator in Python that produces values one at a time, only when requested. It’s useful for working with large datasets or streams of data because it doesn’t store everything in memory.

There are two ways to create a generator:

1. Using a Generator Function with yield

def my_generator():
    yield 1
    yield 2
    yield 3
Enter fullscreen mode Exit fullscreen mode
  • When you call my_generator(), it returns a generator object.
  • Each time you iterate over it (e.g., with a for loop), it runs until it hits a yield, returns that value, and pauses.
  • When you ask for the next value, it resumes from where it left off.

2. Using a Generator Expression

gen = (x * x for x in range(3))
Enter fullscreen mode Exit fullscreen mode
  • This is similar to a list comprehension, but with ()` instead of [].
  • It also returns a generator object and evaluates lazily (one item at a time).

🔄 What Is yield?

  • yield is a keyword used inside a function to turn it into a generator function.
  • It’s like return, but instead of ending the function, it pauses and allows the function to continue later.

Example: Using yield

`python
def count_up_to(n):
count = 1
while count <= n:
yield count
count += 1

gen = count_up_to(3)
for num in gen:
print(num)
`

Output:


1
2
3

  • Each call to next(gen) gives the next number.
  • The function remembers its state between calls.

🔁 What a Generator Does

When you write:
python
lines = (line.upper() for line in f)

This creates a generator object. It doesn’t actually read or process any lines yet. It just sets up the logic for how each line will be processed when requested.


🧠 Why the for Loop Is Needed

python
for line in lines:
process(line)

The for loop triggers** the generator to start reading the file line by line, converting each line to uppercase, and passing it to process().

Without the loop, the generator just sits there — it doesn’t do anything.


4. Avoiding Dots (Attribute Lookups)

Avoid: Repeated Attribute Lookups

`python

BAD - Looks up .append and .upper on every iteration

def process_bad(words):
result = []
for word in words:
result.append(word.upper()) # Two lookups per iteration
return result
`

Why This is Slow:

  • Python does attribute lookup at runtime
  • Each dot (.) triggers a dictionary lookup
  • For 1 million items, that's 2 million dictionary lookups!

Use: Cache Attribute Lookups

`python

GOOD - Lookup once, use many times

def process_good(words):
result = []
append = result.append # Cache the method
upper = str.upper # Cache the function

for word in words:
    append(upper(word))  # Direct reference, no lookup

return result
Enter fullscreen mode Exit fullscreen mode

`

Why This is Fast:

  • Attribute lookup happens only once
  • Direct variable access is much faster
  • Reduces bytecode instructions per iteration

Real-World Example

`python

❌ Bad - repeated lookups

def parse_data_bad(data):
results = []
for item in data:
if item.value > 0: # Lookup 'value'
results.append({ # Lookup 'append'
'id': item.id, # Lookup 'id'
'name': item.name # Lookup 'name'
})
return results

✅ Good - cache lookups

def parse_data_good(data):
results = []
append = results.append

for item in data:
    value = item.value
    if value > 0:
        append({
            'id': item.id,
            'name': item.name
        })
return results
Enter fullscreen mode Exit fullscreen mode

`

Caution: Only use this technique in performance-critical loops. It reduces readability, so use it judiciously.


5. Local vs Global Variables

Avoid: Global Variables in Loops

`python

BAD - Accessing globals is slow

counter = 0

def process_global():
global counter
for i in range(1000000):
counter += 1 # Global lookup on every iteration
`

Why This is Slow:

  • Global variables are stored in a dictionary (globals())
  • Each access requires a dictionary lookup
  • Much slower than local variable access

Use: Local Variables

`python

GOOD - Local variables use optimized storage

def process_local():
counter = 0 # Local variable
for i in range(1000000):
counter += 1 # Fast local access
return counter
`

Why This is Fast:

  • Local variables are stored in an array-like structure
  • Access is by index, not dictionary lookup
  • Much faster at the C level

Best Practice

`python

❌ Avoid

import math

def calculate_distances(points):
distances = []
for p1, p2 in points:
# math.sqrt is a global lookup each time
dist = math.sqrt((p2[0]-p1[0])2 + (p2[1]-p1[1])2)
distances.append(dist)
return distances

✅ Use

import math

def calculate_distances_fast(points):
distances = []
append = distances.append
sqrt = math.sqrt # Make it local!

for p1, p2 in points:
    dist = sqrt((p2[0]-p1[0])**2 + (p2[1]-p1[1])**2)
    append(dist)

return distances
Enter fullscreen mode Exit fullscreen mode

`


6. Dictionary Initialization

Avoid: if-else for Dictionary Keys

`python

BAD - Dictionary lookup happens twice on every iteration

def count_words_bad(words):
word_count = {}
for word in words:
if word not in word_count: # First lookup
word_count[word] = 0
word_count[word] += 1 # Second lookup
return word_count
`

Why This is Slow:

  • Double dictionary lookup for existing keys
  • if statement evaluated every single time
  • After first occurrence, the if always fails but still gets checked

Use: try-except (EAFP)

`python

GOOD - Only one lookup for existing keys

def count_words_try(words):
word_count = {}
for word in words:
try:
word_count[word] += 1 # Try to increment
except KeyError:
word_count[word] = 1 # Only runs once per unique word
return word_count
`

Why This is Fast:

  • Python's EAFP (Easier to Ask for Forgiveness than Permission) philosophy
  • Exceptions are cheap when not raised
  • Only one dictionary lookup for existing keys
  • Exception only raised once per unique word

Use: dict.get() with Default

`python

ALSO GOOD - Clear and concise

def count_words_get(words):
word_count = {}
for word in words:
word_count[word] = word_count.get(word, 0) + 1
return word_count
`

Use: defaultdict (Best for Most Cases)

`python

BEST - Most Pythonic and readable

from collections import defaultdict

def count_words_defaultdict(words):
word_count = defaultdict(int) # int() returns 0
for word in words:
word_count[word] += 1 # No checking needed!
return word_count
`

Why This is Best:

  • No explicit initialization needed
  • Clear intent
  • Very efficient

7. Import Statement Overhead

Avoid: Imports Inside Tight Loops

`python

BAD - Imports module on every function call

def process_data_bad():
for i in range(100000):
import string # Module lookup happens 100,000 times!
result = string.ascii_lowercase
`

Why This is Slow:

  • Python checks sys.modules on every import
  • Even though module isn't reloaded, the lookup is expensive
  • Adds unnecessary overhead to every iteration

Use: Import at Module Level

`python

GOOD - Import once

import string

def process_data_good():
for i in range(100000):
result = string.ascii_lowercase # Direct access
`

Why This is Fast:

  • Module imported only once when file is loaded
  • No import overhead in the function
  • Variable lookup is much faster

Use: Import Specific Names

`python

EVEN BETTER - No attribute lookup

from string import ascii_lowercase

def process_data_better():
for i in range(100000):
result = ascii_lowercase # No dot lookup!
`

Lazy Imports (When Needed) see import email part

`python

When module might not be needed

class EmailProcessor:
def init(self):
self._email_module = None

def parse_email(self, email_string):
    # ✅ Import only when first needed
    if self._email_module is None:
        import email
        self._email_module = email

    return self._email_module.message_from_string(email_string)
Enter fullscreen mode Exit fullscreen mode

This is useful when:

1. Import is expensive (large module)

2. Module might not be used in this execution

3. You want faster startup time

`


8. Data Aggregation

Avoid: Processing Items One at a Time

`python

BAD - Function call overhead for each item

def process_items_bad(items):
total = 0
for item in items:
total = add_to_total(total, item) # Function call per item
return total

def add_to_total(total, item):
return total + item
`

Why This is Slow:

  • Python function calls are expensive
  • Stack frame creation/destruction for each call
  • Parameter passing overhead
  • For 1 million items, 1 million function calls!

Use: Process Data in Batches

`python

GOOD - Process entire collection at once

def process_items_good(items):
return sum(items) # Built-in, implemented in C

Or if you need custom processing

def process_items_batch(items):
total = 0
for item in items: # Loop inside the function
total += item
return total
`

Why This is Fast:

  • Single function call regardless of data size
  • Loop overhead amortized over all items
  • Built-in functions use optimized C code

Real-World Example

`python

❌ Bad - one API call per item

def save_users_bad(users):
for user in users:
database.save(user) # Network round-trip each time

✅ Good - batch API call

def save_users_good(users):
database.bulk_save(users) # Single network round-trip

❌ Bad - one validation per call

def validate_emails_bad(emails):
valid = []
for email in emails:
if is_valid_email(email): # Function call per email
valid.append(email)
return valid

✅ Good - validate in batch

def validate_emails_good(emails):
return [email for email in emails
if '@' in email and '.' in email] # Inline check
`


9. Advanced Built-in Functions and Techniques

all() and any() - Short-Circuit Evaluation

`python

❌ Bad - checks every element even after finding result

def has_negative_manual(numbers):
found = False
for num in numbers:
if num < 0:
found = True
break # Manual short-circuit
return found

✅ Good - built-in with automatic short-circuit

def has_negative_builtin(numbers):
return any(num < 0 for num in numbers)

Why it's better:

numbers = list(range(-1, 1000000))

any() stops at -1 (first element)

Manual loop in Python is slower even with break

`

Key Point: any() and all() are implemented in C and stop as soon as the result is determined.

`python

Real-world examples

def validate_data(records):
# Check if all records are valid (stops at first invalid)
return all(record.get('id') and record.get('name') for record in records)

def has_error(responses):
# Check if any response has error (stops at first error)
return any(resp.status_code >= 400 for resp in responses)
`

enumerate() - Better Than range(len())

`python

❌ Bad - manual indexing

items = ['apple', 'banana', 'cherry']
for i in range(len(items)):
print(f"{i}: {items[i]}") # Extra lookup

✅ Good - enumerate provides index and value

for i, item in enumerate(items):
print(f"{i}: {item}") # No lookup needed

Start from different index

for i, item in enumerate(items, start=1):
print(f"{i}: {item}") # 1: apple, 2: banana, 3: cherry
`

zip() - Parallel Iteration

`python

❌ Bad - manual indexing for parallel lists

names = ['Alice', 'Bob', 'Charlie']
ages = [30, 25, 35]
cities = ['NYC', 'LA', 'Chicago']

for i in range(len(names)):
print(f"{names[i]}, {ages[i]}, {cities[i]}")

✅ Good - zip combines iterables

for name, age, city in zip(names, ages, cities):
print(f"{name}, {age}, {city}")

Create dictionary from two lists

keys = ['a', 'b', 'c']
values = [1, 2, 3]
dict_from_zip = dict(zip(keys, values)) # {'a': 1, 'b': 2, 'c': 3}

Unzip (transpose)

pairs = [(1, 'a'), (2, 'b'), (3, 'c')]
numbers, letters = zip(*pairs)

numbers = (1, 2, 3), letters = ('a', 'b', 'c')

`

itertools - The Power Tools

`python
from itertools import chain, islice, groupby, accumulate, product, combinations

1. chain - flatten multiple iterables (no intermediate list!)

❌ Bad

list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2 # Creates new list

✅ Good

combined = chain(list1, list2) # Lazy iterator
for item in combined:
print(item) # No intermediate list created

2. islice - slicing iterators without loading everything

❌ Bad - loads entire file

with open('huge_file.txt') as f:
first_10 = list(f)[:10] # Loads entire file!

✅ Good - only reads what's needed

from itertools import islice
with open('huge_file.txt') as f:
first_10 = list(islice(f, 10)) # Reads only 10 lines

3. groupby - group consecutive items

from operator import itemgetter
data = [
{'category': 'A', 'value': 1},
{'category': 'A', 'value': 2},
{'category': 'B', 'value': 3},
{'category': 'B', 'value': 4}
]
data.sort(key=itemgetter('category')) # MUST be sorted first!

for category, items in groupby(data, key=itemgetter('category')):
print(f"{category}: {list(items)}")

4. accumulate - running totals

from itertools import accumulate
numbers = [1, 2, 3, 4, 5]
running_sum = list(accumulate(numbers)) # [1, 3, 6, 10, 15]

Custom operation

running_product = list(accumulate(numbers, lambda x, y: x * y))

[1, 2, 6, 24, 120]

5. product - cartesian product (nested loops)

❌ Bad - manual nested loops

result = []
for color in ['red', 'blue']:
for size in ['S', 'M', 'L']:
result.append((color, size))

✅ Good - itertools.product

from itertools import product
result = list(product(['red', 'blue'], ['S', 'M', 'L']))

[('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ...]

6. combinations and permutations

from itertools import combinations, permutations
items = ['A', 'B', 'C']

All 2-item combinations (order doesn't matter)

list(combinations(items, 2))

[('A', 'B'), ('A', 'C'), ('B', 'C')]

All 2-item permutations (order matters)

list(permutations(items, 2))

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

`

set Operations - O(1) Membership Testing

`python

❌ Bad - O(n) lookup in list

allowed_users = ['alice', 'bob', 'charlie'] # List
for user in all_users:
if user in allowed_users: # O(n) check!
process(user)

✅ Good - O(1) lookup in set

allowed_users = {'alice', 'bob', 'charlie'} # Set
for user in all_users:
if user in allowed_users: # O(1) check!
process(user)

Set operations

set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

Intersection (common elements)

common = set1 & set2 # {4, 5}

Union (all elements)

all_elements = set1 | set2 # {1, 2, 3, 4, 5, 6, 7, 8}

Difference (in set1 but not set2)

only_in_set1 = set1 - set2 # {1, 2, 3}

Symmetric difference (in either but not both)

symmetric = set1 ^ set2 # {1, 2, 3, 6, 7, 8}

Remove duplicates while preserving order (Python 3.7+)

items = [1, 2, 2, 3, 1, 4, 3, 5]
unique = list(dict.fromkeys(items)) # [1, 2, 3, 4, 5]
`

functools - Function Tools

`python
from functools import lru_cache, partial, reduce

1. lru_cache - Memoization (caching results)

❌ Bad - recalculates same values

def fibonacci_slow(n):
if n < 2:
return n
return fibonacci_slow(n-1) + fibonacci_slow(n-2)

fibonacci_slow(35) takes ~5 seconds

✅ Good - caches results

@lru_cache(maxsize=128)
def fibonacci_fast(n):
if n < 2:
return n
return fibonacci_fast(n-1) + fibonacci_fast(n-2)

fibonacci_fast(35) takes ~0.0001 seconds!

Real-world example: expensive API call

@lru_cache(maxsize=100)
def get_user_data(user_id):
# Expensive database query or API call
return database.query(user_id)

2. partial - Pre-fill function arguments

def power(base, exponent):
return base ** exponent

square = partial(power, exponent=2)
cube = partial(power, exponent=3)

print(square(5)) # 25
print(cube(5)) # 125

Useful with map/filter

from functools import partial
from operator import mul

double = partial(mul, 2)
numbers = [1, 2, 3, 4, 5]
doubled = list(map(double, numbers)) # [2, 4, 6, 8, 10]

3. reduce - Cumulative operations

from functools import reduce
from operator import mul

numbers = [1, 2, 3, 4, 5]
product = reduce(mul, numbers) # 1*2*3*4*5 = 120

More readable alternatives exist for common cases:

sum() instead of reduce(add, numbers)

math.prod() (Python 3.8+) instead of reduce(mul, numbers)

`

collections Module Power Tools

`python
from collections import deque, namedtuple, ChainMap

1. deque - Fast appends/pops from both ends

❌ Bad - list is O(n) for left operations

my_list = [1, 2, 3]
my_list.insert(0, 0) # O(n) - shifts all elements

✅ Good - deque is O(1) for both ends

from collections import deque
my_deque = deque([1, 2, 3])
my_deque.appendleft(0) # O(1)
my_deque.append(4) # O(1)
my_deque.popleft() # O(1)
my_deque.pop() # O(1)

Ring buffer / sliding window

recent_items = deque(maxlen=5) # Only keeps last 5 items
for item in range(10):
recent_items.append(item)
print(recent_items) # deque([5, 6, 7, 8, 9])

2. namedtuple - Lightweight object

❌ Bad - dictionary overhead

user = {'name': 'Alice', 'age': 30, 'email': 'alice@example.com'}
print(user['name']) # Dictionary lookup

✅ Good - namedtuple is faster and cleaner

from collections import namedtuple
User = namedtuple('User', ['name', 'age', 'email'])
user = User('Alice', 30, 'alice@example.com')
print(user.name) # Attribute access (faster)
print(user[0]) # Also supports indexing

Immutable and memory efficient

Uses tuple's memory layout but with named access

3. ChainMap - Combine multiple dictionaries

❌ Bad - creates new dictionary

config = {**defaults, **user_config, **override} # Memory overhead

✅ Good - ChainMap provides view without copying

from collections import ChainMap
combined = ChainMap(override, user_config, defaults)

Lookups check each dict in order, no copying

`

Context Managers - Resource Management

`python

❌ Bad - manual resource management

file = open('data.txt', 'r')
try:
data = file.read()
process(data)
finally:
file.close() # Easy to forget!

✅ Good - automatic cleanup

with open('data.txt', 'r') as file:
data = file.read()
process(data)

File automatically closed even if exception occurs

Multiple context managers

with open('input.txt') as infile, open('output.txt', 'w') as outfile:
for line in infile:
outfile.write(process(line))

Custom context manager with contextlib

from contextlib import contextmanager
import time

@contextmanager
def timer(name):
start = time.time()
yield # Control returns to caller
end = time.time()
print(f"{name} took {end-start:.4f}s")

Usage

with timer("Data processing"):
process_large_dataset()
`

Assignment Expressions (Walrus Operator) - Python 3.8+

`python

❌ Bad - compute twice or extra variable

data = get_data()
if len(data) > 0:
process(data)

Or

if len(get_data()) > 0: # Calls get_data() twice!
process(get_data())

✅ Good - compute once, use in condition

if (n := len(data)) > 0:
print(f"Processing {n} items")
process(data)

Useful in list comprehensions

❌ Bad - expensive_function called twice per item

results = [expensive_function(x) for x in items
if expensive_function(x) > threshold]

✅ Good - called once per item

results = [y for x in items
if (y := expensive_function(x)) > threshold]

While loop pattern

❌ Bad

line = file.readline()
while line:
process(line)
line = file.readline()

✅ Good

while (line := file.readline()):
process(line)
`

Great! Let's break down this line of Python code and explain it in detail:


🔍 Code in Question

python
if (n := len(data)) > 0:
print(f"Processing {n} items")
process(data)


What’s Happening Here?

This line uses the walrus operator :=, which was introduced in Python 3.8. It allows you to assign a value to a variable as part of an expression — especially useful in conditions.


🧠 Step-by-Step Explanation

  1. len(data):

    • Calculates the number of items in the data list (or any iterable).
  2. n := len(data):

    • Assigns the result of len(data) to the variable n.
    • This is done inside the if condition, so you don’t need a separate line like: python n = len(data) if n > 0:
  3. if (n := len(data)) > 0::

    • Checks if the length of data is greater than 0.
    • If true, it enters the block and uses n (already computed) without recalculating len(data).
  4. print(f"Processing {n} items"):

    • Prints how many items are being processed.
  5. process(data):

    • Calls a function named process() and passes the data to it.

📌 Why Is This Good Practice?

  • Efficiency: You compute len(data) once, but use it twice — in the condition and in the print statement.
  • Cleaner Code: Avoids repetition and keeps logic compact.
  • Readability: Once you're familiar with the walrus operator, it makes code more expressive.

🧪 Example Without Walrus Operator

python
n = len(data)
if n > 0:
print(f"Processing {n} items")
process(data)

This is perfectly fine too — but the walrus operator lets you do it in one line.


10. Profiling: Finding the Real Bottlenecks

Why Profile?

`python

You might think THIS is slow:

def process_data(data):
# Complex calculation
results = [x*2 + x3 + x*4 for x in data] # Suspicious!

# Simple operation
return '\n'.join(map(str, results))  # Looks innocent
Enter fullscreen mode Exit fullscreen mode

But profiling reveals the truth:

95% of time is spent in join()!

Only 5% in the calculation

`

Basic Profiling with cProfile

`python
import cProfile
import pstats

def my_function():
# Your code here
data = list(range(10000))
result = [x**2 for x in data]
return sum(result)

Profile the function

cProfile.run('my_function()', 'output.prof')

Analyze results

stats = pstats.Stats('output.prof')
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 slowest
`

Line-by-Line Profiling

`python

Install: pip install line_profiler

Add @profile decorator

@profile
def slow_function():
result = []
for i in range(10000):
result.append(i ** 2) # How slow is this?

total = sum(result)  # How about this?

return total
Enter fullscreen mode Exit fullscreen mode

Run: kernprof -l -v script.py

Shows time spent on EACH LINE

`

Memory Profiling

`python

Install: pip install memory_profiler

from memory_profiler import profile

@profile
def memory_hog():
# ❌ Bad - creates huge list
big_list = [i for i in range(10000000)]

# ✅ Good - uses generator
big_gen = (i for i in range(10000000))

return sum(big_gen)
Enter fullscreen mode Exit fullscreen mode

Run: python -m memory_profiler script.py

`

Quick Timing with timeit

`python
import timeit

Compare different approaches

setup = "data = list(range(1000))"

Approach 1

time1 = timeit.timeit(
"result = [x**2 for x in data]",
setup=setup,
number=10000
)

Approach 2

time2 = timeit.timeit(
"result = list(map(lambda x: x**2, data))",
setup=setup,
number=10000
)

print(f"List comp: {time1:.4f}s")
print(f"Map: {time2:.4f}s")
print(f"Winner: {'List comp' if time1 < time2 else 'Map'}")
`


13. Memory Optimization Techniques

Generators vs Lists

`python

❌ Bad - loads everything into memory

def read_large_file_bad(filename):
with open(filename) as f:
return [line.strip() for line in f] # All lines in memory!

For 1GB file, needs 1GB+ RAM

lines = read_large_file_bad('huge.txt')
for line in lines:
process(line)

✅ Good - processes one line at a time

def read_large_file_good(filename):
with open(filename) as f:
for line in f: # Generator, not list!
yield line.strip()

Only one line in memory at a time

for line in read_large_file_good('huge.txt'):
process(line)

Generator expression vs list comprehension

❌ List comprehension - immediate memory allocation

squares_list = [x**2 for x in range(1000000)] # ~8MB memory

✅ Generator expression - lazy evaluation

squares_gen = (x**2 for x in range(1000000)) # ~128 bytes

Values computed on-demand

`

When to Use Generators:

  • Processing large datasets
  • Pipeline operations (one result feeds into another)
  • You don't need the entire result set at once
  • Memory is a concern

When to Use Lists:

  • Need multiple passes over data
  • Need random access (indexing)
  • Need to know the length
  • Dataset is small

slots - Reduce Object Memory

`python

❌ Bad - each instance has dict (overhead)

class Point:
def init(self, x, y):
self.x = x
self.y = y

Each instance: ~280 bytes (with dict)

p = Point(1, 2)
print(p.dict) # {'x': 1, 'y': 2}

✅ Good - use slots for memory efficiency

class PointSlots:
slots = ['x', 'y'] # Fixed attributes

def __init__(self, x, y):
    self.x = x
    self.y = y
Enter fullscreen mode Exit fullscreen mode

Each instance: ~48 bytes (no dict)

~80% memory reduction!

Real impact with many objects

1 million Points: ~280MB vs ~48MB

points = [PointSlots(i, i) for i in range(1000000)]
`

When to Use slots:

  • Creating millions of instances
  • Objects have fixed set of attributes
  • Memory is critical (e.g., data science, games)

Tradeoffs:

  • Can't add attributes dynamically
  • Slightly less flexible
  • No dict attribute
  • Can't use with multiple inheritance (complex rules)

Array Module - Compact Numeric Arrays

`python

❌ Bad - list of numbers (lots of overhead)

numbers_list = [1, 2, 3, 4, 5] * 100000

Each integer object: ~28 bytes

500,000 integers: ~14MB

✅ Good - array module (C-style array)

from array import array
numbers_array = array('i', [1, 2, 3, 4, 5] * 100000)

Each integer: 4 bytes

500,000 integers: ~2MB (7x less memory!)

Type codes:

'b': signed char (1 byte)

'i': signed int (4 bytes)

'f': float (4 bytes)

'd': double (8 bytes)

For numerical computing, use numpy (even better)

import numpy as np
numbers_numpy = np.array([1, 2, 3, 4, 5] * 100000, dtype=np.int32)

Optimized for mathematical operations

`

String Interning

`python

Python automatically interns some strings

a = "hello"
b = "hello"
print(a is b) # True - same object in memory

Force interning for frequently used strings

from sys import intern

❌ Bad - many identical strings

tags = ["python", "python", "python"] * 10000

Each "python" might be a separate object

✅ Good - intern frequently used strings

tags = [intern("python")] * 30000

All reference same object

Useful for:

- Large datasets with repeated string values

- Dictionary keys that repeat

- Tag systems, category labels

`

Weak References - Avoid Circular References

`python
import weakref

❌ Bad - circular reference prevents garbage collection

class Node:
def init(self, value):
self.value = value
self.parent = None
self.children = []

def add_child(self, child):
    child.parent = self  # Strong reference cycle!
    self.children.append(child)
Enter fullscreen mode Exit fullscreen mode

Creates memory leak - nodes never freed

✅ Good - use weak references

class NodeWeak:
def init(self, value):
self.value = value
self._parent = None
self.children = []

@property
def parent(self):
    return self._parent() if self._parent else None

@parent.setter
def parent(self, node):
    self._parent = weakref.ref(node) if node else None

def add_child(self, child):
    child.parent = self  # Weak reference, can be freed
    self.children.append(child)
Enter fullscreen mode Exit fullscreen mode

`


14. Algorithm Complexity Matters

Choose the Right Data Structure

`python

Searching: O(n) vs O(1)

❌ Bad - O(n) lookup in list

valid_ids = [1, 5, 10, 15, 20, 25, 30] # 7 items
if user_id in valid_ids: # Checks each item: O(n)
grant_access()

✅ Good - O(1) lookup in set

valid_ids = {1, 5, 10, 15, 20, 25, 30}
if user_id in valid_ids: # Hash lookup: O(1)
grant_access()

For 1 million IDs, 1 million lookups:

List: ~trillion operations

Set: ~1 million operations (1000x faster!)

Removing duplicates

❌ Bad - O(n²)

def remove_duplicates_bad(items):
result = []
for item in items:
if item not in result: # O(n) check
result.append(item)
return result

10,000 items: ~100 million operations

✅ Good - O(n)

def remove_duplicates_good(items):
return list(dict.fromkeys(items)) # Preserves order (3.7+)

10,000 items: ~10,000 operations

Or if order doesn't matter:

def remove_duplicates_set(items):
return list(set(items))
`

`

Avoid Nested Loops When Possible

`python

❌ Bad - O(n²) complexity

def find_common_bad(list1, list2):
common = []
for item1 in list1: # O(n)
for item2 in list2: # O(m)
if item1 == item2:
common.append(item1)
return common

1000 items each: 1 million comparisons

✅ Good - O(n + m) complexity

def find_common_good(list1, list2):
set2 = set(list2) # O(m)
return [item for item in list1 if item in set2] # O(n)

1000 items each: ~2000 operations (500x faster!)

Even better - use set intersection

def find_common_best(list1, list2):
return list(set(list1) & set(list2)) # O(min(n, m))

Practical example: finding duplicate emails

❌ Bad - O(n²)

def find_duplicate_emails_bad(users):
duplicates = []
for i, user1 in enumerate(users):
for user2 in users[i+1:]:
if user1['email'] == user2['email']:
duplicates.append(user1['email'])
return duplicates

✅ Good - O(n)

def find_duplicate_emails_good(users):
seen = set()
duplicates = set()
for user in users:
email = user['email']
if email in seen:
duplicates.add(email)
else:
seen.add(email)
return list(duplicates)
`

Use bisect for Sorted Data

`python
import bisect

❌ Bad - linear search in sorted list O(n)

sorted_numbers = list(range(0, 1000000, 2)) # Even numbers

def find_position_bad(numbers, value):
for i, num in enumerate(numbers):
if num >= value:
return i
return len(numbers)

✅ Good - binary search O(log n)

def find_position_good(numbers, value):
return bisect.bisect_left(numbers, value)

Insert while maintaining sort order

❌ Bad - O(n log n) due to sort

def insert_sorted_bad(numbers, value):
numbers.append(value)
numbers.sort()

✅ Good - O(n) just for insertion

def insert_sorted_good(numbers, value):
bisect.insort(numbers, value)

Real-world example: maintaining sorted timestamps

from bisect import insort
timestamps = []

def add_event(event_time):
insort(timestamps, event_time) # Keeps sorted

def get_events_in_range(start, end):
# O(log n) to find positions
left = bisect.bisect_left(timestamps, start)
right = bisect.bisect_right(timestamps, end)
return timestamps[left:right]
`


15. String Operations Optimization

String Methods vs Regex

`python
import re
import time

text = "Hello, World! This is a test." * 1000

❌ Slower - regex for simple operations

start = time.time()
for _ in range(10000):
result = re.sub(r'test', 'exam', text)
print(f"Regex: {time.time() - start:.4f}s")

Output: ~0.8s

✅ Faster - string method

start = time.time()
for _ in range(10000):
result = text.replace('test', 'exam')
print(f"String method: {time.time() - start:.4f}s")

Output: ~0.1s (8x faster!)

Use regex only when you need pattern matching

❌ Overkill

if re.match(r'hello', text.lower()):
pass

✅ Better

if text.lower().startswith('hello'):
pass

When regex is appropriate:

- Pattern matching: r'\d{3}-\d{3}-\d{4}' for phone numbers

- Complex replacements: r'(\w+)\s+(\w+)' -> r'\2 \1'

- Multiple patterns: r'(cat|dog|bird)'

`

String Building with f-strings (Python 3.6+)

`python

Performance comparison

name = "Alice"
age = 30
city = "NYC"

❌ Slowest - concatenation

result = "Name: " + name + ", Age: " + str(age) + ", City: " + city

✅ Fast - % formatting

result = "Name: %s, Age: %d, City: %s" % (name, age, city)

✅ Fast - .format()

result = "Name: {}, Age: {}, City: {}".format(name, age, city)

✅ Fastest - f-strings (and most readable!)

result = f"Name: {name}, Age: {age}, City: {city}"

f-strings can also include expressions

prices = [10.50, 23.75, 5.25]
result = f"Total: ${sum(prices):.2f}" # Total: $39.50

Multi-line f-strings

message = (
f"User: {name}\n"
f"Age: {age}\n"
f"City: {city}"
)
`

str.translate() - Fast Character Replacement

`python

❌ Bad - multiple replace calls

def remove_punctuation_bad(text):
for char in '.,;:!?':
text = text.replace(char, '')
return text

✅ Good - single translate call

def remove_punctuation_good(text):
translator = str.maketrans('', '', '.,;:!?')
return text.translate(translator)

Even more complex translations

def leetspeak(text):
translation_table = str.maketrans({
'a': '4',
'e': '3',
'i': '1',
'o': '0',
's': '5',
't': '7'
})
return text.lower().translate(translation_table)

print(leetspeak("Hello World")) # h3ll0 w0rld

Performance comparison

import time
text = "Hello, World! How are you today?" * 10000

start = time.time()
for _ in range(1000):
result = remove_punctuation_bad(text)
print(f"Multiple replace: {time.time() - start:.4f}s")

Output: ~1.2s

start = time.time()
for _ in range(1000):
result = remove_punctuation_good(text)
print(f"Translate: {time.time() - start:.4f}s")

Output: ~0.15s (8x faster!)

`


16. Comprehensions vs map/filter

When to Use Each

`python
numbers = range(1000)

List comprehension - clear and Pythonic

squares = [x**2 for x in numbers]

map with lambda - slower due to lambda

squares = list(map(lambda x: x**2, numbers))

map with built-in function - FASTEST

from operator import mul
from functools import partial
double = partial(mul, 2)
doubled = list(map(double, numbers))

✅ Use list comprehension when:

- Transformation is complex

- Need filtering with transformation

- Readability is priority

result = [x**2 for x in numbers if x % 2 == 0]

✅ Use map when:

- Applying built-in function or C-function

- Don't need intermediate list (return iterator)

result = map(str.upper, words) # Iterator, no list created

✅ Use filter when:

- Simple predicate function

- Don't need transformation

result = filter(str.isdigit, characters)
`

Nested Comprehensions

`python

❌ Bad - hard to read

matrix = []
for i in range(3):
row = []
for j in range(3):
row.append(i * j)
matrix.append(row)

✅ Good - nested comprehension

matrix = [[i * j for j in range(3)] for i in range(3)]

[[0, 0, 0], [0, 1, 2], [0, 2, 4]]

Flatten nested list

❌ Bad - manual loops

nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = []
for sublist in nested:
for item in sublist:
flat.append(item)

✅ Good - list comprehension

flat = [item for sublist in nested for item in sublist]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

✅ Best for deeply nested - use itertools

from itertools import chain
flat = list(chain.from_iterable(nested))

Dictionary comprehension

❌ Bad

word_lengths = {}
for word in words:
word_lengths[word] = len(word)

✅ Good

word_lengths = {word: len(word) for word in words}

Set comprehension

❌ Bad

unique_lengths = set()
for word in words:
unique_lengths.add(len(word))

✅ Good

unique_lengths = {len(word) for word in words}
`


17. Number Operations Optimization

Integer Operations

`python

In Python, some operations are faster than others

❌ Slower

x = 47
result = x * 2 # Multiplication

✅ Faster (though less readable)

result = x << 1 # Bit shift (left shift by 1 = multiply by 2)

BUT: In practice, the difference is tiny

Only optimize this in extremely tight loops

Readability usually trumps micro-optimization

Division: // (floor division) vs / (true division)

import time

True division (returns float)

start = time.time()
for i in range(10000000):
result = i / 2
print(f"True division: {time.time() - start:.4f}s")

Output: ~0.5s

Floor division (returns int)

start = time.time()
for i in range(10000000):
result = i // 2
print(f"Floor division: {time.time() - start:.4f}s")

Output: ~0.4s (20% faster if you need integer result)

Modulo operations

Use & for power-of-2 modulo (much faster)

❌ Slower

if x % 2 == 0: # Check if even
pass

✅ Faster (but less clear)

if x & 1 == 0: # Check if even using bitwise AND
pass
`

Float Operations

`python

Avoid float when integers work

❌ Slower - unnecessary float operations

total = 0.0
for i in range(1000000):
total += float(i)

✅ Faster - integer operations

total = 0
for i in range(1000000):
total += i

Use math module for complex operations

import math

❌ Slower - repeated exponentiation

result = x ** 0.5

✅ Faster - dedicated function

result = math.sqrt(x)

❌ Slower

result = x ** 2

✅ Faster (and clearer)

result = x * x
`

Decimal for Financial Calculations

`python

❌ Bad - float precision issues

price = 0.1
quantity = 3
total = price * quantity
print(total) # 0.30000000000000004 (WRONG for money!)

✅ Good - Decimal for exact precision

from decimal import Decimal

price = Decimal('0.1')
quantity = 3
total = price * quantity
print(total) # 0.3 (CORRECT)

Real-world example

from decimal import Decimal, ROUND_HALF_UP

def calculate_total(prices):
total = sum(Decimal(str(price)) for price in prices)
# Round to 2 decimal places for currency
return total.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)

prices = [10.10, 20.20, 30.33]
print(calculate_total(prices)) # 60.63
`


18. Exception Handling Performance

EAFP vs LBYL

`python

LBYL (Look Before You Leap)

❌ Slower when condition is usually true

if key in dictionary:
value = dictionary[key]
else:
value = default

EAFP (Easier to Ask for Forgiveness than Permission)

✅ Faster when exception is rare

try:
value = dictionary[key]
except KeyError:
value = default

Why EAFP is faster:

- No extra dictionary lookup when key exists

- Exceptions are cheap when not raised

- Only pays penalty on actual error

Performance comparison

import time

data = {i: i*2 for i in range(10000)}

LBYL approach

start = time.time()
for i in range(10000):
if i in data: # Extra lookup
value = data[i] # Second lookup
print(f"LBYL: {time.time() - start:.4f}s")

Output: ~0.003s

EAFP approach

start = time.time()
for i in range(10000):
try:
value = data[i] # Single lookup
except KeyError:
pass
print(f"EAFP: {time.time() - start:.4f}s")

Output: ~0.002s (30% faster when key exists)

`

Avoid Exceptions in Hot Paths

`python

When exceptions are common, they're expensive

❌ Bad - exception raised 50% of the time

def process_items_bad(items):
results = []
for item in items:
try:
results.append(expensive_operation(item))
except ValueError: # Raised for 50% of items!
results.append(None)
return results

✅ Good - check first if exceptions are common

def process_items_good(items):
results = []
for item in items:
if is_valid(item): # Cheap check
results.append(expensive_operation(item))
else:
results.append(None)
return results
`


19. Parallel Processing

Threading vs Multiprocessing vs AsyncIO

`python
import time
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cpu_bound_task(n):
"""Simulates CPU-intensive work"""
return sum(i*i for i in range(n))

def io_bound_task(n):
"""Simulates I/O wait (network, disk)"""
time.sleep(0.1) # Simulate I/O wait
return n * 2

❌ Bad - sequential for I/O-bound tasks

def process_io_sequential(tasks):
results = []
for task in tasks:
results.append(io_bound_task(task))
return results

Time: 10 tasks * 0.1s = 1 second

✅ Good - threading for I/O-bound

def process_io_threaded(tasks):
with ThreadPoolExecutor(max_workers=10) as executor:
results = list(executor.map(io_bound_task, tasks))
return results

Time: ~0.1 seconds (10x faster!)

❌ Bad - threading for CPU-bound (GIL limitation)

def process_cpu_threaded(tasks):
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(cpu_bound_task, tasks))
return results

No speedup due to Global Interpreter Lock (GIL)

✅ Good - multiprocessing for CPU-bound

def process_cpu_multiprocessing(tasks):
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(cpu_bound_task, tasks))
return results

Time: ~1/4 on quad-core processor

Use cases:

Threading: I/O-bound (network requests, file I/O, database queries)

Multiprocessing: CPU-bound (calculations, data processing, image processing)

AsyncIO: Many concurrent I/O operations (thousands of connections)

`

AsyncIO for High Concurrency

`python
import asyncio
import aiohttp # pip install aiohttp

❌ Bad - sequential HTTP requests

import requests

def fetch_urls_sequential(urls):
results = []
for url in urls:
response = requests.get(url)
results.append(response.text)
return results

10 URLs, 0.5s each = 5 seconds

✅ Good - async HTTP requests

async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()

async def fetch_urls_async(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results

Run async function

results = asyncio.run(fetch_urls_async(urls))

10 URLs concurrently = ~0.5 seconds (10x faster!)

When to use async:

- Many concurrent I/O operations

- Web scraping (many URLs)

- API calls to multiple services

- Database queries (with async driver)

- WebSocket connections

`


20. Pro Tips and Best Practices

Use Virtual Environments

`bash

Always use virtual environments

python -m venv venv
source venv/bin/activate # Linux/Mac

venv\Scripts\activate # Windows

Why:

- Isolates dependencies

- Prevents version conflicts

- Reproducible environments

`

Utilize Python's Standard Library

`python

Standard library is optimized and battle-tested

Don't reinvent the wheel!

❌ Bad - manual date parsing

def parse_date_bad(date_string):
parts = date_string.split('-')
year = int(parts[0])
month = int(parts[1])
day = int(parts[2])
# ... validation logic ...
return (year, month, day)

✅ Good - use datetime

from datetime import datetime

def parse_date_good(date_string):
return datetime.strptime(date_string, '%Y-%m-%d')

Other powerful standard library modules:

- collections: defaultdict, Counter, deque, namedtuple

- itertools: chain, groupby, combinations, product

- functools: lru_cache, partial, reduce

- operator: itemgetter, attrgetter, methodcaller

- pathlib: modern path handling

- dataclasses: reduce boilerplate (Python 3.7+)

`

Type Hints and Static Analysis

`python

Use type hints for better tooling and catching errors

from typing import List, Dict, Optional, Union

def process_users(users: List[Dict[str, Union[str, int]]]) -> Optional[Dict[str, int]]:
"""Process users and return statistics.

Args:
users: List of user dictionaries

Returns:
Dictionary of statistics or None if empty
"""
if not users:
return None

return {
'total': len(users),
'average_age': sum(u['age'] for u in users) // len(users)
}

Enter fullscreen mode Exit fullscreen mode




Benefits:

- IDEs provide better autocomplete

- Catch type errors before runtime

- Self-documenting code

- Use mypy for static type checking: mypy script.py

`

Dataclasses (Python 3.7+)

`python

❌ Bad - lots of boilerplate

class PointOld:
def init(self, x, y):
self.x = x
self.y = y

def repr(self):
return f"Point(x={self.x}, y={self.y})"

def eq(self, other):
return self.x == other.x and self.y == other.y

Enter fullscreen mode Exit fullscreen mode




✅ Good - dataclass handles boilerplate

from dataclasses import dataclass, field
from typing import List

@dataclass
class Point:
x: float
y: float

def distance(self) -> float:
return (self.x*2 + self.y2) * 0.5
Enter fullscreen mode Exit fullscreen mode




Auto-generates: init, repr, eq, and more!

@dataclass
class User:
name: str
age: int
emails: List[str] = field(default_factory=list) # Mutable default
active: bool = True # Simple default

Usage

user = User("Alice", 30)
print(user) # User(name='Alice', age=30, emails=[], active=True)
`

Use Logging, Not Print

`python

❌ Bad - debugging with print

def process_data(data):
print(f"Processing {len(data)} items") # Can't disable!
result = expensive_operation(data)
print(f"Result: {result}") # Clutters output
return result

✅ Good - use logging

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(name)

def process_data(data):
logger.debug(f"Processing {len(data)} items")
result = expensive_operation(data)
logger.info(f"Processing complete")
return result

Benefits:

- Can adjust verbosity (DEBUG, INFO, WARNING, ERROR)

- Can log to files, not just console

- Can disable in production

- Includes timestamps and context

`


Summary: The Golden Rules of Python Performance

1. Measure First, Optimize Later

`python

Always profile before optimizing

import cProfile
cProfile.run('your_function()')
`

2. Use the Right Data Structure

  • List: Ordered, allows duplicates, O(n) membership
  • Tuple: Immutable list, slightly faster, can be dict key
  • Set: Unordered, no duplicates, O(1) membership
  • Dict: Key-value pairs, O(1) lookup
  • deque: Fast appends/pops from both ends
  • defaultdict: Dict with default values
  • Counter: Count hashable objects

3. Leverage Built-ins

`python

Built-ins are optimized C code

sum(numbers) # vs manual loop
any(conditions) # vs manual check
all(conditions) # vs manual check
max(numbers) # vs manual comparison
sorted(items) # vs manual sort
`

4. Avoid Premature Optimization

`python

❌ Don't optimize this (runs once)

config = eval(open('config.txt').read()) # Simple is fine

✅ Do optimize this (runs millions of times)

def hot_path(data):
# This needs optimization
pass
`

5. Write Pythonic Code

`python

Pythonic code is often faster AND more readable

Use comprehensions, iterators, context managers, etc.

`

6. When All Else Fails

  • Use Cython to compile Python to C
  • Use NumPy for numerical operations
  • Use PyPy JIT compiler
  • Rewrite bottlenecks in C/Rust
  • Use numba JIT compilation for numerical code

Final Wisdom

"Premature optimization is the root of all evil." - Donald Knuth

The best code is:

  1. Correct first
  2. Clear and maintainable
  3. Fast enough for your needs

Only optimize what profiling shows is actually slow!# Python Performance Optimization: Detailed Guide

Summary: The Golden Rules

  1. Profile First: Don't optimize without measuring
  2. Use Built-ins: They're implemented in C and heavily optimized
  3. Avoid String Concatenation: Use join() instead
  4. Cache Lookups: Store frequently accessed attributes in local variables
  5. Use Local Variables: Much faster than globals
  6. List Comprehensions: Usually faster and more readable than loops
  7. EAFP over LYBYL: try-except is often faster than if checks
  8. Import Wisely: Keep imports out of loops
  9. Batch Operations: Process data in aggregate, not item-by-item
  10. Read the Docs: Standard library has optimized solutions for common problems

When to Optimize

`python

❌ Don't optimize this (runs once)

def load_config():
config = {}
for line in open('config.txt'):
# Not worth optimizing
key, value = line.split('=')
config[key] = value
return config

✅ DO optimize this (runs millions of times)

def process_requests(requests):
results = []
append = results.append # Worth optimizing
for request in requests:
append(expensive_operation(request))
return results
`

Remember: Readable code that runs fast enough is better than optimized code that's hard to maintain!

Top comments (0)