DEV Community

Muhammad Atif Iqbal
Muhammad Atif Iqbal

Posted on

Explanation of the syntax `df['column'] = expression` in pandas

The syntax df['column'] = expression in pandas is used to create, modify, or assign values to a column in a pandas DataFrame (df). Let’s break it down, step by step, from basic to advanced levels.


Basic Level

1. Creating a New Column

  • When a column does not exist in the DataFrame, assigning values to df['column'] creates a new column.
  • Example:

     import pandas as pd
     df = pd.DataFrame({'A': [1, 2, 3]})
     print(df)
     # Output:
     #    A
     # 0  1
     # 1  2
     # 2  3
    
     # Create a new column 'B' with all values set to 0
     df['B'] = 0
     print(df)
     # Output:
     #    A  B
     # 0  1  0
     # 1  2  0
     # 2  3  0
    

2. Modifying an Existing Column

  • If the column already exists, assigning values replaces its contents.
  • Example:

     df['B'] = [4, 5, 6]  # Replace values in column 'B'
     print(df)
     # Output:
     #    A  B
     # 0  1  4
     # 1  2  5
     # 2  3  6
    

Intermediate Level

3. Assigning Values Based on an Expression

  • You can assign values to a column based on a computation or transformation.
  • Example:

     df['C'] = df['A'] + df['B']  # Create column 'C' as the sum of 'A' and 'B'
     print(df)
     # Output:
     #    A  B   C
     # 0  1  4   5
     # 1  2  5   7
     # 2  3  6   9
    

4. Assigning Values Using a Condition

  • You can assign values to a column conditionally using pandas’ boolean indexing.
  • Example:

     df['D'] = df['A'].apply(lambda x: 'Even' if x % 2 == 0 else 'Odd')
     print(df)
     # Output:
     #    A  B   C     D
     # 0  1  4   5   Odd
     # 1  2  5   7  Even
     # 2  3  6   9   Odd
    

5. Using Multiple Columns in Expressions

  • You can use multiple columns in a single expression for more complex computations.
  • Example:

     df['E'] = (df['A'] + df['B']) * df['C']
     print(df)
     # Output:
     #    A  B   C     D    E
     # 0  1  4   5   Odd   25
     # 1  2  5   7  Even   49
     # 2  3  6   9   Odd   81
    

Advanced Level

6. Vectorized Operations

  • Assigning values to a column can use vectorized operations for high performance.
  • Example:

     df['F'] = df['A'] ** 2 + df['B'] ** 2  # Fast vectorized calculation
     print(df)
     # Output:
     #    A  B   C     D    E   F
     # 0  1  4   5   Odd   25  17
     # 1  2  5   7  Even   49  29
     # 2  3  6   9   Odd   81  45
    

7. Assigning Values with Conditional Logic Using np.where

  • You can use numpy for conditional assignment.
  • Example:

     import numpy as np
     df['G'] = np.where(df['A'] > 2, 'High', 'Low')
     print(df)
     # Output:
     #    A  B   C     D    E   F     G
     # 0  1  4   5   Odd   25  17   Low
     # 1  2  5   7  Even   49  29   Low
     # 2  3  6   9   Odd   81  45  High
    

8. Assigning Values Using External Functions

  • Assign column values based on a custom function applied to rows or columns.
  • Example:

     def custom_function(row):
         return row['A'] * row['B']
    
     df['H'] = df.apply(custom_function, axis=1)
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H
     # 0  1  4   5   Odd   25  17   Low   4
     # 1  2  5   7  Even   49  29   Low  10
     # 2  3  6   9   Odd   81  45  High  18
    

9. Chaining Operations

  • You can chain multiple operations to keep your code concise.
  • Example:

     df['I'] = df['A'].add(df['B']).mul(df['C'])
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I
     # 0  1  4   5   Odd   25  17   Low   4   25
     # 1  2  5   7  Even   49  29   Low  10   49
     # 2  3  6   9   Odd   81  45  High  18   81
    

10. Assigning Multiple Columns at Once

  • Use assign() to create or modify multiple columns in a single call.
  • Example:

     df = df.assign(
         J=df['A'] + df['B'],
         K=lambda x: x['J'] * 2
     )
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I   J   K
     # 0  1  4   5   Odd   25  17   Low   4   25   5  10
     # 1  2  5   7  Even   49  29   Low  10   49   7  14
     # 2  3  6   9   Odd   81  45  High  18   81   9  18
    

Expert Level

11. Dynamic Column Assignment

  • Dynamically create column names based on external inputs.
  • Example:

     columns_to_add = ['L', 'M']
     for col in columns_to_add:
         df[col] = df['A'] + df['B']
     print(df)
    

12. Using External Data for Assignment

  • Assign values to a column based on an external DataFrame or a dictionary.
  • Example:

     mapping = {1: 'Low', 2: 'Medium', 3: 'High'}
     df['N'] = df['A'].map(mapping)
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I   J   K      N
     # 0  1  4   5   Odd   25  17   Low   4   25   5  10    Low
     # 1  2  5   7  Even   49  29   Low  10   49   7  14  Medium
     # 2  3  6   9   Odd   81  45  High  18   81   9  18   High
    

13. Performance Optimization:

  • Use pandas' built-in functions (apply, vectorized operations) for better performance over Python loops when assigning values.

Takeaway

The syntax df['column'] = expression is a core feature of pandas and is highly versatile. It allows you to:

  • Add, modify, and manipulate columns in a DataFrame.
  • Perform complex computations, including condition-based logic and multi-column transformations.
  • Chain operations and dynamically generate new columns.

This makes pandas a powerful library for data manipulation and analysis.

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay