DEV Community

Muhammad Atif Iqbal
Muhammad Atif Iqbal

Posted on

Explanation of the syntax `df['column'] = expression` in pandas

The syntax df['column'] = expression in pandas is used to create, modify, or assign values to a column in a pandas DataFrame (df). Let’s break it down, step by step, from basic to advanced levels.


Basic Level

1. Creating a New Column

  • When a column does not exist in the DataFrame, assigning values to df['column'] creates a new column.
  • Example:

     import pandas as pd
     df = pd.DataFrame({'A': [1, 2, 3]})
     print(df)
     # Output:
     #    A
     # 0  1
     # 1  2
     # 2  3
    
     # Create a new column 'B' with all values set to 0
     df['B'] = 0
     print(df)
     # Output:
     #    A  B
     # 0  1  0
     # 1  2  0
     # 2  3  0
    

2. Modifying an Existing Column

  • If the column already exists, assigning values replaces its contents.
  • Example:

     df['B'] = [4, 5, 6]  # Replace values in column 'B'
     print(df)
     # Output:
     #    A  B
     # 0  1  4
     # 1  2  5
     # 2  3  6
    

Intermediate Level

3. Assigning Values Based on an Expression

  • You can assign values to a column based on a computation or transformation.
  • Example:

     df['C'] = df['A'] + df['B']  # Create column 'C' as the sum of 'A' and 'B'
     print(df)
     # Output:
     #    A  B   C
     # 0  1  4   5
     # 1  2  5   7
     # 2  3  6   9
    

4. Assigning Values Using a Condition

  • You can assign values to a column conditionally using pandas’ boolean indexing.
  • Example:

     df['D'] = df['A'].apply(lambda x: 'Even' if x % 2 == 0 else 'Odd')
     print(df)
     # Output:
     #    A  B   C     D
     # 0  1  4   5   Odd
     # 1  2  5   7  Even
     # 2  3  6   9   Odd
    

5. Using Multiple Columns in Expressions

  • You can use multiple columns in a single expression for more complex computations.
  • Example:

     df['E'] = (df['A'] + df['B']) * df['C']
     print(df)
     # Output:
     #    A  B   C     D    E
     # 0  1  4   5   Odd   25
     # 1  2  5   7  Even   49
     # 2  3  6   9   Odd   81
    

Advanced Level

6. Vectorized Operations

  • Assigning values to a column can use vectorized operations for high performance.
  • Example:

     df['F'] = df['A'] ** 2 + df['B'] ** 2  # Fast vectorized calculation
     print(df)
     # Output:
     #    A  B   C     D    E   F
     # 0  1  4   5   Odd   25  17
     # 1  2  5   7  Even   49  29
     # 2  3  6   9   Odd   81  45
    

7. Assigning Values with Conditional Logic Using np.where

  • You can use numpy for conditional assignment.
  • Example:

     import numpy as np
     df['G'] = np.where(df['A'] > 2, 'High', 'Low')
     print(df)
     # Output:
     #    A  B   C     D    E   F     G
     # 0  1  4   5   Odd   25  17   Low
     # 1  2  5   7  Even   49  29   Low
     # 2  3  6   9   Odd   81  45  High
    

8. Assigning Values Using External Functions

  • Assign column values based on a custom function applied to rows or columns.
  • Example:

     def custom_function(row):
         return row['A'] * row['B']
    
     df['H'] = df.apply(custom_function, axis=1)
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H
     # 0  1  4   5   Odd   25  17   Low   4
     # 1  2  5   7  Even   49  29   Low  10
     # 2  3  6   9   Odd   81  45  High  18
    

9. Chaining Operations

  • You can chain multiple operations to keep your code concise.
  • Example:

     df['I'] = df['A'].add(df['B']).mul(df['C'])
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I
     # 0  1  4   5   Odd   25  17   Low   4   25
     # 1  2  5   7  Even   49  29   Low  10   49
     # 2  3  6   9   Odd   81  45  High  18   81
    

10. Assigning Multiple Columns at Once

  • Use assign() to create or modify multiple columns in a single call.
  • Example:

     df = df.assign(
         J=df['A'] + df['B'],
         K=lambda x: x['J'] * 2
     )
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I   J   K
     # 0  1  4   5   Odd   25  17   Low   4   25   5  10
     # 1  2  5   7  Even   49  29   Low  10   49   7  14
     # 2  3  6   9   Odd   81  45  High  18   81   9  18
    

Expert Level

11. Dynamic Column Assignment

  • Dynamically create column names based on external inputs.
  • Example:

     columns_to_add = ['L', 'M']
     for col in columns_to_add:
         df[col] = df['A'] + df['B']
     print(df)
    

12. Using External Data for Assignment

  • Assign values to a column based on an external DataFrame or a dictionary.
  • Example:

     mapping = {1: 'Low', 2: 'Medium', 3: 'High'}
     df['N'] = df['A'].map(mapping)
     print(df)
     # Output:
     #    A  B   C     D    E   F     G   H    I   J   K      N
     # 0  1  4   5   Odd   25  17   Low   4   25   5  10    Low
     # 1  2  5   7  Even   49  29   Low  10   49   7  14  Medium
     # 2  3  6   9   Odd   81  45  High  18   81   9  18   High
    

13. Performance Optimization:

  • Use pandas' built-in functions (apply, vectorized operations) for better performance over Python loops when assigning values.

Takeaway

The syntax df['column'] = expression is a core feature of pandas and is highly versatile. It allows you to:

  • Add, modify, and manipulate columns in a DataFrame.
  • Perform complex computations, including condition-based logic and multi-column transformations.
  • Chain operations and dynamically generate new columns.

This makes pandas a powerful library for data manipulation and analysis.

Top comments (0)