DEV Community

Vidyasagar SC Machupalli
Vidyasagar SC Machupalli

Posted on • Originally published at Medium on

Pandas: Conversion using loc and iloc

Pandas is a powerful Python library used for data manipulation and analysis.

Created by Wes McKinney in 2008, it provides data structures and functions for working with structured data efficiently. Pandas allows users to analyze big data, clean messy datasets, and derive meaningful insights.


Photo by Kevin Canlas on Unsplash

Key Features of Pandas

  1. Data Structures: Pandas introduces two primary data structures:
  • Series: A one-dimensional labeled array
  • DataFrame: A two-dimensional labeled data structure with columns of potentially different types
  1. Data Manipulation: Pandas offers functions for analyzing, cleaning, exploring, and manipulating data.

  2. Data Analysis: It enables users to perform complex operations like correlation analysis, grouping, and statistical calculations.

  3. Data Visualization: Pandas integrates well with other libraries to create insightful visualizations.

Practical Examples

Indexing with loc

The loc function enables label-based indexing in DataFrames, allowing precise data selection:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index=['x', 'y', 'z'])

# Select rows with label 'y' and 'z', and columns 'A' and 'C'
result = df.loc[['y', 'z'], ['A', 'C']]
print(result)
Enter fullscreen mode Exit fullscreen mode

The iloc function provides integer-based indexing for DataFrame selection:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Select rows 0 and 2, and columns 1 and 2
result = df.iloc[[0, 2], [1, 2]]
print(result)
Enter fullscreen mode Exit fullscreen mode

Date Conversion with to_datetime

The to_datetimefunction transforms various date formats into standardized datetime objects:

import pandas as pd

# Convert string to datetime
date_string = "2023-09-17 14:30:00"
dt_object = pd.to_datetime(date_string)
print(dt_object)

# Convert multiple date strings
date_series = pd.Series(['20200101', '20200201', '20200301'])
dt_series = pd.to_datetime(date_series, format='%Y%m%d')
print(dt_series)
Enter fullscreen mode Exit fullscreen mode

Output:

2023-09-17 14:30:00
0 2020-01-01
1 2020-02-01
2 2020-03-01
dtype: datetime64[ns]
Enter fullscreen mode Exit fullscreen mode

Pandas simplifies data manipulation tasks, making it an essential tool for data scientists and analysts. Its versatile functions like loc, iloc, and to_datetime provide powerful ways to interact with and transform data, enabling efficient data processing and analysis in Python.

Something to consider while using loc or iloc

Letโ€™s convert the object column date to datetime using loc

import pandas as pd

df = pd.DataFrame({'date': ['2023-01-01', '2023-02-15', '2023-03-31']})

df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])

print(df)
print(df.dtypes)

Output: 

   date
0 2023-01-01 00:00:00
1 2023-02-15 00:00:00
2 2023-03-31 00:00:00
date object
dtype: object
Enter fullscreen mode Exit fullscreen mode

If you observe, the dtype is object not datetime64[ns]. If you try to extract the date using df['date'].dt.date

You will see an error as the conversion was not successful.

Traceback (most recent call last):
  File "/HelloWorld.py", line 11, in <module>
    print(df.dt.date)
          ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object. __getattribute__ (self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'dt'. Did you mean: 'at'?
Enter fullscreen mode Exit fullscreen mode

The reason lies in the changes made in version 2.x.x of Pandas.

From Whatโ€™s new in 2.0.0 (April 3, 2023):

Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH 45333)

How to overcome:

The best way to address this issue is to either avoid using loc or iloc or as suggested on the Pandas documentation use DataFrame.__setitem__()

df = pd.DataFrame({'date': ['2023-01-01', '2023-02-15', '2023-03-31']})

df['date'] = pd.to_datetime(df.loc[:, 'date'])

print(df)
print(df.dtypes)

print(df['date'].dt.date)
Enter fullscreen mode Exit fullscreen mode

Additional read:

https://stackoverflow.com/questions/76766136/pandas-pd-to-datetime-assigns-object-dtype-instead-of-datetime64ns


Top comments (0)