DEV Community

Vidyasagar SC Machupalli
Vidyasagar SC Machupalli

Posted on • Originally published at Medium on

1 1 1 1 1

Pandas: Conversion using loc and iloc

Pandas is a powerful Python library used for data manipulation and analysis.

Created by Wes McKinney in 2008, it provides data structures and functions for working with structured data efficiently. Pandas allows users to analyze big data, clean messy datasets, and derive meaningful insights.


Photo by Kevin Canlas on Unsplash

Key Features of Pandas

  1. Data Structures: Pandas introduces two primary data structures:
  • Series: A one-dimensional labeled array
  • DataFrame: A two-dimensional labeled data structure with columns of potentially different types
  1. Data Manipulation: Pandas offers functions for analyzing, cleaning, exploring, and manipulating data.

  2. Data Analysis: It enables users to perform complex operations like correlation analysis, grouping, and statistical calculations.

  3. Data Visualization: Pandas integrates well with other libraries to create insightful visualizations.

Practical Examples

Indexing with loc

The loc function enables label-based indexing in DataFrames, allowing precise data selection:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index=['x', 'y', 'z'])

# Select rows with label 'y' and 'z', and columns 'A' and 'C'
result = df.loc[['y', 'z'], ['A', 'C']]
print(result)
Enter fullscreen mode Exit fullscreen mode

The iloc function provides integer-based indexing for DataFrame selection:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Select rows 0 and 2, and columns 1 and 2
result = df.iloc[[0, 2], [1, 2]]
print(result)
Enter fullscreen mode Exit fullscreen mode

Date Conversion with to_datetime

The to_datetimefunction transforms various date formats into standardized datetime objects:

import pandas as pd

# Convert string to datetime
date_string = "2023-09-17 14:30:00"
dt_object = pd.to_datetime(date_string)
print(dt_object)

# Convert multiple date strings
date_series = pd.Series(['20200101', '20200201', '20200301'])
dt_series = pd.to_datetime(date_series, format='%Y%m%d')
print(dt_series)
Enter fullscreen mode Exit fullscreen mode

Output:

2023-09-17 14:30:00
0 2020-01-01
1 2020-02-01
2 2020-03-01
dtype: datetime64[ns]
Enter fullscreen mode Exit fullscreen mode

Pandas simplifies data manipulation tasks, making it an essential tool for data scientists and analysts. Its versatile functions like loc, iloc, and to_datetime provide powerful ways to interact with and transform data, enabling efficient data processing and analysis in Python.

Something to consider while using loc or iloc

Let’s convert the object column date to datetime using loc

import pandas as pd

df = pd.DataFrame({'date': ['2023-01-01', '2023-02-15', '2023-03-31']})

df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])

print(df)
print(df.dtypes)

Output: 

   date
0 2023-01-01 00:00:00
1 2023-02-15 00:00:00
2 2023-03-31 00:00:00
date object
dtype: object
Enter fullscreen mode Exit fullscreen mode

If you observe, the dtype is object not datetime64[ns]. If you try to extract the date using df['date'].dt.date

You will see an error as the conversion was not successful.

Traceback (most recent call last):
  File "/HelloWorld.py", line 11, in <module>
    print(df.dt.date)
          ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object. __getattribute__ (self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'dt'. Did you mean: 'at'?
Enter fullscreen mode Exit fullscreen mode

The reason lies in the changes made in version 2.x.x of Pandas.

From What’s new in 2.0.0 (April 3, 2023):

Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH 45333)

How to overcome:

The best way to address this issue is to either avoid using loc or iloc or as suggested on the Pandas documentation use DataFrame.__setitem__()

df = pd.DataFrame({'date': ['2023-01-01', '2023-02-15', '2023-03-31']})

df['date'] = pd.to_datetime(df.loc[:, 'date'])

print(df)
print(df.dtypes)

print(df['date'].dt.date)
Enter fullscreen mode Exit fullscreen mode

Additional read:

https://stackoverflow.com/questions/76766136/pandas-pd-to-datetime-assigns-object-dtype-instead-of-datetime64ns


Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay