Eshan Roy (eshanized)

Posted on Aug 11

9 Must-Know Python-Pandas Operations for Working with Data

#python #datascience #cheatsheet #beginners

Hey folks 👋
It’s been a while since my last post here — but I’m back! 🚀

Today I’m sharing a one-stop Pandas cheat sheet + explanations that I’ve been using in my own projects. Whether you’re importing, cleaning, analyzing, or exporting data, this guide will save you time and clicks.

📥 1. Data Import — Get data into Pandas

pd.read_csv('file.csv')  # Load CSV
pd.read_excel('file.xlsx', sheet_name='Sheet1')  # Load Excel
pd.read_sql(query, connection)  # Run SQL query
pd.read_json('file.json')  # Load JSON
pd.read_parquet('file.parquet')  # Load Parquet

✅ Why: Bring data from almost anywhere into your DataFrame.

🔍 2. Data Selection — Access what you need

df['column']                # Single column
df.loc['row', 'col']        # By label
df.iloc[0:5, 0:2]           # By position
df.query('col > 5')         # SQL-like filter
df[df['col'].isin(['A','B'])]  # Multiple matches

✅ Why: Quickly filter and grab specific rows or columns.

🔄 3. Data Manipulation — Shape your data

df.groupby('col').agg({'col2': ['mean','sum']})  # Group stats
df.merge(df2, on='key', how='left')              # Join data
df.pivot_table(values='val', index='idx')        # Pivot table
df.sort_values(['col1','col2'], ascending=[1,0]) # Sort
df.melt(id_vars=['id'], value_vars=['A','B'])    # Unpivot
df.apply(lambda x: x**2)                         # Apply func

✅ Why: Transform your dataset to match your analysis goals.

📊 4. Statistics — Quick insights

df.describe()                          # Summary stats
df['col'].agg(['mean','median','std']) # Key metrics
df['col'].value_counts(normalize=True) # Value %s
df.corr(method='pearson')              # Correlation
df.cov()                               # Covariance
df.quantile([0.25, 0.5, 0.75])         # Quartiles

✅ Why: Get instant understanding of your data distribution.

🧹 5. Data Cleaning — Make it usable

df.dropna(subset=['col'], how='any')       # Remove NaNs
df.fillna(method='ffill')                  # Fill forward
df.drop_duplicates(subset=['col'])         # Deduplicate
df['col'].replace({'old': 'new'})           # Replace values
df['col'].astype('category')               # Convert type
df.interpolate(method='linear')            # Fill by trend

✅ Why: Reliable data means reliable analysis.

⏳ 6. Time Series — Work with dates

df.resample('M').mean()                     # Monthly avg
df.rolling(window=7).mean()                 # Rolling avg
df.shift(periods=1)                          # Shift data
pd.date_range('2024', periods=12, freq='M') # Date range
df.asfreq('D', method='ffill')               # Daily freq
df['date1'].dt.strftime('%Y-%m-%d')          # Format date

✅ Why: Time-based data often needs resampling or shifting.

✂ 7. String Operations — Text wrangling

df['col'].str.contains('pattern')          # Match
df['col'].str.extract('(\d+)')              # Extract nums
df['col'].str.split('_', 1).str[0]          # Split text
df['col'].str.lower()                       # Lowercase
df['col'].str.strip()                       # Trim spaces
df['col'].str.replace(r'\s+', ' ')          # Normalize spaces

✅ Why: Clean messy text fields directly in Pandas.

🚀 8. Advanced Features — Power moves

df.pipe(func)                              # Chain funcs
pd.eval('df1 + df2')                       # Fast eval
df.memory_usage(deep=True)                 # Memory check
df.select_dtypes(include=['number'])       # Filter types
df.nlargest(5, 'col')                      # Top values
df.explode('col')                          # Expand lists

✅ Why: Write cleaner, faster, and more efficient code.

💾 9. Data Export — Save your work

df.to_csv('output.csv', index=False)               # CSV
df.to_excel('output.xlsx', sheet_name='Sheet1')    # Excel
df.to_parquet('output.parquet')                    # Parquet
df.to_json('output.json', orient='records')        # JSON

✅ Why: Share results in the right format instantly.

💡 Pro Tips

📝 Use .copy() to avoid accidental changes.

🔗 Chain methods for cleaner code.

🗂 Use dtype='category' to save memory.

❌ Avoid inplace=True — reassign instead.

💬 Question for you: Which Pandas trick here do you use the most? Or do you have a favorite that’s not listed? Let’s share and make this an even better cheat sheet. 🚀

DEV Community

9 Must-Know Python-Pandas Operations for Working with Data

Top comments (0)