Hey folks π
Itβs been a while since my last post here β but Iβm back! π
Today Iβm sharing a one-stop Pandas cheat sheet + explanations that Iβve been using in my own projects. Whether youβre importing, cleaning, analyzing, or exporting data, this guide will save you time and clicks.
π₯ 1. Data Import β Get data into Pandas
pd.read_csv('file.csv') # Load CSV
pd.read_excel('file.xlsx', sheet_name='Sheet1') # Load Excel
pd.read_sql(query, connection) # Run SQL query
pd.read_json('file.json') # Load JSON
pd.read_parquet('file.parquet') # Load Parquet
β Why: Bring data from almost anywhere into your DataFrame.
π 2. Data Selection β Access what you need
df['column'] # Single column
df.loc['row', 'col'] # By label
df.iloc[0:5, 0:2] # By position
df.query('col > 5') # SQL-like filter
df[df['col'].isin(['A','B'])] # Multiple matches
β Why: Quickly filter and grab specific rows or columns.
π 3. Data Manipulation β Shape your data
df.groupby('col').agg({'col2': ['mean','sum']}) # Group stats
df.merge(df2, on='key', how='left') # Join data
df.pivot_table(values='val', index='idx') # Pivot table
df.sort_values(['col1','col2'], ascending=[1,0]) # Sort
df.melt(id_vars=['id'], value_vars=['A','B']) # Unpivot
df.apply(lambda x: x**2) # Apply func
β Why: Transform your dataset to match your analysis goals.
π 4. Statistics β Quick insights
df.describe() # Summary stats
df['col'].agg(['mean','median','std']) # Key metrics
df['col'].value_counts(normalize=True) # Value %s
df.corr(method='pearson') # Correlation
df.cov() # Covariance
df.quantile([0.25, 0.5, 0.75]) # Quartiles
β Why: Get instant understanding of your data distribution.
π§Ή 5. Data Cleaning β Make it usable
df.dropna(subset=['col'], how='any') # Remove NaNs
df.fillna(method='ffill') # Fill forward
df.drop_duplicates(subset=['col']) # Deduplicate
df['col'].replace({'old': 'new'}) # Replace values
df['col'].astype('category') # Convert type
df.interpolate(method='linear') # Fill by trend
β Why: Reliable data means reliable analysis.
β³ 6. Time Series β Work with dates
df.resample('M').mean() # Monthly avg
df.rolling(window=7).mean() # Rolling avg
df.shift(periods=1) # Shift data
pd.date_range('2024', periods=12, freq='M') # Date range
df.asfreq('D', method='ffill') # Daily freq
df['date1'].dt.strftime('%Y-%m-%d') # Format date
β Why: Time-based data often needs resampling or shifting.
β 7. String Operations β Text wrangling
df['col'].str.contains('pattern') # Match
df['col'].str.extract('(\d+)') # Extract nums
df['col'].str.split('_', 1).str[0] # Split text
df['col'].str.lower() # Lowercase
df['col'].str.strip() # Trim spaces
df['col'].str.replace(r'\s+', ' ') # Normalize spaces
β Why: Clean messy text fields directly in Pandas.
π 8. Advanced Features β Power moves
df.pipe(func) # Chain funcs
pd.eval('df1 + df2') # Fast eval
df.memory_usage(deep=True) # Memory check
df.select_dtypes(include=['number']) # Filter types
df.nlargest(5, 'col') # Top values
df.explode('col') # Expand lists
β Why: Write cleaner, faster, and more efficient code.
πΎ 9. Data Export β Save your work
df.to_csv('output.csv', index=False) # CSV
df.to_excel('output.xlsx', sheet_name='Sheet1') # Excel
df.to_parquet('output.parquet') # Parquet
df.to_json('output.json', orient='records') # JSON
β Why: Share results in the right format instantly.
π‘ Pro Tips
π Use .copy()
to avoid accidental changes.
π Chain methods for cleaner code.
π Use dtype='category'
to save memory.
β Avoid inplace=True
β reassign instead.
π¬ Question for you: Which Pandas trick here do you use the most? Or do you have a favorite thatβs not listed? Letβs share and make this an even better cheat sheet. π
Top comments (0)