DEV Community: Ashwin Kumar

I Tested the Best JS Frameworks in 2026. Here's Why I Chose Astro for My Tool (And My Lighthouse Score Proves It)

Ashwin Kumar — Fri, 19 Jun 2026 14:08:16 +0000

Every six months, the JavaScript ecosystem releases something that's supposedly going to change everything. In 2026, the noise is louder than ever — AI-generated boilerplate, edge runtimes, React Server Components, Signals in every framework. Before I started building my latest side project, I sat down and genuinely stress-tested the major JS frameworks against the one metric most devs conveniently ignore until launch day: real-world performance.

This is what I found — and why Astro was the only honest answer for what I was building.

The Framework Landscape in 2026 Is Overwhelming (On Purpose)

Let's be direct. Next.js, Remix, SvelteKit, Nuxt, and SolidStart are all excellent frameworks. Each serves a real use case. If you're building a deeply interactive SaaS dashboard, authenticated real-time product, or social platform — React-based full-stack frameworks earn their complexity.

But here's the question most tutorials skip: what are you actually shipping?

When I started building InvoiceAnt — a free, 100% client-side invoice generator — I laid out my requirements:

A fast, SEO-friendly marketing and landing page
A client-side tool that processes zero server data (all local, private by design)
A blog for content marketing
Google PageSpeed Insights scores that don't embarrass me

The moment I wrote those requirements down, most of the "best" frameworks eliminated themselves.

The Hidden Tax Every Framework Charges You

Here's the problem with reaching for Next.js (or any SPA-centric framework) for a content-heavy site: you pay a JavaScript runtime tax on every page load, even pages with no interactivity.

A blog post with a single newsletter form doesn't need 300 KB of React runtime shipped to every visitor. A landing page that's 95% static HTML shouldn't be hydrating a full component tree in the browser. But that's exactly what happens when you default to these frameworks without questioning the tradeoff.

In real-world benchmarks, Next.js static exports score between 80–90 on Google PageSpeed Insights. Astro sites consistently hit 98–100.

That gap isn't a minor cosmetic difference. It's the difference between passing and failing Google's Core Web Vitals assessment — which directly affects your organic search ranking.

What Makes Astro Architecturally Different

Astro starts from a completely different premise than every other framework in this list. While Next.js and SvelteKit assume you're building an application, Astro assumes you're building a document.

Its default output is plain HTML with zero JavaScript. Not minimal JavaScript. Zero.

Islands Architecture — The Technical Core

The genius of Astro is what it calls Component Islands. Instead of hydrating an entire page's component tree (the SPA approach), Astro treats interactive components as isolated, self-contained islands floating in a sea of static HTML.

---
// Your page shell — pure static HTML, zero JS cost
import HeroSection from '../components/HeroSection.astro';
import InvoiceGenerator from '../components/InvoiceGenerator.jsx';
---

<HeroSection />

<!-- This React component is an island — only it ships JS -->
<InvoiceGenerator client:load />

The client: directive is where Astro gets precise. You control when each island hydrates:

Directive	Behaviour	Use case
`client:load`	Hydrates immediately on page load	Critical interactive UI (the invoice tool itself)
`client:idle`	Hydrates when the browser is idle	Non-urgent widgets, secondary forms
`client:visible`	Hydrates when the component enters viewport	Below-the-fold content, lazy features
`client:only`	Skips SSR entirely, renders only in browser	Components relying on `window` or `localStorage`

For InvoiceAnt, this maps perfectly: the marketing sections are static Astro components. The actual invoice generator runs as a client:load React island. The blog is pure static HTML. Each part of the site pays only the JavaScript cost it genuinely needs.

Framework Agnostic — Not a Gimmick

Astro supports React, Vue, Svelte, Solid, and Preact as island renderers — meaning you can mix and match UI frameworks on the same page. This sounds like a party trick but it's genuinely powerful: you can pull in a React component from your existing design system without converting your entire site to React. Astro wraps each in its own isolated runtime.

My Google PageSpeed Insights Score

After building InvoiceAnt with Astro, I ran Google PageSpeed Insights on the homepage — the most honest performance test you can run because it uses real-world Chrome field data and lab metrics together.

Here are the actual numbers, not cherry-picked:

Category	Mobile	Desktop
Performance	98	99
Accessibility	96	96
Best Practices	92	92
SEO	100	100

(Both screenshots run on June 19, 2026.)

98 on mobile and 99 on desktop performance. For context, mobile is always the harder score to hit — network constraints, CPU throttling, and rendering costs all compound. Getting 98 there means the site is genuinely fast for real users on real devices, not just fast in a controlled lab environment.

This wasn't the result of aggressive optimisation, custom caching headers, or pre-loading tricks. It came standard — because Astro ships no unnecessary JavaScript by default, its built-in <Image /> component handles lazy loading, format conversion, and width/height attributes automatically, and its HTML-first output gives crawlers exactly what they need without waiting for client-side rendering.

For context: over 50% of Astro sites pass Google's Core Web Vitals assessment — the only major framework above 50%. Next.js sits at around 25%. That's a 2x difference in sites meeting Google's performance standard, which translates directly into better organic rankings.

When You Should Not Use Astro

Intellectual honesty matters. Astro is the wrong choice for:

Heavily stateful single-page apps — if your entire product lives behind auth and is one long interactive session (think Figma, Notion, or a real-time dashboard), a React or SvelteKit SPA is more appropriate.
Complex server-side data requirements — if every page render needs dynamic, personalised data from a database, Next.js App Router with server components is a more natural fit.
Teams already deep in a single framework — the DX and team velocity cost of switching isn't always worth it.

But if your requirement is a fast, SEO-friendly, content-driven site — a landing page, blog, documentation, portfolio, or a tool that runs client-side — Astro is not just a good choice. It has become the standard-bearer for the modern, high-performance content web in 2026.

Getting Started

The Astro docs are genuinely excellent — one of the best onboarding experiences in the JS ecosystem.

👉 docs.astro.build

Bootstrap a new project in one command:

npm create astro@latest

The interactive CLI will walk you through templates (blog, docs, minimal) and ask if you want TypeScript, Tailwind, and whether to initialise a git repo. You'll have a running project in under three minutes.

What I Shipped

InvoiceAnt — invoiceant.com — is a free, no-watermark, no-subscription, 100% client-side invoice generator built entirely with Astro. The marketing site, blog, and changelog are pure static HTML. The invoice generator itself is a React island. The entire site scores 98–99 on Google PageSpeed Insights Performance across mobile and desktop, loads in under a second on 4G, and processes every invoice locally on your device — nothing ever touches a server.

It's the cleanest demonstration I've found of what Astro's Islands Architecture enables in a real production tool: the right amount of JavaScript, exactly where it's needed, and nowhere else.

The Takeaway

In 2026, the best JS framework isn't the one with the most features. It's the one that matches your problem. If you're building anything content-driven, performance-sensitive, or SEO-dependent — something where Google PageSpeed scores and Core Web Vitals have real business consequences — stop defaulting to Next.js out of habit.

Give Astro a weekend. Your users, your rankings, and your bundle size will thank you.

Built something with Astro? Drop your Google PageSpeed score in the comments — let's see what you've got.

AI Can Build Your SaaS But It Can’t Take Responsibility for Security

Ashwin Kumar — Sun, 01 Feb 2026 07:46:07 +0000

AI Can Build Your SaaS But It Can't Take Responsibility for Security

We're living in an incredible era. Non-coders are shipping products that would've taken months of learning just a few years ago. Tools like Cursor, GitHub Copilot, v0, Replit, and Claude are turning ideas into MVPs overnight. Solo devs are building SaaS products, making real revenue, and living the dream.

But here's the reality check we all need to hear: 45% of AI-generated code introduces OWASP Top 10 security vulnerabilities.

The Numbers Don't Lie—And They're Alarming
Veracode's 2025 GenAI Code Security Report tested over 100 large language models across Java, Python, C#, and JavaScript. The findings should terrify anyone shipping AI-generated code without proper security audits:

Java code: 72% security failure rate

Python: 38% vulnerable

JavaScript: 43% vulnerable

C#: 45% vulnerable

Even more concerning? Cross-site scripting (XSS) defenses failed 86% of the time, and log injection vulnerabilities appeared in 88% of cases. These aren't edge cases—these are OWASP Top 10 vulnerabilities that attackers exploit daily.

Stanford and NYU research found that 40% of GitHub Copilot-generated programs contained bugs or design flaws that could be exploited by attackers. And here's the kicker: AI coding assistants suggest vulnerable code patterns 40% more often than secure alternatives, simply because insecure code appears more frequently in their training data.

Big Tech Is Raising Red Flags
Microsoft's CEO Satya Nadella revealed that AI now writes 30% of Microsoft's code—and they're accelerating toward 80%. But with that speed comes risk. In one documented case study, AI tools suggested non-existent package dependencies over 400,000 times, creating massive supply chain attack vectors.

GitHub Copilot itself isn't immune. In June 2025, security researchers discovered CamoLeak—a critical vulnerability (CVSS 9.6) that allowed silent exfiltration of secrets and private source code from developers' repositories. The attack exploited GitHub's own infrastructure to steal AWS keys, API tokens, and proprietary code.

Critical vulnerabilities were also discovered throughout 2025 in AI coding tools from Cursor, Google's Gemini, and Amazon's Q. The Amazon Q breach demonstration showed how easily prompt injection attacks could compromise these tools.

The Data Exposure Crisis
Since Q2 2023, there's been a 3x increase in repositories containing Personally Identifiable Information (PII) and payment details due to AI-generated code. Research shows that repositories using Copilot exhibit 6.4% secret leakage rates—40% higher than traditional development.

Even worse? There's been a 10x surge in APIs missing basic security fundamentals like authorization and input validation. Sensitive API endpoints have nearly doubled as AI generates code faster than security teams can review it.

The False Sense of Security
"But I'm using Claude/ChatGPT—it's from big tech, so it must be secure, right?"

Wrong.

Here's what trained developers know that non-technical builders don't: AI models don't improve at security as they get smarter. Veracode's research revealed that despite advances in LLMs' ability to generate syntactically correct code, security performance has remained flat over time. Newer, larger models aren't writing more secure code—they're just writing vulnerable code faster.

As John Cranney, VP of Engineering at Secure Code Warrior, warns: "No model provider has yet solved the problem of prompt injection, which means every new input adds a new potential injection vector".

What You Can Do Right Now
If you're building with AI and aren't a security expert, here are immediate actions backed by industry recommendations:

Use this security validation prompt before deploying:

"Analyze this code for security vulnerabilities including SQL injection, XSS attacks, CSRF, authentication flaws, insecure deserialization, hardcoded secrets, weak cryptography, and insufficient input validation. Provide specific fixes with secure code examples for each issue found."

Integrate automated security scanning:

OWASP ZAP for web vulnerability scanning

Snyk or GitGuardian for secrets detection and dependency vulnerabilities

npm audit / pip audit for package security

SonarQube for static code analysis

Treat AI code as untrusted external contributions:
Microsoft, GitHub, and security experts all agree: AI-generated code requires the same security review as third-party libraries. Never deploy it without scanning and human review.
Hire a security expert for pre-launch audit:
Even a 2-hour consultation can identify critical vulnerabilities that could result in data breaches, regulatory fines (GDPR, CCPA), and reputational damage.

The Bottom Line
GitHub Copilot helped developers ship code with a 70% surge in pull requests. That's incredible productivity. But speed without security is a risk you can't afford.

Your users trust you with their emails, payment details, phone numbers, and personal data. 29.1% of AI-generated Python code contains SQL injection, authentication bypass, and XSS vulnerabilities. One breach could destroy everything you've built.

AI is a phenomenal co-pilot. But it's not a security expert. And when a breach happens—when customer data leaks, when your database gets wiped, when regulatory fines arrive—AI won't be there to face angry users, legal teams, or your destroyed reputation.

You will.

Build fast. Ship confidently. But never, ever skip security.

Want to learn more? Check out Veracode's 2025 GenAI Code Security Report and OWASP's guidelines for securing AI-generated code.

Thanks 🙏

The Story of XGBoost: A Machine Learning Revolution

Ashwin Kumar — Sat, 23 Nov 2024 03:42:48 +0000

Did you know XGBoost is not actually an algorithm?

It's a library created by Tianqi Chen that has become one of the most popular tools in machine learning. Today, we’ll explore how Tianqi developed XGBoost. But before diving into its specifics, let’s first understand the foundational algorithm behind it: Gradient Boosting.

What is the Gradient Boosting Algorithm?

Gradient Boosting is a sophisticated and widely used machine learning method that builds a predictive model by combining multiple simpler models—usually decision trees—in a sequential manner. Developed by Jerome H. Friedman, it was introduced in his seminal paper titled "Greedy Function Approximation: A Gradient Boosting Machine."

Key Objectives of Gradient Boosting:

Iteratively correct the errors of earlier models.
Improve prediction accuracy using gradient descent optimization.

Core Idea:

The central concept is to focus on areas where the model struggles most:

Initial Predictions: Start with simple predictions and calculate errors (residuals).
Error Targeting: Construct additional models to minimize those errors.
Incremental Improvement: Combine these models to improve overall performance, ensuring predictions get progressively better.

This systematic focus on mistakes differentiates Gradient Boosting from other ensemble methods like bagging.

What is XGBoost?

XGBoost stands for Extreme Gradient Boosting. It’s a powerful library designed to make machine learning tasks faster and more efficient. It’s widely used for solving regression, classification, and ranking problems.

Official Definition:

"XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework."

The Story Behind XGBoost: How Tianqi Chen Revolutionized Machine Learning

A Passion for Machine Learning:

In the early 2010s, Tianqi Chen, a Ph.D. student at the University of Washington, saw the potential to improve existing tools. While Gradient Boosting Machines (GBMs) were powerful, they were:

Computationally expensive.
Inefficient on large datasets.

Tianqi’s vision? Create a more efficient, scalable, and robust version of gradient boosting.

The Birth of XGBoost:

Driven by personal frustrations, Tianqi began developing XGBoost as a side project. His innovations included:

Parallelization:

Traditional GBMs built trees sequentially. Tianqi introduced parallelization, enabling multiple trees to be built simultaneously, drastically reducing training time.
Regularization:

Unlike traditional GBMs, XGBoost included regularization to prevent overfitting by penalizing model complexity, making it more robust.
Sparsity-Aware Optimization:

Tianqi designed XGBoost to handle missing or sparse data efficiently, adapting the optimization process to treat missing values as a special case.
Hardware Optimization:

XGBoost was built to leverage both CPU and GPU architectures, ensuring scalability from small academic projects to massive datasets.

Gaining Popularity: The Rise of XGBoost

Released as an open-source project in 2014, XGBoost initially went unnoticed. But soon, its superior performance and scalability caught the attention of the machine learning community. Data scientists, particularly on platforms like Kaggle, began adopting it for:

Faster training times.
Improved predictive accuracy.
Handling large datasets with ease.

Its flexibility and features like early stopping and model evaluation further cemented its reputation.

Why XGBoost Changed the Game

Key Strengths:

Practical Optimization: Tianqi addressed computational inefficiencies, making XGBoost both fast and scalable.
Real-World Applicability: From business to healthcare, XGBoost powers critical applications.
Open-Source Impact: Its open-source nature fostered widespread adoption and innovation.

Tianqi Chen: A Legacy in Machine Learning

Today, Tianqi Chen is celebrated as one of the most influential figures in machine learning. His work has:

Empowered data scientists worldwide.
Inspired innovations in optimization and large-scale machine learning.

As of 2024, XGBoost:

Boasts over 26k stars on GitHub.
Dominates 30% of Kaggle competition winning solutions.
Remains a go-to tool across industries like finance, healthcare, e-commerce, and marketing.

Share Your Thoughts!

If you found the story of XGBoost's creation inspiring, share your thoughts in the comments below! Don’t forget to share this article with fellow machine learning enthusiasts.

Happy Coding ❤️ and don’t forget to Like!

5 Best Artificial Intelligence Documentaries Everyone Should Watch

Ashwin Kumar — Sat, 09 Nov 2024 04:44:20 +0000

Hey everyone! Hope you're having an awesome weekend. 😊 If you’ve got some free time and want to watch something that’s both useful and entertaining (and you’re into tech or curious about new technologies), I’ve got some cool documentaries for you on YouTube about AI and technology. Perfect for brushing up on some knowledge that’ll make you sound super smart come Monday. 😎

Here’s the list:

Bonus (and my personal favorite, but it’s on Netflix): Coded Bias. Definitely worth checking out!

From Beginner to Pro: Important Python Learning Topics You Can't Miss!

Ashwin Kumar — Wed, 23 Oct 2024 03:05:57 +0000

Hey guys! If you’re starting to learn Python, great choice! I found some cool stats about it, and while looking for a good syllabus, I noticed some topics come up a lot. So, I made a beginner friendly Python syllabus that covers all the key concepts. I hope you like it!

1. Introduction to Python

What is Python?
Installing Python
Running Python scripts
Python IDEs (Integrated Development Environments)
Basic Syntax: Comments, Indentation, and Variables
Python Data Types: Strings, Integers, Floats, Booleans
Basic Input and Output
Python's Interactive Mode and REPL
Using Jupyter Notebooks
Understanding the Python Shell
Basic Troubleshooting: Common Errors and Fixes

2. Control Flow

Conditional Statements: if, else, elif
Comparison and Logical Operators
Loops:
- for loops
- while loops
- Loop control statements: break, continue, pass
List and Dictionary Comprehensions
Nested Loops
Using enumerate() with Loops
The zip() Function for Iteration
Error Handling in Loops

3. Functions

Defining Functions with def
Parameters and Arguments
Return Values
Variable Scope: Local vs Global
Lambda Functions
Recursion
Default and Keyword Arguments
Variable-length Arguments (*args and `kwargs`)**
Higher-order Functions
Decorators (basic introduction)

4. Data Structures

Lists:
- Indexing, Slicing, and Methods (append, insert, remove, etc.)
Tuples:
- Immutability and Use Cases
Dictionaries:
- Key-Value Pairs, Methods (get, keys, values, etc.)
Sets:
- Set Operations (union, intersection, difference)
Nested Data Structures
List vs. Tuple vs. Set vs. Dictionary
Understanding collections module: Counter, defaultdict, OrderedDict
Data Structure Performance Considerations

5. Object-Oriented Programming (OOP)

Classes and Objects
Attributes and Methods
The self Keyword
Constructors (__init__)
Inheritance
- Single and Multiple Inheritance
Polymorphism
Encapsulation and Abstraction
Special Methods: str, repr, len, etc.
Class vs. Instance Variables
Class Methods and Static Methods
Composition vs. Inheritance
Abstract Base Classes (ABCs)

6. Error Handling

Types of Errors: Syntax, Logic, Runtime
try, except, finally blocks
Raising Exceptions with raise
Custom Exception Classes
Using assert for Debugging
Logging Errors with the logging Module
Creating Context Managers for Error Handling
Best Practices in Error Handling

7. File Handling

Opening Files: open(), read(), write()
Reading and Writing to Files
File Modes (r, w, a, b)
Working with File Paths
Using with to Automatically Close Files
Reading and Writing CSV Files
Working with JSON Files
File Iterators
Handling Large Files with Buffered Reading/Writing

8. Modules and Packages

Importing Modules: import, from ... import
Python Standard Library (e.g., math, random, datetime)
Creating and Using Custom Modules
Using Third-Party Packages with pip
Virtual Environments
Understanding the __init__.py file
Building Your Own Package
Using requirements.txt for Dependency Management
Exploring the sys and os Modules

9. Working with Libraries

NumPy (for array manipulation)
Pandas (for data analysis and manipulation)
Matplotlib and Seaborn (for data visualization)
Requests (for handling HTTP requests)
JSON Handling
Using SciPy for Scientific Computing
Working with SQLAlchemy for Database Interaction
Web Scraping with Beautiful Soup and Scrapy
Introduction to TensorFlow and Keras for Machine Learning

10. Advanced Topics

List and Dictionary Comprehensions (advanced usage)
Generators and yield keyword
Decorators and @decorator_name
Context Managers
Regular Expressions (Regex)
Unit Testing with unittest
Metaclasses and their Use Cases
Asynchronous Programming (async/await)
Threading and Multiprocessing
Python’s functools module (e.g., lru_cache, partial)
Descriptors and Property Decorators
Type Hinting and Annotations
Advanced Error Handling and Custom Exceptions

11. Working with APIs

What are APIs?
Consuming APIs with Python
Authentication (Basic, OAuth)
Parsing JSON from APIs
Using the requests Library for API Calls
Working with REST vs. SOAP APIs
Handling API Rate Limiting
Creating Your Own API with Flask or FastAPI

12. Introduction to Data Science

Basics of Data Manipulation with Pandas
Data Visualization with Matplotlib/Seaborn
Basic Statistics in Python
Introduction to Machine Learning with Scikit-learn (optional)
Exploratory Data Analysis (EDA)
Feature Engineering and Selection
Data Cleaning Techniques
Understanding Overfitting and Underfitting

13. Final Project

Develop a Python project that integrates different concepts:
- Data Analysis, Web Scraping, or a Simple Game
Project Planning and Documentation
Version Control with Git
Deployment Options (e.g., Heroku, GitHub Pages)
Presenting Your Project: Best Practices

Resources to Learn Python:

If you have any suggestions or if I missed something, just drop a comment! Happy coding!

how to handle outliers in machine learning

Ashwin Kumar — Sun, 13 Oct 2024 17:19:40 +0000

Outliers are unusual data points that stand out from the rest of your data because they are either much higher or much lower than the rest. Imagine a classroom where most students score between 50 and 80 marks on a test, but one student scores 5, and another scores 100. These extremely different scores are examples of outliers.

In realworld data, outliers are common, and how you handle them can significantly impact your results. So, let’s break down some simple techniques to deal with outliers, using simple examples and coding demos to help you get started.

What is an Outlier?

Before we jump into the techniques, let’s define what an outlier is. In simple terms, an outlier is a value in a dataset that’s far away from the average or the majority of the other values. For example, in a class of students where most are 18-22 years old, if someone is 50 years old, they would be considered an outlier.

Why Deal with Outliers?

Outliers can distort your results, make your analysis less accurate, and lead to wrong conclusions. For instance, imagine you're trying to find the average income of a neighborhood, but a billionaire lives there. Their income would skew the average, giving you a false impression of the neighborhood’s wealth.

Common Techniques to Deal with Outliers

Let’s explore a few simple and effective techniques to deal with outliers. We'll also include a coding demo to show how to use each technique.

1. Z-Score Method

What it does: The Z-score method tells you how far a value is from the mean (average) of your data in terms of standard deviations. If a value is more than 3 standard deviations away from the mean, it is considered an outlier. Z-score table is useful

When to use: When your data is normally distributed (bell-shaped curve).

Example:

Imagine you have the heights of 100 people, most of them are between 150 cm and 180 cm, but one person is 250 cm tall. This is an outlier.

Coding Demo:

import pandas as pd
import numpy as np

# Sample data: heights of people (in cm)
data = pd.DataFrame({'Height': np.random.normal(170, 10, 100)})

# Adding an outlier
data.loc[0, 'Height'] = 250  # This is the outlier

# Calculate the Z-scores
data['Z_score'] = (data['Height'] - data['Height'].mean()) / data['Height'].std()

# Identifying outliers (Z-score > 3 or Z-score < -3)
outliers = data[np.abs(data['Z_score']) > 3]

print("Outliers:")
print(outliers)

2. IQR Method (Interquartile Range)

What it does: The IQR method calculates the range within which the middle 50% of your data lies. It helps identify outliers by finding values that fall significantly outside this range.

How it works:

Calculate the first quartile (Q1): The 25th percentile of the data.
Calculate the third quartile (Q3): The 75th percentile of the data.
Find the IQR: Subtract Q1 from Q3.

IQR = Q3 - Q1

Determine the outlier boundaries:
- Lower Bound: Q1 - 1.5 × IQR
- Upper Bound: Q3 + 1.5 × IQR
Identify outliers: Any data point below the lower bound or above the upper bound is an outlier.

Example: In a survey of people’s monthly expenses, if most spend between $500 and $1500 but a few spend over $4000, those high expenses are outliers.

Coding Demo:

import pandas as pd
import numpy as np

# Sample data for monthly expenses
data = {
    'Monthly Expenses': [500, 600, 700, 800, 1500, 1600, 2000, 4000, 4500, 5000]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate Q1 and Q3
Q1 = df['Monthly Expenses'].quantile(0.25)
Q3 = df['Monthly Expenses'].quantile(0.75)
IQR = Q3 - Q1

# Calculate bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify outliers
outliers = df[(df['Monthly Expenses'] < lower_bound) | (df['Monthly Expenses'] > upper_bound)]

print("Identified Outliers using IQR:")
print(outliers)

3. Modified Z-Score

What it does: The modified Z-score is similar to the Z-score but is more robust against outliers. It uses the median and the median absolute deviation (MAD) to calculate how far a data point is from the median.

How it works:

Calculate the median of the dataset.
Compute the absolute deviation from the median for each data point.
Calculate the median of those absolute deviations (MAD).
Identify outliers: Any data point below the lower bound or above the upper bound is an outlier.
Calculate the modified Z-score:

X: This represents the specific data point you are evaluating. It could be any individual observation in your dataset.

Median: This is the middle value of your dataset when it is sorted.

MAD (Median Absolute Deviation): This is a measure of variability that quantifies how much the values in a dataset deviate from the median.

0.6745: This constant is a scaling factor used to make the modified Z-score comparable to the standard normal distribution.

Example: In a group of people's daily steps, if most walk between 2000 and 10000 steps but a few walk 30000 steps, those high step counts could be outliers.

Coding Demo:

# Sample data for daily steps
steps_data = {
    'Daily Steps': [2000, 3000, 5000, 7000, 9000, 10000, 15000, 30000]
}

# Create DataFrame
df_steps = pd.DataFrame(steps_data)

# Calculate median and MAD
median = df_steps['Daily Steps'].median()
mad = np.median(np.abs(df_steps['Daily Steps'] - median))

# Calculate modified Z-scores
df_steps['Modified Z'] = 0.6745 * (df_steps['Daily Steps'] - median) / mad

# Identify outliers
outliers_modified = df_steps[np.abs(df_steps['Modified Z']) > 3]

print("\nIdentified Outliers using Modified Z-Score:")
print(outliers_modified)

4. Box Plot Visualization

What it does: A box plot visually displays the distribution of your data, making it easy to spot outliers. The box represents the interquartile range (IQR), and any points outside the “whiskers” (lower and upper bounds) are considered outliers.

Example: In analyzing the heights of basketball players, you might find that most players fall between 180 cm and 210 cm, but a few exceed 230 cm, clearly visible in a box plot.

Coding Demo:

import matplotlib.pyplot as plt

# Sample data for heights
heights = [180, 185, 190, 195, 200, 210, 220, 230, 250]

# Create box plot
plt.boxplot(heights)
plt.title('Box Plot of Heights')
plt.ylabel('Height (cm)')
plt.show()

5. Winsorization

What it does: Winsorization involves capping the outlier values to reduce their influence without completely removing them. For example, you might replace extreme high values with the next highest non-outlier value.

Example: In a dataset of home prices, if one home is listed at $10 million while most are under $1 million, you might replace $10 million with the highest non-outlier price to maintain a realistic range.

Coding Demo:

# Winsorization example
data_prices = {
    'Home Prices': [150000, 200000, 250000, 300000, 10000000]  # One extreme outlier
}

# Create DataFrame
df_prices = pd.DataFrame(data_prices)

# Winsorization: cap outliers at the 95th percentile
cap = df_prices['Home Prices'].quantile(0.95)
df_prices['Capped Prices'] = np.where(df_prices['Home Prices'] > cap, cap, df_prices['Home Prices'])

print("\nData After Winsorization:")
print(df_prices)

6. Log Transformation

What it does: Log transformation reduces the effect of extreme values by applying a logarithmic scale to the data. This is particularly useful for positively skewed data.

Example: In analyzing incomes, where most values are clustered around a certain range, log transformation can help normalize the data and make it easier to analyze.

Coding Demo:

# Sample income data
income_data = {
    'Annual Income': [20000, 30000, 50000, 80000, 200000, 1000000]  # Includes a large outlier
}

# Create DataFrame
df_income = pd.DataFrame(income_data)

# Apply log transformation
df_income['Log Income'] = np.log(df_income['Annual Income'])

print("\nData After Log Transformation:")
print(df_income)

Conclusion

Outliers are a natural part of data, but how you handle them can make a big difference in your analysis. By using techniques like Z-score, IQR, modified Z-score, box plots, winsorization, and log transformation, you can effectively manage outliers and improve the accuracy of your insights. Remember, the choice of technique depends on your data's characteristics and the specific context of your analysis.

Tips

Always visualize your data before and after handling outliers to understand their impact.
Consider the context of your data: sometimes, outliers are valid observations that should be kept for analysis.

Happy Coding ❤️

CountVectorizer vs TfidfVectorizer

Ashwin Kumar — Tue, 08 Oct 2024 18:29:00 +0000

Imagine you're having a conversation with a friend about your favorite book. You discuss the storyline, memorable quotes, and what made it special. Now, if a machine had to understand this conversation, how would it process your words? Machines can’t comprehend text the way we do. They need text data to be converted into numerical form to perform any kind of analysis or prediction. This process of converting text into numbers is called text vectorization, and it’s where tools like CountVectorizer and TfidfVectorizer come into play.

But what are they, and how do they work? Let's break it down in the simplest way possible.

What is CountVectorizer?

CountVectorizer is like creating a word count table. It takes a collection of text data and converts it into a matrix of token counts. Each row represents a document, and each column represents a unique word (or token). The values in the matrix indicate how many times each word appears in each document.

Real Life Example

Suppose you have three sentences:

"I love coding."
"Coding is fun."
"I love learning new things."

Using CountVectorizer, the result might look something like this:

	coding	fun	i	is	learning	love	new	things
Doc 1	1	0	1	0	0	1	0	0
Doc 2	1	1	0	1	0	0	0	0
Doc 3	0	0	1	0	1	1	1	1

Here, 1 indicates the presence of the word, and 0 indicates its absence. This matrix is what CountVectorizer generates.

What is TfidfVectorizer?

TfidfVectorizer (Term Frequency Inverse Document Frequency) is an extension of CountVectorizer. While CountVectorizer just counts the words, TfidfVectorizer goes a step further and also considers the importance of words across all documents. It assigns more weight to words that appear frequently in a single document but are rare across other documents, making it better for distinguishing between words like “the” and actual meaningful terms.

Using the same sentences as above, the matrix generated by TfidfVectorizer will contain decimal values instead of just counts, representing the importance of each word in a given document.

Why Do We Need Vectorization?

Vectorization is needed because machine learning models work with numbers, not text. To analyze, classify, or make predictions based on text data, the text must first be transformed into a numerical form that these models can process. This transformation enables models to find patterns, similarities, and even meaning in the text.

How to Use `CountVectorizer` and `TfidfVectorizer`?

Using these tools in Python is straightforward, especially with the scikit learn library. Here’s a quick example:

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# Sample documents
documents = [
    "I love coding.",
    "Coding is fun.",
    "I love learning new things."
]

# Using CountVectorizer
count_vectorizer = CountVectorizer()
count_matrix = count_vectorizer.fit_transform(documents)
print("Count Vectorizer Result:\n", count_matrix.toarray())

# Using TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
print("TF IDF Vectorizer Result:\n", tfidf_matrix.toarray())


Result:

Count Vectorizer Result:
 [[1 0 0 0 1 0 0]
 [1 1 1 0 0 0 0]
 [0 0 0 1 1 1 1]]

TF IDF Vectorizer Result:
 [[0.70710678 0. 0. 0. 0.70710678 0. 0.]
 [0.4736296  0.62276601 0.62276601 0. 0. 0. 0.]
 [0. 0. 0. 0.52863461 0.40204024 0.52863461 0.52863461]]

Which Vectorizer is Better?

It depends on the task at hand. Here’s a comparison to make it clearer:

Feature	`CountVectorizer`	`TfidfVectorizer`
Output	Count matrix	Weighted matrix (importance of terms)
Suitability	Good for simple word count	Better for distinguishing between terms
Impact of Frequent Words	Overly influenced by common words like "the", "is"	Reduces the weight of frequent words
Use Case	When word frequency matters (e.g., spam detection)	When meaning and relevance matter more

Drawbacks of `CountVectorizer` and `TfidfVectorizer`

CountVectorizer:
- Ignores word order and context.
- High dimensional output with sparse data for large vocabularies.
TfidfVectorizer:
- Loses some contextual information.
- Not ideal when the order of words is critical (e.g., for certain NLP tasks like sentiment analysis).

What Are max_features in CountVectorizer?

The number of features (columns) in CountVectorizer corresponds to the number of unique tokens (words) in the corpus. This can be limited using the max_features parameter. For example, setting max_features=100 will keep only the 100 most frequent words.

Using and Reversing the Vectorization Process

To convert text into vectors, use fit_transform() as shown in the example above. To reverse this process (i.e., turn vectors back into text), use the inverse_transform() method:

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# Sample text data
corpus = [
    "The cat sat on the mat.",
    "The dog is in the house."
]

# Initialize both vectorizers
count_vectorizer = CountVectorizer()
tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the data
count_matrix = count_vectorizer.fit_transform(corpus)
tfidf_matrix = tfidf_vectorizer.fit_transform(corpus)

# Display the vectorized representation
print("CountVectorizer Matrix:\n", count_matrix.toarray())
print("TfidfVectorizer Matrix:\n", tfidf_matrix.toarray())

# Reverse transformation to get back the original text format
count_reversed = count_vectorizer.inverse_transform(count_matrix)
tfidf_reversed = tfidf_vectorizer.inverse_transform(tfidf_matrix)

# Display the reversed text
print("\nReversed Text from CountVectorizer:")
for doc in count_reversed:
    print(" ".join(doc))

print("\nReversed Text from TfidfVectorizer:")
for doc in tfidf_reversed:
    print(" ".join(doc))

Additional Tools and Techniques

Apart from these vectorizers, there are other methods like HashingVectorizer or using pre trained embeddings like Word2Vec, GloVe, and BERT that can be considered for more advanced use cases.

Final Thoughts

Choosing between CountVectorizer and TfidfVectorizer depends on the nature of the problem and the text data at hand. For beginners, starting with these simple vectorizers is a great way to understand how text data can be transformed into numbers and used in machine learning models. Resource to learn more about Sklearn Sklearn Doc

Hey! I hope this helps you understand the concept better. It's completely normal to feel demotivated when you don't grasp something right away. Remember, studying in this field takes time and practice, so try not to lose your motivation. You’ve got this! If you found this helpful, please give it a likeit would really encourage me to create more content like this!

Happy Coding ❤️

Understanding the Curse of Dimensionality

Ashwin Kumar — Mon, 07 Oct 2024 17:18:11 +0000

The "curse of dimensionality" is a term used in data science and statistics to describe various phenomena that arise when analyzing and organizing data in high dimensional spaces. This concept is crucial for understanding the challenges faced in machine learning, data analysis, and related fields. Let’s break it down in simple terms.

What Is the Curse of Dimensionality?

At its core, the curse of dimensionality refers to the problems that occur when we work with data that has many features or dimensions. Imagine you’re trying to find your way in a very large room filled with furniture. The more furniture (dimensions) there is, the harder it is to navigate without bumping into something. Similarly, in data analysis, as the number of dimensions increases, our ability to find patterns and make predictions can diminish.

Why Do We Even Bother?

We bother about dimensionality because many real world problems involve high dimensional data. For instance, when we analyze images, each pixel in the image can be considered a dimension. A simple 100x100 pixel image has 10,000 dimensions! Similarly, in genetics, each gene can represent a dimension, leading to a vast number of features when studying traits or diseases.

Understanding the curse of dimensionality helps data scientists develop better algorithms and improve the accuracy of their predictions.

What Is High Dimension?

High dimensionality refers to data that has many features or variables.

In the context of data analysis:

Low dimensional data could be something like a simple dataset with only 2 or 3 features (like height and weight).
High dimensional data could have hundreds or thousands of features (like an image's pixel values or customer preferences across hundreds of products).

In general, anything with more than three dimensions can be considered "high dimensional," and data can easily reach dozens or hundreds of dimensions.

What Happens When We Have High Dimensions?

When we deal with high dimensional data, several issues arise:

Distance Becomes Less Meaningful: In low dimensions, it's easier to understand how close two points are. In high dimensions, points tend to be equidistant (equally far from two or more places) from each other, making it difficult to find nearby neighbors. For example, if you're looking for friends at a party, it's easier to spot them in a small room than in a huge hall.
Sparsity of Data: As dimensions increase, the volume of the space grows rapidly. For example, if you have 10 dimensions, the space becomes 10 times larger than it was with just one dimension. This means data points become sparse and less clustered, making it harder to find patterns or group similar items.
Overfitting: With many dimensions, models can become overly complex, fitting the noise in the data rather than the underlying trend. This can lead to poor predictions on new, unseen data.

How Do We Know This Is the Curse of Dimensionality?

We can identify the curse of dimensionality through various observations:

Experiments with Distance: Studies show that as dimensions increase, the distance between points becomes less variable. This means that nearest neighbors are not significantly closer than farthest neighbors, which contradicts our intuitive understanding of proximity.
Performance of Algorithms: Many machine learning algorithms, like k-nearest neighbors or clustering methods, perform well in low dimensions but struggle in high dimensions. This drop in performance is a clear indicator of the curse.
Visualizations: While we cannot visualize more than three dimensions directly, we can use techniques like Principal Component Analysis to reduce dimensions and visualize how data behaves in lower dimensional space.

Is There Any Way to Mitigate the Curse of Dimensionality?

Fortunately, there are several strategies to address the curse of dimensionality:

Dimensionality Reduction: Techniques like PCA, t-SNE, and UMAP can help reduce the number of features while preserving essential information. This simplification allows algorithms to perform better.
Feature Selection: Identifying and retaining only the most relevant features can reduce dimensionality. This involves analyzing the data to find which features contribute most to the desired outcome.
Using Appropriate Algorithms: Some algorithms are more robust to high dimensions. For instance, tree based methods like random forests or gradient boosting can handle high dimensional data better than linear models.

Conclusion

The curse of dimensionality presents significant challenges in data analysis and machine learning, especially when working with high dimensional data. By understanding what it is and how it impacts our ability to find meaningful patterns, we can take steps to mitigate its effects. Whether through dimensionality reduction, feature selection, or choosing appropriate algorithms, there are ways to make sense of complex data without getting lost in the high dimensional maze.

If you think this could help someone you know, please share it with your friends!

Happy Coding ❤️

Understand Normal Distribution

Ashwin Kumar — Mon, 07 Oct 2024 16:54:35 +0000

Best Free Resources to Sharpen Your Math Skills for Machine Learning!

Ashwin Kumar — Fri, 04 Oct 2024 17:24:21 +0000

Hey Guys👋

I’ve compiled a list of free, high quality resources to help you sharpen your math skills and gain confidence tackling ML algorithms. Check them out:

YouTube Courses 🎥

Professor Leonard – Clear and detailed explanations of Algebra, Calculus, and Statistics. Perfect for mastering the basics. 📚
3Blue1Brown – Beautiful visual animations simplify even the most complex math concepts. 🎨
Mathematics for Machine Learning (3 Courses in 1) – A comprehensive deep dive into linear algebra, calculus, and probability. 🔢
College Algebra with Python Code – Learn college algebra concepts with real Python coding examples. 📈
Mathematics of Neural Networks – Understand the mathematical core of neural networks, including matrix multiplication and optimization. 🧠
Calculus 1 – Full College Course – Ideal for mastering calculus, essential for gradient descent and other optimization techniques. 🔄
Statistics - A Full University Course on Data Science Basics – A detailed university-level course covering statistics for data science. 📊
Statistics and Probability Full Course – Comprehensive guide to statistics and probability for data science enthusiasts. 🎲
Statistics Full Course for Beginners – A beginner-friendly course perfect for those starting out in data science. 👨‍🏫

Free Books 📚

Mathematics for Machine Learning (Free PDF) – A fantastic, in-depth resource for anyone serious about learning the math behind machine learning. This book covers linear algebra, calculus, and more!
Think Stats (Free Download) – An introduction to probability and statistics for data scientists.
Linear Algebra Done Right (Free) – A clear and approachable book for learning the essentials of linear algebra.

Blogs 📝

Introduction to Statistics for Data Science – A fantastic primer for understanding how statistics fits into the data science field.
The Mathematics of Machine Learning – A step-by-step guide that breaks down essential math concepts needed for ML.
Top 10 Math Skills for Machine Learning – This article explains the key math skills every data scientist needs.

Free Courses 💻

Harvard Free Mathematics Courses – A collection of free math courses from Harvard, covering a wide range of topics from algebra to advanced calculus.
MIT Mathematics Courses – Dive into free courses from MIT’s OpenCourseWare on subjects like linear algebra, differential equations, and more.
Alison Free Online Mathematics Courses – A variety of free courses covering different areas of mathematics, including statistics and calculus.

Feel free to share any other great resources you’ve come across in the comments. Happy learning! 🎉

You Can Learn 🐍 Python Effectively !

Ashwin Kumar — Fri, 04 Oct 2024 05:10:20 +0000

So, you’ve decided to dive into Python programming great choice! Python is not only one of the most popular programming languages today, but it’s also known for its simplicity and readability. However, many beginners often find themselves stuck when they start learning. Where should you begin? What should you focus on first? Don’t worry! This guide will break it down for you step-by-step in an easy to understand way.

Why Python?

Python is widely used in web development, data science, machine learning, automation, and even game development. Its clear syntax makes it an ideal choice for beginners. But even though it’s easy to get started, the sea of tutorials, topics, and resources can feel overwhelming.

Step 1: Master the Basics

Before diving into advanced topics or complicated projects, it’s crucial to build a solid foundation. Here’s how you can get started:

1. Understand Python’s Basic Syntax and Data Types

Every programming language has its building blocks. In Python, those are:
Variables: Think of them as containers that hold data. For example:

  name = "John"   # This is a string variable
  age = 25        # This is an integer variable

Data Types: Python supports several built-in data types like strings, integers, floats, lists, and dictionaries. Here’s an example:

  fruits = ["Apple", "Banana", "Cherry"]  # This is a list

Getting comfortable with these basics will help you handle more complex topics in the future.

2. Control Flow: Decision Making with Conditions and Loops

Control flow statements guide the program’s execution. Start with if, elif, and else statements to make decisions. For example:

temperature = 30
if temperature > 35:
    print("It's hot outside!")
elif temperature > 20:
    print("It's a pleasant day.")
else:
    print("It's quite cold today.")

Similarly, loops help you automate repetitive tasks. Use for and while loops to iterate over items:

for fruit in fruits:
    print(fruit)  # This will print each fruit in the list.

These statements allow you to control the flow of your program and are fundamental to problem solving in programming.

Step 2: Practice with Functions

Functions help break down your code into smaller, reusable blocks. Creating functions not only makes your code more readable but also allows you to perform the same action multiple times without rewriting it.

Example:

def greet(name):
    print(f"Hello, {name}!")

greet("Alice")  # This will print: Hello, Alice!

Start by defining simple functions, then gradually add parameters and return values.

Step 3: Explore Modules and Libraries

Python comes with a wide range of built in modules, and the community has created countless libraries that you can use for free. A module is a file that contains Python code you can reuse.

For instance, the math module provides mathematical functions:

import math
print(math.sqrt(16))  # This will print 4.0

Explore libraries like NumPy for numerical computations and Pandas for data analysis when you’re ready.

Step 4: Get Comfortable with Object Oriented Programming (OOP)

OOP helps you model real world scenarios using classes and objects. It may sound intimidating, but here’s a simple analogy: Think of a class as a blueprint and an object as the actual thing built using that blueprint.

Example:

class Dog:
    def __init__(self, name, breed):
        self.name = name
        self.breed = breed

    def bark(self):
        print(f"{self.name} is barking!")

my_dog = Dog("Buddy", "Golden Retriever")
my_dog.bark()  # Output: Buddy is barking!

Understanding classes, objects, and methods will unlock more advanced Python concepts for you.

Step 5: Error Handling

Everyone makes mistakes, and so does your code. Learning how to handle errors gracefully is crucial. Use try and except blocks to manage exceptions and prevent your program from crashing unexpectedly.

Example:

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Oops! You can't divide by zero.")

Handling errors will make your programs more robust and user-friendly.

Step 6: Build Projects and Solve Problems

After you’ve mastered the basics, the best way to learn is by doing. Start with small projects, like a calculator or a to do list application. As you gain confidence, tackle more complex projects, such as a web scraper or a data visualization tool.

Working on projects will:

Reinforce your learning.
Expose you to real world scenarios.
Teach you how to solve problems independently.

Where to Find Resources?

There are tons of free resources available:

Microsoft Course On Python
Complete Python Programming By ML+
Python For Data Science By IBM
Python documentation is a treasure trove of information.

Google it! 😎

Final Words of Advice

Learning Python, or any programming language, is like climbing a mountain. The beginning is often the hardest part, where everything feels overwhelming. But keep practicing, break down problems into smaller pieces, and don’t hesitate to seek help from the community. With consistent effort, you’ll soon find yourself creating your own programs, solving real world problems, and having fun along the way!

So, Save this and go ahead take that first step, start small, and happy coding! 🌟

How Python Dictionaries Keep Your Code Clean and DRY

Ashwin Kumar — Wed, 02 Oct 2024 05:10:21 +0000

Python Dictionary and the DRY Principle: A Quick Guide for Beginners

Hey there! 👋 If you’re diving into Python programming, you’ve probably stumbled upon dictionaries and maybe wondered, “What exactly is a dictionary in Python, and how can it help me code smarter?” No worries let’s break it down in a super simple way.

What’s a Dictionary in Python?

Imagine you have a list of items, and each item has a unique label attached to it, like “name: John” or “age: 25”. A dictionary in Python works exactly like that! It’s a collection of key value pairs, where each key is unique and points to a specific value. Think of it as a mini database for storing information in a neat and organized way.

It’s like a real dictionary where you look up a word (the key) and get its meaning (the value). Cool, right? 😎

How to Make a Dictionary in Python?

Creating a dictionary is as easy as pie. You just use curly braces {} and separate each key value pair with a colon :.

Here’s how you can make a simple dictionary:

# Creating a dictionary to store student information
student_info = {
    'name': 'John Doe',
    'age': 21,
    'major': 'Computer Science'
}

# Printing out the dictionary
print(student_info)

This dictionary stores a student’s name, age, and major. Notice how the keys like 'name' and 'age' are in quotes? That’s because keys can be strings, numbers, or even tuples! The values can be anything strings, lists, other dictionaries, you name it.

How Dictionaries Help Us to Avoid Repetition (DRY Principle)

Now, here’s where it gets interesting. You may have heard of the DRY principle, which stands for Don’t Repeat Yourself. It’s a rule that encourages you to avoid redundancy in your code. How can dictionaries help with that? Let’s take a look.

Before Using a Dictionary (Repeating Code)

Imagine you want to store information about students in separate variables. It might look something like this:

student1_name = 'Alice'
student1_age = 20
student1_major = 'Mathematics'

student2_name = 'Bob'
student2_age = 22
student2_major = 'Physics'

Not only do we have repetitive variable names, but if we want to print or update these, we have to repeat ourselves again and again. This is where dictionaries can save the day! 🦸

Example 1: After Using a Dictionary (DRY Version)

With dictionaries, we can store all this information in a cleaner way:

# Using dictionaries to store student data
students = {
    'student1': {'name': 'Alice', 'age': 20, 'major': 'Mathematics'},
    'student2': {'name': 'Bob', 'age': 22, 'major': 'Physics'}
}

print(students['student1']['name'])  # Output: Alice
print(students['student2']['age'])   # Output: 22

Now, you don’t have to create separate variables for each student’s name, age, and major. You can access or update the information in a much simpler way. Plus, it makes your code cleaner and easier to manage.

Example 2: Avoiding Repetition with Dictionaries

Let’s say you want to create a simple grading system based on student scores. Without dictionaries, you might end up writing the following:

# Without dictionary (repeating code)
alice_score = 90
bob_score = 75
charlie_score = 85

if alice_score >= 85:
    print("Alice gets an A")
if bob_score >= 85:
    print("Bob gets an A")
if charlie_score >= 85:
    print("Charlie gets an A")

Here, we’re repeating the if statements and hardcoding each student’s name and score, which violates the DRY principle.

Instead, with a dictionary, you can avoid repetition like this:

# Using a dictionary (DRY principle)
student_scores = {'Alice': 90, 'Bob': 75, 'Charlie': 85}

for student, score in student_scores.items():
    if score >= 85:
        print(f"{student} gets an A")

Now, you have a cleaner, shorter, and more maintainable code! You only write the if statement once, and it works for all students in your dictionary. 🎉

Useful Dictionary Methods

Dictionaries come with a bunch of built-in methods that make working with them a breeze. Let’s check out a few of them:

.get(): Helps you avoid errors if the key doesn’t exist.

   print(student_info.get('address', 'Address not available'))  
   # Output: Address not available

.keys() and .values(): Get all keys or values in the dictionary.

   print(student_info.keys())  # Output: dict_keys(['name', 'age', 'major'])
   print(student_info.values())  # Output: dict_values(['John Doe', 21, 'Computer Science'])

.items(): Get both keys and values as pairs.

   for key, value in student_info.items():
       print(f'{key}: {value}')
   # Output: 
   # name: John Doe
   # age: 21
   # major: Computer Science

.update(): Update a dictionary with another dictionary or key-value pairs.

   student_info.update({'grade': 'A'})
   print(student_info)  
   # Output: {'name': 'John Doe', 'age': 21, 'major': 'Computer Science', 'grade': 'A'}

.setdefault(): Adds a key with a default value if the key doesn’t exist.

   student_info.setdefault('graduation_year', 2024)
   print(student_info)  
   # Output: {'name': 'John Doe', 'age': 21, 'major': 'Computer Science', 'grade': 'A', 'graduation_year': 2024}

Wrapping Up

Dictionaries are super powerful and can really help you follow the DRY principle in your code. By using dictionaries, you avoid repeating yourself, keep your code organized, and make it easier to read and maintain.

So, the next time you find yourself creating a bunch of similar variables, consider using a dictionary instead. It’ll save you a ton of time and effort, and your future self will thank you! 🙌

Happy coding! 💻

DEV Community: Ashwin Kumar

I Tested the Best JS Frameworks in 2026. Here's Why I Chose Astro for My Tool (And My Lighthouse Score Proves It)

The Framework Landscape in 2026 Is Overwhelming (On Purpose)

The Hidden Tax Every Framework Charges You

What Makes Astro Architecturally Different

Islands Architecture — The Technical Core

Framework Agnostic — Not a Gimmick

My Google PageSpeed Insights Score

When You Should Not Use Astro

Getting Started

What I Shipped

The Takeaway

AI Can Build Your SaaS But It Can’t Take Responsibility for Security

The Story of XGBoost: A Machine Learning Revolution

Did you know XGBoost is not actually an algorithm?

What is the Gradient Boosting Algorithm?

Key Objectives of Gradient Boosting:

Core Idea:

What is XGBoost?

Official Definition:

The Story Behind XGBoost: How Tianqi Chen Revolutionized Machine Learning

A Passion for Machine Learning:

The Birth of XGBoost:

Gaining Popularity: The Rise of XGBoost

Why XGBoost Changed the Game

Key Strengths:

Tianqi Chen: A Legacy in Machine Learning

Share Your Thoughts!

5 Best Artificial Intelligence Documentaries Everyone Should Watch

From Beginner to Pro: Important Python Learning Topics You Can't Miss!

1. Introduction to Python

2. Control Flow

3. Functions

4. Data Structures

5. Object-Oriented Programming (OOP)

6. Error Handling

7. File Handling

8. Modules and Packages

9. Working with Libraries

10. Advanced Topics

11. Working with APIs

12. Introduction to Data Science

13. Final Project

Resources to Learn Python:

how to handle outliers in machine learning

What is an Outlier?

Why Deal with Outliers?

Common Techniques to Deal with Outliers

1. Z-Score Method

Example:

Coding Demo:

2. IQR Method (Interquartile Range)

3. Modified Z-Score

4. Box Plot Visualization

5. Winsorization

6. Log Transformation

Conclusion

Tips

CountVectorizer vs TfidfVectorizer

What is CountVectorizer?

Real Life Example

What is TfidfVectorizer?

Why Do We Need Vectorization?

How to Use CountVectorizer and TfidfVectorizer?

Which Vectorizer is Better?

Drawbacks of CountVectorizer and TfidfVectorizer

What Are max_features in CountVectorizer?

Using and Reversing the Vectorization Process

Additional Tools and Techniques

Final Thoughts

Understanding the Curse of Dimensionality

What Is the Curse of Dimensionality?

Why Do We Even Bother?

What Is High Dimension?

What Happens When We Have High Dimensions?

How Do We Know This Is the Curse of Dimensionality?

Is There Any Way to Mitigate the Curse of Dimensionality?

Conclusion

Understand Normal Distribution

Best Free Resources to Sharpen Your Math Skills for Machine Learning!

How to Use `CountVectorizer` and `TfidfVectorizer`?

Drawbacks of `CountVectorizer` and `TfidfVectorizer`