<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ashwin Kumar</title>
    <description>The latest articles on DEV Community by Ashwin Kumar (@aashwinkumar).</description>
    <link>https://dev.to/aashwinkumar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F821015%2F9c57a68a-8db6-4ba3-96bd-46ded9f4dcf6.png</url>
      <title>DEV Community: Ashwin Kumar</title>
      <link>https://dev.to/aashwinkumar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aashwinkumar"/>
    <language>en</language>
    <item>
      <title>AI Can Build Your SaaS But It Can’t Take Responsibility for Security</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Sun, 01 Feb 2026 07:46:07 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/ai-can-build-your-saas-but-it-cant-take-responsibility-for-security-55gi</link>
      <guid>https://dev.to/aashwinkumar/ai-can-build-your-saas-but-it-cant-take-responsibility-for-security-55gi</guid>
      <description>&lt;p&gt;AI Can Build Your SaaS But It Can't Take Responsibility for Security&lt;/p&gt;

&lt;p&gt;We're living in an incredible era. Non-coders are shipping products that would've taken months of learning just a few years ago. Tools like Cursor, GitHub Copilot, v0, Replit, and Claude are turning ideas into MVPs overnight. Solo devs are building SaaS products, making real revenue, and living the dream.&lt;/p&gt;

&lt;p&gt;But here's the reality check we all need to hear: 45% of AI-generated code introduces OWASP Top 10 security vulnerabilities.&lt;/p&gt;

&lt;p&gt;The Numbers Don't Lie—And They're Alarming&lt;br&gt;
Veracode's 2025 GenAI Code Security Report tested over 100 large language models across Java, Python, C#, and JavaScript. The findings should terrify anyone shipping AI-generated code without proper security audits:&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Java code: 72% security failure rate&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Python: 38% vulnerable&lt;/p&gt;

&lt;p&gt;JavaScript: 43% vulnerable&lt;/p&gt;

&lt;p&gt;C#: 45% vulnerable&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Even more concerning? Cross-site scripting (XSS) defenses failed 86% of the time, and log injection vulnerabilities appeared in 88% of cases. These aren't edge cases—these are OWASP Top 10 vulnerabilities that attackers exploit daily.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Stanford and NYU research found that 40% of GitHub Copilot-generated programs contained bugs or design flaws that could be exploited by attackers. And here's the kicker: AI coding assistants suggest vulnerable code patterns 40% more often than secure alternatives, simply because insecure code appears more frequently in their training data.&lt;/p&gt;

&lt;p&gt;Big Tech Is Raising Red Flags&lt;br&gt;
Microsoft's CEO Satya Nadella revealed that AI now writes 30% of Microsoft's code—and they're accelerating toward 80%. But with that speed comes risk. In one documented case study, AI tools suggested non-existent package dependencies over 400,000 times, creating massive supply chain attack vectors.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;GitHub Copilot itself isn't immune. In June 2025, security researchers discovered CamoLeak—a critical vulnerability (CVSS 9.6) that allowed silent exfiltration of secrets and private source code from developers' repositories. The attack exploited GitHub's own infrastructure to steal AWS keys, API tokens, and proprietary code.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;Critical vulnerabilities were also discovered throughout 2025 in AI coding tools from Cursor, Google's Gemini, and Amazon's Q. The Amazon Q breach demonstration showed how easily prompt injection attacks could compromise these tools.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;The Data Exposure Crisis&lt;br&gt;
Since Q2 2023, there's been a 3x increase in repositories containing Personally Identifiable Information (PII) and payment details due to AI-generated code. Research shows that repositories using Copilot exhibit 6.4% secret leakage rates—40% higher than traditional development.&lt;/p&gt;

&lt;p&gt;Even worse? There's been a 10x surge in APIs missing basic security fundamentals like authorization and input validation. Sensitive API endpoints have nearly doubled as AI generates code faster than security teams can review it.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;The False Sense of Security&lt;br&gt;
"But I'm using Claude/ChatGPT—it's from big tech, so it must be secure, right?"&lt;/p&gt;

&lt;p&gt;Wrong.&lt;/p&gt;

&lt;p&gt;Here's what trained developers know that non-technical builders don't: AI models don't improve at security as they get smarter. Veracode's research revealed that despite advances in LLMs' ability to generate syntactically correct code, security performance has remained flat over time. Newer, larger models aren't writing more secure code—they're just writing vulnerable code faster.&lt;/p&gt;

&lt;p&gt;As John Cranney, VP of Engineering at Secure Code Warrior, warns: "No model provider has yet solved the problem of prompt injection, which means every new input adds a new potential injection vector".&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;What You Can Do Right Now&lt;br&gt;
If you're building with AI and aren't a security expert, here are immediate actions backed by industry recommendations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use this security validation prompt before deploying:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"Analyze this code for security vulnerabilities including SQL injection, XSS attacks, CSRF, authentication flaws, insecure deserialization, hardcoded secrets, weak cryptography, and insufficient input validation. Provide specific fixes with secure code examples for each issue found."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Integrate automated security scanning:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;OWASP ZAP for web vulnerability scanning&lt;/p&gt;

&lt;p&gt;Snyk or GitGuardian for secrets detection and dependency vulnerabilities&lt;/p&gt;

&lt;p&gt;npm audit / pip audit for package security&lt;/p&gt;

&lt;p&gt;SonarQube for static code analysis&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Treat AI code as untrusted external contributions:&lt;br&gt;
Microsoft, GitHub, and security experts all agree: AI-generated code requires the same security review as third-party libraries. Never deploy it without scanning and human review.&lt;br&gt;
​&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hire a security expert for pre-launch audit:&lt;br&gt;
Even a 2-hour consultation can identify critical vulnerabilities that could result in data breaches, regulatory fines (GDPR, CCPA), and reputational damage.&lt;br&gt;
​&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Bottom Line&lt;br&gt;
GitHub Copilot helped developers ship code with a 70% surge in pull requests. That's incredible productivity. But speed without security is a risk you can't afford.&lt;/p&gt;

&lt;p&gt;Your users trust you with their emails, payment details, phone numbers, and personal data. 29.1% of AI-generated Python code contains SQL injection, authentication bypass, and XSS vulnerabilities. One breach could destroy everything you've built.&lt;br&gt;
​&lt;/p&gt;

&lt;p&gt;AI is a phenomenal co-pilot. But it's not a security expert. And when a breach happens—when customer data leaks, when your database gets wiped, when regulatory fines arrive—AI won't be there to face angry users, legal teams, or your destroyed reputation.&lt;/p&gt;

&lt;p&gt;You will.&lt;/p&gt;

&lt;p&gt;Build fast. Ship confidently. But never, ever skip security.&lt;/p&gt;

&lt;p&gt;Want to learn more? Check out Veracode's 2025 GenAI Code Security Report and OWASP's guidelines for securing AI-generated code.&lt;/p&gt;

&lt;p&gt;Thanks 🙏&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>saas</category>
    </item>
    <item>
      <title>The Story of XGBoost: A Machine Learning Revolution</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Sat, 23 Nov 2024 03:42:48 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/the-story-of-xgboost-a-machine-learning-revolution-bib</link>
      <guid>https://dev.to/aashwinkumar/the-story-of-xgboost-a-machine-learning-revolution-bib</guid>
      <description>&lt;h3&gt;
  
  
  Did you know XGBoost is not actually an algorithm?
&lt;/h3&gt;

&lt;p&gt;It's a library created by &lt;strong&gt;Tianqi Chen&lt;/strong&gt; that has become one of the most popular tools in machine learning. Today, we’ll explore how Tianqi developed XGBoost. But before diving into its specifics, let’s first understand the foundational algorithm behind it: &lt;strong&gt;Gradient Boosting&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the Gradient Boosting Algorithm?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gradient Boosting&lt;/strong&gt; is a sophisticated and widely used machine learning method that builds a predictive model by combining multiple simpler models—usually decision trees—in a sequential manner. Developed by &lt;strong&gt;Jerome H. Friedman&lt;/strong&gt;, it was introduced in his seminal paper titled &lt;em&gt;"&lt;a href="https://www.researchgate.net/publication/2424824_Greedy_Function_Approximation_A_Gradient_Boosting_Machine" rel="noopener noreferrer"&gt;Greedy Function Approximation: A Gradient Boosting Machine.&lt;/a&gt;"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Objectives of Gradient Boosting:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Iteratively correct the errors of earlier models.&lt;/li&gt;
&lt;li&gt;Improve prediction accuracy using &lt;strong&gt;gradient descent&lt;/strong&gt; optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Idea:
&lt;/h3&gt;

&lt;p&gt;The central concept is to focus on areas where the model struggles most:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initial Predictions:&lt;/strong&gt; Start with simple predictions and calculate errors (residuals).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Targeting:&lt;/strong&gt; Construct additional models to minimize those errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Improvement:&lt;/strong&gt; Combine these models to improve overall performance, ensuring predictions get progressively better.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This systematic focus on mistakes differentiates Gradient Boosting from other ensemble methods like bagging.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is XGBoost?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://xgboost.ai/" rel="noopener noreferrer"&gt;XGBoost&lt;/a&gt;&lt;/strong&gt; stands for &lt;strong&gt;Extreme Gradient Boosting&lt;/strong&gt;. It’s a powerful library designed to make machine learning tasks faster and more efficient. It’s widely used for solving regression, classification, and ranking problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Official Definition:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Story Behind XGBoost: How Tianqi Chen Revolutionized Machine Learning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A Passion for Machine Learning:
&lt;/h3&gt;

&lt;p&gt;In the early 2010s, &lt;strong&gt;&lt;a href="https://tqchen.com/" rel="noopener noreferrer"&gt;Tianqi Chen&lt;/a&gt;&lt;/strong&gt;, a Ph.D. student at the University of Washington, saw the potential to improve existing tools. While &lt;strong&gt;Gradient Boosting Machines (GBMs)&lt;/strong&gt; were powerful, they were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ar5iv.labs.arxiv.org/html/1809.04559" rel="noopener noreferrer"&gt;Computationally expensive.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Inefficient on large datasets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tianqi’s vision? Create a more efficient, scalable, and robust version of gradient boosting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuze8mpmv89ru5s0dd4so.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuze8mpmv89ru5s0dd4so.jpg" alt="tianqi chen photo" width="800" height="896"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Birth of XGBoost:
&lt;/h3&gt;

&lt;p&gt;Driven by personal frustrations, Tianqi began developing XGBoost as a side project. His innovations included:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallelization:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Traditional GBMs built trees sequentially. Tianqi introduced parallelization, enabling multiple trees to be built simultaneously, drastically reducing training time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regularization:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Unlike traditional GBMs, XGBoost included regularization to prevent overfitting by penalizing model complexity, making it more robust.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sparsity-Aware Optimization:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Tianqi designed XGBoost to handle missing or sparse data efficiently, adapting the optimization process to treat missing values as a special case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hardware Optimization:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
XGBoost was built to leverage both &lt;strong&gt;CPU&lt;/strong&gt; and &lt;strong&gt;GPU&lt;/strong&gt; architectures, ensuring scalability from small academic projects to massive datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/-9axaxsrexM"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Gaining Popularity: The Rise of XGBoost
&lt;/h2&gt;

&lt;p&gt;Released as an open-source project in 2014, XGBoost initially went unnoticed. But soon, its superior performance and scalability caught the attention of the machine learning community. Data scientists, particularly on platforms like &lt;strong&gt;Kaggle&lt;/strong&gt;, began adopting it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster training times.&lt;/li&gt;
&lt;li&gt;Improved predictive accuracy.&lt;/li&gt;
&lt;li&gt;Handling large datasets with ease.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its flexibility and features like &lt;strong&gt;early stopping&lt;/strong&gt; and &lt;strong&gt;model evaluation&lt;/strong&gt; further cemented its reputation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why XGBoost Changed the Game
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Strengths:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Practical Optimization:&lt;/strong&gt; Tianqi addressed computational inefficiencies, making XGBoost both &lt;strong&gt;fast&lt;/strong&gt; and &lt;strong&gt;scalable&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-World Applicability:&lt;/strong&gt; From business to healthcare, XGBoost powers critical applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-Source Impact:&lt;/strong&gt; Its open-source nature fostered widespread adoption and innovation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tianqi Chen: A Legacy in Machine Learning
&lt;/h2&gt;

&lt;p&gt;Today, &lt;strong&gt;Tianqi Chen&lt;/strong&gt; is celebrated as one of the most influential figures in machine learning. His work has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Empowered data scientists worldwide.&lt;/li&gt;
&lt;li&gt;Inspired innovations in optimization and large-scale machine learning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As of 2024, XGBoost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boasts over &lt;strong&gt;26k stars on GitHub&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Dominates &lt;strong&gt;30% of Kaggle competition winning solutions&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Remains a go-to tool across industries like finance, healthcare, e-commerce, and marketing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Share Your Thoughts!
&lt;/h2&gt;

&lt;p&gt;If you found the story of XGBoost's creation inspiring, share your thoughts in the comments below! Don’t forget to share this article with fellow machine learning enthusiasts.&lt;/p&gt;

&lt;p&gt;Happy Coding ❤️ and don’t forget to Like! &lt;/p&gt;

</description>
    </item>
    <item>
      <title>5 Best Artificial Intelligence Documentaries Everyone Should Watch</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Sat, 09 Nov 2024 04:44:20 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/weekend-watchlist-must-watch-documentaries-3n62</link>
      <guid>https://dev.to/aashwinkumar/weekend-watchlist-must-watch-documentaries-3n62</guid>
      <description>&lt;p&gt;Hey everyone! Hope you're having an awesome weekend. 😊 If you’ve got some free time and want to watch something that’s both useful and entertaining (and you’re into tech or curious about new technologies), I’ve got some cool documentaries for you on YouTube about AI and technology. Perfect for brushing up on some knowledge that’ll make you sound super smart come Monday. 😎&lt;/p&gt;

&lt;p&gt;Here’s the list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://youtu.be/WXuK6gekU1Y?si=Ktw4adlSnh5Hmy61" rel="noopener noreferrer"&gt;AlphaGo - The Movie | Full award-winning documentary&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://youtu.be/Le122vas9aM?si=VeppNwu4SROG3ML1" rel="noopener noreferrer"&gt;AI Supremacy: The artificial intelligence battle between China, USA, and Europe | DW Documentary&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://youtu.be/-sB12gk9ESA?si=CQbAeAWVY_GFRpIM" rel="noopener noreferrer"&gt;A.I. Revolution | Full Documentary | NOVA | PBS&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://youtu.be/2kSl0xkq2lM?si=yuKfHWuxpfKDbXaH" rel="noopener noreferrer"&gt;The Turing Lectures: The future of generative AI (ok, technically a lecture but still cool)&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://youtu.be/R3YFxF0n8n8?si=91C8t4FlutTKyoVz" rel="noopener noreferrer"&gt;The History of Artificial Intelligence [Documentary]&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bonus (and my personal favorite, but it’s on Netflix): Coded Bias. Definitely worth checking out!&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>top7</category>
    </item>
    <item>
      <title>From Beginner to Pro: Important Python Learning Topics You Can't Miss!</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Wed, 23 Oct 2024 03:05:57 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/from-beginner-to-pro-python-learning-syllabus-you-cant-miss-44m9</link>
      <guid>https://dev.to/aashwinkumar/from-beginner-to-pro-python-learning-syllabus-you-cant-miss-44m9</guid>
      <description>&lt;p&gt;Hey guys! If you’re starting to learn Python, great choice! I found some cool stats about it, and while looking for a good syllabus, I noticed some topics come up a lot. So, I made a beginner friendly Python syllabus that covers all the key concepts. I hope you like it!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Introduction to Python&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What is Python?&lt;/li&gt;
&lt;li&gt;Installing Python&lt;/li&gt;
&lt;li&gt;Running Python scripts&lt;/li&gt;
&lt;li&gt;Python IDEs (Integrated Development Environments)&lt;/li&gt;
&lt;li&gt;Basic Syntax: Comments, Indentation, and Variables&lt;/li&gt;
&lt;li&gt;Python Data Types: Strings, Integers, Floats, Booleans&lt;/li&gt;
&lt;li&gt;Basic Input and Output&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python's Interactive Mode and REPL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using Jupyter Notebooks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Understanding the Python Shell&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Basic Troubleshooting: Common Errors and Fixes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Control Flow&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Conditional Statements: &lt;code&gt;if&lt;/code&gt;, &lt;code&gt;else&lt;/code&gt;, &lt;code&gt;elif&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Comparison and Logical Operators&lt;/li&gt;
&lt;li&gt;Loops:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;for&lt;/code&gt; loops&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;while&lt;/code&gt; loops&lt;/li&gt;
&lt;li&gt;Loop control statements: &lt;code&gt;break&lt;/code&gt;, &lt;code&gt;continue&lt;/code&gt;, &lt;code&gt;pass&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;List and Dictionary Comprehensions&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Nested Loops&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Using &lt;code&gt;enumerate()&lt;/code&gt; with Loops&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;The &lt;code&gt;zip()&lt;/code&gt; Function for Iteration&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Error Handling in Loops&lt;/strong&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Functions&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Defining Functions with &lt;code&gt;def&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Parameters and Arguments&lt;/li&gt;
&lt;li&gt;Return Values&lt;/li&gt;
&lt;li&gt;Variable Scope: Local vs Global&lt;/li&gt;
&lt;li&gt;Lambda Functions&lt;/li&gt;
&lt;li&gt;Recursion&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default and Keyword Arguments&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variable-length Arguments (&lt;code&gt;*args&lt;/code&gt; and `&lt;/strong&gt;kwargs`)**&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Higher-order Functions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decorators (basic introduction)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Data Structures&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lists:

&lt;ul&gt;
&lt;li&gt;Indexing, Slicing, and Methods (append, insert, remove, etc.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Tuples:

&lt;ul&gt;
&lt;li&gt;Immutability and Use Cases&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Dictionaries:

&lt;ul&gt;
&lt;li&gt;Key-Value Pairs, Methods (get, keys, values, etc.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Sets:

&lt;ul&gt;
&lt;li&gt;Set Operations (union, intersection, difference)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Nested Data Structures&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;List vs. Tuple vs. Set vs. Dictionary&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Understanding &lt;code&gt;collections&lt;/code&gt; module: Counter, defaultdict, OrderedDict&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Data Structure Performance Considerations&lt;/strong&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Object-Oriented Programming (OOP)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Classes and Objects&lt;/li&gt;
&lt;li&gt;Attributes and Methods&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;self&lt;/code&gt; Keyword&lt;/li&gt;
&lt;li&gt;Constructors (&lt;code&gt;__init__&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Inheritance

&lt;ul&gt;
&lt;li&gt;Single and Multiple Inheritance&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Polymorphism&lt;/li&gt;

&lt;li&gt;Encapsulation and Abstraction&lt;/li&gt;

&lt;li&gt;Special Methods: &lt;code&gt;&lt;strong&gt;str&lt;/strong&gt;&lt;/code&gt;, &lt;code&gt;&lt;strong&gt;repr&lt;/strong&gt;&lt;/code&gt;, &lt;code&gt;&lt;strong&gt;len&lt;/strong&gt;&lt;/code&gt;, etc.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Class vs. Instance Variables&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Class Methods and Static Methods&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Composition vs. Inheritance&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Abstract Base Classes (ABCs)&lt;/strong&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6. Error Handling&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Types of Errors: Syntax, Logic, Runtime&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;try&lt;/code&gt;, &lt;code&gt;except&lt;/code&gt;, &lt;code&gt;finally&lt;/code&gt; blocks&lt;/li&gt;
&lt;li&gt;Raising Exceptions with &lt;code&gt;raise&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Custom Exception Classes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using &lt;code&gt;assert&lt;/code&gt; for Debugging&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Logging Errors with the &lt;code&gt;logging&lt;/code&gt; Module&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Creating Context Managers for Error Handling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Best Practices in Error Handling&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7. File Handling&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Opening Files: &lt;code&gt;open()&lt;/code&gt;, &lt;code&gt;read()&lt;/code&gt;, &lt;code&gt;write()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Reading and Writing to Files&lt;/li&gt;
&lt;li&gt;File Modes (&lt;code&gt;r&lt;/code&gt;, &lt;code&gt;w&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Working with File Paths&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;with&lt;/code&gt; to Automatically Close Files&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reading and Writing CSV Files&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Working with JSON Files&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;File Iterators&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handling Large Files with Buffered Reading/Writing&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;8. Modules and Packages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Importing Modules: &lt;code&gt;import&lt;/code&gt;, &lt;code&gt;from ... import&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Python Standard Library (e.g., &lt;code&gt;math&lt;/code&gt;, &lt;code&gt;random&lt;/code&gt;, &lt;code&gt;datetime&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Creating and Using Custom Modules&lt;/li&gt;
&lt;li&gt;Using Third-Party Packages with &lt;code&gt;pip&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Virtual Environments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Understanding the &lt;code&gt;__init__.py&lt;/code&gt; file&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Building Your Own Package&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using &lt;code&gt;requirements.txt&lt;/code&gt; for Dependency Management&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exploring the &lt;code&gt;sys&lt;/code&gt; and &lt;code&gt;os&lt;/code&gt; Modules&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;9. Working with Libraries&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NumPy (for array manipulation)&lt;/li&gt;
&lt;li&gt;Pandas (for data analysis and manipulation)&lt;/li&gt;
&lt;li&gt;Matplotlib and Seaborn (for data visualization)&lt;/li&gt;
&lt;li&gt;Requests (for handling HTTP requests)&lt;/li&gt;
&lt;li&gt;JSON Handling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using SciPy for Scientific Computing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Working with SQLAlchemy for Database Interaction&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Web Scraping with Beautiful Soup and Scrapy&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Introduction to TensorFlow and Keras for Machine Learning&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;10. Advanced Topics&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;List and Dictionary Comprehensions (advanced usage)&lt;/li&gt;
&lt;li&gt;Generators and &lt;code&gt;yield&lt;/code&gt; keyword&lt;/li&gt;
&lt;li&gt;Decorators and &lt;code&gt;@decorator_name&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Context Managers&lt;/li&gt;
&lt;li&gt;Regular Expressions (Regex)&lt;/li&gt;
&lt;li&gt;Unit Testing with &lt;code&gt;unittest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metaclasses and their Use Cases&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asynchronous Programming (async/await)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Threading and Multiprocessing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python’s &lt;code&gt;functools&lt;/code&gt; module (e.g., &lt;code&gt;lru_cache&lt;/code&gt;, &lt;code&gt;partial&lt;/code&gt;)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Descriptors and Property Decorators&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Type Hinting and Annotations&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Advanced Error Handling and Custom Exceptions&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;11. Working with APIs&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What are APIs?&lt;/li&gt;
&lt;li&gt;Consuming APIs with Python&lt;/li&gt;
&lt;li&gt;Authentication (Basic, OAuth)&lt;/li&gt;
&lt;li&gt;Parsing JSON from APIs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using the &lt;code&gt;requests&lt;/code&gt; Library for API Calls&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Working with REST vs. SOAP APIs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handling API Rate Limiting&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Creating Your Own API with Flask or FastAPI&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;12. Introduction to Data Science&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Basics of Data Manipulation with Pandas&lt;/li&gt;
&lt;li&gt;Data Visualization with Matplotlib/Seaborn&lt;/li&gt;
&lt;li&gt;Basic Statistics in Python&lt;/li&gt;
&lt;li&gt;Introduction to Machine Learning with Scikit-learn (optional)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature Engineering and Selection&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Cleaning Techniques&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Understanding Overfitting and Underfitting&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;13. Final Project&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Develop a Python project that integrates different concepts:

&lt;ul&gt;
&lt;li&gt;Data Analysis, Web Scraping, or a Simple Game&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Project Planning and Documentation&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Version Control with Git&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Deployment Options (e.g., Heroku, GitHub Pages)&lt;/strong&gt;&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Presenting Your Project: Best Practices&lt;/strong&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resources to Learn Python:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.learnpython.org/#google_vignette" rel="noopener noreferrer"&gt;Learn Python Free
&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/learn/python" rel="noopener noreferrer"&gt;Kaggel Course on Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.codecademy.com/learn/learn-advanced-python" rel="noopener noreferrer"&gt;CodeAcacdmy Adv Python Course&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.python.org/3/tutorial/index.html" rel="noopener noreferrer"&gt;Official Python DOC&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you have any suggestions or if I missed something, just drop a comment! Happy coding!&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>programming</category>
      <category>python</category>
      <category>career</category>
    </item>
    <item>
      <title>how to handle outliers in machine learning</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Sun, 13 Oct 2024 17:19:40 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/how-to-handle-outliers-in-machine-learning-3eo2</link>
      <guid>https://dev.to/aashwinkumar/how-to-handle-outliers-in-machine-learning-3eo2</guid>
      <description>&lt;p&gt;Outliers are unusual data points that stand out from the rest of your data because they are either much higher or much lower than the rest. Imagine a classroom where most students score between 50 and 80 marks on a test, but one student scores 5, and another scores 100. These extremely different scores are examples of &lt;strong&gt;outliers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In realworld data, outliers are common, and how you handle them can significantly impact your results. So, let’s break down some simple techniques to deal with outliers, using simple examples and coding demos to help you get started.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an Outlier?
&lt;/h3&gt;

&lt;p&gt;Before we jump into the techniques, let’s define what an outlier is. In simple terms, an outlier is a value in a dataset that’s far away from the average or the majority of the other values. For example, in a class of students where most are 18-22 years old, if someone is 50 years old, they would be considered an outlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Deal with Outliers?
&lt;/h3&gt;

&lt;p&gt;Outliers can distort your results, make your analysis less accurate, and lead to wrong conclusions. For instance, imagine you're trying to find the average income of a neighborhood, but a billionaire lives there. Their income would skew the average, giving you a false impression of the neighborhood’s wealth. &lt;/p&gt;

&lt;h3&gt;
  
  
  Common Techniques to Deal with Outliers
&lt;/h3&gt;

&lt;p&gt;Let’s explore a few simple and effective techniques to deal with outliers. We'll also include a coding demo to show how to use each technique.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. &lt;strong&gt;Z-Score Method&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: The Z-score method tells you how far a value is from the mean (average) of your data in terms of standard deviations. If a value is more than 3 standard deviations away from the mean, it is considered an outlier. &lt;a href="https://z-table.com/" rel="noopener noreferrer"&gt;Z-score table&lt;/a&gt; is useful&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use&lt;/strong&gt;: When your data is normally distributed (bell-shaped curve).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fziy3y8fvtx7zx7vo3g5p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fziy3y8fvtx7zx7vo3g5p.png" alt="Z-Score Method" width="640" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Example:
&lt;/h4&gt;

&lt;p&gt;Imagine you have the heights of 100 people, most of them are between 150 cm and 180 cm, but one person is 250 cm tall. This is an outlier.&lt;/p&gt;

&lt;h4&gt;
  
  
  Coding Demo:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Sample data: heights of people (in cm)
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;170&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

&lt;span class="c1"&gt;# Adding an outlier
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;  &lt;span class="c1"&gt;# This is the outlier
&lt;/span&gt;
&lt;span class="c1"&gt;# Calculate the Z-scores
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Z_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Identifying outliers (Z-score &amp;gt; 3 or Z-score &amp;lt; -3)
&lt;/span&gt;&lt;span class="n"&gt;outliers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Z_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Outliers:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outliers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. &lt;strong&gt;IQR Method (Interquartile Range)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: The IQR method calculates the range within which the middle 50% of your data lies. It helps identify outliers by finding values that fall significantly outside this range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Calculate the first quartile (Q1)&lt;/strong&gt;: The 25th percentile of the data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculate the third quartile (Q3)&lt;/strong&gt;: The 75th percentile of the data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find the IQR&lt;/strong&gt;: Subtract Q1 from Q3. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;IQR  = Q3 - Q1&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzftmfuwi1wyri2x2ztm3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzftmfuwi1wyri2x2ztm3.png" alt="Find the IQR" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Determine the outlier boundaries&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Lower Bound: Q1 - 1.5 × IQR&lt;/li&gt;
&lt;li&gt;Upper Bound: Q3 + 1.5 × IQR&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identify outliers&lt;/strong&gt;: Any data point below the lower bound or above the upper bound is an outlier.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: In a survey of people’s monthly expenses, if most spend between $500 and $1500 but a few spend over $4000, those high expenses are outliers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Demo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Sample data for monthly expenses
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Monthly Expenses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate Q1 and Q3
&lt;/span&gt;&lt;span class="n"&gt;Q1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Monthly Expenses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Q3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Monthly Expenses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;IQR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Q3&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Q1&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate bounds for outliers
&lt;/span&gt;&lt;span class="n"&gt;lower_bound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Q1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;IQR&lt;/span&gt;
&lt;span class="n"&gt;upper_bound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Q3&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;IQR&lt;/span&gt;

&lt;span class="c1"&gt;# Identify outliers
&lt;/span&gt;&lt;span class="n"&gt;outliers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Monthly Expenses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;lower_bound&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Monthly Expenses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;upper_bound&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Identified Outliers using IQR:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outliers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;strong&gt;Modified Z-Score&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: The modified Z-score is similar to the Z-score but is more robust against outliers. It uses the median and the median absolute deviation (MAD) to calculate how far a data point is from the median.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Calculate the median of the dataset.&lt;/li&gt;
&lt;li&gt;Compute the absolute deviation from the median for each data point.&lt;/li&gt;
&lt;li&gt;Calculate the median of those absolute deviations (MAD).&lt;/li&gt;
&lt;li&gt;Identify outliers: Any data point below the lower bound or above the upper bound is an outlier.&lt;/li&gt;
&lt;li&gt;Calculate the modified Z-score:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd6wi2amkw9oum6ojuk1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd6wi2amkw9oum6ojuk1.jpg" alt="modified Z-score" width="506" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;X&lt;/strong&gt;: This represents the specific data point you are evaluating. It could be any individual observation in your dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Median&lt;/strong&gt;: This is the middle value of your dataset when it is sorted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAD (Median Absolute Deviation)&lt;/strong&gt;: This is a measure of variability that quantifies how much the values in a dataset deviate from the median.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;0.6745&lt;/strong&gt;: This constant is a scaling factor used to make the modified Z-score comparable to the standard normal distribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: In a group of people's daily steps, if most walk between 2000 and 10000 steps but a few walk 30000 steps, those high step counts could be outliers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Demo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Sample data for daily steps
&lt;/span&gt;&lt;span class="n"&gt;steps_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Daily Steps&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;steps_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate median and MAD
&lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Daily Steps&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Daily Steps&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate modified Z-scores
&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Modified Z&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6745&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Daily Steps&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;mad&lt;/span&gt;

&lt;span class="c1"&gt;# Identify outliers
&lt;/span&gt;&lt;span class="n"&gt;outliers_modified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_steps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Modified Z&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Identified Outliers using Modified Z-Score:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outliers_modified&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;strong&gt;Box Plot Visualization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: A box plot visually displays the distribution of your data, making it easy to spot outliers. The box represents the interquartile range (IQR), and any points outside the “whiskers” (lower and upper bounds) are considered outliers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: In analyzing the heights of basketball players, you might find that most players fall between 180 cm and 210 cm, but a few exceed 230 cm, clearly visible in a box plot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscd82joc108u0su7rap0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscd82joc108u0su7rap0.png" alt="Box Plot Visualization" width="350" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Demo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;# Sample data for heights
&lt;/span&gt;&lt;span class="n"&gt;heights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;185&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;190&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;210&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;220&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;230&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Create box plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Box Plot of Heights&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Height (cm)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. &lt;strong&gt;Winsorization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: Winsorization involves capping the outlier values to reduce their influence without completely removing them. For example, you might replace extreme high values with the next highest non-outlier value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: In a dataset of home prices, if one home is listed at $10 million while most are under $1 million, you might replace $10 million with the highest non-outlier price to maintain a realistic range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Demo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Winsorization example
&lt;/span&gt;&lt;span class="n"&gt;data_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Home Prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;150000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;250000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# One extreme outlier
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Winsorization: cap outliers at the 95th percentile
&lt;/span&gt;&lt;span class="n"&gt;cap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_prices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Home Prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df_prices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Capped Prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_prices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Home Prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df_prices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Home Prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Data After Winsorization:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. &lt;strong&gt;Log Transformation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does&lt;/strong&gt;: Log transformation reduces the effect of extreme values by applying a logarithmic scale to the data. This is particularly useful for positively skewed data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: In analyzing incomes, where most values are clustered around a certain range, log transformation can help normalize the data and make it easier to analyze.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj61e8syztmel0i9o3lwy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj61e8syztmel0i9o3lwy.png" alt="log transformation " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Demo&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Sample income data
&lt;/span&gt;&lt;span class="n"&gt;income_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Annual Income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Includes a large outlier
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df_income&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;income_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Apply log transformation
&lt;/span&gt;&lt;span class="n"&gt;df_income&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Log Income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_income&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Annual Income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Data After Log Transformation:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_income&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Outliers are a natural part of data, but how you handle them can make a big difference in your analysis. By using techniques like Z-score, IQR, modified Z-score, box plots, winsorization, and log transformation, you can effectively manage outliers and improve the accuracy of your insights. Remember, the choice of technique depends on your data's characteristics and the specific context of your analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Always visualize your data before and after handling outliers to understand their impact.&lt;/li&gt;
&lt;li&gt;Consider the context of your data: sometimes, outliers are valid observations that should be kept for analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy Coding ❤️ &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>CountVectorizer vs TfidfVectorizer</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Tue, 08 Oct 2024 18:29:00 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/countvectorizer-vs-tfidfvectorizer-1kck</link>
      <guid>https://dev.to/aashwinkumar/countvectorizer-vs-tfidfvectorizer-1kck</guid>
      <description>&lt;p&gt;Imagine you're having a conversation with a friend about your favorite book. You discuss the storyline, memorable quotes, and what made it special. Now, if a machine had to understand this conversation, how would it process your words? Machines can’t comprehend text the way we do. They need text data to be converted into numerical form to perform any kind of analysis or prediction. This process of converting text into numbers is called &lt;strong&gt;&lt;a href="https://www.deepset.ai/blog/what-is-text-vectorization-in-nlp" rel="noopener noreferrer"&gt;text vectorization&lt;/a&gt;&lt;/strong&gt;, and it’s where tools like &lt;code&gt;CountVectorizer&lt;/code&gt; and &lt;code&gt;TfidfVectorizer&lt;/code&gt; come into play.&lt;/p&gt;

&lt;p&gt;But what are they, and how do they work? Let's break it down in the simplest way possible.&lt;/p&gt;




&lt;h3&gt;
  
  
  What is CountVectorizer?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CountVectorizer&lt;/code&gt; is like creating a word count table. It takes a collection of text data and converts it into a matrix of token counts. Each row represents a document, and each column represents a unique word (or token). The values in the matrix indicate how many times each word appears in each document.&lt;/p&gt;

&lt;h4&gt;
  
  
  Real Life Example
&lt;/h4&gt;

&lt;p&gt;Suppose you have three sentences:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"I love coding."&lt;/li&gt;
&lt;li&gt;"Coding is fun."&lt;/li&gt;
&lt;li&gt;"I love learning new things."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Using &lt;code&gt;CountVectorizer&lt;/code&gt;, the result might look something like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;coding&lt;/th&gt;
&lt;th&gt;fun&lt;/th&gt;
&lt;th&gt;i&lt;/th&gt;
&lt;th&gt;is&lt;/th&gt;
&lt;th&gt;learning&lt;/th&gt;
&lt;th&gt;love&lt;/th&gt;
&lt;th&gt;new&lt;/th&gt;
&lt;th&gt;things&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Doc 1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Doc 2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Doc 3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;code&gt;1&lt;/code&gt; indicates the presence of the word, and &lt;code&gt;0&lt;/code&gt; indicates its absence. This matrix is what &lt;code&gt;CountVectorizer&lt;/code&gt; generates.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is TfidfVectorizer?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;TfidfVectorizer&lt;/code&gt; (Term Frequency Inverse Document Frequency) is an extension of &lt;code&gt;CountVectorizer&lt;/code&gt;. While &lt;code&gt;CountVectorizer&lt;/code&gt; just counts the words, &lt;code&gt;TfidfVectorizer&lt;/code&gt; goes a step further and also considers the importance of words across all documents. It assigns more weight to words that appear frequently in a single document but are rare across other documents, making it better for distinguishing between words like “the” and actual meaningful terms.&lt;/p&gt;

&lt;p&gt;Using the same sentences as above, the matrix generated by &lt;code&gt;TfidfVectorizer&lt;/code&gt; will contain &lt;strong&gt;decimal values&lt;/strong&gt; instead of just counts, representing the importance of each word in a given document.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Do We Need Vectorization?
&lt;/h3&gt;

&lt;p&gt;Vectorization is needed because machine learning models work with numbers, not text. To analyze, classify, or make predictions based on text data, the text must first be transformed into a numerical form that these models can process. This transformation enables models to find patterns, similarities, and even meaning in the text.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Use &lt;code&gt;CountVectorizer&lt;/code&gt; and &lt;code&gt;TfidfVectorizer&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;Using these tools in Python is straightforward, especially with the &lt;a href="https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html" rel="noopener noreferrer"&gt;scikit learn&lt;/a&gt; library. Here’s a quick example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.feature_extraction.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;

&lt;span class="c1"&gt;# Sample documents
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I love coding.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coding is fun.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I love learning new things.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Using CountVectorizer
&lt;/span&gt;&lt;span class="n"&gt;count_vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;count_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Count Vectorizer Result:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Using TfidfVectorizer
&lt;/span&gt;&lt;span class="n"&gt;tfidf_vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tfidf_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tfidf_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TF IDF Vectorizer Result:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tfidf_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;


&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="n"&gt;Count&lt;/span&gt; &lt;span class="n"&gt;Vectorizer&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="n"&gt;TF&lt;/span&gt; &lt;span class="n"&gt;IDF&lt;/span&gt; &lt;span class="n"&gt;Vectorizer&lt;/span&gt; &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mf"&gt;0.70710678&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.70710678&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.4736296&lt;/span&gt;  &lt;span class="mf"&gt;0.62276601&lt;/span&gt; &lt;span class="mf"&gt;0.62276601&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt; &lt;span class="mf"&gt;0.52863461&lt;/span&gt; &lt;span class="mf"&gt;0.40204024&lt;/span&gt; &lt;span class="mf"&gt;0.52863461&lt;/span&gt; &lt;span class="mf"&gt;0.52863461&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Which Vectorizer is Better?
&lt;/h3&gt;

&lt;p&gt;It depends on the task at hand. Here’s a comparison to make it clearer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;&lt;code&gt;CountVectorizer&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;TfidfVectorizer&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Count matrix&lt;/td&gt;
&lt;td&gt;Weighted matrix (importance of terms)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Suitability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good for simple word count&lt;/td&gt;
&lt;td&gt;Better for distinguishing between terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Impact of Frequent Words&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Overly influenced by common words like "the", "is"&lt;/td&gt;
&lt;td&gt;Reduces the weight of frequent words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When word frequency matters (e.g., spam detection)&lt;/td&gt;
&lt;td&gt;When meaning and relevance matter more&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Drawbacks of &lt;code&gt;CountVectorizer&lt;/code&gt; and &lt;code&gt;TfidfVectorizer&lt;/code&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;CountVectorizer&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ignores word order and context.&lt;/li&gt;
&lt;li&gt;High dimensional output with sparse data for large vocabularies.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;TfidfVectorizer&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loses some contextual information.&lt;/li&gt;
&lt;li&gt;Not ideal when the order of words is critical (e.g., for certain NLP tasks like sentiment analysis).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Are max_features in CountVectorizer?
&lt;/h3&gt;

&lt;p&gt;The number of features (columns) in &lt;code&gt;CountVectorizer&lt;/code&gt; corresponds to the number of unique tokens (words) in the corpus. This can be limited using the &lt;code&gt;max_features&lt;/code&gt; parameter. For example, setting &lt;code&gt;max_features=100&lt;/code&gt; will keep only the 100 most frequent words.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using and Reversing the Vectorization Process
&lt;/h3&gt;

&lt;p&gt;To convert text into vectors, use &lt;code&gt;fit_transform()&lt;/code&gt; as shown in the example above. To reverse this process (i.e., turn vectors back into text), use the &lt;code&gt;inverse_transform()&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.feature_extraction.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;

&lt;span class="c1"&gt;# Sample text data
&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The cat sat on the mat.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The dog is in the house.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize both vectorizers
&lt;/span&gt;&lt;span class="n"&gt;count_vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CountVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tfidf_vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Fit and transform the data
&lt;/span&gt;&lt;span class="n"&gt;count_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tfidf_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tfidf_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Display the vectorized representation
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CountVectorizer Matrix:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TfidfVectorizer Matrix:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tfidf_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toarray&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Reverse transformation to get back the original text format
&lt;/span&gt;&lt;span class="n"&gt;count_reversed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inverse_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count_matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tfidf_reversed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tfidf_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inverse_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tfidf_matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Display the reversed text
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Reversed Text from CountVectorizer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;count_reversed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Reversed Text from TfidfVectorizer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tfidf_reversed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Additional Tools and Techniques
&lt;/h3&gt;

&lt;p&gt;Apart from these vectorizers, there are other methods like &lt;code&gt;HashingVectorizer&lt;/code&gt; or using pre trained embeddings like Word2Vec, GloVe, and BERT that can be considered for more advanced use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Choosing between &lt;code&gt;CountVectorizer&lt;/code&gt; and &lt;code&gt;TfidfVectorizer&lt;/code&gt; depends on the nature of the problem and the text data at hand. For beginners, starting with these simple vectorizers is a great way to understand how text data can be transformed into numbers and used in machine learning models. Resource to learn more about Sklearn &lt;a href="https://scikit-learn.org/1.5/api/sklearn.feature_extraction.html" rel="noopener noreferrer"&gt;Sklearn Doc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hey! I hope this helps you understand the concept better. It's completely normal to feel demotivated when you don't grasp something right away. Remember, studying in this field takes time and practice, so try not to lose your motivation. You’ve got this! If you found this helpful, please give it a likeit would really encourage me to create more content like this!&lt;/p&gt;

&lt;p&gt;Happy Coding ❤️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjln1bfsfmg23gi8k8as.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjln1bfsfmg23gi8k8as.jpg" alt="You just got vectored!" width="680" height="680"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>datascience</category>
      <category>beginners</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding the Curse of Dimensionality</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Mon, 07 Oct 2024 17:18:11 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/understanding-the-curse-of-dimensionality-56po</link>
      <guid>https://dev.to/aashwinkumar/understanding-the-curse-of-dimensionality-56po</guid>
      <description>&lt;p&gt;The "curse of dimensionality" is a term used in data science and statistics to describe various phenomena that arise when analyzing and organizing data in high dimensional spaces. This concept is crucial for understanding the challenges faced in machine learning, data analysis, and related fields. Let’s break it down in simple terms.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Is the Curse of Dimensionality?
&lt;/h3&gt;

&lt;p&gt;At its core, the curse of dimensionality refers to the problems that occur when we work with data that has many features or dimensions. &lt;em&gt;Imagine you’re trying to find your way in a very large room filled with furniture. The more furniture (dimensions) there is, the harder it is to navigate without bumping into something.&lt;/em&gt; Similarly, in data analysis, as the number of dimensions increases, our ability to find patterns and make predictions can diminish.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Do We Even Bother?
&lt;/h3&gt;

&lt;p&gt;We bother about dimensionality because many real world problems involve high dimensional data. For instance, when we analyze images, each pixel in the image can be considered a dimension. A simple 100x100 pixel image has 10,000 dimensions! Similarly, in genetics, each gene can represent a dimension, leading to a vast number of features when studying traits or diseases.&lt;/p&gt;




&lt;p&gt;Understanding the curse of dimensionality helps data scientists develop better algorithms and improve the accuracy of their predictions.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Is High Dimension?
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://deepai.org/machine-learning-glossary-and-terms/high-dimensional-data" rel="noopener noreferrer"&gt;High dimensionality&lt;/a&gt; refers to data that has many features or variables.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the context of data analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Low dimensional data&lt;/strong&gt; could be something like a simple dataset with only 2 or 3 features (like height and weight).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High dimensional data&lt;/strong&gt; could have hundreds or thousands of features (like an image's pixel values or customer preferences across hundreds of products).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;In general, anything with more than three dimensions can be considered "high dimensional," and data can easily reach dozens or hundreds of dimensions.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What Happens When We Have High Dimensions?
&lt;/h3&gt;

&lt;p&gt;When we deal with high dimensional data, several issues arise:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distance Becomes Less Meaningful&lt;/strong&gt;: In low dimensions, it's easier to understand how close two points are. In high dimensions, points tend to be equidistant (equally far from two or more places) from each other, making it difficult to find nearby neighbors. For example, if you're looking for friends at a party, it's easier to spot them in a small room than in a huge hall.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sparsity of Data&lt;/strong&gt;: As dimensions increase, the volume of the space grows rapidly. For example, if you have 10 dimensions, the space becomes 10 times larger than it was with just one dimension. This means data points become sparse and less clustered, making it harder to find patterns or group similar items.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Overfitting&lt;/strong&gt;: With many dimensions, models can become overly complex, fitting the noise in the data rather than the underlying trend. This can lead to poor predictions on new, unseen data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  How Do We Know This Is the Curse of Dimensionality?
&lt;/h3&gt;

&lt;p&gt;We can identify the curse of dimensionality through various observations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Experiments with Distance&lt;/strong&gt;: Studies show that as dimensions increase, the distance between points becomes less variable. This means that nearest neighbors are not significantly closer than farthest neighbors, which contradicts our intuitive understanding of proximity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance of Algorithms&lt;/strong&gt;: Many machine learning algorithms, like k-nearest neighbors or clustering methods, perform well in low dimensions but struggle in high dimensions. This drop in performance is a clear indicator of the curse.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visualizations&lt;/strong&gt;: While we cannot visualize more than three dimensions directly, we can use techniques like &lt;a href="https://en.wikipedia.org/wiki/Principal_component_analysis" rel="noopener noreferrer"&gt;Principal Component Analysis&lt;/a&gt; to reduce dimensions and visualize how data behaves in lower dimensional space.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Is There Any Way to Mitigate the Curse of Dimensionality?
&lt;/h3&gt;

&lt;p&gt;Fortunately, there are several strategies to address the curse of dimensionality:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dimensionality Reduction&lt;/strong&gt;: Techniques like PCA, t-SNE, and UMAP can help reduce the number of features while preserving essential information. This simplification allows algorithms to perform better.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Feature Selection&lt;/strong&gt;: Identifying and retaining only the most relevant features can reduce dimensionality. This involves analyzing the data to find which features contribute most to the desired outcome.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Using Appropriate Algorithms&lt;/strong&gt;: Some algorithms are more robust to high dimensions. For instance, tree based methods like random forests or gradient boosting can handle high dimensional data better than linear models.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;The curse of dimensionality presents significant challenges in data analysis and machine learning, especially when working with high dimensional data. By understanding what it is and how it impacts our ability to find meaningful patterns, we can take steps to mitigate its effects. Whether through dimensionality reduction, feature selection, or choosing appropriate algorithms, there are ways to make sense of complex data without getting lost in the high dimensional maze.&lt;/p&gt;

&lt;p&gt;If you think this could help someone you know, please share it with your friends!&lt;/p&gt;

&lt;p&gt;Happy Coding ❤️&lt;/p&gt;




</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>deeplearning</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Understand Normal Distribution</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Mon, 07 Oct 2024 16:54:35 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/unpublished-video-a67-4f87</link>
      <guid>https://dev.to/aashwinkumar/unpublished-video-a67-4f87</guid>
      <description></description>
    </item>
    <item>
      <title>Best Free Resources to Sharpen Your Math Skills for Machine Learning!</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Fri, 04 Oct 2024 17:24:21 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/best-free-resources-to-sharpen-your-math-skills-for-machine-learning-1nkk</link>
      <guid>https://dev.to/aashwinkumar/best-free-resources-to-sharpen-your-math-skills-for-machine-learning-1nkk</guid>
      <description>&lt;p&gt;Hey Guys👋&lt;/p&gt;

&lt;p&gt;I’ve compiled a list of free, high quality resources to help you sharpen your math skills and gain confidence tackling ML algorithms. Check them out:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;YouTube Courses&lt;/strong&gt; 🎥
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/@ProfessorLeonard" rel="noopener noreferrer"&gt;Professor Leonard&lt;/a&gt;&lt;/strong&gt; – Clear and detailed explanations of Algebra, Calculus, and Statistics. Perfect for mastering the basics. 📚
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/@3blue1brown" rel="noopener noreferrer"&gt;3Blue1Brown&lt;/a&gt;&lt;/strong&gt; – Beautiful visual animations simplify even the most complex math concepts. 🎨
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=0z6AhrOSrRs&amp;amp;t=5555s" rel="noopener noreferrer"&gt;Mathematics for Machine Learning (3 Courses in 1)&lt;/a&gt;&lt;/strong&gt; – A comprehensive deep dive into linear algebra, calculus, and probability. 🔢
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=i7vOAcUo5iA&amp;amp;list=PLWKjhJtqVAbl5SlE6aBHzUVZ1e6q1Wz0v" rel="noopener noreferrer"&gt;College Algebra with Python Code&lt;/a&gt;&lt;/strong&gt; – Learn college algebra concepts with real Python coding examples. 📈
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=b7NnMZPNIXA" rel="noopener noreferrer"&gt;Mathematics of Neural Networks&lt;/a&gt;&lt;/strong&gt; – Understand the mathematical core of neural networks, including matrix multiplication and optimization. 🧠
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://youtu.be/HfACrKJ_Y2w?si=ggcV49c1Yoczpeg9" rel="noopener noreferrer"&gt;Calculus 1 – Full College Course&lt;/a&gt;&lt;/strong&gt; – Ideal for mastering calculus, essential for gradient descent and other optimization techniques. 🔄
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://youtu.be/xxpc-HPKN28?si=t22i49Fmih-Y4TGo" rel="noopener noreferrer"&gt;Statistics - A Full University Course on Data Science Basics&lt;/a&gt;&lt;/strong&gt; – A detailed university-level course covering statistics for data science. 📊
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://youtu.be/sbbYntt5CJk?si=tA9_wGCMjBjTRCpD" rel="noopener noreferrer"&gt;Statistics and Probability Full Course&lt;/a&gt;&lt;/strong&gt; – Comprehensive guide to statistics and probability for data science enthusiasts. 🎲
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://youtu.be/74oUwKezFho?si=9ijYeJMbGjqHHJ7X" rel="noopener noreferrer"&gt;Statistics Full Course for Beginners&lt;/a&gt;&lt;/strong&gt; – A beginner-friendly course perfect for those starting out in data science. 👨‍🏫&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Free Books&lt;/strong&gt; 📚
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://mml-book.github.io/" rel="noopener noreferrer"&gt;Mathematics for Machine Learning (Free PDF)&lt;/a&gt;&lt;/strong&gt; – A fantastic, in-depth resource for anyone serious about learning the math behind machine learning. This book covers linear algebra, calculus, and more!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://greenteapress.com/wp/think-stats/" rel="noopener noreferrer"&gt;Think Stats (Free Download)&lt;/a&gt;&lt;/strong&gt; – An introduction to probability and statistics for data scientists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://linear.axler.net/LADR4e.pdf" rel="noopener noreferrer"&gt;Linear Algebra Done Right (Free)&lt;/a&gt;&lt;/strong&gt; – A clear and approachable book for learning the essentials of linear algebra.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Blogs&lt;/strong&gt; 📝
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.kdnuggets.com/2018/12/introduction-statistics-data-science.html" rel="noopener noreferrer"&gt;Introduction to Statistics for Data Science&lt;/a&gt;&lt;/strong&gt; – A fantastic primer for understanding how statistics fits into the data science field.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://machinelearningmastery.com/mathematics-for-machine-learning/" rel="noopener noreferrer"&gt;The Mathematics of Machine Learning&lt;/a&gt;&lt;/strong&gt; – A step-by-step guide that breaks down essential math concepts needed for ML.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://towardsdatascience.com/top-10-math-skills-for-machine-learning-3d3c727dd5f4" rel="noopener noreferrer"&gt;Top 10 Math Skills for Machine Learning&lt;/a&gt;&lt;/strong&gt; – This article explains the key math skills every data scientist needs.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Free Courses&lt;/strong&gt; 💻
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://pll.harvard.edu/subject/mathematics/free" rel="noopener noreferrer"&gt;Harvard Free Mathematics Courses&lt;/a&gt;&lt;/strong&gt; – A collection of free math courses from Harvard, covering a wide range of topics from algebra to advanced calculus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ocw.mit.edu/courses/mathematics/" rel="noopener noreferrer"&gt;MIT Mathematics Courses&lt;/a&gt;&lt;/strong&gt; – Dive into free courses from MIT’s OpenCourseWare on subjects like linear algebra, differential equations, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://alison.com/tag/mathematics" rel="noopener noreferrer"&gt;Alison Free Online Mathematics Courses&lt;/a&gt;&lt;/strong&gt; – A variety of free courses covering different areas of mathematics, including statistics and calculus.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Feel free to share any other great resources you’ve come across in the comments. Happy learning! 🎉&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>You Can Learn 🐍 Python Effectively !</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Fri, 04 Oct 2024 05:10:20 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/you-can-learn-python-effectively--3ma2</link>
      <guid>https://dev.to/aashwinkumar/you-can-learn-python-effectively--3ma2</guid>
      <description>&lt;p&gt;So, you’ve decided to dive into Python programming great choice! Python is not only one of the most popular programming languages today, but it’s also known for its simplicity and readability. However, many beginners often find themselves stuck when they start learning. Where should you begin? What should you focus on first? Don’t worry! This guide will break it down for you step-by-step in an easy to understand way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Python?
&lt;/h3&gt;

&lt;p&gt;Python is widely used in web development, data science, machine learning, automation, and even game development. Its clear syntax makes it an ideal choice for beginners. But even though it’s easy to get started, the sea of tutorials, topics, and resources can feel overwhelming.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Master the Basics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before diving into advanced topics or complicated projects, it’s crucial to build a solid foundation. Here’s how you can get started:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Understand Python’s Basic Syntax and Data Types&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Every programming language has its building blocks. In Python, those are:&lt;br&gt;
&lt;strong&gt;Variables&lt;/strong&gt;: Think of them as containers that hold data. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;John&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;   &lt;span class="c1"&gt;# This is a string variable
&lt;/span&gt;  &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;        &lt;span class="c1"&gt;# This is an integer variable
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data Types&lt;/strong&gt;: Python supports several built-in data types like strings, integers, floats, lists, and dictionaries. Here’s an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;fruits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Apple&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Banana&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cherry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# This is a list
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Getting comfortable with these basics will help you handle more complex topics in the future.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Control Flow: Decision Making with Conditions and Loops&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Control flow statements guide the program’s execution. Start with &lt;code&gt;if&lt;/code&gt;, &lt;code&gt;elif&lt;/code&gt;, and &lt;code&gt;else&lt;/code&gt; statements to make decisions. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s hot outside!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s a pleasant day.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s quite cold today.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similarly, loops help you automate repetitive tasks. Use &lt;code&gt;for&lt;/code&gt; and &lt;code&gt;while&lt;/code&gt; loops to iterate over items:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fruit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fruits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fruit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# This will print each fruit in the list.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These statements allow you to control the flow of your program and are fundamental to problem solving in programming.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Practice with Functions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Functions help break down your code into smaller, reusable blocks. Creating functions not only makes your code more readable but also allows you to perform the same action multiple times without rewriting it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# This will print: Hello, Alice!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start by defining simple functions, then gradually add parameters and return values.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Explore Modules and Libraries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Python comes with a wide range of built in modules, and the community has created countless libraries that you can use for free. A module is a file that contains Python code you can reuse.&lt;/p&gt;

&lt;p&gt;For instance, the &lt;code&gt;math&lt;/code&gt; module provides mathematical functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# This will print 4.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explore libraries like &lt;strong&gt;NumPy&lt;/strong&gt; for numerical computations and &lt;strong&gt;Pandas&lt;/strong&gt; for data analysis when you’re ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Get Comfortable with Object Oriented Programming (OOP)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OOP helps you model real world scenarios using classes and objects. It may sound intimidating, but here’s a simple analogy: Think of a class as a blueprint and an object as the actual thing built using that blueprint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Dog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;breed&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;breed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;breed&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is barking!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;my_dog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Buddy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Golden Retriever&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;my_dog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bark&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Output: Buddy is barking!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Understanding classes, objects, and methods will unlock more advanced Python concepts for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 5: Error Handling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Everyone makes mistakes, and so does your code. Learning how to handle errors gracefully is crucial. Use &lt;code&gt;try&lt;/code&gt; and &lt;code&gt;except&lt;/code&gt; blocks to manage exceptions and prevent your program from crashing unexpectedly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ZeroDivisionError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Oops! You can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t divide by zero.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Handling errors will make your programs more robust and user-friendly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 6: Build Projects and Solve Problems&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After you’ve mastered the basics, the best way to learn is by doing. Start with small projects, like a calculator or a to do list application. As you gain confidence, tackle more complex projects, such as a web scraper or a data visualization tool.&lt;/p&gt;

&lt;p&gt;Working on projects will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reinforce your learning.&lt;/li&gt;
&lt;li&gt;Expose you to real world scenarios.&lt;/li&gt;
&lt;li&gt;Teach you how to solve problems independently.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where to Find Resources?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;There are tons of free resources available:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/training/browse/?terms=python" rel="noopener noreferrer"&gt;&lt;strong&gt;Microsoft Course On Python&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://edu.machinelearningplus.com/courses/Python-Programming-628f8abb0cf22aee960e8234?redirectToMicroFE=false" rel="noopener noreferrer"&gt;&lt;strong&gt;Complete Python Programming By ML+&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://cognitiveclass.ai/courses/python-for-data-science" rel="noopener noreferrer"&gt;&lt;strong&gt;Python For Data Science By IBM&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.python.org/3/" rel="noopener noreferrer"&gt;&lt;strong&gt;Python documentation&lt;/strong&gt;&lt;/a&gt; is a treasure trove of information.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5q8gpgfamteqx1hhtbm.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5q8gpgfamteqx1hhtbm.gif" alt="Google It"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google it! 😎&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final Words of Advice&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Learning Python, or any programming language, is like climbing a mountain. The beginning is often the hardest part, where everything feels overwhelming. But keep practicing, break down problems into smaller pieces, and don’t hesitate to seek help from the community. With consistent effort, you’ll soon find yourself creating your own programs, solving real world problems, and having fun along the way!&lt;/p&gt;

&lt;p&gt;So, Save this and go ahead take that first step, start small, and happy coding! 🌟&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How Python Dictionaries Keep Your Code Clean and DRY</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Wed, 02 Oct 2024 05:10:21 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/how-python-dictionaries-keep-your-code-clean-and-dry-3067</link>
      <guid>https://dev.to/aashwinkumar/how-python-dictionaries-keep-your-code-clean-and-dry-3067</guid>
      <description>&lt;h3&gt;
  
  
  Python Dictionary and the DRY Principle: A Quick Guide for Beginners
&lt;/h3&gt;

&lt;p&gt;Hey there! 👋 If you’re diving into Python programming, you’ve probably stumbled upon dictionaries and maybe wondered, “What exactly is a dictionary in Python, and how can it help me code smarter?” No worries let’s break it down in a super simple way.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;What’s a Dictionary in Python?&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Imagine you have a list of items, and each item has a unique label attached to it, like “name: John” or “age: 25”. A &lt;a href="https://docs.python.org/3/tutorial/datastructures.html#dictionaries" rel="noopener noreferrer"&gt;dictionary in Python&lt;/a&gt; works exactly like that! It’s a collection of key value pairs, where each key is unique and points to a specific value. Think of it as a mini database for storing information in a neat and organized way.&lt;/p&gt;

&lt;p&gt;It’s like a real dictionary where you look up a word (the key) and get its meaning (the value). Cool, right? 😎&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How to Make a Dictionary in Python?&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Creating a dictionary is as easy as pie. You just use curly braces &lt;code&gt;{}&lt;/code&gt; and separate each key value pair with a colon &lt;code&gt;:&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here’s how you can make a simple dictionary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Creating a dictionary to store student information
&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;John Doe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;major&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Computer Science&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Printing out the dictionary
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dictionary stores a student’s name, age, and major. Notice how the keys like &lt;code&gt;'name'&lt;/code&gt; and &lt;code&gt;'age'&lt;/code&gt; are in quotes? That’s because keys can be strings, numbers, or even tuples! The values can be anything strings, lists, other dictionaries, you name it.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;How Dictionaries Help Us to Avoid Repetition (DRY Principle)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Now, here’s where it gets interesting. You may have heard of the &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" rel="noopener noreferrer"&gt;DRY principle&lt;/a&gt;&lt;/strong&gt;, which stands for &lt;em&gt;Don’t Repeat Yourself&lt;/em&gt;. It’s a rule that encourages you to avoid redundancy in your code. How can dictionaries help with that? Let’s take a look.&lt;/p&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;Before Using a Dictionary (Repeating Code)&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;Imagine you want to store information about students in separate variables. It might look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;student1_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;student1_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;student1_major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mathematics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;student2_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;student2_age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;
&lt;span class="n"&gt;student2_major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Physics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not only do we have repetitive variable names, but if we want to print or update these, we have to repeat ourselves again and again. This is where dictionaries can save the day! 🦸&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Example 1: After Using a Dictionary (DRY Version)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With dictionaries, we can store all this information in a cleaner way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using dictionaries to store student data
&lt;/span&gt;&lt;span class="n"&gt;students&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;student1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;major&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mathematics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;student2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;major&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Physics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;student1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# Output: Alice
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;students&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;student2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;   &lt;span class="c1"&gt;# Output: 22
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, you don’t have to create separate variables for each student’s name, age, and major. You can access or update the information in a much simpler way. Plus, it makes your code cleaner and easier to manage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Example 2: Avoiding Repetition with Dictionaries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let’s say you want to create a simple grading system based on student scores. Without dictionaries, you might end up writing the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Without dictionary (repeating code)
&lt;/span&gt;&lt;span class="n"&gt;alice_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;
&lt;span class="n"&gt;bob_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;
&lt;span class="n"&gt;charlie_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;alice_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice gets an A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bob_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob gets an A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;charlie_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Charlie gets an A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we’re repeating the &lt;code&gt;if&lt;/code&gt; statements and hardcoding each student’s name and score, which violates the DRY principle.&lt;/p&gt;

&lt;p&gt;Instead, with a dictionary, you can avoid repetition like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using a dictionary (DRY principle)
&lt;/span&gt;&lt;span class="n"&gt;student_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Charlie&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;student&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;student_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;student&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; gets an A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, you have a cleaner, shorter, and more maintainable code! You only write the &lt;code&gt;if&lt;/code&gt; statement once, and it works for all students in your dictionary. 🎉&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Useful Dictionary Methods&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Dictionaries come with a bunch of built-in methods that make working with them a breeze. Let’s check out a few of them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.get()&lt;/code&gt;&lt;/strong&gt;: Helps you avoid errors if the key doesn’t exist.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Address not available&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  
   &lt;span class="c1"&gt;# Output: Address not available
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.keys()&lt;/code&gt; and &lt;code&gt;.values()&lt;/code&gt;&lt;/strong&gt;: Get all keys or values in the dictionary.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Output: dict_keys(['name', 'age', 'major'])
&lt;/span&gt;   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Output: dict_values(['John Doe', 21, 'Computer Science'])
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.items()&lt;/code&gt;&lt;/strong&gt;: Get both keys and values as pairs.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="c1"&gt;# Output: 
&lt;/span&gt;   &lt;span class="c1"&gt;# name: John Doe
&lt;/span&gt;   &lt;span class="c1"&gt;# age: 21
&lt;/span&gt;   &lt;span class="c1"&gt;# major: Computer Science
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.update()&lt;/code&gt;&lt;/strong&gt;: Update a dictionary with another dictionary or key-value pairs.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;grade&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
   &lt;span class="c1"&gt;# Output: {'name': 'John Doe', 'age': 21, 'major': 'Computer Science', 'grade': 'A'}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.setdefault()&lt;/code&gt;&lt;/strong&gt;: Adds a key with a default value if the key doesn’t exist.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setdefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;graduation_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
   &lt;span class="c1"&gt;# Output: {'name': 'John Doe', 'age': 21, 'major': 'Computer Science', 'grade': 'A', 'graduation_year': 2024}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Wrapping Up&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Dictionaries are super powerful and can really help you follow the DRY principle in your code. By using dictionaries, you avoid repeating yourself, keep your code organized, and make it easier to read and maintain.&lt;/p&gt;

&lt;p&gt;So, the next time you find yourself creating a bunch of similar variables, consider using a dictionary instead. It’ll save you a ton of time and effort, and your future self will thank you! 🙌&lt;/p&gt;

&lt;p&gt;Happy coding! 💻&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>coding</category>
      <category>newbie</category>
    </item>
    <item>
      <title>How User Manipulation Can Affect Google’s Advertising Model</title>
      <dc:creator>Ashwin Kumar</dc:creator>
      <pubDate>Mon, 30 Sep 2024 16:59:42 +0000</pubDate>
      <link>https://dev.to/aashwinkumar/how-user-manipulation-can-affect-googles-advertising-model-jji</link>
      <guid>https://dev.to/aashwinkumar/how-user-manipulation-can-affect-googles-advertising-model-jji</guid>
      <description>&lt;h2&gt;
  
  
  The Key to Google’s Ad Model Success is?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Relevance&lt;/strong&gt; is the cornerstone of Google’s success and the foundation of its entire advertising model. Businesses pay Google to show their ads to a highly targeted and relevant audience, ensuring that their marketing dollars are being spent efficiently. But what happens if a portion of Google’s users unintentionally disrupt this relevance? &lt;/p&gt;

&lt;h3&gt;
  
  
  Unintentional Manipulation of Ad Relevance
&lt;/h3&gt;

&lt;p&gt;Recently, I came across a YouTuber suggesting ways to block YouTube ads on both mobile and desktop devices. The strategy involved clicking on the "i" button next to an ad and reporting it as "not relevant." While this may seem like a harmless way for users to avoid seeing ads, it has broader implications.&lt;/p&gt;

&lt;p&gt;By marking an ad as irrelevant even when it’s actually targeted correctly users are unintentionally providing Google with manipulated feedback. As a result, Google’s machine learning algorithms could potentially misinterpret this data, leading to skewed targeting in future ad placements. But what does this mean for advertisers?&lt;/p&gt;

&lt;h3&gt;
  
  
  How Does This Impact Advertisers?
&lt;/h3&gt;

&lt;p&gt;For advertisers, &lt;a href="https://www.relevance.com/guide-to-relevance-marketing/#:~:text=Relevance%20is%20a%20marketing%20differentiator&amp;amp;text=By%20incorporating%20relevance%20marketing%20into,their%20changing%20wants%20and%20needs." rel="noopener noreferrer"&gt;relevance is everything&lt;/a&gt;. They rely on Google to show their ads to the right audience, increasing the chances of engagement, conversions, and ultimately, sales. When users manipulate the feedback loop by marking ads as irrelevant, even if they are perfectly targeted, it can cause:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reduced Ad Performance&lt;/strong&gt;: If Google’s algorithms begin to misinterpret which audiences find ads relevant, this can lead to ads being shown to less relevant users, reducing overall ad performance and ROI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Higher Ad Costs&lt;/strong&gt;: When relevance scores drop, advertisers might have to pay higher costs to maintain their ad positions, as Google’s system perceives them as less valuable to users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wasted Budget&lt;/strong&gt;: Ads being shown to less interested users mean more wasted impressions and clicks, ultimately leading to a higher CPA and lower campaign effectiveness.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  A Data Science Perspective Impact on Machine Learning Models
&lt;/h3&gt;

&lt;p&gt;From a data science perspective, even a small amount of manipulated data can have a ripple effect on Google’s machine learning models, which rely on clean and unbiased data for training. Here’s how:&lt;/p&gt;

&lt;h4&gt;
  
  
  Impact on Model Performance
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bias in Predictions&lt;/strong&gt;: Even a small percentage of manipulated feedback can lead to biased predictions. If the altered data skews the training set, the model may learn to favor certain outcomes that do not reflect genuine user behavior, resulting in unfair or inaccurate ad targeting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Poor Generalization&lt;/strong&gt;: Models trained on data that include manipulative elements may perform well in controlled environments but fail to generalize effectively in real world applications. This could lead to ineffective ad placements and decreased overall campaign performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Overfitting&lt;/strong&gt;: If the manipulated data introduces noise, the model might overfit to these specific patterns rather than learning meaningful signals from the broader dataset. This results in poor performance when encountering new, unseen data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Model Poisoning
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.sciencedirect.com/science/article/pii/S187705092101869X" rel="noopener noreferrer"&gt;Model poisoning&lt;/a&gt; is another critical concern in the context of manipulated data. If a significant number of users begin to report ads as irrelevant, this can compromise the integrity of the training data. The model might be exposed to deliberately misleading feedback, leading it to make poor decisions based on incorrect assumptions about user preferences. This can cause a cycle where the model continues to reinforce bad predictions, further straying from accurate user targeting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Matters, LongTerm Consequences for the Ad Ecosystem?
&lt;/h3&gt;

&lt;p&gt;While the manipulation of relevance data might not have an immediate and significant impact on Google’s revenue or ad relevance scores, it poses a long term risk. Google’s machine learning models are continuously learning and adapting based on the data they receive. &lt;a href="https://shardsecure.com/blog/data-manipulation-ml" rel="noopener noreferrer"&gt;Manipulated data can cause the models to become less effective over time&lt;/a&gt;, affecting both advertisers and end-users.&lt;/p&gt;

&lt;p&gt;For advertisers, it means less accurate targeting, wasted budget, and higher costs. For end users, it means a less personalized browsing experience, which could lead to frustration and a decline in overall user satisfaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion Maintaining the Integrity of Ad Relevance
&lt;/h3&gt;

&lt;p&gt;The relevance of ads is vital for both advertisers and end users, and any unintentional manipulation of this relevance can have far reaching consequences. While blocking ads or marking them as irrelevant may seem like a harmless action, it’s important to understand the broader impact it can have on the ad ecosystem.&lt;/p&gt;

&lt;p&gt;For Google, it’s crucial to continually refine its machine learning models and ensure they are resilient against such manipulative behavior. For advertisers, awareness and understanding of how these systems work can help them better strategize and optimize their campaigns in a rapidly evolving digital landscape.&lt;/p&gt;

&lt;p&gt;If you have any concerns or feedback about my article, please feel free to leave a comment so I can make the necessary corrections. Thank you for your time!&lt;/p&gt;

&lt;p&gt;Happy Coding!&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>google</category>
    </item>
  </channel>
</rss>
