<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: KARTHIK G T</title>
    <description>The latest articles on DEV Community by KARTHIK G T (@karthik_gt_b2d9a6b518984).</description>
    <link>https://dev.to/karthik_gt_b2d9a6b518984</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3600681%2Ffe4ee3a1-06e5-4883-8e1b-1de5848c83f1.png</url>
      <title>DEV Community: KARTHIK G T</title>
      <link>https://dev.to/karthik_gt_b2d9a6b518984</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/karthik_gt_b2d9a6b518984"/>
    <language>en</language>
    <item>
      <title>🧩 Data Cleaning Challenge with Pandas (Google Colab)</title>
      <dc:creator>KARTHIK G T</dc:creator>
      <pubDate>Fri, 07 Nov 2025 09:36:18 +0000</pubDate>
      <link>https://dev.to/karthik_gt_b2d9a6b518984/data-cleaning-challenge-with-pandas-google-colab-4iom</link>
      <guid>https://dev.to/karthik_gt_b2d9a6b518984/data-cleaning-challenge-with-pandas-google-colab-4iom</guid>
      <description>&lt;p&gt;🧠 Introduction&lt;/p&gt;

&lt;p&gt;For this task, I worked on cleaning and preprocessing a real-world dataset using Python’s Pandas library in Google Colab.&lt;br&gt;
I selected the E-commerce Sales Dataset from Kaggle, which originally contained 112,000 rows and 18 columns.&lt;br&gt;
The dataset included transactional information such as order IDs, product categories, prices, quantities, sales amounts, and customer regions.&lt;br&gt;
The main goal of this activity was to identify and correct data quality issues—such as missing values, duplicates, inconsistent formatting, and incorrect data types—so that the dataset could be ready for analysis and visualization.&lt;/p&gt;

&lt;p&gt;This activity helped me understand how data cleaning is a critical step in any data pipeline and how Pandas provides powerful tools to efficiently manage and preprocess large datasets.&lt;/p&gt;

&lt;p&gt;📊 Dataset Overview&lt;/p&gt;

&lt;p&gt;After importing the dataset using pd.read_csv() and checking the structure with df.info() and df.head(), I observed that:&lt;/p&gt;

&lt;p&gt;Several columns contained missing values, particularly in discount, profit, and ship_date.&lt;/p&gt;

&lt;p&gt;Some records were duplicated.&lt;/p&gt;

&lt;p&gt;The order_date and ship_date columns were stored as plain strings instead of proper datetime objects.&lt;/p&gt;

&lt;p&gt;Columns like Product Name and Category had inconsistent capitalization and extra spaces.&lt;/p&gt;

&lt;p&gt;Numerical columns like Sales and Profit sometimes contained text symbols such as “$” or “N/A”.&lt;/p&gt;

&lt;p&gt;These issues could cause errors or inaccuracies during analysis, so systematic cleaning steps were needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlx6evluuynfeihlztky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlx6evluuynfeihlztky.png" alt="Importing dataset" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🛠️ Cleaning &amp;amp; Preprocessing Steps&lt;/p&gt;

&lt;p&gt;Loading and Initial Inspection&lt;br&gt;
The dataset was loaded into a Pandas DataFrame using:&lt;/p&gt;

&lt;p&gt;df = pd.read_csv('/content/ecommerce_data.csv')&lt;br&gt;
df.info()&lt;br&gt;
df.head()&lt;/p&gt;

&lt;p&gt;This provided an overview of the data types and revealed missing and inconsistent entries.&lt;/p&gt;

&lt;p&gt;Handling Missing Values&lt;br&gt;
I counted missing values using df.isnull().sum().&lt;br&gt;
For numeric columns, I filled missing values with their mean using:&lt;/p&gt;

&lt;p&gt;df['profit'].fillna(df['profit'].mean(), inplace=True)&lt;/p&gt;

&lt;p&gt;For categorical columns, I replaced nulls with their mode or “Unknown”.&lt;br&gt;
Some rows with excessive missing values were removed using df.dropna().&lt;/p&gt;

&lt;p&gt;Removing Duplicates&lt;br&gt;
Duplicate records were identified with df.duplicated().sum() and removed using:&lt;/p&gt;

&lt;p&gt;df.drop_duplicates(inplace=True)&lt;/p&gt;

&lt;p&gt;Fixing Inconsistent Formats&lt;/p&gt;

&lt;p&gt;Date Columns: Converted using pd.to_datetime(df['order_date']) and pd.to_datetime(df['ship_date']).&lt;/p&gt;

&lt;p&gt;Text Columns: Cleaned using string functions:&lt;/p&gt;

&lt;p&gt;df['category'] = df['category'].str.strip().str.title()&lt;/p&gt;

&lt;p&gt;Numeric Columns: Removed symbols and converted to numeric:&lt;/p&gt;

&lt;p&gt;df['sales'] = df['sales'].replace('[\$,]', '', regex=True).astype(float)&lt;/p&gt;

&lt;p&gt;Renaming Columns&lt;br&gt;
To make column names consistent and easier to reference, I used:&lt;/p&gt;

&lt;p&gt;df.rename(columns={'Order ID': 'order_id', 'Product Name': 'product_name'}, inplace=True)&lt;/p&gt;

&lt;p&gt;This followed the snake_case naming convention.&lt;/p&gt;

&lt;p&gt;Filtering and Subsetting&lt;br&gt;
To focus on high-value transactions, I created a filtered dataset of all sales greater than 1000:&lt;/p&gt;

&lt;p&gt;high_sales = df[df['sales'] &amp;gt; 1000]&lt;/p&gt;

&lt;p&gt;Grouping and Aggregating&lt;br&gt;
I calculated total and average sales by region:&lt;/p&gt;

&lt;p&gt;region_sales = df.groupby('region')['sales'].sum().reset_index()&lt;/p&gt;

&lt;p&gt;Converting Data Types&lt;br&gt;
Columns such as region and category were converted to categorical types to optimize memory:&lt;/p&gt;

&lt;p&gt;df['region'] = df['region'].astype('category')&lt;/p&gt;

&lt;p&gt;📈 Before vs After Summary&lt;br&gt;
Metric  Before Cleaning After Cleaning&lt;br&gt;
Total Rows  112,000 109,800&lt;br&gt;
Missing Values  10,245  0&lt;br&gt;
Duplicate Records   2,000   0&lt;br&gt;
Incorrect Data Types    6   0&lt;br&gt;
Inconsistent Text Entries   4 Columns   Fixed&lt;br&gt;
Columns Renamed 0   12 Renamed&lt;/p&gt;

&lt;p&gt;After cleaning, the dataset became more reliable, consistent, and ready for visualization or machine learning use.&lt;/p&gt;

&lt;p&gt;📉 Visualization and Export&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbuevtl81hnzl3n36som.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbuevtl81hnzl3n36som.png" alt="Result" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To verify improvements, I created visualizations using Matplotlib:&lt;/p&gt;

&lt;p&gt;region_sales.plot(kind='bar', x='region', y='sales', figsize=(8,4), title='Total Sales by Region')&lt;br&gt;
plt.show()&lt;/p&gt;

&lt;p&gt;Another chart displayed the distribution of sales values before and after cleaning, showing that outliers and missing data had been corrected.&lt;/p&gt;

&lt;p&gt;Finally, the cleaned dataset was exported using:&lt;/p&gt;

&lt;p&gt;df.to_csv('/content/Cleaned_Dataset.csv', index=False)&lt;/p&gt;

&lt;p&gt;This exported file can now be reused for dashboards or analysis.&lt;/p&gt;

&lt;p&gt;🎓 Conclusion&lt;/p&gt;

&lt;p&gt;Through this project, I learned that data cleaning is one of the most crucial and time-consuming stages in any data analysis process.&lt;br&gt;
Using Pandas, I was able to efficiently detect and correct missing, duplicated, and inconsistent data.&lt;br&gt;
The ability to transform raw data into a structured and reliable format is what makes accurate data-driven insights possible.&lt;/p&gt;

&lt;p&gt;For Data Engineers and Data Analysts, mastering data preprocessing using Pandas is essential.&lt;br&gt;
This task not only strengthened my technical skills in handling large datasets but also gave me a deeper understanding of the importance of clean, well-structured data for analytics and decision-making.&lt;/p&gt;

</description>
      <category>challenge</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>My MongoDB Hands-On: NoSQL Data Analysis Using Yelp Dataset</title>
      <dc:creator>KARTHIK G T</dc:creator>
      <pubDate>Fri, 07 Nov 2025 08:44:13 +0000</pubDate>
      <link>https://dev.to/karthik_gt_b2d9a6b518984/my-mongodb-hands-on-nosql-data-analysis-using-yelp-dataset-1j7i</link>
      <guid>https://dev.to/karthik_gt_b2d9a6b518984/my-mongodb-hands-on-nosql-data-analysis-using-yelp-dataset-1j7i</guid>
      <description>&lt;p&gt;👋 Introduction&lt;/p&gt;

&lt;p&gt;MongoDB is a popular NoSQL database designed to store and manage data in a flexible, document-based format using JSON-like documents.&lt;br&gt;
Unlike traditional relational databases, MongoDB allows developers to work with unstructured or semi-structured data easily, making it perfect for modern applications in Data Engineering and Data Analysis.&lt;/p&gt;

&lt;p&gt;Through this hands-on task, I learned how to:&lt;/p&gt;

&lt;p&gt;Install and set up MongoDB (locally or via cloud using MongoDB Atlas)&lt;/p&gt;

&lt;p&gt;Import datasets and store records&lt;/p&gt;

&lt;p&gt;Perform CRUD operations (Create, Read, Update, Delete)&lt;/p&gt;

&lt;p&gt;Export query results for further analysis&lt;/p&gt;

&lt;p&gt;This activity helped me understand how real-world data can be stored, queried, and analyzed efficiently using a NoSQL database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnicfs2m2dvlh3x1zewd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnicfs2m2dvlh3x1zewd3.png" alt="MongoDB installation" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚙️ Installation Steps&lt;/p&gt;

&lt;p&gt;I installed MongoDB using MongoDB Compass, which provides a user-friendly graphical interface for database operations.&lt;br&gt;
After connecting to the local MongoDB server (mongodb://localhost:27017), I created a new database named reviewsDB and a collection called businesses to store my data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik2ku6eqxdpjfl3m18g3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik2ku6eqxdpjfl3m18g3.png" alt="MongoDB Compass connected to local database" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📂 Importing Dataset&lt;/p&gt;

&lt;p&gt;Next, I imported my dataset — the Yelp Reviews Dataset downloaded from Kaggle.&lt;br&gt;
Using MongoDB Compass, I selected “Add Data → Import File”, chose the JSON dataset, and imported it into my businesses collection.&lt;br&gt;
All records were successfully displayed in the Compass data viewer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkyg01vhdvyt1kyh9rxs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkyg01vhdvyt1kyh9rxs.png" alt="Imported dataset shown in MongoDB Compass" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💻 Performing Queries&lt;br&gt;
1️⃣ Insert at least 10 records manually&lt;br&gt;
db.businesses.insertMany([&lt;br&gt;
  {business_id: 1, name: "Cafe Aroma", rating: 4.5, review: "Good coffee and great service!"},&lt;br&gt;
  {business_id: 2, name: "Pizza Hub", rating: 4.2, review: "Tasty pizza and good ambience."},&lt;br&gt;
  {business_id: 3, name: "Burger Bite", rating: 3.9, review: "Decent burgers, good value."},&lt;br&gt;
  {business_id: 4, name: "Sweet Treats", rating: 4.8, review: "Excellent desserts!"},&lt;br&gt;
  {business_id: 5, name: "Veggie Delight", rating: 4.0, review: "Healthy food and good taste."},&lt;br&gt;
  {business_id: 6, name: "Spice Route", rating: 4.6, review: "Authentic and spicy dishes."},&lt;br&gt;
  {business_id: 7, name: "Urban Eatery", rating: 4.3, review: "Modern setup and good food."},&lt;br&gt;
  {business_id: 8, name: "Taco Time", rating: 3.8, review: "Average tacos but good service."},&lt;br&gt;
  {business_id: 9, name: "Pasta Palace", rating: 4.7, review: "Great pasta and ambience."},&lt;br&gt;
  {business_id: 10, name: "Choco Heaven", rating: 4.9, review: "Best chocolates and cakes ever!"}&lt;br&gt;
])&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltelifnswj6u4twtqeh4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltelifnswj6u4twtqeh4.png" alt="Insert operation result" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2️⃣ Top 5 businesses with highest average rating&lt;br&gt;
db.businesses.find().sort({rating: -1}).limit(5)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj3gfkawam2cldoron6m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj3gfkawam2cldoron6m.png" alt="Top 5 businesses output" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3️⃣ Count reviews containing the word “good”&lt;br&gt;
db.businesses.countDocuments({review: /good/i})&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vdiyq6l8mbi7tdp1emb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vdiyq6l8mbi7tdp1emb.png" alt="Count result showing reviews containing “good”" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4️⃣ Get all reviews for a specific business ID&lt;br&gt;
db.businesses.find({business_id: 2})&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2ki7gbmbttba2h6it37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2ki7gbmbttba2h6it37.png" alt="Reviews for selected business" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;5️⃣ Update a review&lt;br&gt;
db.businesses.updateOne(&lt;br&gt;
  {business_id: 3},&lt;br&gt;
  {$set: {review: "Delicious burgers with excellent value!"}}&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jqdfe6qn3ap4zpuzxru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jqdfe6qn3ap4zpuzxru.png" alt="Update query result" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;6️⃣ Delete a record&lt;br&gt;
db.businesses.deleteOne({business_id: 8})&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwklw96g0jseik2m948w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwklw96g0jseik2m948w.png" alt="Record deletion confirmation" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📤 Exporting Results&lt;/p&gt;

&lt;p&gt;After running each query, I used the Export Results feature in MongoDB Compass to save my outputs.&lt;br&gt;
The data was exported in both JSON and CSV formats for further analysis or submission.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmzhp0d3j0dkqdsk5nvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmzhp0d3j0dkqdsk5nvj.png" alt="Export dialog and saved results file" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🎓 Conclusion&lt;/p&gt;

&lt;p&gt;This MongoDB hands-on activity gave me practical experience in managing NoSQL data.&lt;br&gt;
I learned how to:&lt;/p&gt;

&lt;p&gt;Create and manage collections&lt;/p&gt;

&lt;p&gt;Perform CRUD operations&lt;/p&gt;

&lt;p&gt;Query and filter data using expressions and patterns&lt;/p&gt;

&lt;p&gt;Export and visualize real-world datasets&lt;/p&gt;

&lt;p&gt;MongoDB’s flexible schema and JSON-based structure make it an essential tool for Data Engineers and Analysts, especially when handling large, diverse datasets where relational models are less effective.&lt;/p&gt;

&lt;p&gt;It’s a valuable skill for anyone interested in data pipelines, analytics, or backend engineering.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>mongodb</category>
      <category>tutorial</category>
      <category>database</category>
    </item>
  </channel>
</rss>
