DEV Community

Cover image for πŸ—“οΈ Day 50 of My Data Analytics Journey !
Ramya .C
Ramya .C

Posted on

πŸ—“οΈ Day 50 of My Data Analytics Journey !

Today marks a milestone β€” Day 50 of my learning journey in Data Analytics! πŸŽ‰

Here’s what I learned today in Pandas and Spark:


🧩 1. Pivot Table in Pandas

A pivot table is used to summarize and analyze data.

import pandas as pd

data = {
    'Product': ['A', 'A', 'B', 'B', 'C'],
    'Category': ['X', 'Y', 'X', 'Y', 'X'],
    'Quantity': [10, 15, 20, 25, 30]
}

df = pd.DataFrame(data)

pivot = df.pivot_table(values='Quantity', index='Product', columns='Category', aggfunc='mean')
print(pivot)
Enter fullscreen mode Exit fullscreen mode

πŸ“ˆ 2. Pandas Series

A Series is a one-dimensional labeled array.

import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)
Enter fullscreen mode Exit fullscreen mode

πŸ”’ 3. Using range()

numbers = list(range(1, 6))
print(numbers)
Enter fullscreen mode Exit fullscreen mode

🎯 4. Accessing Declared Values

print(s['b'])   # Access using label
print(s[2])     # Access using index
Enter fullscreen mode Exit fullscreen mode

⚑ 5. Spark

Spark is used for large-scale data processing.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("DataExample").getOrCreate()
df = spark.read.csv("data.csv", header=True, inferSchema=True)
df.show()
Enter fullscreen mode Exit fullscreen mode

βž• 6. Arithmetic Operations

s1 = pd.Series([1, 2, 3])
s2 = pd.Series([4, 5, 6])
print(s1 + s2)   # Addition
print(s1 * s2)   # Multiplication
Enter fullscreen mode Exit fullscreen mode

🚫 7. Skip Rows While Reading File

df = pd.read_csv("data.csv", skiprows=2)
print(df.head())
Enter fullscreen mode Exit fullscreen mode

βœ‚οΈ 8. Slicing

print(s[1:3])  # Slice elements
Enter fullscreen mode Exit fullscreen mode

πŸ“‚ 9. Read & Write Files

CSV

df = pd.read_csv("data.csv")
df.to_csv("output.csv", index=False)
Enter fullscreen mode Exit fullscreen mode

Excel

df = pd.read_excel("data.xlsx")
df.to_excel("output.xlsx", index=False)
Enter fullscreen mode Exit fullscreen mode

JSON

df = pd.read_json("data.json")
df.to_json("output.json", orient='records')
Enter fullscreen mode Exit fullscreen mode

XML

df = pd.read_xml("data.xml")
df.to_xml("output.xml")
Enter fullscreen mode Exit fullscreen mode

Text

df = pd.read_csv("data.txt", sep='\t')
df.to_csv("output.txt", sep='\t', index=False)
Enter fullscreen mode Exit fullscreen mode

πŸ—œοΈ 10. Compression While Saving

df.to_csv("compressed_data.csv.gz", compression='gzip')
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Summary

Today’s learning covered:

  • Pivot Table creation
  • Pandas Series and range
  • Accessing values and performing arithmetic
  • Working with Spark
  • Reading & Writing data in multiple formats
  • File compression techniques

✨ Every line of code brings me closer to mastering Data Analytics!

100DaysOfCode #DataAnalytics #Pandas #Python #LearningJourney

Top comments (0)