DEV Community: Rajasekaran Palraj

Data Analysis with Excel - Notes

Rajasekaran Palraj — Sat, 29 Nov 2025 05:05:54 +0000

Array Functions:

Modern Dynamic Array vs Classic Array

unique
Sort(R2#)

Lookup

VLookup
Hlookup
XLookup

Unique:
=UNIQUE(A:A,FALSE,FALSE) - to get unique values

=UNIQUE(A:A,FALSE,TRUE) - to get values exist only one

Duplicate
=UNIQUE(FILTER(A2:A100, COUNTIF(A2:A100, A2:A100)>1))

Quick Analysis

Quick Analysis tool makes it possible to analyze your data quickly and easily using different Excel tools.

You can use Quick Analysis with a range or a table of data. To access Quick Access tool, select the cells that contain the data you want to analyze. The Quick Analysis tool buttonQuick Analysis Tool Button

Formatting
Charts
Total
Tables
Sparklines

Lookups

Find values in a range of data - VLOOKUP and HLOOKUP
Obtain a value or the reference to a value from within a table or range - INDEX
Obtain the relative position of a specified item in a range of cells - MATCH

Using VLOOKUP Function
The syntax of the VLOOKUP function is

VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])

Using VLOOKUP Function with range_lookup FALSE
_
Use range_lookup False when data not in order and not matching exact values_

Using HLOOKUP Function
You can use HLOOKUP function if the data is in rows rather than columns.

_
Use range_lookup False when data not in order and not matching exact values_

Using INDEX Function
When you have an array of data, you can retrieve a value in the array by specifying the row number and column number of that value in the array.

Pivot Tables:

Data analysis on a large set of data is quite often necessary and important. It involves summarizing the data, obtaining the needed values and presenting the results.

Excel provides PivotTable to enable you summarize thousands of data values easily and quickly so as to obtain the required results.

Creating PivotTable
To create PivotTables, ensure the first row has headers.

Click the table.
Click the INSERT tab on the Ribbon.
Click PivotTable in the Tables group. The PivotTable dialog box appears.

In the Table / Range Box, type the table name.
Click New Worksheet to tell Excel where to keep the PivotTable.
Click OK.

Charts:

Excel charts transform raw data into clear, visual representations, enabling us to analyze trends, compare datasets and communicate insights effectively. Charts visually represent data, making complex information easier to interpret. Different chart types include:

Column Chart: Compares values across categories.
Bar Chart: Similar to column charts but with horizontal bars.
Line Chart: Displays trends over time.
Pie Chart: Shows proportions of a whole.
Combo Chart: Combines multiple chart types (e.g., column and line).

Creating a Chart in Excel Follow the steps below to create a chart in Excel:

Step 1: Select the data for which we want to create the chart.
Step 2: Go to Insert tab and Select type of Chart

Changing the Chart Type Follow the below steps to change the chart type:

Click on Chart Design
Click on Change Chart Type
Select the type of chart we want

Customizing the Chart Type 3.1 Switch Row/Column To switch the rows or columns follow the below steps:

Go to the Design tab in the Ribbon.
Click on Switch Row/Column

3.2 Legend Position
Follow the below steps to place the legends

Go to Chart Design in Insert chart
Click on Add Chart Element
Navigate to Legend
Select where we want to place our legends.

Use Recommendated Charts
Chart Design
- Chart Element to help show visual indication
- Quick Layout to try different themes

Line Chart:

Job Title
Include/Remove Axes From Chart Elements
Add Trend Line
we can use Chart Design to change Quick Layout and Chart Elements

Pie chart

Job no degree mention
Chart Design -> Quick Chart
Change Colors from Quick Chart
Change Font from Format
Customize percentage - Lable/Number Decimal places
Change Chart Tiles with valid name

Bar-Column Chart:

Names are long in column chart
Not organized
Sort by Job Count
Create Bar chart
Change sort order and create bar chart again
select chart and right click select data
Try Change Chart Type
Add DAta lables to shwow count
Change number Format by double clicking count

Scatter Plots

Salary year avg, salary hour avg
Select only yearly Median avg and Hourly Median avg
Click axis and change Axis Option Bounds min/max
Change Number format
Enable Axis Titles
select Axis Tiles, Enter= then select cell which have title name
Enable Data Labels
Add values from cell otion from format
select Range for job names
Add Trend Lines

Add values in bar chart

Map Charts

Job country
get Job count and Median Salary
Create Map chart by select job title and Median Salary
Change Title
Apply Filter

Histogram:

Skewed Write Distribution
Filter --> Number filter --> less than 300000
Double Click Bar - change bin width
Change Number Format
Change Title, Enable Axes Titles

Box & Whiskers Charts

To analyse different jobs titles
Select salary and title
Use REcommended chart
Change Y axis maximum values
Change Number format
change Color
Change Title

Spark Lines

Its Mini Charts
select data / Insert / sparklines Column
Add High Point/ Low Point
Change color for High Point

Dashboard

Power Query

it used to copy data from multiple places into excel or powerBi (ETL)

Combine Quereis (Merge and Append)

M Language

Power Pivot

DAX

https://github.com/lukebarousse/Excel_Data_Analytics_Course/tree/main

Data Analysis - Matplot Lib - Notes

Rajasekaran Palraj — Wed, 19 Nov 2025 02:34:27 +0000

Matplotlib is a powerful and versatile open-source plotting library for Python, designed to help users visualize data in a variety of formats. Developed by John D. Hunter in 2003, it enables users to graphically represent data, facilitating easier analysis and understanding. If you want to convert your boring data into interactive plots and graphs, Matplotlib is the tool for you.

Components or Parts of Matplotlib Figure
Anatomy of a Matplotlib Plot: This section dives into the key components of a Matplotlib plot, including figures, axes, titles, and legends, essential for effective data visualization.

The parts of a Matplotlib figure include (as shown in the figure above):

Figure: The overarching container that holds all plot elements, acting as the canvas for visualizations.
Axes: The areas within the figure where data is plotted; each figure can contain multiple axes.
Axis: Represents the x-axis and y-axis, defining limits, tick locations, and labels for data interpretation.
Lines and Markers: Lines connect data points to show trends, while markers denote individual data points in plots like scatter plots.
Title and Labels: The title provides context for the plot, while axis labels describe what data is being represented on each axis.

Matplotlib Pyplot
Pyplot is a module within Matplotlib that provides a MATLAB-like interface for making plots. It simplifies the process of adding plot elements such as lines, images, and text to the axes of the current figure. Steps to Use Pyplot:

Import Matplotlib: Start by importing matplotlib.pyplot as plt.
Create Data: Prepare your data in the form of lists or arrays.
Plot Data: Use plt.plot() to create the plot.
Customize Plot: Add titles, labels, and other elements using methods like plt.title(), plt.xlabel(), and plt.ylabel().
Display Plot: Use plt.show() to display the plot.
Let's visualize a basic plot, and understand basic components of matplotlib figure:

import matplotlib.pyplot as plt

x = [0, 2, 4, 6, 8]
y = [0, 4, 16, 36, 64]

fig, ax = plt.subplots()

ax.plot(x, y, marker='o', label="Data Points")

ax.set_title("Basic Components of Matplotlib Figure")
ax.set_xlabel("X-Axis")
ax.set_ylabel("Y-Axis")

plt.show()
Output:

Basic Components of matplotlib figure
Different Types of Plots in Matplotlib
Matplotlib offers a wide range of plot types to suit various data visualization needs. Here are some of the most commonly used types of plots in Matplotlib:

Line Graph
Bar Chart
Histogram
Scatter Plot
Pie Chart
3D Plot and many more..

Bar chart and Pie chart
For learning about the different types of plots in Matplotlib, please read Types of Plots in Matplotlib.

Key Features of Matplotlib
Versatile Plotting: Create a wide variety of visualizations, including line plots, scatter plots, bar charts, and histograms.
Extensive Customization: Control every aspect of your plots, from colors and markers to labels and annotations.
Seamless Integration with NumPy: Effortlessly plot data arrays directly, enhancing data manipulation capabilities.
High-Quality Graphics: Generate publication-ready plots with precise control over aesthetics.
Cross-Platform Compatibility: Use Matplotlib on Windows, macOS, and Linux without issues.
Interactive Visualizations: Engage with your data dynamically through interactive plotting features.

Matplotlib Simple Line Plot

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3, 4])  # X-axis 
y = x*2  # Y-axis

plt.plot(x, y)  
plt.show()

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3, 4])
y = x*2

plt.plot(x, y)
plt.xlabel("X-axis")        # Label for the X-axis
plt.ylabel("Y-axis")        # Label for the Y-axis
plt.title("Any suitable title")  # Chart title
plt.show()

Line Chart with Annotations

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.figure(figsize=(8, 6))
plt.plot(x, y, marker='o', linestyle='-')

# Add annotations
for i, (xi, yi) in enumerate(zip(x, y)):
    plt.annotate(f'({xi}, {yi})', (xi, yi), textcoords="offset points", xytext=(0, 10), ha='center')

plt.title('Line Chart with Annotations')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')

plt.grid(True)
plt.show()

*Multiple Line Charts Using Matplotlib *

import matplotlib.pyplot as plt
import numpy as np


x = np.array([1, 2, 3, 4])
y = x*2

plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Any suitable title")
plt.show()  # show first chart

plt.figure()
x1 = [2, 4, 6, 8]
y1 = [3, 5, 7, 9]
plt.plot(x1, y1, '-.')
plt.show()

Multiple Plots on the Same Axis

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3, 4])
y = x*2

# first plot with X and Y data
plt.plot(x, y)

x1 = [2, 4, 6, 8]
y1 = [3, 5, 7, 9]

# second plot with x1 and y1 data
plt.plot(x1, y1, '-.')

plt.xlabel("X-axis data")
plt.ylabel("Y-axis data")
plt.title('multiple plots')
plt.show()

Fill the Area Between Two Lines

import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3, 4])
y = x*2

plt.plot(x, y)

x1 = [2, 4, 6, 8]
y1 = [3, 5, 7, 9]

plt.plot(x, y1, '-.')
plt.xlabel("X-axis data")
plt.ylabel("Y-axis data")
plt.title('multiple plots')

plt.fill_between(x, y, y1, color='green', alpha=0.5)
plt.show()

Bar Plot

import matplotlib.pyplot as plt
import numpy as np

fruits = ['Apples', 'Bananas', 'Cherries', 'Dates']
sales = [400, 350, 300, 450]

plt.bar(fruits, sales)
plt.title('Fruit Sales')
plt.xlabel('Fruits')
plt.ylabel('Sales')
plt.show()

Customizing Bar Colors

import matplotlib.pyplot as plt
import numpy as np

fruits = ['Apples', 'Bananas', 'Cherries', 'Dates']
sales = [400, 350, 300, 450]

plt.bar(fruits, sales, color='violet')
plt.title('Fruit Sales')
plt.xlabel('Fruits')
plt.ylabel('Sales')
plt.show()

Creating Horizontal Bar Plots

import matplotlib.pyplot as plt
import numpy as np

fruits = ['Apples', 'Bananas', 'Cherries', 'Dates']
sales = [400, 350, 300, 450]

plt.barh(fruits, sales)

plt.title('Fruit Sales')
plt.xlabel('Fruits')
plt.ylabel('Sales')
plt.show()

Adjusting Bar width

import matplotlib.pyplot as plt
import numpy as np

fruits = ['Apples', 'Bananas', 'Cherries', 'Dates']
sales = [400, 350, 300, 450]

plt.bar(fruits, sales, width=0.3)
plt.title('Fruit Sales')
plt.xlabel('Fruits')
plt.ylabel('Sales')
plt.show()

Multiple bar plots

import numpy as np 
import matplotlib.pyplot as plt 

barWidth = 0.25
fig = plt.subplots(figsize =(12, 8)) 

IT = [12, 30, 1, 8, 22] 
ECE = [28, 6, 16, 5, 10] 
CSE = [29, 3, 24, 25, 17] 

br1 = np.arange(len(IT)) 
br2 = [x + barWidth for x in br1] 
br3 = [x + barWidth for x in br2] 

plt.bar(br1, IT, color ='r', width = barWidth, 
        edgecolor ='grey', label ='IT') 
plt.bar(br2, ECE, color ='g', width = barWidth, 
        edgecolor ='grey', label ='ECE') 
plt.bar(br3, CSE, color ='b', width = barWidth, 
        edgecolor ='grey', label ='CSE') 

plt.xlabel('Branch', fontweight ='bold', fontsize = 15) 
plt.ylabel('Students passed', fontweight ='bold', fontsize = 15) 
plt.xticks([r + barWidth for r in range(len(IT))], 
        ['2015', '2016', '2017', '2018', '2019'])

plt.legend()
plt.show()

Stacked bar chart

import numpy as np
import matplotlib.pyplot as plt

N = 5

boys = (20, 35, 30, 35, 27)
girls = (25, 32, 34, 20, 25)
boyStd = (2, 3, 4, 1, 2)
girlStd = (3, 5, 2, 3, 3)
ind = np.arange(N)   
width = 0.35  

fig = plt.subplots(figsize =(10, 7))
p1 = plt.bar(ind, boys, width, yerr = boyStd)
p2 = plt.bar(ind, girls, width,
             bottom = boys, yerr = girlStd)

plt.ylabel('Contribution')
plt.title('Contribution by the teams')
plt.xticks(ind, ('T1', 'T2', 'T3', 'T4', 'T5'))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), ('boys', 'girls'))

plt.show()

Matplotlib Scatter

import matplotlib.pyplot as plt
import numpy as np

x = np.array([12, 45, 7, 32, 89, 54, 23, 67, 14, 91])
y = np.array([99, 31, 72, 56, 19, 88, 43, 61, 35, 77])

plt.scatter(x, y)
plt.title("Basic Scatter Plot")
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.show()

Example 1: In this example, we compare the height and weight of two different groups using different colors for each group.

x1 = np.array([160, 165, 170, 175, 180, 185, 190, 195, 200, 205])
y1 = np.array([55, 58, 60, 62, 64, 66, 68, 70, 72, 74])

x2 = np.array([150, 155, 160, 165, 170, 175, 180, 195, 200, 205])
y2 = np.array([50, 52, 54, 56, 58, 64, 66, 68, 70, 72])

plt.scatter(x1, y1, color='blue', label='Group 1')
plt.scatter(x2, y2, color='red', label='Group 2')

plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Comparison of Height vs Weight between two groups')
plt.legend()
plt.show()

Example 2: This example demonstrates how to customize a scatter plot using different marker sizes and colors for each point. Transparency and edge colors are also adjusted.

x = np.array([3, 12, 9, 20, 5, 18, 22, 11, 27, 16])
y = np.array([95, 55, 63, 77, 89, 50, 41, 70, 58, 83])

a = [20, 50, 100, 200, 500, 1000, 60, 90, 150, 300] # size
b = ['red', 'green', 'blue', 'purple', 'orange', 'black', 'pink', 'brown', 'yellow', 'cyan'] # color

plt.scatter(x, y, s=a, c=b, alpha=0.6, edgecolors='w', linewidths=1)
plt.title("Scatter Plot with Varying Colors and Sizes")
plt.show()

Example 3: This example shows how to create a bubble plot where the size of each point (bubble) represents a variable's magnitude. Edge color and alpha transparency are also used.

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
sizes = [30, 80, 150, 200, 300]  # Bubble sizes

plt.scatter(x, y, s=sizes, alpha=0.5, edgecolors='blue', linewidths=2)
plt.title("Bubble Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

Example 4: In this example, we map data values to colors using a colormap and add a colorbar. This helps in visualizing a third variable via color intensity.

x = np.random.randint(50, 150, 100)
y = np.random.randint(50, 150, 100)
colors = np.random.rand(100)  # Random float values for color mapping
sizes = 20 * np.random.randint(10, 100, 100)

plt.scatter(x, y, c=colors, s=sizes, cmap='viridis', alpha=0.7)
plt.colorbar(label='Color scale')
plt.title("Scatter Plot with Colormap and Colorbar")
plt.show()

Example 5: This final example illustrates how to change the marker style using the marker parameter. Here, triangle markers are used with magenta color.

plt.scatter(x, y, marker='^', color='magenta', s=100, alpha=0.7)
plt.title("Scatter Plot with Triangle Markers")
plt.show()

*Plotting Histogram *

Different methods of plotting histogram

Basic Histogram
Customized Histogram with Density Plot
Customized Histogram with Watermark
Multiple Histograms with Subplots
Stacked Histogram
2D Histogram (Hexbin Plot)

Basic Histogram

import matplotlib.pyplot as plt
import numpy as np

# Generate random data for the histogram
data = np.random.randn(1000)

# Plotting a basic histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')

# Adding labels and title
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Basic Histogram')

# Display the plot
plt.show()

Customized Histogram with Density Plot

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Generate random data for the histogram
data = np.random.randn(1000)

# Creating a customized histogram with a density plot
sns.histplot(data, bins=30, kde=True, color='lightgreen', edgecolor='red')

# Adding labels and title
plt.xlabel('Values')
plt.ylabel('Density')
plt.title('Customized Histogram with Density Plot')

# Display the plot
plt.show()

Customized Histogram with Watermark

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter

# Creating dataset
np.random.seed(23685752)
N_points = 10000
n_bins = 20

# Creating distribution
x = np.random.randn(N_points)
y = 0.8 ** x + np.random.randn(N_points) + 25
legend = ['distribution']

# Creating figure and axes
fig, axs = plt.subplots(1, 1, figsize=(10, 7), tight_layout=True)

# Remove axes splines
for s in ['top', 'bottom', 'left', 'right']:
    axs.spines[s].set_visible(False)

# Remove x, y ticks
axs.xaxis.set_ticks_position('none')
axs.yaxis.set_ticks_position('none')

# Add padding between axes and labels
axs.xaxis.set_tick_params(pad=5)
axs.yaxis.set_tick_params(pad=10)

# Add x, y gridlines (updated syntax)
axs.grid(visible=True, color='grey', linestyle='-.', linewidth=0.5, alpha=0.6)

# Add text watermark
fig.text(0.9, 0.15, 'Jeeteshgavande30',
         fontsize=12,
         color='red',
         ha='right',
         va='bottom',
         alpha=0.7)

# Creating histogram
N, bins, patches = axs.hist(x, bins=n_bins)

# Setting color gradient
fracs = ((N ** (1 / 5)) / N.max())
norm = colors.Normalize(fracs.min(), fracs.max())

for thisfrac, thispatch in zip(fracs, patches):
    color = plt.cm.viridis(norm(thisfrac))
    thispatch.set_facecolor(color)

# Adding extra features
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend(legend)
plt.title('Customized Histogram with Watermark')

# Show plot
plt.show()

Multiple Histograms with Subplots

import matplotlib.pyplot as plt
import numpy as np

# Generate random data for multiple histograms
data1 = np.random.randn(1000)
data2 = np.random.normal(loc=3, scale=1, size=1000)

# Creating subplots with multiple histograms
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))

axes[0].hist(data1, bins=30, color='Yellow', edgecolor='black')
axes[0].set_title('Histogram 1')

axes[1].hist(data2, bins=30, color='Pink', edgecolor='black')
axes[1].set_title('Histogram 2')

# Adding labels and title
for ax in axes:
    ax.set_xlabel('Values')
    ax.set_ylabel('Frequency')

# Adjusting layout for better spacing
plt.tight_layout()

# Display the figure
plt.show()

Stacked Histogram

import matplotlib.pyplot as plt
import numpy as np

# Generate random data for stacked histograms
data1 = np.random.randn(1000)
data2 = np.random.normal(loc=3, scale=1, size=1000)

# Creating a stacked histogram
plt.hist([data1, data2], bins=30, stacked=True, color=['cyan', 'Purple'], edgecolor='black')

# Adding labels and title
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Stacked Histogram')

# Adding legend
plt.legend(['Dataset 1', 'Dataset 2'])

# Display the plot
plt.show()

2D Histogram (Hexbin Plot)

import matplotlib.pyplot as plt
import numpy as np

# Generate random 2D data for hexbin plot
x = np.random.randn(1000)
y = 2 * x + np.random.normal(size=1000)

# Creating a 2D histogram (hexbin plot)
plt.hexbin(x, y, gridsize=30, cmap='Blues')

# Adding labels and title
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('2D Histogram (Hexbin Plot)')

# Adding colorbar
plt.colorbar(label='Counts')

plt.show()

Pie Chart

# Import libraries
from matplotlib import pyplot as plt
import numpy as np


# Creating dataset
cars = ['AUDI', 'BMW', 'FORD',
        'TESLA', 'JAGUAR', 'MERCEDES']

data = [23, 17, 35, 29, 12, 41]

# Creating plot
fig = plt.figure(figsize=(10, 7))
plt.pie(data, labels=cars)

# show plot
plt.show()

Customizing Pie Charts

startangle: This attribute allows you to rotate the pie chart in Python counterclockwise around the x-axis by the specified degrees.. By adjusting this angle, you can change the starting position of the first wedge, which can improve the overall presentation of the chart.
shadow: This boolean attribute adds a shadow effect below the rim of the pie. Setting this to True can make your chart stand out and give it a more three-dimensional appearance, enhancing the overall look of your pie chart in Matplotlib.
wedgeprops: This parameter accepts a Python dictionary to customize the properties of each wedge in the pie chart. You can specify various attributes such as linewidth, edgecolor, and facecolor. This level of customization allows you to enhance the visual distinction between wedges, making your matplotlib pie chart more informative.
frame: When set to True, this attribute draws a frame around the pie chart. This can help emphasize the chart's boundaries and improve its visibility, making it clearer when presenting data. autopct: This attribute controls how the percentages are displayed on the wedges. You can customize the format string to define the appearance of the percentage labels on each slice. _The explode parameter separates a portion of the chart, and colors define each wedge's color. The autopct function customizes text display, and legend and title functions enhance chart readability and aesthetics. _

# Import libraries
import numpy as np
import matplotlib.pyplot as plt


# Creating dataset
cars = ['AUDI', 'BMW', 'FORD',
        'TESLA', 'JAGUAR', 'MERCEDES']

data = [23, 17, 35, 29, 12, 41]


# Creating explode data
explode = (0.1, 0.0, 0.2, 0.3, 0.0, 0.0)

# Creating color parameters
colors = ("orange", "cyan", "brown",
          "grey", "indigo", "beige")

# Wedge properties
wp = {'linewidth': 1, 'edgecolor': "green"}

# Creating autocpt arguments


def func(pct, allvalues):
    absolute = int(pct / 100.*np.sum(allvalues))
    return "{:.1f}%\n({:d} g)".format(pct, absolute)


# Creating plot
fig, ax = plt.subplots(figsize=(10, 7))
wedges, texts, autotexts = ax.pie(data,
                                  autopct=lambda pct: func(pct, data),
                                  explode=explode,
                                  labels=cars,
                                  shadow=True,
                                  colors=colors,
                                  startangle=90,
                                  wedgeprops=wp,
                                  textprops=dict(color="magenta"))

# Adding legend
ax.legend(wedges, cars,
          title="Cars",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.setp(autotexts, size=8, weight="bold")
ax.set_title("Customizing pie chart")

# show plot
plt.show()

Nested Pie Chart:

# Import libraries
from matplotlib import pyplot as plt
import numpy as np


# Creating dataset
size = 6
cars = ['AUDI', 'BMW', 'FORD',
        'TESLA', 'JAGUAR', 'MERCEDES']

data = np.array([[23, 16], [17, 23],
                 [35, 11], [29, 33],
                 [12, 27], [41, 42]])

# normalizing data to 2 pi
norm = data / np.sum(data)*2 * np.pi

# obtaining ordinates of bar edges
left = np.cumsum(np.append(0,
                           norm.flatten()[:-1])).reshape(data.shape)

# Creating color scale
cmap = plt.get_cmap("tab20c")
outer_colors = cmap(np.arange(6)*4)
inner_colors = cmap(np.array([1, 2, 5, 6, 9,
                              10, 12, 13, 15,
                              17, 18, 20]))

# Creating plot
fig, ax = plt.subplots(figsize=(10, 7),
                       subplot_kw=dict(polar=True))

ax.bar(x=left[:, 0],
       width=norm.sum(axis=1),
       bottom=1-size,
       height=size,
       color=outer_colors,
       edgecolor='w',
       linewidth=1,
       align="edge")

ax.bar(x=left.flatten(),
       width=norm.flatten(),
       bottom=1-2 * size,
       height=size,
       color=inner_colors,
       edgecolor='w',
       linewidth=1,
       align="edge")

ax.set(title="Nested pie chart")
ax.set_axis_off()

# show plot
plt.show()

Three-dimensional Plotting(3D Plot)

import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection='3d')
plt.show()

3d Line plot

from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection='3d')

z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

ax.plot3D(x, y, z, 'green')
ax.set_title('3D Line Plot')
plt.show()

3D Scatter plot

fig = plt.figure()
ax = plt.axes(projection='3d')

z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)
c = x + y  # Color array based on x and y

ax.scatter(x, y, z, c=c)
ax.set_title('3D Scatter Plot')
plt.show()

Surface Plot

x = np.outer(np.linspace(-2, 2, 10), np.ones(10))
y = x.copy().T
z = np.cos(x**2 + y**3)

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_surface(x, y, z, cmap='viridis', edgecolor='green')
ax.set_title('Surface Plot')
plt.show()

Wireframe Plot

def f(x, y):
    return np.sin(np.sqrt(x**2 + y**2))

x = np.linspace(-1, 5, 10)
y = np.linspace(-1, 5, 10)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='green')
ax.set_title('Wireframe Plot')
plt.show()

Contour plot in 3d This plot combines a 3D surface with contour lines to highlight elevation or depth. It helps visualize the function’s shape and gradient changes more clearly in 3D space.

def fun(x, y):
    return np.sin(np.sqrt(x**2 + y**2))

x = np.linspace(-10, 10, 40)
y = np.linspace(-10, 10, 40)
X, Y = np.meshgrid(x, y)
Z = fun(X, Y)

fig = plt.figure(figsize=(10, 8))
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, cmap='cool', alpha=0.8)

ax.set_title('3D Contour Plot')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

plt.show()

Surface Triangulation plot This plot uses triangular meshes to build a 3D surface from scattered or grid data. It's ideal when the surface is irregular or when using non-rectangular grids.

from matplotlib.tri import Triangulation

def f(x, y):
    return np.sin(np.sqrt(x**2 + y**2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

tri = Triangulation(X.ravel(), Y.ravel())

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(tri, Z.ravel(), cmap='cool', edgecolor='none', alpha=0.8)

ax.set_title('Surface Triangulation Plot')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

plt.show()

Möbius Strip Plot

A Möbius strip is a one-sided surface with a twist—a famous concept in topology. This plot visualizes its 3D geometry, showing how math and art can blend beautifully.

R = 2
u = np.linspace(0, 2*np.pi, 100)
v = np.linspace(-1, 1, 100)
u, v = np.meshgrid(u, v)

x = (R + v * np.cos(u / 2)) * np.cos(u)
y = (R + v * np.cos(u / 2)) * np.sin(u)
z = v * np.sin(u / 2)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(x, y, z, alpha=0.5)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('Möbius Strip')

ax.set_xlim([-3, 3])
ax.set_ylim([-3, 3])
ax.set_zlim([-3, 3])

plt.show()

Data Analysis - Python OOPS

Rajasekaran Palraj — Wed, 19 Nov 2025 01:55:46 +0000

Class A class is a collection of objects. Classes are blueprints for creating objects. A class defines a set of attributes and methods that the created objects (instances) can have.

Some points on Python class:

Classes are created by keyword class.
Attributes are the variables that belong to a class.
Attributes are always public and can be accessed using the dot (.) operator. Example: Myclass.Myattribute
Creating a Class
Here, class keyword indicates that we are creating a class followed by name of the class (Dog in this case).

class Dog:
    species = "Canine"  # Class attribute

    def __init__(self, name, age):
        self.name = name  # Instance attribute
        self.age = age  # Instance attribute

Explanation:

class Dog: Defines a class named Dog.
species: A class attribute shared by all instances of the class.
init method: Initializes the name and age attributes when a new object is created.

Objects An Object is an instance of a Class. It represents a specific implementation of the class and holds its own data.

An object consists of:

State: It is represented by the attributes and reflects the properties of an object.
Behavior: It is represented by the methods of an object and reflects the response of an object to other objects.
Identity: It gives a unique name to an object and enables one object to interact with other objects.

Creating Object
Creating an object in Python involves instantiating a class to create a new instance of that class. This process is also referred to as object instantiation.

class Dog:
    species = "Canine"  # Class attribute

    def __init__(self, name, age):
        self.name = name  # Instance attribute
        self.age = age  # Instance attribute

# Creating an object of the Dog class
dog1 = Dog("Buddy", 3)

print(dog1.name) 
print(dog1.species)

Output
Buddy
Canine
Explanation:

dog1 = Dog("Buddy", 3): Creates an object of the Dog class with name as "Buddy" and age as 3.
dog1.name: Accesses the instance attribute name of the dog1 object.
dog1.species: Accesses the class attribute species of the dog1 object.
Self Parameter
Self parameter is a reference to the current instance of the class. It allows us to access the attributes and methods of the object.

Example: In this example, we create a Dog class with both class and instance attributes, then demonstrate how to access them using the self parameter.

class Dog:
    species = "Canine"  # Class attribute

    def __init__(self, name, age):
        self.name = name  # Instance attribute
        self.age = age  # Instance attribute

dog1 = Dog("Buddy", 3)  # Create an instance of Dog
dog2 = Dog("Charlie", 5)  # Create another instance of Dog

print(dog1.name, dog1.age, dog1.species)  # Access instance and class attributes
print(dog2.name, dog2.age, dog2.species)  # Access instance and class attributes
print(Dog.species)  # Access class attribute directly

Output
Buddy 3 Canine
Charlie 5 Canine
Canine
Explanation:

self.name: Refers to the name attribute of the object (dog1) calling the method.
dog1.bark(): Calls the bark method on dog1.
init Method
init method is the constructor in Python, automatically called when a new object is created. It initializes the attributes of the class.

Example: In this example, we create a Dog class and use init method to set the name and age of each dog when creating an object.

class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age

dog1 = Dog("Buddy", 3)
print(dog1.name)

Output
Buddy
Explanation:

init: Special method used for initialization.
self.name and self.age: Instance attributes initialized in the constructor.
Class and Instance Variables
In Python, variables defined in a class can be either class variables or instance variables, and understanding the distinction between them is crucial for object-oriented programming.

Class Variables

These are the variables that are shared across all instances of a class. It is defined at the class level, outside any methods. All objects of the class share the same value for a class variable unless explicitly overridden in an object.

Instance Variables

Variables that are unique to each instance (object) of a class. These are defined within the init method or other instance methods. Each object maintains its own copy of instance variables, independent of other objects.

Example: In this example, we create a Dog class to show difference between class variables and instance variables. We also demonstrate how modifying them affects objects differently.

class Dog:
    # Class variable
    species = "Canine"

    def __init__(self, name, age):
        # Instance variables
        self.name = name
        self.age = age

# Create objects
dog1 = Dog("Buddy", 3)
dog2 = Dog("Charlie", 5)

# Access class and instance variables
print(dog1.species)  # (Class variable)
print(dog1.name)     # (Instance variable)
print(dog2.name)     # (Instance variable)

# Modify instance variables
dog1.name = "Max"
print(dog1.name)     # (Updated instance variable)

# Modify class variable
Dog.species = "Feline"
print(dog1.species)  # (Updated class variable)
print(dog2.species)

Output
Canine
Buddy
Charlie
Max
Feline
Feline
Explanation:

Class Variable (species): Shared by all instances of the class. Changing Dog.species affects all objects, as it's a property of the class itself.
Instance Variables (name, age): Defined in the init method. Unique to each instance (e.g., dog1.name and dog2.name are different).
Accessing Variables: Class variables can be accessed via the class name (Dog.species) or an object (dog1.species). Instance variables are accessed via the object (dog1.name).
Updating Variables: Changing Dog.species affects all instances. Changing dog1.name only affects dog1 and does not impact dog2.

Inheritance Inheritance allows a class (child class) to acquire properties and methods of another class (parent class). It supports hierarchical classification and promotes code reuse.

Inheritance
Types of Inheritance:
Single Inheritance: A child class inherits from a single parent class.
Multiple Inheritance: A child class inherits from more than one parent class.
Multilevel Inheritance: A child class inherits from a parent class, which in turn inherits from another class.
Hierarchical Inheritance: Multiple child classes inherit from a single parent class.
Hybrid Inheritance: A combination of two or more types of inheritance.
Example: In this example, we create a Dog class and demonstrate single, multilevel and multiple inheritance. We show how child classes can use or extend parent class methods.

# Single Inheritance
class Dog:
    def __init__(self, name):
        self.name = name

    def display_name(self):
        print(f"Dog's Name: {self.name}")

class Labrador(Dog):  # Single Inheritance
    def sound(self):
        print("Labrador woofs")

# Multilevel Inheritance
class GuideDog(Labrador):  # Multilevel Inheritance
    def guide(self):
        print(f"{self.name}Guides the way!")

# Multiple Inheritance
class Friendly:
    def greet(self):
        print("Friendly!")

class GoldenRetriever(Dog, Friendly):  # Multiple Inheritance
    def sound(self):
        print("Golden Retriever Barks")

# Example Usage
lab = Labrador("Buddy")
lab.display_name()
lab.sound()

guide_dog = GuideDog("Max")
guide_dog.display_name()
guide_dog.guide()

retriever = GoldenRetriever("Charlie")
retriever.display_name()
retriever.greet()
retriever.sound()

Explanation:

Single Inheritance: Labrador inherits Dog's attributes and methods.
Multilevel Inheritance: GuideDog extends Labrador, inheriting both Dog and Labrador functionalities.
Multiple Inheritance: GoldenRetriever inherits from both Dog and Friendly.
4. Polymorphism
Polymorphism in Python means "same operation, different behavior." It allows functions or methods with the same name to work differently depending on the type of object they are acting upon.

Types of Polymorphism
polymorphism_in_python
Polymorphism in Python

Compile-Time Polymorphism:

This type of polymorphism is determined during the compilation of the program. It allows methods or operators with the same name to behave differently based on their input parameters or usage. In languages like Java or C++, compile-time polymorphism is achieved through method overloading but it's not directly supported in Python.

In Python:

True compile-time polymorphism is not supported.
Instead, Python mimics it using default arguments or args/*kwargs.
Operator overloading can also be seen as part of polymorphism, though it is implemented at runtime in Python.
Example:

class Calculator:
def add(self, *args):
return sum(args)

calc = Calculator()
print(calc.add(5, 10)) # Two arguments
print(calc.add(5, 10, 15)) # Three arguments
print(calc.add(1, 2, 3, 4)) # Any number of arguments

Output
15
30
10

Run-Time Polymorphism

Run-Time Polymorphism is determined during the execution of the program. It covers multiple forms in Python:

Method Overriding: A subclass redefines a method from its parent class.
Duck Typing: If an object implements the required method, it works regardless of its type.
Operator Overloading: Special methods (add, sub, etc.) redefine how operators behave for user-defined objects.
Example: In this example, we show run-time polymorphism using method overriding with dog classes and compile-time polymorphism by mimicking method overloading in a calculator class.

Method Overriding

We start with a base class and then a subclass that "overrides" the speak method.

class Animal:
def speak(self):
return "I am an animal."

class Dog(Animal):
def speak(self):
return "Woof!"

print(Dog().speak())

2 Duck Typing

class Cat:
def speak(self):
return "Meow!"

def make_animal_speak(animal):
# This function works for both Dog and Cat because they both have a 'speak' method.
return animal.speak()

print(make_animal_speak(Cat()))
print(make_animal_speak(Dog()))

3 Operator Overloading

We create a simple class that customizes the '+' operator.

class Vector:
def init(self, x, y):
self.x = x
self.y = y

def add(self, other):
# This special method defines the behavior of the '+' operator.
return Vector(self.x + other.x, self.y + other.y)

def repr(self):
return f"Vector({self.x}, {self.y})"

v1 = Vector(2, 3)
v2 = Vector(4, 5)
v3 = v1 + v2

print(v3)
Explanation:

Method Overriding: Dog class overrides the speak method from the parent Animal class. This allows the program to call the speak method on a Dog object and get a specialized "Woof!" response, which is determined at runtime.
Duck Typing: "make_animal_speak" function can accept both a Dog and a Cat object. It doesn't care about their class hierarchy; it only checks if they have a speak method, showcasing Python's flexible typing.
Operator Overloading: "add" method within the Vector class is a special method that defines how the + operator behaves for Vector objects. This allows the user to add two vectors using the familiar + symbol, which is a form of syntactic sugar.

Encapsulation Encapsulation is the bundling of data (attributes) and methods (functions) within a class, restricting access to some components to control interactions.

A class is an example of encapsulation as it encapsulates all the data that is member functions, variables, etc.

Encapsulation
Types of Encapsulation:
Public Members: Accessible from anywhere.
Protected Members: Accessible within the class and its subclasses.
Private Members: Accessible only within the class.
Example: In this example, we create a Dog class with public, protected and private attributes. We also show how to access and modify private members using getter and setter methods.

class Dog:
    def __init__(self, name, breed, age):
        self.name = name  # Public attribute
        self._breed = breed  # Protected attribute
        self.__age = age  # Private attribute

    # Public method
    def get_info(self):
        return f"Name: {self.name}, Breed: {self._breed}, Age: {self.__age}"

    # Getter and Setter for private attribute
    def get_age(self):
        return self.__age

    def set_age(self, age):
        if age > 0:
            self.__age = age
        else:
            print("Invalid age!")

# Example Usage
dog = Dog("Buddy", "Labrador", 3)

# Accessing public member
print(dog.name)  # Accessible

# Accessing protected member
print(dog._breed)  # Accessible but discouraged outside the class

# Accessing private member using getter
print(dog.get_age())

# Modifying private member using setter
dog.set_age(5)
print(dog.get_info())

Explanation:

Public Members: Easily accessible, such as name.
Protected Members: Used with a single , such as _breed. Access is discouraged but allowed in subclasses.
Private Members: Used with _, such as __age. Access requires getter and setter methods.

Data Abstraction Abstraction hides the internal implementation details while exposing only the necessary functionality. It helps focus on "what to do" rather than "how to do it."

Types of Abstraction:
Partial Abstraction: Abstract class contains both abstract and concrete methods.
Full Abstraction: Abstract class contains only abstract methods (like interfaces).
Example: In this example, we create an abstract Dog class with an abstract method (sound) and a concrete method. Subclasses implement the abstract method while inheriting the concrete method.

from abc import ABC, abstractmethod

class Dog(ABC):  # Abstract Class
    def __init__(self, name):
        self.name = name

    @abstractmethod
    def sound(self):  # Abstract Method
        pass

    def display_name(self):  # Concrete Method
        print(f"Dog's Name: {self.name}")

class Labrador(Dog):  # Partial Abstraction
    def sound(self):
        print("Labrador Woof!")

class Beagle(Dog):  # Partial Abstraction
    def sound(self):
        print("Beagle Bark!")

# Example Usage
dogs = [Labrador("Buddy"), Beagle("Charlie")]
for dog in dogs:
    dog.display_name()  # Calls concrete method
    dog.sound()  # Calls implemented abstract method

Explanation:

Partial Abstraction: The Dog class has both abstract (sound) and concrete (display_name) methods.
Why Use It: Abstraction ensures consistency in derived classes by enforcing the implementation of abstract methods.

Data Analysis Week 5 Notes

Rajasekaran Palraj — Wed, 19 Nov 2025 01:55:29 +0000

Business Intelligence Overview
Business intelligence refers to a collection of mathematical models and analysis methods that utilize data to produce valuable information and insight for making important decisions.

Main Components of Business Intelligence System:

Data Source
Data Mart / Data Warehouse
Data Exploration
Data Mining
Optimization
Decisions

Data Source
The first step is gathering and consolidating data from an array of primary and secondary sources.

Data Mart / Data Warehouse
Through the utilization of extraction and transformation tools, also known as extract, transform, load (ETL), data is acquired from various sources and saved in databases designed specifically for business intelligence analysis. These databases, commonly known as data warehouses and data marts, serve as a centralized location for the gathered data.

Data Exploration
The third level of the pyramid offers essential resources for conducting a passive analysis in business intelligence.

Data Mining
The fourth level, known as active business intelligence methodologies, focuses on extracting valuable information and knowledge from data. We will delve into various techniques such as mathematical models, pattern recognition, machine learning, and data mining.

Optimization
As you ascend the pyramid, you'll encounter optimization models that empower you to choose the most optimal course of action among various alternatives, which can often be quite extensive or even endless. These models have also been effectively incorporated in marketing and logistics.

Types of Users of Business Intelligence
Analyst (Data Analyst or Business Analyst): They are the statistician of the company, they used BI on the basis of historical data priorly stored in the system.
Head or Manager of the Company: Head of the company uses Business Intelligence used to increase the profitability of their company by increasing the efficiency in their decisions on the basis of all the knowledge they discovered.
Small Business Owners: Can be used by a small businessman because it is quite affordable too.
**Government Officials: **In the decision-making of the government.

Types of Decisions Supported by Business Intelligence
Strategic Level: The strategic level is the level where the Heads of the company decide the strategies of any business.
Tactical Level: Once the strategy is made though for handling all the details and matters have a tactical level where all the technologies and methodologies come under one umbrella. This level is further responsible for continuously updating the data.
**Operational Level: **Operation decisions are made at this level. Operational decisions help in operating the system.

Power BI

Power BI is a Microsoft-powered business intelligence tool that helps transform raw data into interactive dashboards and actionable insights. It allow users to connect to various data sources, clean and shape data and visualize it using charts, graphs and reports all with minimal coding.

What is Power BI Used For?
Power BI is a tool that helps you understand your data better. you can:

Bring in data from different places like Excel files, SQL databases, CSVs, JSON files and even websites.
Clean and fix your data easily without writing code.
Create visuals like bar charts, line graphs, pie charts and dashboards to help you see patterns and trends.
Analyze your data using filters and slicers, so you can focus on specific details like sales by region or product performance over time.
Share your dashboards with your team or clients so everyone stays informed and can explore the data on their own.
Set up automatic updates so your reports always show the latest information without you having to do anything.

Key Components of Power BI

Power BI Desktop: This is the free version you can install on your computer. It lets you connect to data, clean it and create reports.
Power BI Service: This is the online version. You can upload your reports here share them with others and view them from anywhere using the internet.
Power BI Mobile: There are mobile apps for iOS and Android and you can use them to see and interact with your reports on your phone or tablet.
Power BI Gateway: This helps you to connect data from your own company’s servers to Power BI and you can use that data in your reports and dashboards.
Power BI Report Server: This provide companies to keep and share Power BI reports on their own private server instead of using the cloud.
Power BI Embedded: It allow businesses to add Power BI reports into their own apps or websites. People can view the reports without opening Power BI separately.

Tips for Using Power BI Effectively
Learn DAX for Advanced Calculations: DAX is a formula language that can help you perform advanced calculations. If you want to create more complex analytics learning DAX is a good idea.
Master Power Query: Power Query helps you to prepare your data. The more you learn about it the more efficient your workflow will be.
Stay Organized: When you create more reports it's important to keep things organized by giving fields, calculations clear and descriptive names.
Explore Templates: It has many templates for different industries. Use them to get started quickly and spark new ideas for your own reports.

7 Benefits of Microsoft Power BI:

Easy to Connect Data
Provides Data Governance and Security
Artificial Intelligence Integration
Allows Users to Build Personalized Dashboards and Interactive Reports Easily
Offers Familiar Excel Integration
Affordable and Cost-Effective
Provides Monthly Updates

Power Query Editor in Power BI

Power Query Editor is a tool in Power BI Desktop used for data transformation and preparation. It allow to connect, shape and transform multiple data sources according to the user's needs. After making the desired changes the transformed data is then loaded to the Power BI desktop to fetch final outputs and reports. It can be in two modes as desktop or online.

Data Analysis SQL Advanced

Rajasekaran Palraj — Wed, 19 Nov 2025 01:55:02 +0000

Stored Procedures

A SQL Stored Procedure is a collection of SQL statements bundled together to perform a specific task. These procedures are stored in the database and can be called upon by users, applications, or other procedures. Stored procedures are essential for automating database tasks, improving efficiency, and reducing redundancy. By encapsulating logic within stored procedures, developers can streamline their workflow and enforce consistent business rules across multiple applications and systems.

CREATE PROCEDURE procedure_name
(parameter1 data_type, parameter2 data_type, ...)
AS
BEGIN
-- SQL statements to be executed
END

Types of SQL Stored Procedures

System Stored Procedures(ss_help, ss_rename)
User-Defined Stored Procedures (UDPs)
Extended Stored Procedures
CLR Stored Procedures

SQL Triggers

A trigger is a special stored procedure in a database that automatically executes when specific events (like INSERT, UPDATE, or DELETE) occur on a table. Triggers help automate tasks, maintain data consistency, and record database activities. Each trigger is tied to a particular table and runs without manual execution.

Uses of SQL Triggers

Automation: Handles repetitive tasks.
Data Integrity: Ensures clean and accurate data.
Business Rules: Enforces database logic.
Audit Trails: Tracks and records changes.

Types of SQL Triggers

*1. DDL Triggers *
CREATE TRIGGER prevent_table_creation ON DATABASE FOR CREATE_TABLE, ALTER_TABLE, DROP_TABLE AS BEGIN PRINT 'you can not create, drop and alter table in this database'; ROLLBACK; END;

2. DML Triggers
CREATE TRIGGER prevent_update ON students FOR UPDATE AS BEGIN PRINT 'You can not insert, update and delete this table i'; ROLLBACK; END;

3. Logon Triggers

CREATE TRIGGER track_logon ON LOGON AS BEGIN PRINT 'A new user has logged in.'; END;

GRAND and REVOKE

CREATE USER 'report_user'@'localhost' IDENTIFIED BY 'StrongPass123';

GRANT ALL PRIVILEGES ON companydb.* TO 'report_user'@'localhost';

GRANT SELECT ON companydb.Employees TO 'report_user'@'localhost';

GRANT INSERT, UPDATE ON companydb.Employees TO 'report_user'@'localhost';

REVOKE INSERT ON companydb.Employees FROM 'report_user'@'localhost';

REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'report_user'@'localhost';

FLUSH PRIVILEGES;

SHOW GRANTS FOR 'report_user'@'localhost';


select * from Employees e 


-- Create a stored procedure named "GetEmployeesBySalary"
DELIMITER //
CREATE PROCEDURE GetEmployeesBySalaryNew(IN p_salary INT)
BEGIN
    SELECT id, first_name, last_name 
    FROM Employees
    WHERE salary = p_salary;
END //
DELIMITER ;

-- Execute the stored procedure with parameter "Sri lanka"
CALL GetEmployeesBySalaryNew(60000);

DELIMITER //
create function calcuate_bonus(salary INT)
RETURNS DECIMAL(10, 2)
DETERMINISTIC
BEGIN
    RETURN salary*0.5;
END

select emp_id,first_name,salary,calcuate_bonus(salary) from Employees e 


DROP PROCEDURE IF EXISTS GetEmployeesBySalary;



DELIMITER $$

CREATE TRIGGER after_employee_insert
AFTER INSERT ON Employees
FOR EACH ROW
BEGIN
    INSERT INTO employee_log (emp_id, action, created_at)
    VALUES (NEW.emp_id, 'Inserted', NOW());
END

$$

DELIMITER ;

create table employee_log(
emp_id int,
action varchar(50),
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
)


INSERT INTO employees.Employees
(emp_id, first_name, last_name, dept_id, salary, hire_date, manager_id)
VALUES(112, 'Ashok', 'Rohit', 4, 90000.00, '2021-09-12', 103);

select * from employee_log el 




select * from Employees e 

Explain select * from Employees e where emp_id=101;

Explain select * from Employees e where first_name='Raj';


SET @running_total := 0;
select cumulative_sum from (
SELECT
    salary,
    (@running_total := @running_total + salary) AS cumulative_sum
FROM
    Employees e 
ORDER BY
    salary) t order by cumulative_sum desc limit 1 ;

Data Analysis - Power BI Notes

Rajasekaran Palraj — Wed, 19 Nov 2025 01:52:21 +0000

Power BI DAX (Data Analysis Expressions) functions are grouped into categories. I’ll explain the main types, then give sample data and queries for each so you can see how they’re used in practice.

What is DAX Used for?

Create New Columns on a table
Method of connecting Disparate Data Sources with multiple Key Columns

Calculated Columns
Create New Columns on a table
Method for Connecting Disparate Data Sources with Multiple Key
Columns

Calculated Measures
Create Dynamic Calculations for Reporting
Ratios/Percentages
Time Intelligence Calulations
Complex Relationships

Calculated Tables
Create a new table derived from another table
Can be used to create a date table when one doesn’t exist already

Navigation Funcations:

RElated TAble

🔹 Categories of DAX Functions

Aggregate Functions

SUM, AVERAGE, MIN, MAX, COUNT, DISTINCTCOUNT

Logical Functions

IF, AND, OR, SWITCH

Text Functions

CONCATENATE, LEFT, RIGHT, LEN, SEARCH, REPLACE

Date & Time Functions

TODAY, NOW, YEAR, MONTH, DAY, EOMONTH, DATEDIFF

Filter Functions

CALCULATE, FILTER, ALL, ALLEXCEPT, RELATEDTABLE

Time Intelligence Functions

TOTALYTD, SAMEPERIODLASTYEAR, DATESYTD, PREVIOUSMONTH, PARALLELPERIOD

Mathematical & Trigonometric Functions

ROUND, ROUNDUP, ROUNDDOWN, ABS, DIVIDE, POWER

Information Functions

ISBLANK, ISNUMBER, ISTEXT, ISFILTERED

Relationship Functions

RELATED, RELATEDTABLE, USERELATIONSHIP

Iterator (X) Functions

SUMX, AVERAGEX, MAXX, MINX, COUNTX

Stacked Bar Chart:

select Y axis and X axis and legend
Rename
Small multiples in Power BI is a visualization feature that breaks down a single chart into a grid of smaller, individual charts, each displaying data for a specific category of a chosen dimension

Types Of Bar Charts
There are mainly two variations of Bar Chats. Depending on the situation charts are used.

Clustered Bar Charts: This is a default Bar chart where all the categories are displayed against the same value category.
Stacked Bar Charts: These are the basic typed charts that allow the comparison of one category to another category.

Main Parts of Stacked Bar Charts
Title: It denotes the information about the chart
X-axis: It is the individual entry for the category to be presented
Legend: It is the different category that will contribute to the charts
Y-axis: It is for the value against each type of category
Bars: These heights represent the total value of all the legend

The format pane in Power BI custom visuals

https://www.microsoft.com/en-us/security/business/microsoft-intune

Types of graphs:
Bar chart(Stacked, Clustered, 100% stacked)
Column chart(Stacked, Clustered, 100% stacked)

visual calculations

A visual calculation is a DAX calculation defined and executed directly on a visual. Visual calculations make it easier to create calculations that were previously hard to create, leading to simpler DAX, easier maintenance, and better performance.

Semantic Model:

Paginated Reports:

1: Dimension Modeling

Multiple Fact Table
Role Playing Table
Data Analyst of the future
Data Modeling

Dimension Model - organize data
fact table - fact is an event that may or may not include measures
Dimension table - category of information or noun , descriptive
Attribute - (Column in dimension table ) descriptor of the object

Fact tables

Conceptual Model
Logical Model

surogate key

calculate

useRElationship

Data Analysis Pandas Notes

Rajasekaran Palraj — Sat, 11 Oct 2025 04:49:44 +0000

Pandas (stands for Python Data Analysis) is an open-source software library designed for data manipulation and analysis.

Revolves around two primary Data structures: Series (1D) and DataFrame (2D)
Built on top of NumPy, efficiently manages large datasets, offering tools for data cleaning, transformation, and analysis.
Tools for working with time series data, including date range generation and frequency conversion. For example, we can convert date or time columns into pandas’ datetime type using pd.to_datetime(), or specify parse_dates=True during CSV loading.
Seamlessly integrates with other Python libraries like NumPy, Matplotlib, and scikit-learn.
Provides methods like .dropna() and .fillna() to handle missing values seamlessly

Here is a various tasks that we can do using Pandas:

Data Cleaning, Merging and Joining: Clean and combine data from multiple sources, handling inconsistencies and duplicates.
Handling Missing Data: Manage missing values (NaN) in both floating and non-floating point data.
Column Insertion and Deletion: Easily add, remove or modify columns in a DataFrame.
Group By Operations: Use "split-apply-combine" to group and analyze data.
Data Visualization: Create visualizations with Matplotlib and Seaborn, integrated with Pandas.

Pandas Dataframe:
A Pandas DataFrame is a two-dimensional table-like structure in Python where data is arranged in rows and columns. It’s one of the most commonly used tools for handling data and makes it easy to organize, analyze and manipulate data. It can store different types of data such as numbers, text and dates across its columns. The main parts of a DataFrame are:

Data: Actual values in the table.
Rows: Labels that identify each row.
Columns: Labels that define each data category.

Creating Empty DataFrame:

import pandas as pd

df = pd.DataFrame()

print(df)

Creating a DataFrame from a List
A simple way to create a DataFrame is by using a single list. Pandas automatically assigns index values to the rows when you pass a list.

Each item in the list becomes a row.
The DataFrame consists of a single unnamed column.

import pandas as pd

lst = ['Geeks', 'For', 'Geeks', 'is', 
            'portal', 'for', 'Geeks']

df = pd.DataFrame(lst)
print(df)

Creating a DataFrame from a List of Dictionaries

It represents data where each dictionary corresponds to a row. This method is useful for handling structured data from APIs or JSON files. It is commonly used in web scraping and API data processing since JSON responses often contain lists of dictionaries.

import pandas as pd

dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}

df = pd.DataFrame(dict)

print(df)

Add index Explicitly:
df = pd.DataFrame(dict,index=['Rollno1','Rollno2','Rollno3','Rollno4'])

Method #2: Using from_dict() function

df = pd.DataFrame.from_dict(dict)

Create dataframe by passing lists variable to dictionary

import pandas as pd

# dictionary of lists
name=['aparna', 'pankaj', 'sudhir', 'Geeku']
degree=['MBA','BCA', 'M.Tech', 'MBA']
score=[90, 40, 80, 98]
dict = {'name':name,
        'degree':degree ,
        'score':score}

df = pd.DataFrame(dict,index=['Rollno1','Rollno2','Rollno3','Rollno4'])

Pandas Dataframe Index:

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
        'Age': [25, 30, 22, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Salary': [50000, 55000, 40000, 70000, 48000]}

df = pd.DataFrame(data)
print(df.index)  # Accessing the index

Custom Index:

# Set 'Name' column as the index
df_with_index = df.set_index('Name')

Resetting the Index

# Reset the index back to the default integer index
df_reset = df.reset_index()
print(df_reset)

Indexing with loc

row = df.loc['Alice']
print(row)

Changing the Index

# Set 'Age' as the new index
df_with_new_index = df.set_index('Age')
print(df_with_new_index)

Accessing Columns From DataFrame
Columns in a DataFrame can be accessed individually using bracket notation Accessing a column retrieves that column as a Series, which can then be further manipulated.

# Access the 'Age' column
age_column = df['Age']
print(age_column)

Accessing Rows by Index

To access specific rows in a DataFrame, you can use iloc (for positional indexing) or loc (for label-based indexing). These methods allow you to retrieve rows based on their index positions or labels.

# Access the row at index 1 (second row)
second_row = df.iloc[1]
print(second_row)

Accessing Multiple Rows or Columns
You can access multiple rows or columns at once by passing a list of column names or index positions. This is useful when you need to select several columns or rows for further analysis.

# Access the first three rows and the 'Name' and 'Age' columns
subset = df.loc[0:2, ['Name', 'Age']]
df.loc[2,'Gender']
print(subset)

. Accessing Rows Based on Conditions
Pandas allows you to filter rows based on conditions, which can be very powerful for exploring subsets of data that meet specific criteria.

Access rows where 'Age' is greater than 25

filtered_data = df[df['Age'] > 25]
print(filtered_data)

Accessing Specific Cells with at and iat
If you need to access a specific cell, you can use the .at[] method for label-based indexing and the .iat[] method for integer position-based indexing. These are optimized for fast access to single values.

Access the 'Salary' of the row with label 2

salary_at_index_2 = df.at[2, 'Salary']
print(salary_at_index_2)

output = data.iat[row, column]

Indexing and Selecting Data with Pandas

first = data["Age"]

first.head(5)

Selecting Multiple Columns first = data[["Age", "College", "Salary"]]

Indexing with .loc[ ]
The.loc[] function is used for label-based indexing. It allows us to access rows and columns by their labels. Unlike the indexing operator, it can select subsets of rows and columns simultaneously which offers flexibility in data retrieval.

Selecting a Single Row by Label

import pandas as pd
data = pd.read_csv("/content/nba.csv", index_col="Name")

row = data.loc["Avery Bradley"]
print(row)

Concatenating DataFrame using .concat()

import pandas as pd

data1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
         'Age': [27, 24, 22, 32],
         'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
         'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}

data2 = {'Name': ['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
         'Age': [17, 14, 12, 52],
         'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
         'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}

df = pd.DataFrame(data1, index=[0, 1, 2, 3])

df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])

print(df, "\n\n", df1)

Concatenating DataFrames by Setting Logic on Axes We can modify the concatenation by setting logic on the axes. Specifically we can choose whether to take the Union (join='outer') or Intersection (join='inner') of columns.

import pandas as pd

data1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
         'Age': [27, 24, 22, 32],
         'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
         'Qualification': ['Msc', 'MA', 'MCA', 'Phd'],
         'Mobile No': [97, 91, 58, 76]}

data2 = {'Name': ['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'],
         'Age': [22, 32, 12, 52],
         'Address': ['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],
         'Qualification': ['MCA', 'Phd', 'Bcom', 'B.hons'],
         'Salary': [1000, 2000, 3000, 4000]}

df = pd.DataFrame(data1, index=[0, 1, 2, 3])

df1 = pd.DataFrame(data2, index=[2, 3, 6, 7])

print(df, "\n\n", df1)


res2 = pd.concat([df, df1], axis=1, join='inner')

res2

Now we set axes join = outer for union of dataframe which keeps all columns from both DataFrames.

res2 = pd.concat([df, df1], axis=1, sort=False)

res2

Concatenating DataFrames by Ignoring Indexes

res = pd.concat([df, df1], ignore_index=True)

res

Concatenating DataFrame with group keys : If we want to retain information about the DataFrame from which each row came, we can use the keys argument. This assigns a label to each group of rows based on the source DataFrame.

frames = [df, df1 ]

res = pd.concat(frames, keys=['x', 'y'])
res

Concatenating Mixed DataFrames and Series We can also concatenate a mix of Series and DataFrames. If we include a Series in the list, it will automatically be converted to a DataFrame and we can specify the column name.

import pandas as pd

data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

df = pd.DataFrame(data1,index=[0, 1, 2, 3])

s1 = pd.Series([1000, 2000, 3000, 4000], name='Salary')

print(df, "\n\n", s1)

res = pd.concat([df, s1], axis=1)

res

Merging DataFrame
Merging DataFrames in Pandas is similar to performing SQL joins. It is useful when we need to combine two DataFrames based on a common column or index. The merge() function provides flexibility for different types of joins.

Merging DataFrames Using One Key

import pandas as pd

data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],}

data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

df = pd.DataFrame(data1)

df1 = pd.DataFrame(data2)


print(df, "\n\n", df1)

res = pd.merge(df, df1, on='key')

res

Merging DataFrames Using Multiple Keys We can also merge DataFrames based on more than one column by passing a list of column names to the on argument.

import pandas as pd

data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K1', 'K0', 'K1'],
         'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],}

data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K0', 'K0', 'K0'],
         'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}

df = pd.DataFrame(data1)

df1 = pd.DataFrame(data2)


print(df, "\n\n", df1)

res1 = pd.merge(df, df1, on=['key', 'key1'])

res1

Merging DataFrames Using the how Argument

res = pd.merge(df, df1, how='left', on=['key', 'key1'])

res

MERGE METHOD JOIN NAME DESCRIPTION
left LEFT OUTER JOIN Use keys from left frame only
right RIGHT OUTER JOIN Use keys from right frame only
outer FULL OUTER JOIN Use union of keys from both frames
inner INNER JOIN Use intersection of keys from both frames

Joining DataFrame
The .join() method in Pandas is used to combine columns of two DataFrames based on their indexes. It's a simple way of merging two DataFrames when the relationship between them is primarily based on their row indexes. It is used when we want to combine DataFrames along their indexes rather than specific columns.

Joining DataFrames Using .join() If both DataFrames have the same index, we can use the .join() function to combine their columns. This method is useful when we want to merge DataFrames based on their row indexes rather than columns.

res = df.join(df1)

res

Pivot Table

# importing pandas
import pandas as pd

# creating dataframe
df = pd.DataFrame({'Product': ['Carrots', 'Broccoli', 'Banana', 'Banana',
                               'Beans', 'Orange', 'Broccoli', 'Banana'],
                   'Category': ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
                                'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
                   'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
                   'Amount': [270, 239, 617, 384, 626, 610, 62, 90]})

pivot = df.pivot_table(index=['Product'],
                       values=['Amount'],
                       aggfunc='sum')
print(pivot)

pivot = df.pivot_table(index=['Category'],
                       values=['Amount'],
                       aggfunc='sum')
print(pivot)

pivot = df.pivot_table(index=['Product', 'Category'],
                       values=['Amount'], aggfunc='sum')
print(pivot)

pivot = df.pivot_table(index=['Category'], values=['Amount'],
                       aggfunc={'median', 'mean', 'min'})
print(pivot)

Pandas Series

import pandas as pd
import numpy as np

# creating simple array
data = np.array(['g', 'e', 'e', 'k', 's', 'f',
                 'o', 'r', 'g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
# retrieve the first element
print(ser[0])

Accessing First 5 Elements of Series

print(ser[:5])

Accessing Last 10 Elements of Series

print(ser[-10:])

Accessing First 5 Elements of Series

ser.head(10)

Accessing a Single Element Using index Label

print(ser[16])

Accessing a Multiple Element Using index Label

data = np.array(['g', 'e', 'e', 'k', 's', 'f',
                 'o', 'r', 'g', 'e', 'e', 'k', 's'])
ser = pd.Series(data, index=[10, 11, 12, 13, 14,
                             15, 16, 17, 18, 19, 20, 21, 22])
print(ser[[10, 11, 12, 13, 14]])

Access Multiple Elements by Providing Label of Index


ser = pd.Series(np.arange(3, 9), index=['a', 'b', 'c', 'd', 'e', 'f'])
print(ser[['a', 'd']])

Working with Missing Data in Pandas

Using isnull()

isnull() returns a DataFrame of Boolean value where True represents missing data (NaN). This is simple if we want to find and fill missing data in a dataset.

Example 1: Finding Missing Values in a DataFrame

import pandas as pd
import numpy as np

d = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)

mv = df.isnull()

print(mv)

Example 2: Filtering Data Based on Missing Values

sampleFile

import pandas as pd
d = pd.read_csv("/content/employees.csv")

bool_series = pd.isnull(d["Gender"])
missing_gender_data = d[bool_series]
print(missing_gender_data)

Filling Missing Values in Pandas

Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.

Using fillna()

d = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)

df.fillna(0)

Example 2: Fill with Previous Value (Forward Fill)

df.fillna(method='pad')

Example 3: Fill with Next Value (Backward Fill)

df.fillna(method='bfill')

Example 4: Fill NaN Values with 'No Gender'

d["Gender"].fillna('No Gender', inplace = True) 
d[10:25]

Using replace()

data.replace(to_replace=np.nan, value=-99)

Using interpolate()

 df.interpolate(method ='linear', limit_direction ='forward')

Dropping Rows with At Least One Null Value

import pandas as pd
import numpy as np

dict = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, 40, 80, 98],
        'Fourth Score': [np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)

df.dropna()

Dropping Rows with All Null Values We can drop rows where all values are missing using dropna(how='all').

Example

df.dropna(how='all')

Dropping Columns with At Least One Null Value

df.dropna(axis=1)

Data Analysis Week 4 Notes

Rajasekaran Palraj — Sun, 24 Aug 2025 17:36:13 +0000

Database:

CREATE DATABASE new_db;
SHOW DATABASES;
USE new_db
DROP DATABASE new_db;
CREATE DATABASE IF NOT EXISTS new_db;
DROP DATABASE IF EXISTS new_db mysql
RENAME TABLE old_database_name.table1 TO new_database_name.table1;
RENAME TABLE old_database_name.table2 TO new_database_name.table2; -- Repeat for all tables in the old database postgres ALTER DATABASE current_database_name RENAME TO new_database_name;

MSSQl
ALTER DATABASE Test MODIFY NAME = Example

Troubleshooting Database Rename Issues
Database is in Use: If you encounter an error saying that the database is in use, you may need to disconnect active users or close any applications connected to the database before renaming it.
Insufficient Privileges: If you receive a permission error, ensure you have administrative privileges or sufficient rights to modify the database. You might need to check your user role and permissions.
Database Name Constraints: Some DBMSs have restrictions on certain characters or reserved words in database names. Ensure your new database name adheres to the naming conventions for the specific SQL system you are using.

CREATE TABLE Customer(
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(50),
LastName VARCHAR(50),
Country VARCHAR(50),
Age INT CHECK (Age >= 0 AND Age <= 99),
Phone int(10)
);

CREATE TABLE SubTable AS
SELECT CustomerID, CustomerName
FROM customer;

CREATE TABLE IF NOT EXISTS Customer

DESC table_name;

DROP TABLE IF EXISTS categories;

SHOW TABLES; (mysql)

SELECT * FROM INFORMATION_SCHEMA.TABLES(other table)

ALTER TABLE Student RENAME Column name TO FIRST_NAME;

ALTER TABLE Student RENAME TO Student_Details;

ALTER TABLE Student ADD marks INT;

ALTER TABLE Student_Details
MODIFY COLUMN phone BIGINT;

ALTER TABLE Student_Details

DROP COLUMN marks;

ALTER TABLE Student_Details ALTER COLUMN age SET DEFAULT 18;

TRUNCATE TABLE EMPLOYEE;

SQL Cloning or Copying a Table

What is a Copying or Cloning Table in SQL
SQL Cloning is an operation that means making a copy of a table. It's like taking a photocopy of a document. This copy can include both the table’s structure (column names, data types, constraints) and optionally its data. The clone table is independent of the original and can be used for testing, backups, or analysis without affecting the original table.

Cloning a table in SQL means making a duplicate copy of an existing table. It's like making a backup so that we can experiment or work with the data without affecting the original table. This saves our the time and effort of creating a completely new table and re-entering all the same data. Cloning can be done with or without data:

With Data: The clone table includes the structure and rows of the original table.
Without Data: Only the structure of the original table is copied.

Methods for Cloning Tables in SQL
There are three different methods to create a clone table in SQL:

Simple Cloning
Deep Cloning
Shallow Cloning

Simple Cloning In this method, the clone table creates a copy of the original table’s structure and data, but constraints like primary keys, unique keys, and auto-increment properties are not preserved.

CREATE TABLE STUDENT_COPY AS SELECT * FROM STUDENT;

Shallow Cloning Shallow cloning is the method in which the clone table gets the same structure as the original table but it does not inherits or copy the data from the original table. In other words, we will have the empty table including indices such as primary key, unique key, and auto_increment. It also preserves constraints like primary keys and unique keys.

CREATE TABLE STUDENT_SHALLOW_CLONE LIKE STUDENT;
SELECT * FROM STUDENT_SHALLOW_CLONE;

Deep Cloning This method is widely used for creating the clone tables in SQL as it inherits all the properties of original table including indices such as primary key, unique, and auto_increment as well as inherits the existing data from the original table.

CREATE TABLE STUDENT_DEEP_CLONE LIKE STUDENT;
INSERT INTO STUDENT_DEEP_CLONE SELECT * FROM STUDENT;
SELECT * FROM STUDENT_DEEP_CLONE;

What is Temporary Table in SQL?

A temporary table in SQL is an important tool for maintaining intermediate results during query execution. They help store temporary data without affecting the underlying permanent tables.

CREATE TABLE #EmpDetails (id INT, name VARCHAR(25))

SELECT * FROM Emp1 WHERE Age=24;

SELECT * FROM Customer;

SELECT CustomerName
FROM Customer
where Age = '21';

SELECT Customer_id, COUNT(*) AS order_count
FROM Orders
GROUP BY Customer_id;

SELECT Department, sum(Salary) as Salary
FROM employee
GROUP BY department
HAVING SUM(Salary) >= 50000;

SELECT * FROM Customer ORDER BY Age DESC;

INSERT

INSERT INTO table_name
VALUES (value1, value2, value);

INSERT INTO Student (ROLL_NO, NAME, AGE, ADDRESS, PHONE)
VALUES
(6, 'Amit Kumar', 15, 'Delhi', 'XXXXXXXXXX'),
(7, 'Gauri Rao', 18, 'Bangalore', 'XXXXXXXXXX'),
(8, 'Manav Bhatt', 17, 'New Delhi', 'XXXXXXXXXX'),
(9, 'Riya Kapoor', 10, 'Udaipur', 'XXXXXXXXXX');

INSERT INTO target_table
SELECT * FROM source_table;

specific columns

INSERT INTO Students (Name, Age)
SELECT Name, Age
FROM OldStudents;

Inserting Data Using Transactions
When inserting large amounts of data, you can use SQL transactions to ensure that all rows are inserted correctly. A transaction groups multiple SQL operations into a single unit, so if one operation fails, the entire transaction is rolled back.

Query:

`BEGIN TRANSACTION;

INSERT INTO Customers (CustomerID, CustomerName, ContactName, Country)
VALUES (5, 'Sarah White', 'John White', 'Canada');

INSERT INTO Customers (CustomerID, CustomerName, ContactName, Country)
VALUES (6, 'Mohamed Ibrahim', 'Ahmed Ibrahim', 'UAE');

-- If any error occurs, the transaction will be rolled back

COMMIT;`

Update

UPDATE Customer
SET CustomerName = 'Nitin'
WHERE Age = 22;

UPDATE Customer
SET CustomerName = 'Satyam',
Country = 'USA'
WHERE CustomerID = 1;

If we accidentally omit the WHERE clause, all the rows in the table will be updated, which is a common mistake. Let’s update the CustomerName for every record in the table:

Query:

UPDATE Customer
SET CustomerName = 'Shubham';

Best Practices for Using SQL UPDATE Statement

Always use the WHERE clause:
The most important point when using the UPDATE statement is to always include a WHERE clause unless you genuinely intend to update all rows.
Check your data before updating:

Run a SELECT query to view the data you want to update before executing the UPDATE statement. This helps avoid accidental data modifications.

SELECT * FROM Customer WHERE Age = 22;

Use transactions for critical updates:

When performing updates in a production environment, consider using transactions. Transactions allow you to commit or roll back changes as needed.

BEGIN TRANSACTION;
UPDATE Customer SET CustomerName = 'John' WHERE CustomerID = 3;
COMMIT; -- Use ROLLBACK to undo if necessary

Test on a small dataset first: If you are uncertain about the impact of your UPDATE, test it on a small subset of data to ensure the changes are as expected.

delete

DELETE FROM GFG_Employees
WHERE NAME = 'Rithvik';

delete All records from table

DELETE FROM GFG_Employees;
Or
DELETE * FROM GFG_Employees;

START TRANSACTION;
DELETE FROM GFG_Employees WHERE department = 'Development';
-- If needed, you can rollback the deletion
ROLLBACK;

What Are Duplicate Rows?
Duplicate rows are records in a database that have identical values in one or more columns. These rows often arise due to issues like multiple imports, user errors, or missing constraints like primary keys or unique indexes. SQL query to delete duplicate rows typically involves identifying duplicates using functions like ROW_NUMBER() or COUNT() and making sure that only one copy of each record is kept in the table. If not handled properly, duplicates can lead to:

Inaccurate Data Reporting: Reports may contain false information.
Storage Waste: Redundant records consume unnecessary space.
Decreased Query Performance: Queries on large tables with duplicates may perform poorly.
Why You Should Remove Duplicate Rows
Data Integrity: Duplicates can distort reports and analyses, leading to incorrect insights.
Optimal Performance: Redundant data can slow down queries, especially when dealing with large datasets.
Efficient Storage: Removing duplicates helps optimize storage usage, keeping your database lean.

Best Practices to Prevent Duplicates
While identifying and removing duplicates is essential, preventing them is even better. Here are some best practices to ensure that duplicates don’t enter your database in the first place:

Use Primary Keys or Unique Constraints: These ensure that each record is unique, preventing accidental duplication.
Data Validation: Implement validation rules in your application to prevent duplicate entries.
Indexing: Create unique indexes on columns that must remain unique, like contact numbers or email addresses.
Regular Data Cleaning: Periodically run data-cleaning queries to identify and remove any newly inserted duplicates.

SELECT DATE_FORMAT('2025-08-20 08:30:00', '%W, %M %D, %Y %r');
-- Output: Wednesday, August 20th, 2025 08:30:00 AM
Common format specifiers include:
%Y: Four-digit year
%y: Two-digit year
%M: Full month name (January-December)
%m: Numeric month (01-12)
%b: Abbreviated month name (Jan-Dec)
%D: Day of the month with English suffix (1st, 2nd, 3rd, etc.)
%d: Numeric day of the month (01-31)
%W: Full weekday name (Sunday-Saturday)
%w: Numeric weekday (0=Sunday, 6=Saturday)
%H: Hour (00-23)
%h: Hour (01-12)
%i: Minutes (00-59)
%s: Seconds (00-59)
%f: Microseconds (000000-999999)
%r: Time in 12-hour format with AM/PM (hh:mm:ss AM/PM)

SELECT CURTIME();

SELECT CURDATE()

Types of MySQL JOIN

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
CROSS JOIN
Self Join
Update JOin

UPDATE students
JOIN courses ON students.course_id = courses.course_id
JOIN grades ON students.student_id = grades.student_id
SET grades.grade = grades.grade + 5
WHERE courses.course_name = 'Science';

Delete Join

DELETE emp FROM employees emp
JOIN salaries sal ON emp.employee_id = sal.employee_id
WHERE sal.salary < 50000;

Window Functions in SQL

SQL window functions are essential for advanced data analysis and database management. It is a type of function that allows us to perform calculations across a specific set of rows related to the current row. These calculations happen within a defined window of data and they are particularly useful for aggregates, rankings and cumulative totals without modifying the dataset.

Types of Window Functions in SQL

aggregate window functions
ranking window functions

Aggregate Window Function Aggregate window functions calculate aggregates over a window of rows while retaining individual rows. Common aggregate functions include:

SUM(): Sums values within a window.
AVG(): Calculates the average value within a window.
COUNT(): Counts the rows within a window.
MAX(): Returns the maximum value in the window.
MIN(): Returns the minimum value in the window.
LAG()/LEAD()

Ranking Window Functions These functions provide rankings of rows within a partition based on specific criteria. Common ranking functions include:

RANK(): Assigns ranks to rows, skipping ranks for duplicates.
DENSE_RANK(): Assigns ranks to rows without skipping rank numbers for duplicates.
ROW_NUMBER(): Assigns a unique number to each row in the result set.

CASE Statements in SQL

CASE lets you add conditional logic inside queries.

✅ Example 1: Simple CASE
SELECT emp_id,
first_name,
salary,
CASE
WHEN salary >= 100000 THEN 'High'
WHEN salary BETWEEN 60000 AND 99999 THEN 'Medium'
ELSE 'Low'
END AS salary_band
FROM Employees;

👉 Categorizes employees into salary bands.

✅ Example 2: CASE inside ORDER BY
SELECT emp_id, first_name, dept_id
FROM Employees
ORDER BY
CASE
WHEN dept_id = 1 THEN 1
WHEN dept_id = 2 THEN 2
ELSE 3
END;

👉 Custom sorting (Dept 1 first, then Dept 2, then others).

✅ Example 3: CASE inside WHERE
SELECT emp_id, first_name, salary, dept_id
FROM Employees
WHERE salary >
CASE
WHEN dept_id = 1 THEN 70000
WHEN dept_id = 2 THEN 50000
ELSE 40000
END;

Type Indexes:

Clustered index
Non-clustered index

Clustered Index
A clustered index is the type of indexing that establishes a physical sorting order of rows. The data rows are stored directly in the order of the indexed column(s). Each table can have only one clustered index because it dictates the data’s physical storage. A clustered index is like a Dictionary in which the sorting order is alphabetical and there is no separate index page.

Non-Clustered Index
Non-Clustered index is an index structure separate from the data stored in a table that reorders one or more selected columns. The non-clustered index is created to improve the performance of frequently used queries not covered by a clustered index. It's like a textbook, the index page is created separately at the beginning of that book.

Other Index types:

Unique Index
Composite (or Compound) Index
Filtered Index(mysql wont support)
Columnstore Index
Hash Index

Data Normalization:

ACID Properties

Transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliability even in case of failures / errors. This is where the ACID properties come into play.

ACID stands for Atomicity, Consistency, Isolation, and Durability.

Data Analysis Week 3 Notes

Rajasekaran Palraj — Thu, 14 Aug 2025 17:24:07 +0000

Structured Query Language (SQL) is the standard language used to interact with relational databases.

Mainly used to manage data. Whether you want to create, delete, update or read data, SQL provides the structure and commands to perform these operations.
Widely supported across various database systems like MySQL, Oracle, PostgreSQL, SQL Server and many others.
Mainly works with Relational Databases (data is stored in the form of tables)

After your MySQL environment is set up, you can write your SQL program. Below is the example to display " Hello World" using SQL.

Create a database named test_db

CREATE DATABASE test_db;

Use the test_db database

USE test_db;

Create a table named greetings

CREATE TABLE greetings (
    id INT AUTO_INCREMENT PRIMARY KEY,
    message VARCHAR(255)
);

Insert the message 'Hello, World!' into the greetings table

INSERT INTO greetings (message)
VALUES ('Hello, World!');

Retrieve the message from the greetings table

SELECT message FROM greetings;

Why Learn SQL?
SQL's integration with various technologies makes it essential for managing and querying data in databases. Whether it is in traditional relational databases (RDBMS) or modern technologies such as machine learning, AI and blockchain, SQL plays a key role. It works effortlessly with DBMS to help users interact with data, whether stored in structured RDBMS or other types of databases.

Data Science & Analytics: Used for querying large datasets, data cleaning and analysis. Analysts use SQL to generate reports and insights that inform business decisions.
Machine Learning & AI: Helps in preparing and managing the data required for training machine learning models and AI algorithms. It is used for data cleaning, transformation, and extraction.
Web Development: Used to manage user data, e-commerce transactions, and content management in websites and applications built with frameworks like Django, Node.js, and Ruby on Rails.
Cloud and Big Data: SQL is integrated into cloud-based databases (e.g., Amazon RDS, Microsoft Azure SQL) and Big Data platforms (e.g., Apache Hive) to enable seamless data querying and management.
Blockchain and Decentralized Systems: In blockchain systems, SQL can be used to manage off-chain data, providing efficient data storage and retrieval alongside decentralized ledger technology

SQL
Structured Query Language (SQL) is the standard language used to interact with relational databases.

Allows users to store, retrieve, update, and manage data efficiently through simple commands.
Known for its user-friendly syntax and powerful capabilities, SQL is widely used across industries.

How Does SQL Work?
We interact with databases using SQL queries.
SQL Engine breaks down, optimizes and executes these queries efficiently
DBMS tools like MySQL and SQL Server have their own SQL engine and an interface where users can write and execute SQL queries..

Key Components of a SQL System
Databases : A database is a structured collection of data. It organizes data into tables, which are like spreadsheets with rows (records) and columns (fields) .
Tables: Each table enforces rules and relationships among its columns for data integrity.
Indexes: Indexes speed up queries by allowing the database to quickly locate data without scanning the entire table.
Views: A view is a virtual table—basically a saved SELECT statement you can query like a table.
Stored Procedures: These are pre-written SQL scripts stored inside the database. They can receive inputs, run complex logic, and return results boosting performance, reusability, and security.
Transactions: A transaction groups multiple SQL operations into a single unit. It ensures all changes are applied successfully or none are, preserving data integrity (ACID properties)
Security & Permissions: SQL includes tools to restrict access, letting DBAs assign who can do what whether it's accessing tables, executing procedures, or changing structures.
Joins: Joins combine data from multiple tables based on relationships essential for querying across related datasets.

Rules for Writing SQL Queries
There are certain rules for SQL which would ensure consistency and functionality across databases. By following these rules, queries will be well formed and well executed in any database.

End with Semicolon (;): Each SQL statement must end with a semicolon to execute properly.
Case Insensitivity: SQL keywords (e.g., SELECT, INSERT) are not case-sensitive. However, table/column names may be case-sensitive depending on the DBMS.
Whitespace Allowed: Queries can span multiple lines, but use spaces between keywords and names.
Reserved Words: Avoid using SQL keywords as names. If needed, wrap them in quotes (" ") or backticks (`).

What are Different SQL Commands or Queries?
Structured Query Language (SQL) commands are standardized instructions used by developers to interact with data stored in relational databases. These commands allow for the creation, manipulation, retrieval and control of data, as well as database structures. SQL commands are categorized based on their specific functionalities:

SQL Data Types

In SQL, each column must be assigned a data type that defines the kind of data it can store, such as integers, dates, text, or binary values. Choosing the correct data type is crucial for data integrity, query performance and efficient indexing.

Benefits of using the right data type:

Memory-efficient storage
Accurate operations (e.g., calculations, sorting)
Consistency in stored values
Validation of input data
SQL data types are broadly categorized into several groups

1. Numeric Data Types
Numeric data types are fundamental to database design and are used to store numbers, whether they are integers, decimals or floating-point numbers. These data types allow for mathematical operations like addition, subtraction, multiplication and division, which makes them essential for managing financial, scientific and analytical data.

Data Type Description Range
BIGINT Large integer numbers -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
INT Standard integer values -2,147,483,648 to 2,147,483,647
SMALLINT Small integers -32,768 to 32,767
TINYINT Very small integers 0 to 255
DECIMAL Exact fixed-point numbers (e.g., for financial values) -10^38 + 1 to 10^38 - 1
NUMERIC Similar to DECIMAL, used for precision data -10^38 + 1 to 10^38 - 1
MONEY For storing monetary values -922,337,203,685,477.5808 to 922,337,203,685,477.5807
SMALLMONEY Smaller monetary values -214,748.3648 to 214,748.3647

Approximate Numeric Datatype
These types are used to store approximate values, such as scientific measurements or large ranges of data that don't need exact precision.

Data Type Description Range
FLOAT Approximate numeric values -1.79E+308 to 1.79E+308
REAL Similar to FLOAT, but with less precision -3.40E+38 to 3.40E+38

2. Character and String Data Types
Character data types are used to store text or character-based data. The choice between fixed-length and variable-length data types depends on the nature of your data.

Data Type Description
Char

The maximum length of 8000 characters. (Fixed-Length non-Unicode Characters)

Varchar

The maximum length of 8000 characters. (Variable-Length non-Unicode Characters)

*Varchar(max)
*
The maximum length of 2^31 - 1 characters(SQL Server 2005 only). (Variable Length non-Unicode data)

Text

The maximum length of 2,127,483,647 characters(Variable Length non-Unicode data)

Unicode Character String Data Types
Unicode data types are used to store characters from any language, supporting a wider variety of characters. These are given in below table.

Data Type

Description

Nchar

The maximum length of 4000 characters(Fixed-Length Unicode Characters)

Nvarchar

The maximum length of 4000 characters.(Variable-Length Unicode Characters)

Nvarchar(max)

The maximum length of 2^31 - 1 characters(SQL Server 2005 only). (Variable Length Unicode data)

3. Date and Time Data Type
SQL provides several data types for storing date and time information. They are essential for managing timestamps, events and time-based queries. These are given in the below table.

Data Type Description
Storage Size

DATE

stores the data of date (year, month, day)

3 Bytes

TIME

stores the data of time (hour, minute,second)

3 Bytes

DATETIME

store both the data and time (year, month, day, hour, minute, second)

8 Bytes

4. Binary Data Types in SQL
Binary data types are used to store binary data such as images, videos or other file types. These include:

Data Type Description Max Length
Binary Fixed-length binary data. 8000 bytes
VarBinary Variable-length binary data. 8000 bytes
Image Stores binary data as images. 2,147,483,647 bytes
5. Boolean Data Type in SQL
The BOOLEAN data types are used to store logical values, typically TRUE or FALSE. It is commonly used for flag fields or binary conditions.

Special Data Types SQL also supports some specialized data types for advanced use cases:

XML Data Type: Used to store XML data and manipulate XML structures in the database
Spatial Data Type (Geometry): stores planar spatial data, such as points, lines, and polygons, in a database table.

SQL Operators

Types of SQL Operators
SQL operators can be categorized based on the type of operation they perform. Here are the primary types of SQL operators:

Arithmetic Operator
Comparison Operator
Logical Operator
Bitwise Operators
Compound Operators
Special Operators

SQL Arithmetic Operators
Arithmetic operators in SQL are used to perform mathematical operations on numeric data types in SQL queries. Some common arithmetic operators:

Example: Arithmetic Operations
In this example, we calculates a 5% increment on employee salaries and returns both the original and updated salary values.

Query:

SELECT emp_salary, emp_salary * 1.05 AS "Revised Salary" FROM employee;

SQL Comparison Operators
Comparison Operators in SQL are used to compare one expression's value to other expressions. SQL supports different types of comparison operator,

SELECT * FROM MATHS WHERE MARKS=50;

SQL Logical Operators
Logical Operators in SQL are used to combine or manipulate conditions in SQL queries to retrieve or manipulate data based on specified criteria..

Operator Description
AND
Logical AND compares two Booleans as expressions and returns true when both expressions are true.

OR

Logical OR compares two Booleans as expressions and returns true when one of the expressions is true.

NOT
Not takes a single Boolean as an argument and change its value from false to true or from true to false.

SELECT * FROM employee WHERE emp_city = 'Allahabad' AND emp_country = 'India';

SQL Bitwise Operators
Bitwise operators in SQL are used to perform bitwise operations on binary values in SQL queries, manipulating individual bits to perform logical operations at the bit level. Some SQL Bitwise Operators are:

Operator

Description

Bitwise AND operator

Bitwise OR operator

Bitwise XOR (exclusive OR) operator

Bitwise NOT (complement) operator

Left shift operator

Right shift operator

SQL Compound Operators
Compound operators combine an operation with assignment. These operators modify the value of a column and store the result in the same column in a single step. Some Compound operators are:

Operator

Description

Add and assign

Subtract and assign

Multiply and assign

Divide and assign

Modulo and assign

Bitwise AND and assign

Bitwise XOR and assign

Bitwise OR and assign

SQL Special Operators
SQL also provides several special operators that serve specific functions such as filtering data based on a range, checking for existence, and comparing sets of values.

Operators Description
ALL

ALL is used to select all records of a SELECT STATEMENT. It compares a value to every value in a list of results from a query. The ALL must be preceded by the comparison operators and evaluated to TRUE if the query returns no rows.

ANY
ANY compares a value to each value in a list of results from a query and evaluates to true if the result of an inner query contains at least one row.

BETWEEN
The SQL BETWEEN operator tests an expression against a range. The range consists of a beginning, followed by an AND keyword and an end expression.

IN

The IN operator checks a value within a set of values separated by commas and retrieves the rows from the table that match.

EXISTS

The EXISTS checks the existence of a result of a subquery. The EXISTS subquery tests whether a subquery fetches at least one row. When no data is returned then this operator returns 'FALSE'.

SOME SOME operator evaluates the condition between the outer and inner tables and evaluates to true if the final result returns any one row. If not, then it evaluates to false.
UNIQUE The UNIQUE operator searches every unique row of a specified table.

Data Analytics Week 2 Notes

Rajasekaran Palraj — Wed, 06 Aug 2025 16:56:47 +0000

Python

Programming Language Types

Low-level Language
- OS development
- Embedded system
- Device drivers
High-level Language

Ways of high-level Language Execution

Compiler
- C, C++, java
Interpreter
- Execute line by line Python, perl, metlab

Programming Language Components

Data types
Variables
Operators
Control Structurs
Libraries

Python Features

Simple and Readable
Run seamlessly on all operating Systems
Data Science, Automation and AI
Libray support
Used by big companies like google , Netflix, Nasa

Data Types:

*1. Numeric Data Types *

Integers - This value is represented by int class. It contains positive or negative whole numbers (without fractions or decimals). In Python, there is no limit to how long an integer value can be.
Float - This value is represented by the float class. It is a real number with a floating-point representation. It is specified by a decimal point. Optionally, the character e or E followed by a positive or negative integer may be appended to specify scientific notation.
Complex Numbers - A complex number is represented by a complex class. It is specified as (real part) + (imaginary part)j . For example - 2+3j

2. Sequence Data Types

String - arrays of bytes representing Unicode characters
List - ordered collection of data with similar or different data type (Mutable)
Tuple - ordered collection of data with similar or different data type (ImMutable)

3. Boolean Data Type
Python Data type with one of the two built-in values

True
False

4. Set Data Type
It is an unordered collection of data types that is iterable, mutable, and has no duplicate elements.

*5. Dictionary Data *
A dictionary in Python is a collection of data values, used to store data values like a map, unlike other Python Data Types that hold only a single value as an element, a Dictionary holds a key: value pair. Key-value is provided in the dictionary to make it more optimized.

Operators

Operators	Type
+, -, *, /, %	Arithmetic operator
<, <=, >, >=, ==, !=	Relational operator
AND, OR, NOT	Logical operator
&,	, <<, >>, ~, ^
=, +=, -=, *=, %=	Assignment operator

Flow Control Statements:

Conditional Statements (if, elif, else, Match case)
Looping Statements (for, while)
Loop Control Statements (break, continue, pass)
Exception Handling

Conditional statements
IF
`age = 20

if age >= 18:
print("Eligible to vote.")`

ELIF and Else
`age = 25

if age <= 12:
print("Child.")
elif age <= 19:
print("Teenager.")
elif age <= 35:
print("Young adult.")
else:
print("Adult.")`

Nested IF

`age = 70
is_member = True

if age >= 60:
if is_member:
print("30% senior discount!")
else:
print("20% senior discount.")
else:
print("Not eligible for a senior discount.")`

Ternary Conditional Statement in Python
A ternary conditional statement is a compact way to write an if-else condition in a single line. It’s sometimes called a "conditional expression."

`# Assign a value based on a condition
age = 20
s = "Adult" if age >= 18 else "Minor"

print(s)`

Match-Case Statement in Python
match-case statement is Python's version of a switch-case found in other languages. It allows us to match a variable's value against a set of patterns.

`number = 2

match number:
case 1:
print("One")
case 2 | 3:
print("Two or Three")
case _:
print("Other number")`

For Loop

The for loop is used to iterate over a sequence (e.g., a list, tuple, string, or range) and execute a block of code for each item in the sequence.

for i in range(5): print(i)

While Loop
The while loop is used to repeatedly execute a block of code as long as a specified condition is true.

count = 0 while count < 5: print(count) count += 1

Nested Loops
Loops can be nested within one another to perform more complex iterations. For example, a for loop can be nested inside another for loop to create a two-dimensional iteration.

Python For Loop with String
This code uses a for loop to iterate over a string and print each character on a new line. The loop assigns each character to the variable i and continues until all characters in the string have been processed.

s = "Rajasekaran" for i in s: print(i)

Using range() with For Loop
The range() function is commonly used with for loops to generate a sequence of numbers. It can take one, two, or three arguments:

range(stop): Generates numbers from 0 to stop-1.
range(start, stop): Generates numbers from start to stop-1.
range(start, stop, step): Generates numbers from start to stop-1, incrementing by step.

for i in range(0, 10, 2): print(i)

Output

0 2 4 6 8

List

a list is a built-in dynamic sized array (automatically grows and shrinks). We can store all types of items (including another list) in a list. A list may contain mixed type of items, this is possible because a list mainly stores references at contiguous locations and actual items maybe stored at different locations.

List can contain duplicate items.
List in Python are Mutable. Hence, we can modify, replace or delete the items.
List are ordered. It maintain the order of elements based on how they are added.
Accessing items in List can be done directly using their position (index), starting from 0

`# Creating a Python list with different data types
a = [10, 20, "GfG", 40, True]
print(a)

Accessing elements using indexing

print(a[0]) # 10
print(a[1]) # 20
print(a[2]) # "GfG"
print(a[3]) # 40
print(a[4]) # True

Checking types of elements

print(type(a[2])) # str
print(type(a[4])) # bool`

Create List:
`# List of integers
a = [1, 2, 3, 4, 5]

List of strings

b = ['apple', 'banana', 'cherry']

Mixed data types

c = [1, 'hello', 3.14, True]

print(a)
print(b)
print(c)`

Adding Elements into List
We can add elements to a list using the following methods:

append(): Adds an element at the end of the list.
extend(): Adds multiple elements to the end of the list.
insert(): Adds an element at a specific position.

# Initialize an empty list
a = []

# Adding 10 to end of list
a.append(10)  
print("After append(10):", a)  

# Inserting 5 at index 0
a.insert(0, 5)
print("After insert(0, 5):", a) 

# Adding multiple elements  [15, 20, 25] at the end
a.extend([15, 20, 25])  
print("After extend([15, 20, 25]):", a)

Updating Elements into List
We can change the value of an element by accessing it using its index.

`a = [10, 20, 30, 40, 50]

Change the second element

a[1] = 25
print(a)`

Removing Elements from List
We can remove elements from a list using:

remove(): Removes the first occurrence of an element.
pop(): Removes the element at a specific index or the last element if no index is specified.
del statement: Deletes an element at a specified index.

`a = [10, 20, 30, 40, 50]

Removes the first occurrence of 30

a.remove(30)

print("After remove(30):", a)

Removes the element at index 1 (20)

popped_val = a.pop(1)

print("Popped element:", popped_val)
print("After pop(1):", a)

Deletes the first element (10)

del a[0]

print("After del a[0]:", a)`

Iterating Over Lists
We can iterate the Lists easily by using a for loop or other iteration methods. Iterating over lists is useful when we want to do some operation on each item or access specific items based on certain conditions. Let's take an example to iterate over the list using for loop.

`a = ['apple', 'banana', 'cherry']

Iterating over the list

for item in a:
print(item)`

Nested Lists in Python
A nested list is a list within another list, which is useful for representing matrices or tables. We can access nested elements by chaining indexes.

`matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]

Access element at row 2, column 3

print(matrix[1][2])`

output

6

List Slicing
Python list slicing is fundamental concept that let us easily access specific elements in a list

`a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Get elements from index 1 to 4 (excluded)

print(a[1:4])`

output

[2, 3, 4]

Python List Slicing Syntax
list_name[start : end : step]

Parameters:

start (optional): Index to begin the slice (inclusive). Defaults to 0 if omitted.
end (optional): Index to end the slice (exclusive). Defaults to the length of list if omitted.
step (optional): Step size, specifying the interval between elements. Defaults to 1 if omitted

`a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Get all elements in the list

print(a[::])
print(a[:])`

Dictionaries

Python dictionary is a data structure that stores the value in key: value pairs. Values in a dictionary can be of any data type and can be duplicated, whereas keys can't be repeated and must be immutable.

d = {1: 'Raj', 2: 'Ram', 3: 'Rani'}

Adding and Updating Dictionary Items
We can add new key-value pairs or update existing keys by using assignment.

`d = {1: 'Raj', 2: 'Ram', 3: 'Rani'}

Adding a new key-value pair

d["age"] = 32

Updating an existing value

d[1] = "Sekar"

print(d)`

Output

{1: 'Sekar', 2: 'Ram', 3: 'Rani', 'age': 32}

Removing Dictionary Items
We can remove items from dictionary using the following methods:

del: Removes an item by key.
pop(): Removes an item by key and returns its value.
clear(): Empties the dictionary.
popitem(): Removes and returns the last key-value pair.

d = {1: 'Raj', 2: 'Ram', 3: 'Sekar', 'age':22}

# Using del to remove an item
del d["age"]
print(d)

# Using pop() to remove an item and return the value
val = d.pop(1)
print(val)

# Using popitem to removes and returns
# the last key-value pair.
key, val = d.popitem()
print(f"Key: {key}, Value: {val}")

# Clear all items from the dictionary
d.clear()
print(d)

Nested Dictionaries

Example of Nested Dictionary:

Nested Dictionaries

d = {1: 'Raj', 2: 'Sekaran',
3: {'A': '1', 'B': 'To', 'C': '3'}}

print(d)Nested Dictionaries

Data Analytics Week 1 Notes

Rajasekaran Palraj — Wed, 06 Aug 2025 14:40:28 +0000

Data Analytics

It is a process of examining, cleaning, transforming and interpreting data to discover useful information, draw conclusions and support decision-making. It helps businesses and organizations understand their data better, identify patterns, solve problems and improve overall performance.

**
How data analytics works:**

Define Objectives
Data Collection
Data Cleaning & Data Processing
Analysis Techniques
Interpretation of Results
Visualization and Communication

Technical Skill for Data Analytics

SQL
Statistics
Excel
Data Visualization
Programming Languages
Mathematics

Data

**What is data?**

Data is distinct pieces of information, usually formatted in a special way

**
Data can be generated by:**
Humans
Machines
Human-Machine combines.

Why data is so important?

Data helps in making better decisions.
Data helps in solving problems by finding the reason for underperformance.
Data helps one to evaluate performance.
Data helps one improve processes.
Data helps one understand consumers and the market.

Categories of Data
Data can be categorized into two main parts -

Structured data
Unstructured data

Excel

Ribbon: The toolbar at the top of Excel which is divided into tabs that group related tools and commands.
Formula Bar: Displays the contents or formulas of the active cell and allows editing.
Worksheet Area: The grid where we enter and manipulate our data and it is made up of rows and columns.
Navigation Tools: It includes sheet tabs at the bottom, scroll bars and zoom controls for easy movement within our workbook
All Tabs in Excel Ribbon
Ribbon is divided into several tabs and each serving different purposes:
Home Tab: The most commonly used tab for everyday tasks. It includes formatting options (font, alignment, colors), number formatting, table styles, cell editing and basic tools like AutoSum and Sort & Filter.
Insert Tab: Lets us add tables, charts, images, shapes, hyperlinks, text boxes, symbols and sparklines to make our worksheet more visual and interactive.
Page Layout Tab: Controls how our worksheet looks when printed. Adjust themes, margins, page orientation, size, print area and background here.
Formulas Tab: The central place to create and manage formulas it includes the Function Library for logical, text, date, math and financial functions, plus named ranges and formula auditing tools.
Data Tab: Helps us import, organize and analyze data. Key features include Get & Transform (Power Query), Sort & Filter, Data Validation, Flash Fill, Text to Columns and Forecast tools.

Formulas:
=SUM(Arg1: Arg2): It is used to find the sum of all the numeric data specified in the given range of numbers.
=COUNT(Arg1: Arg2): It is used to count all the number of cells(it will count only number) specified in the given range of numbers.
=MAX(Arg1: Arg2): It is used to find the maximum number from the given range of numbers.
=MIN(Arg1: Arg2): It is used to find the minimum number from the given range of numbers.
=TODAY(): It is used to find today's date.
=SQRT(Arg1): It is used to find the square root of the specified cell.

What is Workbook?
A collection of worksheets is referred to as a workbook (spreadsheets). Workbooks are your Excel files. You'll need to create a new workbook every time you start a new project in Excel. There are various ways to begin working with an Excel workbook. You can either start from scratch or use a pre-designed template to create a new workbook or access an existing one.

Characteristics of a good worksheet
An appealing worksheet should have specific titles that indicate what it is about as well as pictures or other clipart to draw some attention.
It is important that the paper and the writing are in good contrast so that eye strain is minimized.
Users should be able to do the worksheet independently by following the directions with examples.
Despite the fact that the worksheet needs to illuminate a pattern in problem solving or usage, it shouldn't grind an idea into dust.
Once the worksheet has been completed, the user should be able to explain how it was formed or what it was meant to teach: the answer to this question should be linked to the worksheet's title.

Important functions

Create table (Home > Style > Format Table)
Count Nulls (select columns then =COUNTBLANK())
sorting( Select Data > Data Tab> Sort)
Filter
- Data between Filter
- Number greater than or equals filter
- Clear Filter
Total Row function
Conditional Formatting
- Select Range
- Choose Formatting
- Data Bars
- Color Scales
- Manage Rule
- Top/Bottom Rules
If Function
SumIf/SumIfs
- Range
- Criteria
- Sum Range
AvergeIf/AverageIfs

Chart Types

Understanding Your Data and Visualization Goals
Before choosing the right chart it's essential to understand the nature of your data and the message you want to convey. To understand consider the following questions:

What is the primary purpose of the visualization?
What type of data are you working with?
How many variables are you visualizing?
What is the volume of data points?

1. Comparison Charts

Bar Chart: Compares categories using horizontal bars (e.g., product sales).
Column Chart: Uses vertical bars, great for time-based or hierarchical data.
Radar Chart: Displays multiple variables in a circular layout, ideal for comparing performance across categories

2. Trend Charts

Line Chart: Shows trends over time with connected points (e.g., monthly sales).
Area Chart: Like a line chart but fills the space below, showing cumulative totals.
Waterfall Chart: Displays how increases and decreases affect a total (e.g., profit changes).

3. Relationship Charts

Scatter Chart: Plots two variables to show their relationship or trend.
Bubble Chart: Adds size and color to scatter plots for more variable comparison.
Heatmap: Uses color in grids to reveal patterns, density or relationships in data.

4. Distribution Charts

Histogram: Shows frequency of values within ranges.
Box Plot: Displays data spread, median and outliers.
KDE Plot: Smooth curve showing data distribution shape.
Violin Plot: Combines box plot and density curve to compare group distributions

5. Composition Charts

Pie Chart: Best for small datasets shows each part’s share of the total.
Stacked Bar Chart: Compares parts across categories within bars.
Treemap: Uses nested boxes sized by value to show part-to-whole relationships.

6. Geographical Charts

Choropleth Map: Uses color to show values (like population or income) across regions.
Cartogram: Resizes regions based on data values (e.g., population) distorting geography for emphasis.ed

Handle Null in Elasticsearch Aggrgation

Rajasekaran Palraj — Tue, 05 Aug 2025 07:16:06 +0000

Sample Data

{"system": "aaa","ip":"0.0.0.1"}, {"system": "bbb","ip":"0.0.0.2"}, {"system": null,"ip":"0.0.0.3"}, {"ip":"0.0.0.4"}

Basic aggregation query

{ "aggs" : { "myAggrs" : { "terms" : { "field" : "system" } } }

Output for above:

{ "key": "aaa", "doc_count": 1 }, { "key": "bbb", "doc_count": 1 }

It ignores whether the field is null or if it's missing

The expected result is

{ "key": "aaa", "doc_count": 1 }, { "key": "bbb", "doc_count": 1 }, { "key": null, "doc_count": 2 }

To solve this, we have to change the query like this,
{ "aggs" : { "myAggrs" : { "terms" : { "field" : "system" }, "missing" : "NULL" } }

We will get below
{ "key": "aaa", "doc_count": 1 }, { "key": "bbb", "doc_count": 1 }, { "key": "NULL", "doc_count": 2 }