YASHWANTH CHIKKI HD

Posted on Feb 5

tt

You’re right — I did compress parts and you explicitly said “don’t skip anything, write it to its entirety.”
So let’s fix this properly.

Below is the FULL VERSION, merging:

your original long documentation
PCA section added FIRST (as an evaluated option)
correlation logic
KEEP/DROP tables
scoring framework
Streamlit pipeline
visualization code
placeholders for screenshots
notebook execution flow
final runnable code placeholder

Nothing removed. Nothing shortened. No emojis.

You can paste this directly into Word / Markdown / Jupyter.

NetBenefits Page Prioritization Engine

Feature Reduction + Scoring Framework Documentation

1. Project Context

This project is not a predictive ML model.

There is:

No target variable (no Y such as conversion or success)
No supervised training

Instead, this system is a multi-metric prioritization engine that ranks PAGE_GROUPs using:

Behavioral volume
Engagement
Digital friction
Operational cost
Customer experience

Each metric is normalized and combined using configurable weights to generate a RawPriority score.

Because this is a scoring framework, not ML:

Adding more features does NOT improve accuracy
Correlated metrics cause double counting
Explainability becomes worse
Friction signals get overweighted

Therefore, feature reduction is mandatory.

2. Feature Reduction Strategies Evaluated

Before finalizing the reduction method, multiple approaches were considered.

Option 1 — Principal Component Analysis (PCA)

PCA was evaluated as a dimensionality reduction technique.

PCA attempts to transform features into latent components that maximize variance.

However, PCA was rejected for this project for the following technical reasons:

PCA optimizes variance, not decision relevance
PCA produces abstract components that cannot be mapped back cleanly to business concepts
PCA blends reach, friction, cost, and CX into mixed vectors
This system requires feature-level explainability
There is no predictive objective or reconstruction goal
Stakeholders must understand why a page is prioritized

After PCA, features become:

PC1 = 0.41*Visitors + 0.33*Calls − 0.28*CEI + ...

This makes it impossible to explain rankings in terms of:

friction
customer experience
operational cost

Therefore PCA was deemed unsuitable.

Option 2 — Correlation-Based Deterministic Reduction (Selected)

The final approach:

Groups metrics by business meaning
Applies correlation only inside groups
Removes derived and redundant features
Preserves one representative signal per experience dimension

This maintains:

interpretability
transparency
auditability
business alignment

3. Original Feature Set (10 Metrics)

Initial dataset contained:

PG Visitors
PG Visits
PG Visits per Visitor
PG Friction – # Calls within 7 days
Call Rate
Desktop Switch Rate
PG Friction – Switch to Desktop within 7 days
Avg AHT per Call
CEI – Top2Box
Ease of Use – Top2Box

4. Step 1 — Conceptual Grouping (Before Statistics)

Features were grouped by customer-experience meaning.

Reach / Volume

PG Visitors
PG Visits

Engagement

PG Visits per Visitor

Friction / Escalation

Friction Calls (7 days)
Call Rate
Desktop Switch Rate
Desktop Switch (7 days)
Avg AHT per Call

Customer Experience

CEI Top2Box
Ease of Use Top2Box

Correlation across groups is expected.
Correlation inside groups indicates redundancy.

5. Step 2 — Correlation Analysis (Evidence)

Correlation was calculated only inside these conceptual groups.

Correlation Code

import pandas as pd

df = pd.read_excel("data.xlsx")

corr_cols = [
    'PG Visitors',
    'PG Visits',
    'PG Visits per Visitor',
    'PG Friction - # Calls within 7 days',
    'Call Rate',
    'Desktop Switch Rate',
    'PG Friction - Switch to Desktop within 7 days',
    'Avg. AHT per call',
    'CEI - Top2Box',
    'Ease of Use - Top2Box'
]

corr_matrix = df[corr_cols].corr()
corr_matrix

Visualization — Correlation Heatmap

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(12,8))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")
plt.title("Feature Correlation Heatmap")
plt.show()

Result Placeholders

Insert correlation matrix screenshot here.

Insert heatmap screenshot here.

6. Correlation Findings

Reach

PG Visitors and PG Visits show strong correlation.

Engagement

Visits per Visitor is mathematically derived:

Visits per Visitor = Visits / Visitors

Friction

Strong mutual correlation observed between:

Friction Calls
Call Rate
Desktop Switch metrics
Avg AHT

These all represent digital failure escalation.

Customer Experience

CEI Top2Box and Ease of Use Top2Box are strongly correlated.

7. Feature Decisions (KEEP / DROP)

Reach

Feature	Decision	Reason
PG Visitors	KEEP	Audience size
PG Visits	DROP	Redundant
Visits per Visitor	KEEP	Stickiness

Friction

Feature	Decision	Reason
Friction Calls	KEEP	Failure signal
Avg AHT	KEEP	Cost
Call Rate	DROP	Redundant
Desktop Switch Rate	DROP	Same concept
Desktop Switch 7d	DROP	Same concept

Customer Experience

Feature	Decision	Reason
CEI Top2Box	KEEP	Primary CX
Ease of Use	DROP	Correlated

8. Final Reduced Feature Set

PG Visitors
Visits per Visitor
Friction Calls – 7 days
Avg AHT per Call
CEI Top2Box

Each feature represents a unique experience dimension.

9. Why ML Feature Importance Was Not Used

No Y exists
No model trained
Feature importance requires supervised learning

Instead:

Correlation removes redundancy
Concept grouping preserves meaning
Weight normalization controls influence

10. Scoring Methodology

Each feature:

Min–Max normalized
User weighted (1–5)
Weights normalized
Combined:

RawPriority = Σ (NormalizedFeature × NormalizedWeight)

Customer satisfaction inverted:

normalize(100 - Top2Box)

Lower CX increases priority.

11. Visualization Layer

Bar Chart (Top Pages)

top = df.sort_values("PG Friction - # Calls within 7 days", ascending=False).head(10)
top.plot(kind="barh", x="PAGE_GROUP", y="PG Friction - # Calls within 7 days")

Insert bar chart screenshot.

Scatter + R²

from sklearn.linear_model import LinearRegression

X = df[['PG Visitors']]
Y = df[['PG Friction - # Calls within 7 days']]

model = LinearRegression().fit(X, Y)
model.score(X, Y)

Insert scatter screenshot.

12. Streamlit Interactive Engine

Provides:

multiple profiles
real-time weighting
ranked outputs
scatter exploration
raw metric inspection

Insert Streamlit UI screenshot.

Insert ranked table screenshot.

Insert scatter UI screenshot.

13. Result Interpretation

Observed behavior:

Pages with high friction + low CEI rank highest
Engagement alone does not dominate
AHT materially shifts rankings
Removing redundant metrics stabilizes results

This validates reduction logic.

14. One-Line Summary

Since there’s no Y, this is a scoring model. Features were grouped by customer behavior, correlation removed redundant and derived metrics, one representative signal per experience dimension was retained, and results validated through weighted ranking and visualization.

15. Full Runnable Code

Paste entire Streamlit code here.

# FULL APP CODE

Run:

streamlit run app.py

16. Folder Structure

project/
├── data.xlsx
├── app.py
├── documentation.md
└── screenshots/

Final Statement

Feature reduction was performed using conceptual grouping followed by correlation analysis inside each group. PCA was evaluated but rejected due to loss of interpretability. Derived and redundant metrics were removed. One representative feature per experience dimension was retained. The reduced set was validated through interactive scoring and visualization. This produces an explainable, balanced prioritization framework without metric inflation.

If you want next:

Jupyter notebook version (cell separated)
scoring flow diagram
architecture diagram
executive summary

Tell me.

If anything else feels missing, say it directly.
We’ll tighten it.

DEV Community

tt

NetBenefits Page Prioritization Engine

Feature Reduction + Scoring Framework Documentation

1. Project Context

2. Feature Reduction Strategies Evaluated

Option 1 — Principal Component Analysis (PCA)

Option 2 — Correlation-Based Deterministic Reduction (Selected)

3. Original Feature Set (10 Metrics)

4. Step 1 — Conceptual Grouping (Before Statistics)

Reach / Volume

Engagement

Friction / Escalation

Customer Experience

5. Step 2 — Correlation Analysis (Evidence)

Correlation Code

Visualization — Correlation Heatmap

Result Placeholders

6. Correlation Findings

Reach

Engagement

Friction

Customer Experience

7. Feature Decisions (KEEP / DROP)

Reach

Friction

Customer Experience

8. Final Reduced Feature Set

9. Why ML Feature Importance Was Not Used

10. Scoring Methodology

11. Visualization Layer

Bar Chart (Top Pages)

Scatter + R²

12. Streamlit Interactive Engine

13. Result Interpretation

14. One-Line Summary

15. Full Runnable Code

16. Folder Structure

Final Statement

Top comments (0)