DEV Community: Rajiv Sambasivan

From ML Tooling to Analytical Governance: Recent Updates to KMDS

Rajiv Sambasivan — Wed, 17 Jun 2026 04:32:36 +0000

Over the last few months I've been refining KMDS, a framework for building repeatable and auditable machine learning systems.

The original motivation behind KMDS was simple:

Many machine learning projects fail long before model selection becomes important.

Teams struggle with questions such as:

What entities are represented in the data?
What is the unit of analysis?
What temporal structure exists?
Which feature engineering strategies are appropriate?
Which modeling assumptions were made?
How are these decisions preserved over time?

Most organizations answer these questions at some point. The problem is that the answers often disappear into notebooks, documents, tickets, or the memories of individual contributors.

KMDS is an attempt to make these decisions explicit, structured, and reusable.

What Changed?

Recent updates have focused on moving beyond workflow automation and toward analytical governance.

1. Metadata-Driven Semantic Data Understanding

The workflow begins with semantic tagging and metadata generation.

Rather than immediately building features or training models, the system first attempts to understand:

attribute types
entities
temporal structure
data quality characteristics

The goal is to establish a semantic foundation before modeling begins.

2. Feature Advisor

One of the new additions is a Feature Advisor service.

Given metadata and project context, the advisor recommends feature engineering strategies for non-numeric attributes.

Examples include:

hierarchical categorical encoding
target encoding strategies
TF-IDF pipelines
sentence embedding approaches
native model handling for modern gradient boosting systems

The objective is not automatic feature engineering.

The objective is to provide design guidance and rationale that helps practitioners make better decisions.

3. Design Governance

A second addition is a Design Governance framework.

Machine learning projects contain many decision points:

classification vs regression
handling class imbalance
interpretability vs predictive performance
validation strategy
calibration requirements
graph-based vs tabular approaches

The Design Governance layer acts as a design-time advisor that captures these considerations and generates implementation guidance.

The output is a structured design blueprint that can be reviewed by humans or supplied to AI coding assistants.

4. Knowledge Preservation

Perhaps the most important change is an increased emphasis on preserving analytical knowledge.

The long-term goal is not simply to create models.

It is to create reusable analytical assets.

Using KMDS tooling, project artifacts can be transformed into a knowledge graph representing:

data understanding
feature engineering decisions
modeling assumptions
operational considerations
generated artifacts

This creates a queryable representation of the analytical lifecycle.

Why This Matters

Most organizations already have documentation.

What they often lack is accessible institutional knowledge.

Critical analytical decisions are frequently distributed across:

repositories
notebooks
presentations
tickets
email threads
individual contributors

When people leave, much of that context leaves with them.

My view is that the real asset is not the agent.

The real asset is the structured analytical knowledge that the agent can access.

If the knowledge is preserved independently of any specific model, tool, or LLM, organizations retain ownership of their analytical reasoning and can recreate capabilities as technology evolves.

Current Direction

The broader goal of KMDS is to make machine learning systems:

more transparent
more auditable
more reproducible
easier to transfer between teams

Recent work has focused on feature governance, design governance, metadata-driven workflows, and knowledge graph generation.

Future work will continue exploring how analytical context can be captured and preserved as a first-class artifact rather than an afterthought.

I would be interested in hearing how others are approaching analytical governance, reproducibility, and knowledge preservation in their own machine learning workflows.

New Features with TSEDA - get the most out of your time series data.

Rajiv Sambasivan — Thu, 07 May 2026 07:37:12 +0000

Update: Automating Time Series Exploration with tseda 📈

A while back, I shared tseda, a tool designed to help you make sense of high-frequency business metrics (like hourly conversion rates or service windows).
Since then, I’ve been working on making the transition from "collecting data" to "understanding data" even faster.

What’s New in tseda?

Automatic Window Management: You no longer have to guess your window sizes. The tool now handles automatic window size assignment and refinement, finding the "signal" in your data without the trial and error.
Notebook Parity: You can now move seamlessly between the tool and Jupyter notebooks. Keep your flow state intact while switching from visual exploration to deep-dive coding.

Why use it?

If you have data at an hourly or greater cadence, you’re likely looking for two things: Forecasting and Anomaly Detection. tseda is built to help you build better apps by actually understanding the underlying patterns of those metrics.

Get Started (or Catch Up):

New README & Docs: github.com/rajivsam/tseda
User Guide: Step-by-step instructions
Video Overview: AI-generated summary

I’m looking for feedback from anyone monitoring metrics at a high cadence. How are you currently handling window refinements? Let’s discuss in the comments!

Would you like me to tailor the technical highlights to focus more on the Markov analysis or the specific Python libraries you used?

KMDS with New Features

Rajiv Sambasivan — Mon, 27 Apr 2026 06:45:53 +0000

A little while ago I developed a python package meant for small data science teams to communicate the rationale and motivation for decisions in developing and modeling data science projects. The package was called KMDS. This package has an upgrade now. You can input your observations in natural language and the package will take care of tagging it appropriately based on a data science project ontology. Conversely, the natural language search is also available, you can query this tool in natural language.
The updated repository with examples is available here:
https://github.com/rajivsam/kmds
Thank you
Rajiv

Explore Your Time Series Data

Rajiv Sambasivan — Mon, 27 Apr 2026 06:40:44 +0000

Do you have business data that you collect at an hourly or greater cadence - for example, site conversion rate per day, average service time for the 10 am - 11 am window etc.? Do you want to understand this data so that you can build better apps based on your understanding - for example, forecast the metric you are monitoring for the next business period, understand if a particular value is anomalous.
If this is of interest to you, check out tseda, a tool to explore and understand your time series data
https://github.com/rajivsam/tseda
There is a video (AI summary) here:
https://www.youtube.com/watch?v=baoJrIpSTE8
There is a user guide here:
https://github.com/rajivsam/tseda/blob/main/docs/user_guide.md
I am happy to answer questions and discuss how you can use this if you are collecting a metric at an hourly cadence or higher.
I am making a version of this high frequency sensor data.
The technical motivation is available here: https://rajivsam.github.io/r2ds-blog/posts/markov_analysis_coffee_prices/

Descriptive Analytics

Rajiv Sambasivan — Wed, 04 Feb 2026 03:12:11 +0000

Descriptive Analytics is a repository that is a collection of recipes for descriptive analysis of enterprise data. It is work in progress. Integration with generative AI tools to combine conventional ML analysis techniques with generative AI tools is ongoing.
See https://www.youtube.com/watch?v=MwXKC_oloH8 for an overview, see
https://github.com/rajivsam/descriptive_analytics for the repository

KMDS, a package for knowledge managment in data science

Rajiv Sambasivan — Thu, 18 Apr 2024 08:17:31 +0000

KMDS is a tool that solves a problem that most folks doing data analysis are frustrated with. You run into a design question, you know you've dealt with this in the past, you just can't recreate the context, question, research and the rationale for picking a solution when you run into the problem next. Here is a new release of the tool with examples of how you use it for both machine learning and analytics workflows. See this document. Here is a short video description.