<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rajiv Sambasivan</title>
    <description>The latest articles on DEV Community by Rajiv Sambasivan (@rajivsam).</description>
    <link>https://dev.to/rajivsam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1367373%2Ff1cbd62b-f45f-4bf3-bd4c-7e5b301ec32a.png</url>
      <title>DEV Community: Rajiv Sambasivan</title>
      <link>https://dev.to/rajivsam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rajivsam"/>
    <language>en</language>
    <item>
      <title>From ML Tooling to Analytical Governance: Recent Updates to KMDS</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Wed, 17 Jun 2026 04:32:36 +0000</pubDate>
      <link>https://dev.to/rajivsam/-from-ml-tooling-to-analytical-governance-recent-updates-to-kmds-548n</link>
      <guid>https://dev.to/rajivsam/-from-ml-tooling-to-analytical-governance-recent-updates-to-kmds-548n</guid>
      <description>&lt;p&gt;Over the last few months I've been refining KMDS, a framework for building repeatable and auditable machine learning systems.&lt;/p&gt;

&lt;p&gt;The original motivation behind KMDS was simple:&lt;/p&gt;

&lt;p&gt;Many machine learning projects fail long before model selection becomes important.&lt;/p&gt;

&lt;p&gt;Teams struggle with questions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What entities are represented in the data?&lt;/li&gt;
&lt;li&gt;What is the unit of analysis?&lt;/li&gt;
&lt;li&gt;What temporal structure exists?&lt;/li&gt;
&lt;li&gt;Which feature engineering strategies are appropriate?&lt;/li&gt;
&lt;li&gt;Which modeling assumptions were made?&lt;/li&gt;
&lt;li&gt;How are these decisions preserved over time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most organizations answer these questions at some point. The problem is that the answers often disappear into notebooks, documents, tickets, or the memories of individual contributors.&lt;/p&gt;

&lt;p&gt;KMDS is an attempt to make these decisions explicit, structured, and reusable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed?
&lt;/h2&gt;

&lt;p&gt;Recent updates have focused on moving beyond workflow automation and toward analytical governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Metadata-Driven Semantic Data Understanding
&lt;/h3&gt;

&lt;p&gt;The workflow begins with semantic tagging and metadata generation.&lt;/p&gt;

&lt;p&gt;Rather than immediately building features or training models, the system first attempts to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;attribute types&lt;/li&gt;
&lt;li&gt;entities&lt;/li&gt;
&lt;li&gt;temporal structure&lt;/li&gt;
&lt;li&gt;data quality characteristics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to establish a semantic foundation before modeling begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Feature Advisor
&lt;/h3&gt;

&lt;p&gt;One of the new additions is a Feature Advisor service.&lt;/p&gt;

&lt;p&gt;Given metadata and project context, the advisor recommends feature engineering strategies for non-numeric attributes.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hierarchical categorical encoding&lt;/li&gt;
&lt;li&gt;target encoding strategies&lt;/li&gt;
&lt;li&gt;TF-IDF pipelines&lt;/li&gt;
&lt;li&gt;sentence embedding approaches&lt;/li&gt;
&lt;li&gt;native model handling for modern gradient boosting systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The objective is not automatic feature engineering.&lt;/p&gt;

&lt;p&gt;The objective is to provide design guidance and rationale that helps practitioners make better decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Design Governance
&lt;/h3&gt;

&lt;p&gt;A second addition is a Design Governance framework.&lt;/p&gt;

&lt;p&gt;Machine learning projects contain many decision points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification vs regression&lt;/li&gt;
&lt;li&gt;handling class imbalance&lt;/li&gt;
&lt;li&gt;interpretability vs predictive performance&lt;/li&gt;
&lt;li&gt;validation strategy&lt;/li&gt;
&lt;li&gt;calibration requirements&lt;/li&gt;
&lt;li&gt;graph-based vs tabular approaches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Design Governance layer acts as a design-time advisor that captures these considerations and generates implementation guidance.&lt;/p&gt;

&lt;p&gt;The output is a structured design blueprint that can be reviewed by humans or supplied to AI coding assistants.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Knowledge Preservation
&lt;/h3&gt;

&lt;p&gt;Perhaps the most important change is an increased emphasis on preserving analytical knowledge.&lt;/p&gt;

&lt;p&gt;The long-term goal is not simply to create models.&lt;/p&gt;

&lt;p&gt;It is to create reusable analytical assets.&lt;/p&gt;

&lt;p&gt;Using KMDS tooling, project artifacts can be transformed into a knowledge graph representing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data understanding&lt;/li&gt;
&lt;li&gt;feature engineering decisions&lt;/li&gt;
&lt;li&gt;modeling assumptions&lt;/li&gt;
&lt;li&gt;operational considerations&lt;/li&gt;
&lt;li&gt;generated artifacts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a queryable representation of the analytical lifecycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Most organizations already have documentation.&lt;/p&gt;

&lt;p&gt;What they often lack is accessible institutional knowledge.&lt;/p&gt;

&lt;p&gt;Critical analytical decisions are frequently distributed across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repositories&lt;/li&gt;
&lt;li&gt;notebooks&lt;/li&gt;
&lt;li&gt;presentations&lt;/li&gt;
&lt;li&gt;tickets&lt;/li&gt;
&lt;li&gt;email threads&lt;/li&gt;
&lt;li&gt;individual contributors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When people leave, much of that context leaves with them.&lt;/p&gt;

&lt;p&gt;My view is that the real asset is not the agent.&lt;/p&gt;

&lt;p&gt;The real asset is the structured analytical knowledge that the agent can access.&lt;/p&gt;

&lt;p&gt;If the knowledge is preserved independently of any specific model, tool, or LLM, organizations retain ownership of their analytical reasoning and can recreate capabilities as technology evolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Direction
&lt;/h2&gt;

&lt;p&gt;The broader goal of KMDS is to make machine learning systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more transparent&lt;/li&gt;
&lt;li&gt;more auditable&lt;/li&gt;
&lt;li&gt;more reproducible&lt;/li&gt;
&lt;li&gt;easier to transfer between teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent work has focused on feature governance, design governance, metadata-driven workflows, and knowledge graph generation.&lt;/p&gt;

&lt;p&gt;Future work will continue exploring how analytical context can be captured and preserved as a first-class artifact rather than an afterthought.&lt;/p&gt;

&lt;p&gt;I would be interested in hearing how others are approaching analytical governance, reproducibility, and knowledge preservation in their own machine learning workflows.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>python</category>
      <category>mlops</category>
    </item>
    <item>
      <title>New Features with TSEDA - get the most out of your time series data.</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Thu, 07 May 2026 07:37:12 +0000</pubDate>
      <link>https://dev.to/rajivsam/new-features-with-tseda-get-the-most-out-of-your-time-series-data-405d</link>
      <guid>https://dev.to/rajivsam/new-features-with-tseda-get-the-most-out-of-your-time-series-data-405d</guid>
      <description>&lt;h2&gt;
  
  
  Update: Automating Time Series Exploration with tseda 📈
&lt;/h2&gt;

&lt;p&gt;A while back, I shared tseda, a tool designed to help you make sense of high-frequency business metrics (like hourly conversion rates or service windows).&lt;br&gt;
Since then, I’ve been working on making the transition from "collecting data" to "understanding data" even faster. &lt;/p&gt;

&lt;h2&gt;
  
  
  What’s New in tseda?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Automatic Window Management: You no longer have to guess your window sizes. The tool now handles automatic window size assignment and refinement, finding the "signal" in your data without the trial and error.&lt;/li&gt;
&lt;li&gt;Notebook Parity: You can now move seamlessly between the tool and Jupyter notebooks. Keep your flow state intact while switching from visual exploration to deep-dive coding.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why use it?
&lt;/h2&gt;

&lt;p&gt;If you have data at an hourly or greater cadence, you’re likely looking for two things: Forecasting and Anomaly Detection. tseda is built to help you build better apps by actually understanding the underlying patterns of those metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started (or Catch Up):
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;New README &amp;amp; Docs: github.com/rajivsam/tseda&lt;/li&gt;
&lt;li&gt;User Guide: Step-by-step instructions&lt;/li&gt;
&lt;li&gt;Video Overview: &lt;a href="https://www.youtube.com/watch?v=baoJrIpSTE8" rel="noopener noreferrer"&gt;AI-generated summary&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  I’m looking for feedback from anyone monitoring metrics at a high cadence. How are you currently handling window refinements? Let’s discuss in the comments!
&lt;/h2&gt;

&lt;p&gt;Would you like me to tailor the technical highlights to focus more on the Markov analysis or the specific Python libraries you used?&lt;/p&gt;

</description>
      <category>automation</category>
      <category>datascience</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>KMDS with New Features</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Mon, 27 Apr 2026 06:45:53 +0000</pubDate>
      <link>https://dev.to/rajivsam/kmds-with-new-features-jnh</link>
      <guid>https://dev.to/rajivsam/kmds-with-new-features-jnh</guid>
      <description>&lt;p&gt;A little while ago I developed a python package meant for small data science teams to communicate the rationale and motivation for decisions in developing and modeling data science projects. The package was called KMDS. This package has an upgrade now. You can input your observations in natural language and the package will take care of tagging it appropriately based on a data science project ontology. Conversely, the natural language search is also available, you can query this tool in natural language.&lt;br&gt;
The updated repository with examples is available here:&lt;br&gt;
&lt;a href="https://github.com/rajivsam/kmds" rel="noopener noreferrer"&gt;https://github.com/rajivsam/kmds&lt;/a&gt;&lt;br&gt;
Thank you&lt;br&gt;
Rajiv&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Explore Your Time Series Data</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Mon, 27 Apr 2026 06:40:44 +0000</pubDate>
      <link>https://dev.to/rajivsam/explore-your-time-series-data-p4a</link>
      <guid>https://dev.to/rajivsam/explore-your-time-series-data-p4a</guid>
      <description>&lt;p&gt;Do you have business data that you collect at an hourly or greater cadence - for example, site conversion rate per day, average service time for the 10 am - 11 am window etc.? Do you want to understand this data so that you can build better apps based on your understanding - for example, forecast the metric you are monitoring for the next business period, understand if a particular value is anomalous. &lt;br&gt;
If this is of interest to you, check out tseda, a tool to explore and understand your time series data&lt;br&gt;
&lt;a href="https://github.com/rajivsam/tseda" rel="noopener noreferrer"&gt;https://github.com/rajivsam/tseda&lt;/a&gt;&lt;br&gt;
There is a video (AI summary) here:&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=baoJrIpSTE8" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=baoJrIpSTE8&lt;/a&gt;&lt;br&gt;
There is a user guide here:&lt;br&gt;
&lt;a href="https://github.com/rajivsam/tseda/blob/main/docs/user_guide.md" rel="noopener noreferrer"&gt;https://github.com/rajivsam/tseda/blob/main/docs/user_guide.md&lt;/a&gt;&lt;br&gt;
I am happy to answer questions and discuss how you can use this if you are collecting a metric at an hourly cadence or higher.&lt;br&gt;
I am making a version of this high frequency sensor data.&lt;br&gt;
The technical motivation is available here: &lt;a href="https://rajivsam.github.io/r2ds-blog/posts/markov_analysis_coffee_prices/" rel="noopener noreferrer"&gt;https://rajivsam.github.io/r2ds-blog/posts/markov_analysis_coffee_prices/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Descriptive Analytics</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Wed, 04 Feb 2026 03:12:11 +0000</pubDate>
      <link>https://dev.to/rajivsam/descriptive-analytics-1230</link>
      <guid>https://dev.to/rajivsam/descriptive-analytics-1230</guid>
      <description>&lt;p&gt;Descriptive Analytics is a repository that is a collection of recipes for descriptive analysis of enterprise data. It is work in progress. Integration with generative AI tools to combine conventional ML analysis techniques with generative AI tools is ongoing.&lt;br&gt;
See &lt;a href="https://www.youtube.com/watch?v=MwXKC_oloH8" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=MwXKC_oloH8&lt;/a&gt; for an overview, see&lt;br&gt;
&lt;a href="https://github.com/rajivsam/descriptive_analytics" rel="noopener noreferrer"&gt;https://github.com/rajivsam/descriptive_analytics&lt;/a&gt; for the repository &lt;/p&gt;

</description>
    </item>
    <item>
      <title>KMDS, a package for knowledge managment in data science</title>
      <dc:creator>Rajiv Sambasivan</dc:creator>
      <pubDate>Thu, 18 Apr 2024 08:17:31 +0000</pubDate>
      <link>https://dev.to/rajivsam/kmds-a-package-for-knowledge-managment-in-data-science-48fd</link>
      <guid>https://dev.to/rajivsam/kmds-a-package-for-knowledge-managment-in-data-science-48fd</guid>
      <description>&lt;p&gt;KMDS is a tool that solves a problem that most folks doing data analysis are frustrated with. You run into a design question, you know you've dealt with this in the past, you just can't recreate the context, question, research and the rationale for picking a solution when you run into the problem next. Here is a new release of the tool with examples of how you use it for both machine learning and analytics workflows. See &lt;a href="https://github.com/rajivsam/kmds_migration/blob/main/sba_migration/documents/sba_development_example_full_doc.md" rel="noopener noreferrer"&gt;this document&lt;/a&gt;. Here is a &lt;a href="https://www.youtube.com/watch?v=b_zmnyOveEI" rel="noopener noreferrer"&gt;short video description&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
