<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sangeeth Macherla</title>
    <description>The latest articles on DEV Community by Sangeeth Macherla (@sangeeth_macherla).</description>
    <link>https://dev.to/sangeeth_macherla</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901872%2F2c972d06-0943-4a6a-9a7e-a7b3cc744b38.png</url>
      <title>DEV Community: Sangeeth Macherla</title>
      <link>https://dev.to/sangeeth_macherla</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sangeeth_macherla"/>
    <language>en</language>
    <item>
      <title>How Missing Data Analysis Lab uses Flask, Bayesian optimization, and MongoDB in one regression workflow</title>
      <dc:creator>Sangeeth Macherla</dc:creator>
      <pubDate>Tue, 28 Apr 2026 15:40:09 +0000</pubDate>
      <link>https://dev.to/sangeeth_macherla/how-missing-data-analysis-lab-uses-flask-bayesian-optimization-and-mongodb-in-one-regression-peg</link>
      <guid>https://dev.to/sangeeth_macherla/how-missing-data-analysis-lab-uses-flask-bayesian-optimization-and-mongodb-in-one-regression-peg</guid>
      <description>&lt;p&gt;Missing Data Analysis Lab is a Flask-served Python application for studying missing-value behavior, comparing imputation strategies, training regression models, and surfacing experiment results through a lightweight dashboard. This page mirrors the structure of your reference file, but it is written around your actual project architecture, endpoints, and machine learning pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team Members&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This Project Was Developed By&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a class="mentioned-user" href="https://dev.to/sangeeth_macherla"&gt;@sangeeth_macherla&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="mentioned-user" href="https://dev.to/shashank_siddamshetty"&gt;@shashank_siddamshetty&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a class="mentioned-user" href="https://dev.to/dinesh_p_b272f90b058887f"&gt;@dinesh_p_b272f90b058887f&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a class="mentioned-user" href="https://dev.to/yogeshnaiduk"&gt;@yogeshnaiduk&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We would like to express our sincere gratitude to &lt;a class="mentioned-user" href="https://dev.to/chanda_rajkumar"&gt;@chanda_rajkumar&lt;/a&gt; for their valuable guidance and support throughout this project.&lt;/p&gt;

&lt;p&gt;Their insights into system design, architecture, and development played a key role in shaping Energy AI.&lt;br&gt;
&lt;strong&gt;Why this project combines analysis, modeling, and delivery in one app&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project is not just a notebook or a model-training script. It is organized as a complete experiment workflow where dataset upload, missingness diagnostics, model benchmarking, optimization, result persistence, and dashboard delivery all belong to the same application boundary. That matters because model quality is only one part of the user experience. The project also needs reproducible datasets, clear visual outputs, explainable metrics, and an interface that keeps the analysis readable.&lt;/p&gt;

&lt;p&gt;That is why the application is centered around a Flask runtime that serves both the API layer and the frontend entry page. The browser loads a first-party dashboard from the frontend folder, while the backend coordinates experiment execution, saved artifacts, authentication flow, and optional MongoDB persistence. Instead of scattering the workflow across unrelated tools, the project keeps the analysis pipeline and the user-facing dashboard closely aligned.&lt;/p&gt;

&lt;p&gt;This design is especially useful for research-style ML work. Missing-data analysis usually generates many outputs: summaries, plots, tuned model metrics, prediction tables, and explanatory text. A single integrated app makes those outputs easier to reproduce, compare, and present. It also creates a cleaner path from raw CSV input to interpretable model evaluation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our application architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current stack is organized around a Flask backend in &lt;code&gt;api/flask_main.py&lt;/code&gt;, a reusable experiment pipeline in &lt;code&gt;src/pipeline.py&lt;/code&gt;, static dashboard files in &lt;code&gt;frontend/&lt;/code&gt;, generated artifacts in&lt;code&gt;results/&lt;/code&gt;, and optional MongoDB-backed storage for experiment and dataset history. The application can work with uploaded CSV files or synthesize benchmark datasets when no file is supplied. From there it analyzes missingness, applies imputation strategies, trains models, optionally runs Optuna-based Bayesian optimization, and returns structured results to the dashboard&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Flask-served Python web application"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ui_entry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"frontend/index.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"frontend_assets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"frontend/styles.css"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"frontend/app.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"frontend/auth.css"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"frontend/auth.js"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"core_modules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"api/flask_main.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"api/auth_routes.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/pipeline.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/missing_analysis.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/imputation.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/models.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/optimization.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"src/mongo_store.py"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"supported_models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"linear_regression"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"ridge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"lasso"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"random_forest"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"imputation_methods"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"drop_rows"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"mean"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"median"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"iterative"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"storage_mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in-memory fallback with optional MongoDB persistence"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frypng3igk0eem0b6z8zw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frypng3igk0eem0b6z8zw.png" alt=" Image description " width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipeline workflow and ML methodology&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The heart of the project lives in the experiment pipeline. The workflow starts by either loading an uploaded CSV or generating a synthetic regression dataset with controlled missingness. The system then detects the target column, computes missing-value analysis, creates missingness plots, splits the dataset into train, validation, and test partitions, and benchmarks multiple model and imputation combinations.&lt;/p&gt;

&lt;p&gt;For each imputation method, the project trains baseline regressors and optionally performs Bayesian optimization through Optuna. Validation metrics guide model selection so the project does not choose the best configuration only by test-set luck. Once the best run is identified, the application saves prediction outputs, comparison plots, residual diagnostics, and result summaries that the dashboard can render directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dataset_sources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"uploaded CSV"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"synthetic regression dataset"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"analysis_steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"missingness summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"missingness plots"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"train/validation/test split"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"imputation comparison"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"baseline model training"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Optuna Bayesian optimization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"diagnostic plot generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"insight and ranking generation"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selection_rule"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"best configuration is chosen using validation-first scoring"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"results_table"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"best_run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"predictions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"feature_importance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"imputation_ranking"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"artifact URLs"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;API and route surface&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Flask layer exposes the analysis workflow as practical API endpoints rather than leaving it buried in offline scripts. The root route serves the dashboard, &lt;code&gt;/frontend/&amp;lt;filename&amp;gt;&lt;/code&gt; serves static frontend files, and &lt;code&gt;/artifacts/&amp;lt;filename&amp;gt;&lt;/code&gt; exposes generated plots and result files. Data-oriented routes support upload, summary generation, missingness analysis, training, optimization, leaderboard views, performance trend reporting, and dataset history inspection.&lt;/p&gt;

&lt;p&gt;The project also includes an authentication flow backed by MongoDB for signup, login, logout, and session-aware usage. That makes the app more than a throwaway experiment runner. It is structured like a shareable product with a user-facing control room and protected data actions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fua59fa0fqmzk92j58u20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fua59fa0fqmzk92j58u20.png" alt=" " width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MongoDB and persistence strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MongoDB is used here as an application support layer rather than as the core modeling engine. The main experiment can still run without live database storage, but when MongoDB is available the system persists dataset metadata, experiment results, leaderboard rows, and time-series style performance history. That gives the project a bridge between local experimentation and longer-term result tracking.&lt;/p&gt;

&lt;p&gt;This storage model fits the project because experiment outputs are document-like. One run may contain summary metrics, artifact paths, prediction arrays, insight text, and metadata about the selected model and imputation method. Another record may only contain dataset-level statistics. MongoDB is a flexible choice for storing those evolving shapes without forcing the whole project into a rigid table design too early.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dataset_cache"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in-memory cache for active uploaded data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latest_result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in-memory fallback for current experiment session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mongodb_usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"dataset metadata"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"experiment results"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"leaderboard records"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"performance trend points"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"authentication users and sessions"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallback_behavior"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"saved results JSON and CSV can still populate the UI when live ML runtime is unavailable"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Frontend dashboard structure&lt;/strong&gt;&lt;br&gt;
The user interface is intentionally lightweight, but it is not minimal in capability. The dashboard is divided into focused pages for overview, experiment control, insights, visuals, and results. That structure keeps the workflow approachable: the user can inspect dataset summaries, configure experiments, review generated explanations, inspect charts, and download predictions without leaving the same app.&lt;/p&gt;

&lt;p&gt;This is a good fit for the project because missing-data experiments produce multiple categories of output. Putting everything into a single page would quickly become hard to scan. By separating control, interpretation, and visualization, the frontend makes the research workflow feel more like an analysis studio than a raw API demo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F772jh2adronh1a6s93pg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F772jh2adronh1a6s93pg.png" alt=" " width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generated artifacts and evaluation outputs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The project writes a complete output package into the &lt;code&gt;results/&lt;/code&gt; directory. That includes tabular results in CSV and JSON form, missingness visualizations, performance comparison plots, prediction-vs-actual plots, residual distribution charts, residuals-vs-predicted diagnostics, and Q-Q normality plots. These artifacts are important because they turn raw metrics into interpretable evidence about how the model behaves under different missing-data treatments.&lt;/p&gt;

&lt;p&gt;By persisting those files, the project becomes much easier to demonstrate, review, and compare across experiments. Even when the live training runtime is not available, previously saved artifacts can still power the dashboard through fallback loading behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results_dir_outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"latest_results.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"latest_results.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"predictions_output.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"missingness_heatmap.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"missingness_bar.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"performance_comparison.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"actual_vs_predicted.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"residual_distribution.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"residuals_vs_predicted.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"qq_plot.png"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evaluation_metrics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"MSE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"RMSE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"R2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"MAE"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The project is designed to run locally as a Flask application. The dashboard is served on the local Flask port, and the generated results can be viewed directly through the interface once the backend is running. The implementation is also organized cleanly enough to support classroom demos, portfolio presentation, or later deployment work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqxov2p80345hie8c3a5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqxov2p80345hie8c3a5.png" alt=" " width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Missing Data Analysis Lab is a Flask-based machine learning application for missing-data diagnostics, imputation comparison, regression benchmarking, Bayesian optimization, optional MongoDB persistence, and dashboard-based result delivery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/SangeethGoud/missingdataanalysis.git" rel="noopener noreferrer"&gt;GITHUB REPOSITORY&lt;/a&gt;&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/WULKk1jZR8I"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>mongodb</category>
      <category>python</category>
    </item>
  </channel>
</rss>
