<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mona Hamid</title>
    <description>The latest articles on DEV Community by Mona Hamid (@mona_hamid).</description>
    <link>https://dev.to/mona_hamid</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1992400%2F08b3c4e4-fb11-409c-a6e7-159712144992.jpg</url>
      <title>DEV Community: Mona Hamid</title>
      <link>https://dev.to/mona_hamid</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mona_hamid"/>
    <language>en</language>
    <item>
      <title>Build a Data-to-Graph Pipeline with DLT, DuckDB &amp; Cognee 🧠📈</title>
      <dc:creator>Mona Hamid</dc:creator>
      <pubDate>Wed, 09 Jul 2025 04:56:27 +0000</pubDate>
      <link>https://dev.to/mona_hamid/build-a-data-to-graph-pipeline-with-dlt-duckdb-cognee-59kg</link>
      <guid>https://dev.to/mona_hamid/build-a-data-to-graph-pipeline-with-dlt-duckdb-cognee-59kg</guid>
      <description>&lt;p&gt;What We’ll Build&lt;br&gt;
In this post, we’ll show how to:&lt;/p&gt;

&lt;p&gt;Load NYC Taxi data via a REST API&lt;/p&gt;

&lt;p&gt;Store it in DuckDB using DLT&lt;/p&gt;

&lt;p&gt;Visualize the relationships using Cognee&lt;/p&gt;

&lt;p&gt;Step 1 – Ingest Data with DLT&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@dlt.resource(write_disposition="replace", name="zoomcamp_data")
def zoomcamp_data():
    url = "https://us-central1-dlthub-analytics.cloudfunctions.net/data_engineering_zoomcamp_api"
    response = requests.get(url)
    df = pd.DataFrame(response.json())
    df['Trip_Pickup_DateTime'] = pd.to_datetime(df['Trip_Pickup_DateTime'])

    df['tag'] = pd.cut(
        df['Trip_Pickup_DateTime'],
        bins=[
            pd.Timestamp("2009-06-01"),
            pd.Timestamp("2009-06-10"),
            pd.Timestamp("2009-06-20"),
            pd.Timestamp("2009-06-30")
        ],
        labels=["first_10_days", "second_10_days", "last_10_days"]
    )
    yield df[df['tag'].notnull()]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 2 – Run Pipeline to DuckDB&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination="duckdb",
    dataset_name="zoomcamp_tagged_data"
)
pipeline.run(zoomcamp_data())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 3 – Enrich and Visualize with Cognee&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wait cognee.add(df_set1_json, node_set=["first_10_days"])
await cognee.add(df_set2_json, node_set=["second_10_days"])
await cognee.add(df_set3_json, node_set=["last_10_days"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result 🎉&lt;br&gt;
Upload your notebook and see interactive graphs emerge from your dataset.&lt;/p&gt;

&lt;p&gt;🧪 DuckDB + 🧵 DLT + 🧠 Cognee = Magic!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6asao0lr522e275udf9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6asao0lr522e275udf9z.png" alt=" " width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title># Building an MLOps Monitoring Architecture That Actually Works</title>
      <dc:creator>Mona Hamid</dc:creator>
      <pubDate>Mon, 23 Jun 2025 15:11:48 +0000</pubDate>
      <link>https://dev.to/mona_hamid/-building-an-mlops-monitoring-architecture-that-actually-works-4j0f</link>
      <guid>https://dev.to/mona_hamid/-building-an-mlops-monitoring-architecture-that-actually-works-4j0f</guid>
      <description>&lt;p&gt;The Problem 😅&lt;/p&gt;

&lt;p&gt;You've probably been here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy ML model ✅&lt;/li&gt;
&lt;li&gt;Model works great initially ✅
&lt;/li&gt;
&lt;li&gt;Stakeholders are happy ✅&lt;/li&gt;
&lt;li&gt;Then... 📉 silent degradation&lt;/li&gt;
&lt;li&gt;Business metrics drop 📊&lt;/li&gt;
&lt;li&gt;"Why didn't we know sooner?" 🤔&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traditional monitoring doesn't work for ML models.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture 🏗️
&lt;/h2&gt;

&lt;p&gt;Built a 3-layer monitoring system:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Models &amp;amp; Data 🤖
&lt;/h3&gt;

&lt;p&gt;┌─────────────────┐    ┌─────────────────┐&lt;br&gt;
│   ML Model      │    │  Data Storage   │&lt;br&gt;
│   (FastAPI)     │◄───┤ (PostgreSQL/S3) │&lt;br&gt;
└─────────────────┘    └─────────────────┘&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Processing ⚙️
&lt;/h3&gt;

&lt;p&gt;┌─────────────────┐    ┌─────────────────┐&lt;br&gt;
│ Drift Detection │    │  Orchestration  │&lt;br&gt;
│ (Evidently AI)  │◄───┤   (Prefect)     │&lt;br&gt;
└─────────────────┘    └─────────────────┘&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Alerts &amp;amp; Viz 📊
&lt;/h3&gt;

&lt;p&gt;┌─────────────────┐    ┌─────────────────┐&lt;br&gt;
│   Dashboards    │    │     Alerts      │&lt;br&gt;
│   (Grafana)     │◄───┤ (Slack/PagerDuty)│&lt;br&gt;
└─────────────────┘    └─────────────────┘&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Monitoring Metrics 📈
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🎯 Prediction Drift
&lt;/h3&gt;

&lt;p&gt;Detect when model outputs change distribution:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
from evidently.metrics import DatasetDriftMetric

def check_prediction_drift(reference, current):
    metric = DatasetDriftMetric()
    result = metric.calculate(reference, current)
    return result.drift_detected
📊 Feature Drift
Monitor input feature distributions:

Mean/median shifts
Standard deviation changes
Quantile-based detection

❌ Data Quality
Real-time validation:

Missing value %
Outlier detection
Schema changes

📉 Performance Metrics
When ground truth available:

Accuracy trends
F1-score evolution
Business KPI correlation

Implementation Example 💻
pythonclass MLMonitor:
    def __init__(self, reference_data):
        self.reference_data = reference_data
        self.slack_webhook = os.getenv('SLACK_WEBHOOK')

    def monitor_predictions(self, current_data):
        """Main monitoring function"""

        # 1. Check for drift
        drift_result = self.check_drift(current_data)

        # 2. Validate data quality  
        quality_result = self.check_quality(current_data)

        # 3. Send alerts if needed
        if drift_result['drift_detected']:
            self.send_alert(f"🚨 Drift detected: {drift_result['drift_score']:.3f}")

        # 4. Update dashboards
        self.update_metrics(drift_result, quality_result)

    def check_drift(self, current_data):
        """Drift detection with Evidently"""
        from evidently.report import Report
        from evidently.metric_preset import DataDriftPreset

        report = Report(metrics=[DataDriftPreset()])
        report.run(self.reference_data, current_data)

        return report.as_dict()

    def send_alert(self, message):
        """Send Slack notification"""
        import requests

        payload = {
            "text": message,
            "channel": "#ml-alerts",
            "username": "ML Monitor Bot"
        }

        requests.post(self.slack_webhook, json=payload)
Results 📊
After implementing this system:
MetricBeforeAfterDetection Time2-3 days2-3 hoursMonthly Incidents83False Positive Rate40%5%Stakeholder Confidence😐😍
Tech Stack Choices 🛠️
Why Evidently AI?

Open source &amp;amp; flexible
Excellent drift algorithms
Great documentation
Active community

Why Grafana?

Beautiful dashboards
Real-time capabilities
PostgreSQL integration
Industry standard

Why Prefect over Airflow?

Modern Python-first approach
Better error handling
Easier Kubernetes deployment
Superior observability

Lessons Learned 💡
✅ What Worked

Start simple - Basic drift detection first
Tune thresholds - Avoid alert fatigue
Pretty dashboards - Stakeholders love visuals
Automation - Let system handle simple fixes

❌ What Failed

Too many alerts initially - Alert fatigue is real
Complex metrics upfront - Confused the team
Manual processes - Doesn't scale


What's Next? 🔮
Planning to add:

Automated retraining triggers
A/B testing integration
Cost monitoring per prediction
Explainability tracking with SHAP

Conclusion 🎉
ML monitoring isn't optional anymore. This architecture has:

Caught issues 10x faster
Reduced incidents by 60%
Improved stakeholder trust
Made our ML systems actually reliable

Key takeaway: Treat monitoring as a first-class citizen in your ML pipeline.

What monitoring challenges are you facing? Share in the comments! 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
    </item>
  </channel>
</rss>
