DEV Community

Humza Inam
Humza Inam

Posted on

My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

From zero data engineering experience to deploying a streaming analytics platform powered by Azure's student tier


What I Built

I created an end-to-end IoT data pipeline that ingests simulated sensor data, detects anomalies in real-time, stores everything in a database, and visualizes live metrics on a Power BI dashboard. Think of it as a complete "data journey", from sensor readings on a phone to insights on a dashboard, all happening in real-time.

The Pipeline Flow

IoT Central (Simulated Devices)Event Hub (Ingestion)Stream Analytics (Processing + ML)Azure SQL (Storage).NET FunctionPower BI (Visualization)

What It Does

  • Simulates IoT sensors using Azure IoT Central's Plug & Play templates (accelerometer, gyroscope, battery, GPS)
  • Processes streaming data in real-time with Azure Stream Analytics
  • Detects anomalies using built-in ML algorithms (battery spikes, unusual acceleration patterns)
  • Stores raw data in Azure Data Lake Gen2 for future analysis
  • Curates processed data in Azure SQL Database with proper schema design
  • Streams live metrics to Power BI through a custom .NET 8 Azure Function
  • Visualizes everything on a real-time dashboard with KPIs, maps, and alerts

Why This Project?

I wanted to understand how data flows in the real world. Not just theory or toy examples, but an actual production-grade pipeline that could handle real IoT scenarios. As my first data engineering project, I needed to learn:

  • How to ingest high-velocity streaming data
  • How to process and transform data in real-time
  • How to apply machine learning for anomaly detection
  • How to store data efficiently for analytics
  • How to visualize insights for decision-making

Most importantly, I wanted to build something tangible that demonstrated the entire data lifecycle.


The Azure Student Advantage

Here's the best part: This entire project cost me nothing. Azure's student tier provided everything I needed:

  • $100 in free credits (renewed annually)
  • Free tier services like Azure Functions and IoT Central
  • 12 months of free services including SQL Database and Stream Analytics hours
  • Access to enterprise-grade tools that companies actually use

How You Can Do This Too

  1. Verify your student status at azure.microsoft.com/free/students
  2. No credit card required for the initial signup
  3. Explore beyond your university - Azure is an "external organization" offering resources to students globally

The key insight: Don't limit yourself to your school's resources. Companies like Microsoft, AWS, and Google offer generous student programs specifically to help you learn their platforms. Take advantage of them.


The Architecture Journey

Phase 1: Device Simulation with IoT Central

I started with Azure IoT Central, a managed IoT platform that let me simulate devices without owning physical hardware. Using Plug & Play device templates, I modeled smartphones with:

  • Accelerometer (x, y, z axes)
  • Gyroscope readings
  • Battery percentage
  • GPS coordinates
  • Barometric pressure

IoT Central has a built-in transformation engine that let me normalize the data format before sending it downstream. This was crucial, cleaning data at the source meant less work later.

Phase 2: Ingestion with Event Hub

Azure Event Hub acts as the front door for streaming data. It's a distributed ingestion system that can handle millions of events per second with guaranteed durability.

Key learning: Event Hubs use partitions for parallel processing. Understanding partitioning strategy was essential for scalability.

Phase 3: Real-Time Processing with Stream Analytics

This is where the magic happens. Azure Stream Analytics is a SQL-like query engine that processes streams in real-time.

I implemented:

Magnitude Calculations - Converting 3-axis accelerometer data into a single acceleration magnitude:

SQRT(accelerometer.x² + accelerometer.y² + accelerometer.z²)
Enter fullscreen mode Exit fullscreen mode

Anomaly Detection - Using built-in ML algorithms to flag unusual patterns:

AnomalyDetection_SpikeAndDip(battery, 95, 85, 'spikesanddips')
  OVER (LIMIT DURATION(second, 60))
Enter fullscreen mode Exit fullscreen mode

This function analyzes a 60-second sliding window and identifies battery spikes or dips with 95% confidence. No custom ML model needed, it's built into Stream Analytics.

Dual Outputs:

  • Raw data → Azure Data Lake Gen2 (for future ML training)
  • Processed data → Azure SQL Database (for business intelligence)

Phase 4: Data Storage Strategy

I used a two-tier storage approach:

Azure Data Lake Gen2 - Raw event archive

  • Every single event preserved
  • Parquet format for efficient querying
  • Foundation for future ML model training

Azure SQL Database - Curated analytical store

  • Two tables: Devices (metadata) and Telemetry (time-series data)
  • Proper foreign key relationships
  • Optimized for BI queries and joins

This mirrors real-world data lakehouse architecture, raw data in the lake, curated data in the warehouse.

Phase 5: .NET 8 Azure Function for Power BI Integration

Stream Analytics doesn't natively push to Power BI streaming datasets, so I built a custom Azure Function in .NET 8:

What it does:

  • Runs every minute on a timer trigger
  • Queries Azure SQL for new telemetry since last run
  • Batches records (up to 500 at a time)
  • POSTs JSON to Power BI's REST API
  • Tracks state using Azure Table Storage

Key technical decisions:

  • Isolated worker model (latest .NET Functions pattern)
  • Incremental processing to avoid duplicates
  • Batching to respect Power BI API limits
  • Idempotent operations for reliability

Phase 6: Power BI Visualization

The final piece was creating a live dashboard with:

  • Real-time KPI cards (latest battery %, acceleration)
  • Map visual showing device GPS locations
  • Time-series charts for trend analysis
  • Anomaly alerts highlighted in red

Power BI's streaming datasets update instantly, no refresh button needed.


What I Learned

Real-Time Data Engineering Patterns

  • Lambda architecture (hot path for real-time, cold path for batch)
  • Stream processing windowing (tumbling, hopping, sliding windows)
  • Event time vs processing time semantics
  • Idempotency and exactly-once processing

Azure Cloud Services

Before this project, I'd barely touched Azure. Now I'm comfortable with:

  • IoT Central device templates and exports
  • Event Hub partitioning and consumer groups
  • Stream Analytics query language and ML functions
  • Azure SQL managed databases
  • Azure Functions isolated worker model
  • Data Lake Gen2 hierarchical namespaces

Infrastructure as Code

I included a Terraform configuration in the repo as reference. While I deployed most resources through the Azure portal for faster iteration, I learned:

  • Resource definition with HCL syntax
  • State management concepts
  • Importance of IaC for reproducibility

.NET Backend Development

Writing the Azure Function taught me:

  • Async/await patterns in C#
  • Dependency injection in isolated worker model
  • HTTP client best practices
  • Configuration management (avoiding secrets in code)

Machine Learning Integration

I didn't build custom ML models, but I learned how to apply pre-built anomaly detection effectively:

  • Choosing appropriate sensitivity thresholds
  • Understanding spike vs. dip detection
  • Sliding window analysis
  • Real-time inference constraints

Technical Highlights Worth Sharing

Stream Analytics Query Design

The query processes raw events into multiple outputs simultaneously:

-- Raw passthrough to Data Lake
SELECT * INTO [RawOutput] FROM [IoTInput]

-- Device metadata extraction
SELECT DISTINCT deviceId, applicationId, templateId
INTO [DevicesOutput] FROM [IoTInput]

-- Enriched telemetry with anomalies
SELECT 
  deviceId,
  enqueuedTime,
  battery,
  SQRT(accel.x² + accel.y² + accel.z²) AS AccelMagnitude,
  CASE WHEN BatteryAnom.Score > 0.95 THEN 1 ELSE 0 END AS Anomaly
INTO [TelemetryOutput] FROM [IoTInput]
Enter fullscreen mode Exit fullscreen mode

Database Schema for Time-Series Data

I designed normalized tables with proper indexing:

CREATE TABLE Devices (
  deviceId VARCHAR(50) PRIMARY KEY,
  applicationId VARCHAR(50),
  templateId VARCHAR(100)
);

CREATE TABLE Telemetry (
  telemetryId BIGINT IDENTITY PRIMARY KEY,
  deviceId VARCHAR(50) NOT NULL,
  enqueuedTime DATETIME2 NOT NULL,
  battery INT,
  AccelMagnitude FLOAT,
  Anomaly BIT DEFAULT 0,
  FOREIGN KEY (deviceId) REFERENCES Devices(deviceId)
);
Enter fullscreen mode Exit fullscreen mode

The DATETIME2 type provides precision for time-series analysis, and the foreign key ensures referential integrity.


What's Next?

This project laid the foundation. Future enhancements could include:

  • Custom ML models trained on historical ADLS data (Python + Jupyter)
  • Managed Identity authentication instead of connection strings
  • Azure Key Vault integration for secrets management
  • Complete Terraform automation including Stream Analytics inputs/outputs
  • Predictive maintenance models using historical anomaly patterns

Try It Yourself

The project is open source on GitHub with all the code, SQL scripts, and configuration examples you need. Here's what's included:

  • Stream Analytics query files
  • SQL table creation scripts
  • .NET 8 Azure Function source code
  • IoT Central transformation templates
  • Terraform configuration reference
  • Architecture diagrams and documentation

Whether you're learning data engineering, exploring Azure, or building IoT solutions, this repo provides a complete reference implementation.


Resources for Students

Azure for Students: azure.microsoft.com/free/students

Don't forget:

  • GitHub Student Developer Pack (more free cloud credits)
  • JetBrains student licenses
  • DataCamp student subscriptions
  • Coursera student programs

The lesson: Companies invest heavily in student programs because they want you to learn their tools. Don't limit yourself to your university's offerings, actively seek out external resources. These "external organizations" can supercharge your learning journey.


Final Thoughts

This was my first real data engineering project, and it taught me that the best way to learn is by building. I could have watched tutorials or read documentation, but nothing compares to wrestling with real streaming data, debugging pipeline failures, and seeing live metrics update on a dashboard.

Starting with Azure's student tier removed the financial barrier completely. I experimented freely, broke things, rebuilt them, and learned through iteration, all without spending a dollar.

If you're a student interested in data engineering, cloud computing, or IoT, I encourage you to take advantage of these resources. Build something that processes real data, solves an actual problem, and demonstrates end-to-end technical skills.

The infrastructure is accessible. The tools are free. The only thing stopping you is getting started.

Github: https://github.com/Humza987/Azure_IoT_Realtime_Data_Pipeline


About This Post

This blog post was compiled from the project's README documentation combined with my personal reflections on the learning journey. The narrative was structured and refined with AI assistance to create a cohesive story of my first data engineering experience and the technical decisions behind the implementation.

Top comments (0)