<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Humza Inam</title>
    <description>The latest articles on DEV Community by Humza Inam (@humza_inam).</description>
    <link>https://dev.to/humza_inam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3525528%2F78e480f4-dacb-42b4-893e-10445ad85f09.jpg</url>
      <title>DEV Community: Humza Inam</title>
      <link>https://dev.to/humza_inam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/humza_inam"/>
    <language>en</language>
    <item>
      <title>Learning Azure Networking Through Code: My Spring Boot + Terraform Journey</title>
      <dc:creator>Humza Inam</dc:creator>
      <pubDate>Mon, 20 Oct 2025 21:09:52 +0000</pubDate>
      <link>https://dev.to/humza_inam/learning-azure-networking-through-code-my-spring-boot-terraform-journey-4kij</link>
      <guid>https://dev.to/humza_inam/learning-azure-networking-through-code-my-spring-boot-terraform-journey-4kij</guid>
      <description>&lt;p&gt;&lt;strong&gt;Building a production-style network from scratch to understand how the cloud really works&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I created a complete Azure network infrastructure with multiple virtual networks, security rules, load balancing, and a Spring Boot application to test it all. Think of it as building a miniature version of how companies actually structure their cloud networks, but small enough to learn from and cheap enough to run on Azure's student credits.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Setup
&lt;/h3&gt;

&lt;p&gt;Two separate virtual networks connected through VNet peering, with a Spring Boot REST API that provides real-time network diagnostics. An Application Gateway acts as the public entry point, routing traffic to a private VM that can only be accessed through specific security rules. There's also a bastion host for secure SSH access and private endpoints connecting to Azure Storage without touching the public internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The flow:&lt;/strong&gt; Internet → Application Gateway → Private Spring Boot VM → Storage Account (via private endpoint)&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Stack?
&lt;/h3&gt;

&lt;p&gt;The combination seems unusual at first: Terraform for infrastructure, Spring Boot for the application, and Azure networking concepts all mixed together. But that's exactly why I chose it. I wanted to learn Spring Boot (Java web development) while also understanding networking fundamentals. By building the network infrastructure myself and then deploying an app that could test and report on that infrastructure, I learned both topics simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started with Azure Student Credits
&lt;/h2&gt;

&lt;p&gt;This entire project runs on &lt;strong&gt;Azure's free student tier&lt;/strong&gt;. No credit card needed, and you get $100 in credits renewed annually. The resources I used (small VMs, basic networking) cost pennies per hour, so you can run this for days while learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to get started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to azure.microsoft.com/free/students&lt;/li&gt;
&lt;li&gt;Verify with your student email&lt;/li&gt;
&lt;li&gt;Get instant access to enterprise-grade cloud tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight I learned: don't just rely on your university's resources. External companies like Microsoft, AWS, and Google actively want students learning their platforms. Take advantage of these programs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Two Virtual Networks (VNets)
&lt;/h3&gt;

&lt;p&gt;I created two separate networks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary VNet (10.0.0.0/16):&lt;/strong&gt; Contains the Spring Boot application, bastion host, and Application Gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secondary VNet (10.1.0.0/16):&lt;/strong&gt; A separate network for testing cross-network communication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These VNets are connected through &lt;strong&gt;VNet peering&lt;/strong&gt;, which lets them communicate privately without going through the public internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Five Subnets for Segmentation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Primary VNet:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Public subnet (10.0.1.0/24) - Bastion host&lt;/li&gt;
&lt;li&gt;Private subnet (10.0.2.0/24) - Spring Boot app and private endpoint&lt;/li&gt;
&lt;li&gt;App Gateway subnet (10.0.3.0/24) - Load balancer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Secondary VNet:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Public subnet (10.1.1.0/24) - Reserved&lt;/li&gt;
&lt;li&gt;Private subnet (10.1.2.0/24) - Test VM running Nginx&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Subnets let you isolate different types of resources and apply different security rules to each.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Through Network Security Groups (NSGs)
&lt;/h3&gt;

&lt;p&gt;Each subnet has an NSG acting as a firewall. For example, the private subnet only allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSH from the public subnet (bastion host)&lt;/li&gt;
&lt;li&gt;HTTP traffic on port 8080 from within the VNet&lt;/li&gt;
&lt;li&gt;Traffic from the App Gateway subnet&lt;/li&gt;
&lt;li&gt;Communication with the secondary VNet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else is blocked by default. This is &lt;strong&gt;zero-trust networking&lt;/strong&gt; - nothing is allowed unless explicitly permitted.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bastion Pattern
&lt;/h3&gt;

&lt;p&gt;The bastion host is a VM with a public IP that acts as a secure jump box. You SSH into the bastion first, then from there you can access private VMs. This means your application servers never need public IPs, reducing attack surface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh azureuser@bastion-ip
&lt;span class="c"&gt;# Then from bastion:&lt;/span&gt;
ssh azureuser@private-vm-ip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Application Gateway for Public Access
&lt;/h3&gt;

&lt;p&gt;The Application Gateway is Azure's Layer 7 load balancer. It has a public IP and routes incoming HTTP requests to the private Spring Boot VM on port 8080. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Health probes that check &lt;code&gt;/health&lt;/code&gt; every 30 seconds&lt;/li&gt;
&lt;li&gt;Automatic routing rules&lt;/li&gt;
&lt;li&gt;Protection for the backend (the private VM stays private)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Private Endpoints for Secure Storage Access
&lt;/h3&gt;

&lt;p&gt;The coolest part: the private VM connects to Azure Storage without using the public internet. A &lt;strong&gt;private endpoint&lt;/strong&gt; creates a network interface in your VNet that maps to the storage account. When the VM resolves &lt;code&gt;storageaccount.blob.core.windows.net&lt;/code&gt;, it gets a private IP (10.x.x.x) instead of a public one.&lt;/p&gt;

&lt;p&gt;This is achieved through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Private DNS zone for &lt;code&gt;privatelink.blob.core.windows.net&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Private endpoint in the private subnet&lt;/li&gt;
&lt;li&gt;DNS resolution that returns private IPs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  NAT Gateway for Outbound Internet
&lt;/h3&gt;

&lt;p&gt;Private VMs need internet access for things like software updates. The &lt;strong&gt;NAT Gateway&lt;/strong&gt; provides a single public IP that all private subnet resources share for outbound connections. This way, they can reach the internet but the internet can't reach them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Spring Boot Application
&lt;/h2&gt;

&lt;p&gt;I built a REST API with several diagnostic endpoints to test the network:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GET /health&lt;/strong&gt; - Basic health check for the Application Gateway probe&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"UP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234567890&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello from Azure Private VM!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"storage_account"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demostorageXXXX"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GET /api/vm-info&lt;/strong&gt; - Reports the VM's network details&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hostname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vm-private-demo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"private_ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.0.2.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"vnet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vnet-demo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resource_group"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rg-demo-networking"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GET /api/network-test&lt;/strong&gt; - Tests DNS resolution of the storage account&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"storage_fqdn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demostorage.blob.core.windows.net"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resolved_ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.0.2.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dns_resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SUCCESS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"using_private_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GET /api/storage-test&lt;/strong&gt; - Validates private endpoint connectivity&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Spring Boot?
&lt;/h3&gt;

&lt;p&gt;I wanted to learn Java web development while building this infrastructure project. Spring Boot makes it easy to create REST APIs with minimal boilerplate. The &lt;code&gt;@RestController&lt;/code&gt; and &lt;code&gt;@GetMapping&lt;/code&gt; annotations handle all the HTTP routing automatically.&lt;/p&gt;

&lt;p&gt;The application runs as a systemd service on Ubuntu, managed through a bash setup script that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Installs Java and Maven&lt;/li&gt;
&lt;li&gt;Creates the Spring Boot project structure&lt;/li&gt;
&lt;li&gt;Builds the JAR file with Maven&lt;/li&gt;
&lt;li&gt;Configures a systemd service for automatic startup&lt;/li&gt;
&lt;li&gt;Creates test scripts for network validation&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Infrastructure as Code with Terraform
&lt;/h2&gt;

&lt;p&gt;All 700+ lines of infrastructure are defined in code. This means I can destroy everything and rebuild it identically in minutes. Here's how Terraform structures the resources:&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Dependencies
&lt;/h3&gt;

&lt;p&gt;Terraform automatically figures out what order to create resources based on dependencies. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VNet must exist before subnets&lt;/li&gt;
&lt;li&gt;Subnets must exist before VMs&lt;/li&gt;
&lt;li&gt;NSGs must exist before NSG associations&lt;/li&gt;
&lt;li&gt;Private DNS zone must be linked to VNet before private endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Templating in Cloud-Init Scripts
&lt;/h3&gt;

&lt;p&gt;The Spring Boot setup script uses Terraform's &lt;code&gt;templatefile()&lt;/code&gt; function to inject variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;custom_data&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;base64encode&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;templatefile&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"spring-boot-setup.sh"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;storage_account_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;azurerm_storage_account&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;container_name&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;azurerm_storage_container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;vnet_name&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;azurerm_virtual_network&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets the Java application know which storage account to test against without hardcoding values.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Networking Fundamentals
&lt;/h3&gt;

&lt;p&gt;Before this project, concepts like subnets, CIDR notation, and routing tables were abstract. Now I understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Why subnets matter:&lt;/strong&gt; They isolate resources and let you apply different security policies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How DNS resolution works:&lt;/strong&gt; Especially with private DNS zones overriding public DNS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The difference between Layer 4 and Layer 7 load balancing:&lt;/strong&gt; NAT Gateway vs Application Gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network security groups vs firewalls:&lt;/strong&gt; NSGs filter traffic at the subnet/NIC level&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Spring Boot Development
&lt;/h3&gt;

&lt;p&gt;I learned how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structure a Spring Boot project with Maven&lt;/li&gt;
&lt;li&gt;Create REST controllers with proper HTTP methods&lt;/li&gt;
&lt;li&gt;Handle JSON serialization automatically&lt;/li&gt;
&lt;li&gt;Deploy Java applications as Linux services&lt;/li&gt;
&lt;li&gt;Build projects in CI/CD-style automation scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Azure Cloud Services
&lt;/h3&gt;

&lt;p&gt;Working with Azure taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How VNet peering enables private cross-network communication&lt;/li&gt;
&lt;li&gt;Private endpoints eliminate public internet exposure for PaaS services&lt;/li&gt;
&lt;li&gt;Application Gateway health probes and backend pools&lt;/li&gt;
&lt;li&gt;Service endpoints vs private endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Infrastructure as Code Best Practices
&lt;/h3&gt;

&lt;p&gt;I learned to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use variables and outputs effectively&lt;/li&gt;
&lt;li&gt;Structure Terraform for readability&lt;/li&gt;
&lt;li&gt;Handle dependencies and resource ordering&lt;/li&gt;
&lt;li&gt;Inject configuration into VMs via cloud-init&lt;/li&gt;
&lt;li&gt;Manage SSH keys securely&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Challenges I Faced
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Maven Build Issues
&lt;/h3&gt;

&lt;p&gt;The cloud-init script had to handle different Java versions and ensure the JAR file path was correctly detected. I added fallback logic to find any built JAR file instead of hardcoding the name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Gateway Health Probes
&lt;/h3&gt;

&lt;p&gt;Initially, the health probe kept failing because I forgot to configure the probe path to match the &lt;code&gt;/health&lt;/code&gt; endpoint. The gateway needs explicit configuration for backend health checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Private DNS Resolution
&lt;/h3&gt;

&lt;p&gt;Understanding when to use service endpoints vs private endpoints took time. Service endpoints secure the service but still use public IPs. Private endpoints actually inject a private IP into your VNet, which is more secure but requires private DNS zones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Security Group Rules
&lt;/h3&gt;

&lt;p&gt;Getting the NSG priorities and port ranges right required iteration. I learned to be explicit with source and destination address prefixes rather than using wildcards.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project taught me that the best way to learn cloud networking is to build something real. Reading documentation helps, but nothing compares to troubleshooting why your private endpoint DNS isn't resolving or why the Application Gateway can't reach your backend.&lt;/p&gt;

&lt;p&gt;Combining Spring Boot with networking infrastructure might seem unusual, but it forced me to understand both topics deeply. The application wasn't just sitting on top of the network - it was actively testing and reporting on network behavior.&lt;/p&gt;

&lt;p&gt;Azure's student credits made this possible without any financial barrier. If you're a student interested in cloud infrastructure, networking, or backend development, I encourage you to build something similar. Start small, break things, fix them, and learn through doing.&lt;/p&gt;




&lt;h2&gt;
  
  
  About This Post
&lt;/h2&gt;

&lt;p&gt;This blog post was written based on the project's Terraform configuration, Spring Boot setup scripts, and my personal reflections throughout the learning process. The narrative was structured and refined with AI assistance to create a clear and approachable explanation of the technical implementation.&lt;/p&gt;

</description>
      <category>springboot</category>
      <category>networking</category>
      <category>terraform</category>
      <category>azure</category>
    </item>
    <item>
      <title>My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure</title>
      <dc:creator>Humza Inam</dc:creator>
      <pubDate>Mon, 20 Oct 2025 20:26:04 +0000</pubDate>
      <link>https://dev.to/humza_inam/my-first-data-engineering-project-building-a-real-time-iot-pipeline-on-azure-2kho</link>
      <guid>https://dev.to/humza_inam/my-first-data-engineering-project-building-a-real-time-iot-pipeline-on-azure-2kho</guid>
      <description>&lt;p&gt;&lt;strong&gt;From zero data engineering experience to deploying a streaming analytics platform powered by Azure's student tier&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I created an end-to-end IoT data pipeline that ingests simulated sensor data, detects anomalies in real-time, stores everything in a database, and visualizes live metrics on a Power BI dashboard. Think of it as a complete "data journey", from sensor readings on a phone to insights on a dashboard, all happening in real-time.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pipeline Flow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;IoT Central (Simulated Devices)&lt;/strong&gt; → &lt;strong&gt;Event Hub (Ingestion)&lt;/strong&gt; → &lt;strong&gt;Stream Analytics (Processing + ML)&lt;/strong&gt; → &lt;strong&gt;Azure SQL (Storage)&lt;/strong&gt; → &lt;strong&gt;.NET Function&lt;/strong&gt; → &lt;strong&gt;Power BI (Visualization)&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simulates IoT sensors&lt;/strong&gt; using Azure IoT Central's Plug &amp;amp; Play templates (accelerometer, gyroscope, battery, GPS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processes streaming data&lt;/strong&gt; in real-time with Azure Stream Analytics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detects anomalies&lt;/strong&gt; using built-in ML algorithms (battery spikes, unusual acceleration patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stores raw data&lt;/strong&gt; in Azure Data Lake Gen2 for future analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Curates processed data&lt;/strong&gt; in Azure SQL Database with proper schema design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streams live metrics&lt;/strong&gt; to Power BI through a custom .NET 8 Azure Function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualizes everything&lt;/strong&gt; on a real-time dashboard with KPIs, maps, and alerts&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Project?
&lt;/h2&gt;

&lt;p&gt;I wanted to understand &lt;strong&gt;how data flows in the real world&lt;/strong&gt;. Not just theory or toy examples, but an actual production-grade pipeline that could handle real IoT scenarios. As my first data engineering project, I needed to learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to ingest high-velocity streaming data&lt;/li&gt;
&lt;li&gt;How to process and transform data in real-time&lt;/li&gt;
&lt;li&gt;How to apply machine learning for anomaly detection&lt;/li&gt;
&lt;li&gt;How to store data efficiently for analytics&lt;/li&gt;
&lt;li&gt;How to visualize insights for decision-making&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, I wanted to build something tangible that demonstrated the entire data lifecycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Azure Student Advantage
&lt;/h2&gt;

&lt;p&gt;Here's the best part: &lt;strong&gt;This entire project cost me nothing.&lt;/strong&gt; Azure's student tier provided everything I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$100 in free credits&lt;/strong&gt; (renewed annually)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free tier services&lt;/strong&gt; like Azure Functions and IoT Central&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12 months of free services&lt;/strong&gt; including SQL Database and Stream Analytics hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access to enterprise-grade tools&lt;/strong&gt; that companies actually use&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How You Can Do This Too
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify your student status&lt;/strong&gt; at &lt;a href="https://azure.microsoft.com/free/students" rel="noopener noreferrer"&gt;azure.microsoft.com/free/students&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No credit card required&lt;/strong&gt; for the initial signup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore beyond your university&lt;/strong&gt; - Azure is an "external organization" offering resources to students globally&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: &lt;strong&gt;Don't limit yourself to your school's resources.&lt;/strong&gt; Companies like Microsoft, AWS, and Google offer generous student programs specifically to help you learn their platforms. Take advantage of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Device Simulation with IoT Central
&lt;/h3&gt;

&lt;p&gt;I started with &lt;strong&gt;Azure IoT Central&lt;/strong&gt;, a managed IoT platform that let me simulate devices without owning physical hardware. Using Plug &amp;amp; Play device templates, I modeled smartphones with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accelerometer (x, y, z axes)&lt;/li&gt;
&lt;li&gt;Gyroscope readings&lt;/li&gt;
&lt;li&gt;Battery percentage&lt;/li&gt;
&lt;li&gt;GPS coordinates&lt;/li&gt;
&lt;li&gt;Barometric pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;IoT Central has a built-in &lt;strong&gt;transformation engine&lt;/strong&gt; that let me normalize the data format before sending it downstream. This was crucial, cleaning data at the source meant less work later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Ingestion with Event Hub
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Azure Event Hub&lt;/strong&gt; acts as the front door for streaming data. It's a distributed ingestion system that can handle millions of events per second with guaranteed durability.&lt;/p&gt;

&lt;p&gt;Key learning: Event Hubs use &lt;strong&gt;partitions&lt;/strong&gt; for parallel processing. Understanding partitioning strategy was essential for scalability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Real-Time Processing with Stream Analytics
&lt;/h3&gt;

&lt;p&gt;This is where the magic happens. &lt;strong&gt;Azure Stream Analytics&lt;/strong&gt; is a SQL-like query engine that processes streams in real-time.&lt;/p&gt;

&lt;p&gt;I implemented:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Magnitude Calculations&lt;/strong&gt; - Converting 3-axis accelerometer data into a single acceleration magnitude:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;SQRT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accelerometer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accelerometer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accelerometer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Anomaly Detection&lt;/strong&gt; - Using built-in ML algorithms to flag unusual patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;AnomalyDetection_SpikeAndDip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;battery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'spikesanddips'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;DURATION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function analyzes a 60-second sliding window and identifies battery spikes or dips with 95% confidence. No custom ML model needed, it's built into Stream Analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual Outputs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raw data&lt;/strong&gt; → Azure Data Lake Gen2 (for future ML training)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processed data&lt;/strong&gt; → Azure SQL Database (for business intelligence)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 4: Data Storage Strategy
&lt;/h3&gt;

&lt;p&gt;I used a &lt;strong&gt;two-tier storage approach&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Azure Data Lake Gen2&lt;/strong&gt; - Raw event archive&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every single event preserved&lt;/li&gt;
&lt;li&gt;Parquet format for efficient querying&lt;/li&gt;
&lt;li&gt;Foundation for future ML model training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Azure SQL Database&lt;/strong&gt; - Curated analytical store&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two tables: &lt;code&gt;Devices&lt;/code&gt; (metadata) and &lt;code&gt;Telemetry&lt;/code&gt; (time-series data)&lt;/li&gt;
&lt;li&gt;Proper foreign key relationships&lt;/li&gt;
&lt;li&gt;Optimized for BI queries and joins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mirrors real-world &lt;strong&gt;data lakehouse architecture&lt;/strong&gt;, raw data in the lake, curated data in the warehouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: .NET 8 Azure Function for Power BI Integration
&lt;/h3&gt;

&lt;p&gt;Stream Analytics doesn't natively push to Power BI streaming datasets, so I built a custom &lt;strong&gt;Azure Function&lt;/strong&gt; in .NET 8:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs every minute on a timer trigger&lt;/li&gt;
&lt;li&gt;Queries Azure SQL for new telemetry since last run&lt;/li&gt;
&lt;li&gt;Batches records (up to 500 at a time)&lt;/li&gt;
&lt;li&gt;POSTs JSON to Power BI's REST API&lt;/li&gt;
&lt;li&gt;Tracks state using Azure Table Storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key technical decisions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolated worker model&lt;/strong&gt; (latest .NET Functions pattern)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental processing&lt;/strong&gt; to avoid duplicates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batching&lt;/strong&gt; to respect Power BI API limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotent operations&lt;/strong&gt; for reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 6: Power BI Visualization
&lt;/h3&gt;

&lt;p&gt;The final piece was creating a &lt;strong&gt;live dashboard&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time KPI cards (latest battery %, acceleration)&lt;/li&gt;
&lt;li&gt;Map visual showing device GPS locations&lt;/li&gt;
&lt;li&gt;Time-series charts for trend analysis&lt;/li&gt;
&lt;li&gt;Anomaly alerts highlighted in red&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power BI's &lt;strong&gt;streaming datasets&lt;/strong&gt; update instantly, no refresh button needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Real-Time Data Engineering Patterns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda architecture&lt;/strong&gt; (hot path for real-time, cold path for batch)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream processing windowing&lt;/strong&gt; (tumbling, hopping, sliding windows)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event time vs processing time&lt;/strong&gt; semantics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Idempotency and exactly-once processing&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Azure Cloud Services
&lt;/h3&gt;

&lt;p&gt;Before this project, I'd barely touched Azure. Now I'm comfortable with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IoT Central device templates and exports&lt;/li&gt;
&lt;li&gt;Event Hub partitioning and consumer groups&lt;/li&gt;
&lt;li&gt;Stream Analytics query language and ML functions&lt;/li&gt;
&lt;li&gt;Azure SQL managed databases&lt;/li&gt;
&lt;li&gt;Azure Functions isolated worker model&lt;/li&gt;
&lt;li&gt;Data Lake Gen2 hierarchical namespaces&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Infrastructure as Code
&lt;/h3&gt;

&lt;p&gt;I included a &lt;strong&gt;Terraform configuration&lt;/strong&gt; in the repo as reference. While I deployed most resources through the Azure portal for faster iteration, I learned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resource definition with HCL syntax&lt;/li&gt;
&lt;li&gt;State management concepts&lt;/li&gt;
&lt;li&gt;Importance of IaC for reproducibility&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  .NET Backend Development
&lt;/h3&gt;

&lt;p&gt;Writing the Azure Function taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Async/await patterns in C#&lt;/li&gt;
&lt;li&gt;Dependency injection in isolated worker model&lt;/li&gt;
&lt;li&gt;HTTP client best practices&lt;/li&gt;
&lt;li&gt;Configuration management (avoiding secrets in code)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Machine Learning Integration
&lt;/h3&gt;

&lt;p&gt;I didn't build custom ML models, but I learned how to apply &lt;strong&gt;pre-built anomaly detection&lt;/strong&gt; effectively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choosing appropriate sensitivity thresholds&lt;/li&gt;
&lt;li&gt;Understanding spike vs. dip detection&lt;/li&gt;
&lt;li&gt;Sliding window analysis&lt;/li&gt;
&lt;li&gt;Real-time inference constraints&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Technical Highlights Worth Sharing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stream Analytics Query Design
&lt;/h3&gt;

&lt;p&gt;The query processes raw events into multiple outputs simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Raw passthrough to Data Lake&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RawOutput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IoTInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;-- Device metadata extraction&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;applicationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;templateId&lt;/span&gt;
&lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DevicesOutput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IoTInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;-- Enriched telemetry with anomalies&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;enqueuedTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;battery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;SQRT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;accel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AccelMagnitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;BatteryAnom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Anomaly&lt;/span&gt;
&lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TelemetryOutput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IoTInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Database Schema for Time-Series Data
&lt;/h3&gt;

&lt;p&gt;I designed normalized tables with proper indexing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Devices&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;deviceId&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;applicationId&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;templateId&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Telemetry&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;telemetryId&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;deviceId&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;enqueuedTime&lt;/span&gt; &lt;span class="n"&gt;DATETIME2&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;battery&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;AccelMagnitude&lt;/span&gt; &lt;span class="nb"&gt;FLOAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Anomaly&lt;/span&gt; &lt;span class="nb"&gt;BIT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;Devices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DATETIME2&lt;/code&gt; type provides precision for time-series analysis, and the foreign key ensures referential integrity.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;This project laid the foundation. Future enhancements could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom ML models&lt;/strong&gt; trained on historical ADLS data (Python + Jupyter)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed Identity authentication&lt;/strong&gt; instead of connection strings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Key Vault integration&lt;/strong&gt; for secrets management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete Terraform automation&lt;/strong&gt; including Stream Analytics inputs/outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictive maintenance models&lt;/strong&gt; using historical anomaly patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The project is open source on GitHub with all the code, SQL scripts, and configuration examples you need. Here's what's included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stream Analytics query files&lt;/li&gt;
&lt;li&gt;SQL table creation scripts&lt;/li&gt;
&lt;li&gt;.NET 8 Azure Function source code&lt;/li&gt;
&lt;li&gt;IoT Central transformation templates&lt;/li&gt;
&lt;li&gt;Terraform configuration reference&lt;/li&gt;
&lt;li&gt;Architecture diagrams and documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you're learning data engineering, exploring Azure, or building IoT solutions, this repo provides a complete reference implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources for Students
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Azure for Students:&lt;/strong&gt; &lt;a href="https://azure.microsoft.com/free/students" rel="noopener noreferrer"&gt;azure.microsoft.com/free/students&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't forget:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Student Developer Pack (more free cloud credits)&lt;/li&gt;
&lt;li&gt;JetBrains student licenses&lt;/li&gt;
&lt;li&gt;DataCamp student subscriptions&lt;/li&gt;
&lt;li&gt;Coursera student programs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Companies invest heavily in student programs because they want you to learn their tools. Don't limit yourself to your university's offerings, actively seek out external resources. These "external organizations" can supercharge your learning journey.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This was my first real data engineering project, and it taught me that &lt;strong&gt;the best way to learn is by building&lt;/strong&gt;. I could have watched tutorials or read documentation, but nothing compares to wrestling with real streaming data, debugging pipeline failures, and seeing live metrics update on a dashboard.&lt;/p&gt;

&lt;p&gt;Starting with Azure's student tier removed the financial barrier completely. I experimented freely, broke things, rebuilt them, and learned through iteration, all without spending a dollar.&lt;/p&gt;

&lt;p&gt;If you're a student interested in data engineering, cloud computing, or IoT, I encourage you to take advantage of these resources. Build something that processes real data, solves an actual problem, and demonstrates end-to-end technical skills.&lt;/p&gt;

&lt;p&gt;The infrastructure is accessible. The tools are free. The only thing stopping you is getting started.&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/Humza987/Azure_IoT_Realtime_Data_Pipeline" rel="noopener noreferrer"&gt;https://github.com/Humza987/Azure_IoT_Realtime_Data_Pipeline&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About This Post
&lt;/h2&gt;

&lt;p&gt;This blog post was compiled from the project's README documentation combined with my personal reflections on the learning journey. The narrative was structured and refined with AI assistance to create a cohesive story of my first data engineering experience and the technical decisions behind the implementation.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>iot</category>
      <category>dataengineering</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Building Mindryx: From Local AWS Emulation to Production SaaS AI Quiz Generator</title>
      <dc:creator>Humza Inam</dc:creator>
      <pubDate>Mon, 20 Oct 2025 17:32:58 +0000</pubDate>
      <link>https://dev.to/humza_inam/building-mindryx-from-local-aws-emulation-to-production-saas-ai-quiz-generator-38eo</link>
      <guid>https://dev.to/humza_inam/building-mindryx-from-local-aws-emulation-to-production-saas-ai-quiz-generator-38eo</guid>
      <description>&lt;p&gt;&lt;strong&gt;A journey through serverless architecture, AI-powered learning, and the evolution of a full-stack quiz generation platform&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Mindryx?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mindryxify.vercel.app" rel="noopener noreferrer"&gt;Mindryx&lt;/a&gt; is an AI-powered quiz generation platform that transforms how students interact with their study materials. Upload a PDF or enter any topic, and Mindryx generates tailored multiple-choice quizzes to help you learn more effectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;📝 Intelligent Quiz Generation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Create quizzes from any topic or PDF document. The platform uses advanced OCR (Optical Character Recognition) to extract text from even image-based PDFs, then leverages Google's Gemini Flash 2.0 API to generate contextually relevant questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📄 Unlimited PDF Processing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Unlike traditional LLM interfaces with restrictive file upload limits, Mindryx preprocesses PDFs using Tesseract.js OCR before sending them to the AI model. This approach bypasses typical file size constraints while taking advantage of Gemini's generous free tier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📊 Progress Tracking&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Review past quizzes, monitor your learning progress, and identify areas that need more attention. All quiz history and results are stored securely in a Supabase PostgreSQL database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🤖 Experimental WebLLM Chat&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
An in-browser AI assistant runs entirely on your local machine, no API costs, no server calls, just pure client-side inference. This experimental feature showcases the future of accessible AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Next.js 15 with App Router, TailwindCSS 4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Supabase Edge Functions (Deno runtime)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; Supabase PostgreSQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication:&lt;/strong&gt; Clerk Auth with server-side API protection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI/ML:&lt;/strong&gt; Google Gemini Flash 2.0, WebLLM, Tesseract.js OCR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Vercel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payments:&lt;/strong&gt; Stripe (sandbox experimentation)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Journey: Solving a Real Problem
&lt;/h2&gt;

&lt;p&gt;The spark for Mindryx came from a frustration every student faces: &lt;strong&gt;file upload limits&lt;/strong&gt;. When studying with AI tools like ChatGPT, I constantly hit walls trying to upload lengthy PDFs or image-based documents. I wanted to generate study materials from my lecture notes and textbooks, but the limitations made it impractical.&lt;/p&gt;

&lt;p&gt;That's the problem Mindryx set out to solve unlimited PDF processing combined with AI-generated quizzes, all while leveraging free-tier APIs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 1: Learning Cloud Architecture with LocalStack
&lt;/h2&gt;

&lt;p&gt;When I started this project, I had a clear goal: understand cloud infrastructure without prematurely committing to AWS costs. I wanted to learn Lambda, API Gateway, DynamoDB, and SQS, the building blocks of serverless architecture, but in a safe, local environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enter LocalStack:&lt;/strong&gt; A Docker-based AWS emulator that let me experiment freely.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Built
&lt;/h3&gt;

&lt;p&gt;The testing branch became a fully functional serverless application running entirely on my machine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Lambda functions&lt;/strong&gt; for quiz generation and data processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt; for NoSQL data storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQS queues&lt;/strong&gt; for asynchronous job processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt; for RESTful endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SNS&lt;/strong&gt; for notification experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup taught me the fundamentals of microservices architecture, event-driven design, queue-based processing, and stateless function execution. More importantly, &lt;strong&gt;it taught me Docker&lt;/strong&gt;, which became crucial for containerized development and understanding modern DevOps practices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;p&gt;Working with LocalStack gave me hands-on experience with serverless patterns before spending a dollar on cloud services. I learned how to structure Lambda functions, manage environment variables, handle cold starts, and design resilient async workflows. The testing branch remains a valuable learning sandbox for trying new AWS patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2: Migration to Production Infrastructure
&lt;/h2&gt;

&lt;p&gt;After proving the concept locally, I faced a decision: continue with AWS or explore alternatives? I chose &lt;strong&gt;Supabase and Vercel&lt;/strong&gt; for production deployment, and this migration taught me an invaluable lesson, &lt;strong&gt;understanding multiple architectural approaches&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Supabase + Vercel?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supabase Edge Functions:&lt;/strong&gt; Deno-based serverless functions with excellent PostgreSQL integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in PostgreSQL:&lt;/strong&gt; More familiar than DynamoDB for complex queries and relational data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel Deployment:&lt;/strong&gt; Seamless Next.js deployment with automatic optimizations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer Experience:&lt;/strong&gt; Faster iteration cycles and simpler environment management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Migration Challenge
&lt;/h3&gt;

&lt;p&gt;Translating my LocalStack architecture to Supabase wasn't just a copy-paste job. I had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Convert Python Lambda functions to TypeScript/Deno Edge Functions&lt;/li&gt;
&lt;li&gt;Redesign data models from NoSQL (DynamoDB) to SQL (PostgreSQL)&lt;/li&gt;
&lt;li&gt;Rethink authentication flow with Clerk's server-side protection&lt;/li&gt;
&lt;li&gt;Implement proper error handling for production environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This process solidified my understanding of &lt;strong&gt;serverless fundamentals that transcend specific platforms&lt;/strong&gt;, concepts like stateless execution, API design, and async processing apply whether you're using AWS Lambda or Supabase Edge Functions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Authentication &amp;amp; Payment Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Clerk Auth
&lt;/h3&gt;

&lt;p&gt;Implementing authentication from scratch is complex and error-prone. Clerk simplified this dramatically with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-built UI components for sign-up/sign-in&lt;/li&gt;
&lt;li&gt;Automatic JWT handling&lt;/li&gt;
&lt;li&gt;Server-side API route protection&lt;/li&gt;
&lt;li&gt;User management dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Integrating Clerk taught me about OAuth flows, session management, and securing API endpoints, all without building everything from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stripe Experimentation
&lt;/h3&gt;

&lt;p&gt;While Mindryx isn't monetized yet, I integrated Stripe's sandbox environment to understand payment flows. Though not fully setup, their developer friendly approach makes it obvious why its a favorite for many SaaS applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  The WebLLM Experiment: AI Without Servers
&lt;/h2&gt;

&lt;p&gt;The most fascinating technical feature in Mindryx is &lt;strong&gt;WebLLM&lt;/strong&gt;, an in-browser AI chat that runs entirely on client-side compute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;Traditional AI features require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server infrastructure to host models&lt;/li&gt;
&lt;li&gt;API calls that cost money per request&lt;/li&gt;
&lt;li&gt;Network latency for every interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;WebLLM eliminates all of this.&lt;/strong&gt; The model downloads once, then runs locally using WebGPU. No API costs, no server dependency, instant responses.&lt;/p&gt;

&lt;p&gt;This technology is still experimental, but it represents a paradigm shift. Imagine AI-powered features in web apps that work offline, respect privacy completely, and cost nothing to operate at scale. I included it in Mindryx to raise awareness and showcase what's possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OCR Pipeline
&lt;/h3&gt;

&lt;p&gt;The PDF processing pipeline was the most technically challenging component:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;PDF.js&lt;/strong&gt; extracts text from text-based PDFs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Canvas rendering&lt;/strong&gt; converts PDF pages to images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tesseract.js&lt;/strong&gt; performs OCR on images to extract text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preprocessing&lt;/strong&gt; cleans and structures the extracted content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini API&lt;/strong&gt; generates quiz questions from the processed text&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This multi-stage approach handles both native PDFs and scanned documents, working around file size limitations while maximizing the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serverless Edge Functions
&lt;/h3&gt;

&lt;p&gt;Supabase Edge Functions run on Deno at the edge, close to users globally. I structured them as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;quiz/new&lt;/code&gt; - Validates input and queues quiz generation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;quiz/{quizId}&lt;/code&gt; - Retrieves quiz data with caching&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;submit&lt;/code&gt; - Processes answers and calculates scores&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;past-quizzes&lt;/code&gt; - Aggregates user history with pagination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each function is stateless, meaning they scale automatically and stay fast under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Project Taught Me
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MVP to Polished Product
&lt;/h3&gt;

&lt;p&gt;Mindryx evolved from a rough proof-of-concept to a near-production SaaS application. This journey taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Feature prioritization:&lt;/strong&gt; Focus on core value before adding bells and whistles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative refinement:&lt;/strong&gt; Ship something that works, then improve it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-centric design:&lt;/strong&gt; Every technical decision should serve the end user&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud Architecture Fundamentals
&lt;/h3&gt;

&lt;p&gt;By working with both LocalStack and Supabase, I developed platform-agnostic knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless function design patterns&lt;/li&gt;
&lt;li&gt;Async processing with queues&lt;/li&gt;
&lt;li&gt;Database schema design (both NoSQL and SQL)&lt;/li&gt;
&lt;li&gt;API security and authentication&lt;/li&gt;
&lt;li&gt;Error handling and observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Docker &amp;amp; DevOps
&lt;/h3&gt;

&lt;p&gt;LocalStack's Docker-based approach gave me practical experience with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container orchestration&lt;/li&gt;
&lt;li&gt;Environment isolation&lt;/li&gt;
&lt;li&gt;Reproducible development setups&lt;/li&gt;
&lt;li&gt;Docker Compose workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These skills are transferable to any modern development environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Status &amp;amp; Future Roadmap
&lt;/h2&gt;

&lt;p&gt;Mindryx is &lt;strong&gt;nearly feature-complete&lt;/strong&gt; as a SaaS product. The core functionality works reliably, authentication is secure, and the UI is polished. A few finishing touches would make it fully production-ready:&lt;/p&gt;

&lt;h3&gt;
  
  
  Planned Enhancements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lesson Generation:&lt;/strong&gt; Expand beyond quizzes to flashcards and study notes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced OCR:&lt;/strong&gt; Better handling of tables, diagrams, and complex layouts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spaced Repetition:&lt;/strong&gt; Smart scheduling algorithms for optimal retention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaborative Features:&lt;/strong&gt; Share quizzes and study together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export Options:&lt;/strong&gt; Generate PDFs and Anki decks from quizzes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics Dashboard:&lt;/strong&gt; Visualize learning progress and weak areas&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Mindryx is live at &lt;strong&gt;&lt;a href="https://mindryxify.vercel.app" rel="noopener noreferrer"&gt;mindryxify.vercel.app&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The codebase is open source (MIT License) and available on GitHub. Whether you're interested in serverless architecture, AI integration, or building educational tools, feel free to explore the code and contribute.&lt;/p&gt;




&lt;h2&gt;
  
  
  Acknowledgments
&lt;/h2&gt;

&lt;p&gt;This project wouldn't exist without incredible open-source tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supabase&lt;/strong&gt; for backend infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google AI Studio&lt;/strong&gt; for accessible Gemini API access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clerk&lt;/strong&gt; for authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tesseract.js&lt;/strong&gt; for OCR capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebLLM&lt;/strong&gt; for browser-based AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next.js team&lt;/strong&gt; for the excellent framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LocalStack&lt;/strong&gt; for AWS emulation during development&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building Mindryx was about more than creating a quiz app, it was about understanding modern web development from first principles. By starting with LocalStack's AWS emulation and migrating to Supabase, I gained perspective on different architectural approaches and learned which patterns are universal versus platform-specific.&lt;/p&gt;

&lt;p&gt;The biggest lesson? &lt;strong&gt;Ship something that solves a real problem.&lt;/strong&gt; I built Mindryx because I needed it as a student. That authentic need guided every technical decision and kept the project focused on delivering value.&lt;/p&gt;

&lt;p&gt;If you're working on your own project, I encourage you to experiment with different architectures, learn by building, and don't be afraid to migrate when you find a better solution. The journey is just as valuable as the destination.&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/Humza987/Mindryx" rel="noopener noreferrer"&gt;https://github.com/Humza987/Mindryx&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About This Post
&lt;/h2&gt;

&lt;p&gt;This blog post was compiled from project README files documenting both the testing and production branches of Mindryx, combined with my personal reflections and scattered thoughts collected throughout the development journey. The narrative was structured and refined with AI assistance to create a cohesive story of the technical evolution and learning experiences behind the project.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>webllm</category>
      <category>learning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building SafeScript: Our Journey Creating a Security Tool for AI-Generated Code</title>
      <dc:creator>Humza Inam</dc:creator>
      <pubDate>Mon, 20 Oct 2025 17:28:53 +0000</pubDate>
      <link>https://dev.to/humza_inam/building-safescript-our-journey-creating-a-security-tool-for-ai-generated-code-2hk7</link>
      <guid>https://dev.to/humza_inam/building-safescript-our-journey-creating-a-security-tool-for-ai-generated-code-2hk7</guid>
      <description>&lt;p&gt;What six engineering students learned about security, AI, and teamwork over eight months&lt;/p&gt;




&lt;p&gt;When our team started this capstone project, we had a simple observation: AI code generators like ChatGPT and GitHub Copilot are incredibly powerful, but they sometimes produce code with security vulnerabilities. What if we could catch those issues right in the IDE, as developers write?&lt;/p&gt;

&lt;p&gt;Eight months later, we've built SafeScript, a VS Code extension that analyzes C code for common security vulnerabilities in real-time. Here's what we learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem We Tackled
&lt;/h2&gt;

&lt;p&gt;AI-powered code generation has revolutionized development, but it comes with a catch. These tools can generate code that lacks essential safeguards against vulnerabilities like buffer overflows, path traversal, or insecure API calls. Developers might trust the generated code without thoroughly validating its security implications.&lt;/p&gt;

&lt;p&gt;We wanted to build something that would help developers catch these issues early, without disrupting their workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Focused on C
&lt;/h2&gt;

&lt;p&gt;We narrowed our scope to C for a good reason: it's notoriously unsafe. With no memory safety, unprotected pointers, and manual resource handling, C is prone to critical vulnerabilities like buffer overflows, use-after-free, and integer overflows.&lt;/p&gt;

&lt;p&gt;By focusing on C, we could leverage language-specific syntax parsing to enable accurate, fine-grained analysis rather than relying on generic patterns.&lt;/p&gt;

&lt;p&gt;This constraint actually helped us. Instead of building a shallow tool for many languages, we could create something truly useful for C developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  From Line-by-Line to Method-Level Analysis
&lt;/h3&gt;

&lt;p&gt;Our first version analyzed code line by line. It worked, but we quickly hit a wall: too many false positives. The tool would flag code as insecure without understanding the broader context of the function.&lt;/p&gt;

&lt;p&gt;The breakthrough came when we switched to method-level analysis using Tree-sitter, an AST parser. This allowed us to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track buffer sizes throughout a function&lt;/li&gt;
&lt;li&gt;Detect whether input validation was actually happening&lt;/li&gt;
&lt;li&gt;Understand pointer relationships and memory usage patterns&lt;/li&gt;
&lt;li&gt;Provide context-aware warnings that made sense&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference was night and day. Our false positive rate dropped significantly, and the feedback became actually useful for developers.&lt;/p&gt;

&lt;h3&gt;
  
  
  What SafeScript Detects
&lt;/h3&gt;

&lt;p&gt;We implemented detection for over 10 types of C-specific vulnerabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unsafe functions like strcpy, gets, sprintf with suggestions for safer alternatives&lt;/li&gt;
&lt;li&gt;Buffer overflows through AST analysis of array indexing and memory operations&lt;/li&gt;
&lt;li&gt;Hardcoded credentials and sensitive values&lt;/li&gt;
&lt;li&gt;Heap overflows, race conditions, weak hashing, integer overflows&lt;/li&gt;
&lt;li&gt;Each mapped to their corresponding CWE&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;

&lt;p&gt;We built SafeScript using a modular factory pattern. Each security check is its own class implementing a common interface, making it easy to add new tests or maintain existing ones independently.&lt;/p&gt;

&lt;p&gt;Tech stack: TypeScript, VS Code API, Tree-sitter, Jest&lt;/p&gt;

&lt;p&gt;The UI includes three key panels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Security Analysis Panel – Shows detected issues with CWE/CVE details&lt;/li&gt;
&lt;li&gt;AI Suggestion History – Tracks AI-generated improvements&lt;/li&gt;
&lt;li&gt;Dual-mode Interface – Analyze existing code or generate new secure code&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Server Security Wake-Up Call
&lt;/h3&gt;

&lt;p&gt;We had a critical security issue early on. Our server was accepting direct requests without filtering, running everything as root. Anyone with the IP address could send malicious requests.&lt;/p&gt;

&lt;p&gt;The fix? We implemented a reverse proxy that filters requests before they hit the main server. It was a humbling reminder that security tools need to be secure themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Machine Learning Pivot
&lt;/h3&gt;

&lt;p&gt;Originally, we planned to implement a deep learning model for vulnerability detection. Reality check: we didn't have the time, compute resources, or training data.&lt;/p&gt;

&lt;p&gt;Instead, we pivoted to a shallow Random Forest model that classifies code by CWE category. It's not as ambitious, but it works and gave us room to focus on what mattered most: the core detection engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  UI: Harder Than It Looks
&lt;/h3&gt;

&lt;p&gt;We underestimated UI development. Building an intuitive interface in VS Code's constrained environment took multiple iterations. What seemed simple on paper—showing vulnerabilities and suggestions—required careful design to avoid overwhelming developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Stats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;8 months of development&lt;/li&gt;
&lt;li&gt;4 agile sprints&lt;/li&gt;
&lt;li&gt;10+ vulnerability types detected&lt;/li&gt;
&lt;li&gt;Under 3 seconds analysis time&lt;/li&gt;
&lt;li&gt;161 story points completed&lt;/li&gt;
&lt;li&gt;$850 in free credits spent on server hosting&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with an MVP, Really&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We spent too long perfecting security rules before getting user feedback. If we could do it again, we'd ship a working prototype faster and iterate based on actual usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constraints Are Liberating&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Focusing exclusively on C might seem limiting, but it allowed us to build something genuinely useful instead of a mediocre multi-language tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communication Makes or Breaks Teams&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With six people juggling different schedules and skills, clear communication was essential. Regular standups, detailed sprint retrospectives, and thorough documentation kept us aligned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing Prevents Embarrassment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unit testing with Jest caught issues before they hit production. Our regression tests were crucial for reducing false positives as we refined detection rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;SafeScript is live, but we're not done. Future improvements we're considering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic code analysis for runtime vulnerability detection&lt;/li&gt;
&lt;li&gt;Project-level analysis across multiple files&lt;/li&gt;
&lt;li&gt;Deeper machine learning integration for autonomous code fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;SafeScript is available now on the VS Code Marketplace. If you write C code or work with AI-generated code, give it a try and let us know what you think.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project taught us that building developer tools isn't just about the technology—it's about understanding workflows, earning trust, and making security feel like a helpful assistant rather than a nagging critic.&lt;/p&gt;

&lt;p&gt;Looking back, we're proud of what we built. SafeScript started as a class project and became a real tool that addresses a genuine need in modern software development. The journey taught us more about engineering, security, and collaboration than any lecture could.&lt;/p&gt;

&lt;p&gt;To anyone building developer tools: start small, focus on real problems, and don't be afraid to pivot when reality challenges your assumptions. The best tools emerge from iteration, not perfection.&lt;/p&gt;




&lt;p&gt;Built by Team 13&lt;br&gt;
York University - Lassonde School of Engineering&lt;br&gt;
Capstone Project 2024-2025&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was structured with AI assistance from our technical capstone report. All content, data, and experiences are from our team's actual development of SafeScript.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Connect with me on Linkedin &lt;br&gt;
&lt;a href="https://www.linkedin.com/in/humza-inam/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/humza-inam/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>vscode</category>
      <category>extensions</category>
      <category>c</category>
    </item>
  </channel>
</rss>
