DEV Community

Richard
Richard

Posted on

What Three Fictional Startups Taught Me About Structured, Semi-Structured, and Unstructured Data on Azure

I'm currently transitioning into cloud engineering and DevOps, and one thing I've learned quickly is that understanding a concept in theory is very different from being able to reason through it in real-life scenarios — the kind you'd actually get on the job.

I don't have real production systems to practice on yet, so I employed the use of AI to come up with creative briefs for fictional startups that I can listen through and get constructive feedback on right away. Instead of memorizing definitions, I run myself through a case-based exercise: for each fictional company, I have to classify their data into structured, semi-structured, and unstructured, and pick the right Azure service for it.

The Three Data Types

Structured data has a fixed schema — you'll typically have the same fields every time, arranged in rows and columns. For example: customer records with name, age, and account number.

Semi-structured data has some organization, but doesn't necessarily fit into a rigid table. A good example is a JSON object, where fields can vary between records.

Unstructured data has no inherent structure at all — think a photo, a video file, or a PDF.

That part felt easy enough, and I figured I could just walk through some real-life scenarios and apply it cleanly. But I quickly learned that the type of data and the right database for that data are actually two separate decisions — and it wasn't quite as intuitive as I initially thought.

Case One: Telemedicine

The first creative brief was a telemedicine platform storing patient records, consultation videos, symptom photos, doctor's notes, app logs, and insurance PDFs.

Most of it sorted out without much issue. Patient records were easy to classify as structured data — the same fields every time (patient ID, name, age, diagnosis) falling into clean rows and columns. On Azure, that's a job for Azure SQL Database.

Consultation videos, symptom photos, and insurance PDFs are all formats that aren't rigid or structured. I also learned something here: even with a PDF file, Azure can't read through whatever rows and columns might exist inside it. A PDF is more of a flattened file, so it lumps together under the unstructured umbrella and gets handled by Azure Blob Storage.

The one that stretched my thinking was the system logs tracking user logins, errors, and API calls. Initially I felt: okay, there's some pattern here, it's not quite as rigid as a database. That instinct was correct — it was the textbook definition of semi-structured data, best handled by Azure Table Storage or Azure Cosmos DB.

Case Two: Food Delivery

The second brief was a food delivery startup — think of the Glovo or Jumia Food type of app. This is where I hit my first real curveball.

On one end, you have restaurants registered on the platform with structured information — location, offerings, and so on. On the other end, you have customers logging in, placing orders, specifying delivery locations, generating user IDs. Once again, structured data.

But then there was the GPS tracking data for delivery drivers, captured every 30 seconds — latitude, longitude, timestamp, speed, driver ID — used to tell you the driver is five or ten minutes away.

Initially, this seemed structured, because it is — the same five fields every time. But I realized the sheer volume of data generated every 30 seconds would overwhelm any Azure SQL database. So this was a situation where the data shape, which is structured, doesn't necessarily dictate the architecture needed to handle it.

Azure Cosmos DB became the choice — not because the data became less structured, but because Cosmos DB is built for that kind of high-velocity ingestion.

This gave me a new rule: data type is about the shape of the data, but database choice is about shape plus volume, velocity, and query pattern. Structured data doesn't automatically mean "put it in SQL." Sometimes it needs a NoSQL-style engine purely for performance reasons.

Case Three: Digital Banking

The third brief sharpened these lessons further, and it produced a couple of catches I initially missed.

Onboarding data — KYC details, account information, name, and so on — was straightforwardly structured. KYC compliance documents, like uploaded driver's licenses and utility bills, were straightforwardly unstructured, headed for Blob Storage.

Transactions and fraud detection logs followed the same pattern as the GPS data: structured, but at a volume best suited for Azure Cosmos DB rather than traditional SQL.

The real twist came with the monthly account statements. Because I could picture my own bank statements — everything arranged in neat rows and columns — I figured this was structured data, destined for Azure SQL. That was wrong, or at least only half right. The underlying data is indeed structured. But once it's rendered into a PDF, it becomes a flattened file. Azure doesn't see rows and columns anymore — it just sees a file. That belongs in Azure Blob Storage.

There was one more lesson from this case that's worth going back for: customer support chat transcripts. When a customer reaches out to support, they type free text and can attach screenshots. My first thought was: free text and screenshots, unstructured, Blob Storage. But I soon realized every message also carries metadata — a timestamp, a sender ID, an agent ID — which is structured, and lets you reference a specific query later by its unique identifier.

So even though it initially looks like unstructured data, the fact that it carries structured metadata makes it more accurately semi-structured, best handled with Cosmos DB. And in real systems, it's quite possible that the metadata alone lives in Cosmos DB or Table Storage, while the raw content — the free text and screenshots — lives separately in Blob Storage.

Three Takeaways

  1. Ask whether you're looking at a raw input or a rendered output. A rendered output, like a PDF, can carry fully structured data underneath but still count as a flattened, unstructured file once it's generated.
  2. High volume and velocity can push structured data into NoSQL territory — not because the data changes shape, but because the architecture needs to handle the load differently.
  3. Real-world data can be layered. As with the support chat example, structured metadata and unstructured content can — and often should — live across multiple Azure services rather than being forced into one.

For one who has done cloud engineering for many years, this might not seem like such a big deal. But for somebody a few months into a full career transition, working through real-life scenarios and real-life decisions in a low-stakes, case-based way is actually something I'm starting to enjoy. It's helping me build capacity, confidence, and systems-level thinking.

So more case studies to follow, hopefully more mistakes along the way as well, as I keep building toward being an effective cloud engineer.

Top comments (0)