What is Structured Data?

#webdev #datascience

Structured data is data that is structured - that is perhaps the most succinct summary of it without telling you anything you didn't already expect. To take away something more than that, it can be useful to break down what structured data can look like.

When we say a set of data is structured, it has properties and values that hold a specific meaning. One value might represent some amount of money, another value might represent a date, and a third might be an address. When labelled appropriately, these values can then represent a greater context - perhaps they represent details on an invoice or payslip.

{
  "amount": 20.00,
  "datePaid": "2017-08-12T12:56:00",
  "address": "123 Example Street, Some Suburb, Some City"
}

This data could be in a database, a spreadsheet, a JSON blob - it doesn't matter. If there is data and a defined data model, that is structured data. When data is structured well, it makes it easier to query, process and generally use the data.

In contrast, there is also unstructured data. The value for our address property in the example above is itself unstructured data.

123 Example Street, Some Suburb, Some City

That single value has multiple internal components we would need to parse out depending what we are wanting to do. If we wanted to group all the records on the same street, we'd need to carefully parse the street, suburb and city out of our data. Different countries can have different formats and rules for addresses it is hard to process. Making the wrong assumption when parsing the data can lead to irregularities in the data, causing problems for whatever we wanted to consume that data. For instance, depending where you are in the world, can you assume all streets even have names?

While structured data can definitely be easier to work with, it doesn't make unstructured data useless. Instead, think of unstructured data as untapped potential - useful data exists but is difficult to get to.

Structured Data and Web Pages

Web pages are an interesting example of both structured and unstructured data. There are specific elements one could look at for certain information like the <title> element or other semantic elements like <article> or <section>. The problem though is that these elements are more like our "address" example earlier - they often contain more than just the strict data we are looking for. A title might have a prefix or suffix of the website's name. An article or section might have many other layers of <div>, <span> or any other elements to help form the site's structure. To top it off, the HTML structure can vary wildly from site to site. If you were wanting to extract data from multiple websites, it can get very hard very fast.

That said, there are a number of ways to embed structured data into web pages. A web page could use Microdata, RDFa, JSON-LD or Open Graph to express structured data. More than that though, a web page can use multiple of these at the same time. Open Graph is commonly used as a method of defining details for a link preview while the others might express more complex data like product pricing or reviews.

Having standard formats like Microdata or JSON-LD are a good start but only represent the format of the data - we need a common vocabulary so we can understand the data those formats encode. One common vocabulary used is called Schema.org and provides over 700 types including types to describe people, places, products, recipes, reviews, vehicles, movies and medical devices. Using Schema.org for structured data on a website can help search engines provide richer experiences in the search results.

Summary

Structured data, through standardising expected properties and value formats, makes the sharing and processing of data easier. Web pages in particular benefit from encoding structured data in their mark-up where it can be used by search engines and other tools.

DEV Community

What is Structured Data?

Structured Data and Web Pages

Summary

Top comments (0)