DEV Community

Cover image for What is Microdata?
James Turner for BrandVantage

Posted on • Updated on • Originally published at brandvantage.co

What is Microdata?

Microdata is a HTML standard, created by WHATWG, for describing rich metadata in web pages. This rich metadata can be used by search engines or other computer systems to better understand the content of the web page.

Microdata is made of up a number of attributes including itemscope, itemprop and itemtype. Below is an example of a basic web page using Microdata.

<html>
  <head>
    <title>What is Microdata?</title>
  </head>
  <body itemscope itemtype="https://schema.org/WebPage">
    <article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
      <meta itemprop="url" content="https://brandvantage.co/blog/what-is-microdata">
      <meta itemprop="image" content="https://brandvantage.co/blog/2020/images/what-is-microdata-cover.png">
      <h1 itemprop="name headline">What is Microdata?</h1>
      <time itemprop="datePublished" datetime="2020-09-20">20th of September, 2020</time>
      <div itemprop="articleBody">
        Hello and welcome to this example!
      </div>
    </article>
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

The "itemprop" Attribute

The itemprop attribute defines a name-value pair for data. The value associated to the property can be derived from:

  • The inner text content of the tag
  • The content attribute (if defined)
  • The src attribute (for img, audio, video, iframe etc tags)
  • The href attribute (for link or a tags)
  • The value attribute (for data or meter tags)
  • The datetime attribute (for time tags)

An item property can also contain a group of name-value pairs through the use of the itemscope attribute. Additionally, the value on the itemprop attribute may refer to multiple properties.

You can see a number of different examples of itemprop in the previous example.

Grouping Properties Together

<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
Enter fullscreen mode Exit fullscreen mode

With the use of itemscope attribute here, the "mainEntity" property is grouping other name-value pairs together.

Value from a "content" Attribute

<meta itemprop="url" content="https://brandvantage.co/blog/what-is-microdata">
<meta itemprop="image" content="https://brandvantage.co/blog/2020/images/what-is-microdata-cover.png">
Enter fullscreen mode Exit fullscreen mode

Here we have two properties ("url" and "image") with their values defined from the content attribute.

Setting Two Properties at Once

<h1 itemprop="name headline">What is Microdata?</h1>
Enter fullscreen mode Exit fullscreen mode

The itemprop here is setting two properties at once ("name" and "headline") with the value from the inner text.

Value from a "datetime" Attribute

<time itemprop="datePublished" datetime="2020-09-20">20th of September, 2020</time>
Enter fullscreen mode Exit fullscreen mode

The "datePublished" property uses the time tag with the datetime attribute. The date is formatted as an ISO 8601 date.

Value from Inner Text

<div itemprop="articleBody">
    Hello and welcome to this example!
</div>
Enter fullscreen mode Exit fullscreen mode

The "articleBody" property uses the inner content for its value.

The "itemtype" Attribute

While itemprop helps set the values of properties, without a method to define what properties to expect, the usefulness can be called into question. This is where itemtype comes in, giving context to the properties used through a URL which identifies the vocabularly.

One common vocabularly to use is Schema.org, a joint venture between Google, Microsoft, Yahoo and Yandex. A shared vocabularly like Schema.org makes interoperability easier for third parties who want or need to use the data. This being said, there is nothing stopping you from using other vocabularies or even making your own. If a third party doesn't understand your vocabularly, your metadata may not be processed and used.

In the previous example, there were two uses of itemtype:

  <body itemscope itemtype="https://schema.org/WebPage">
    <article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
Enter fullscreen mode Exit fullscreen mode

The first is defining the scope as a WebPage schema object which has a number of useful properties including "breadcrumb" and "significantLink". In the Schema.org vocabularly, a WebPage extends CreativeWork which has a property "mainEntity". This property can be any Thing, the base type in the Schema.org vocabularly.

Now for the "mainEntity" property, we are defining the type to be an Article schema object for the scope. This means for all subsequent properties inside the article tag are properties of our Article.

The "itemscope" Attribute

The itemscope defines a group (scope) of name-value pairs (properties), effectively treating the property value as an object. It doesn't require the presences of an itemtype however without one, the interoperability between systems can be very limited without a common vocabularly.

Summary

Microdata allows website developers to enrich a webpage with rich metadata, allowing for search engines and other systems to integrate the data into their systems. You can define individual properties with different values including nesting values through scopes.

While there are a number of other aspects about Microdata including additional attributes, this should serve as a basic introduction to the world of Microdata.

Additional Resources

Top comments (9)

Collapse
 
ravavyr profile image
Ravavyr

For anyone curious what this microdata is for and what it does:

  • Google uses it for tagging page content so it helps with google search results when a page is indexed. It can definitely help your page rankings and google's console shows you the errors to help you understand what needs to be fixed too.

  • Facebook and Twitter [i figure other social media sites] also use this to grab the main image you see when you share a url, and the description text for it as well.

Various other tools use it to clarify what information is present on the page, but it's largely used for search engines and social media.

Collapse
 
turnerj profile image
James Turner

Yep! Google uses structured data like Microdata (in combination with other sources) for the Google Knowledge Panel - the panel that sits on the right of search content which displays things like logos and names for organizations through to mapping information, reviews, ratings for businesses or actors etc for movies.

Collapse
 
rmhogervorst profile image
Roel Hogervorst

Do you have any validator for microdata on a webpage, apart from Google's? I never really found others

Collapse
 
ravavyr profile image
Ravavyr

Hey Roel,
Facebook has a microdata debugger, but it's mainly so you can verify content for facebook, so it's not a general debugger. business.facebook.com/ads/microdat...

Collapse
 
rmhogervorst profile image
Roel Hogervorst

Thanks, I did see that one too, thanks! Unfortunately you'll need a Facebook account for that one. I just searched and found linter.structured-data.org

Collapse
 
mjgs profile image
Mark Smith • Edited

Nice article, well written, has lots of useful info.

Do you happen to know if there any javascript tools to extract the microdata into some kind of data structure?

Collapse
 
turnerj profile image
James Turner

Sorry, I haven't worked with any JS tools to extract microdata. That said, you might be interested in JSON-LD though, a different form of structured data that would be a lot easier to parse in JS. (I'll be writing a post about that soon, as well as a few other types)

Collapse
 
zenulabidin profile image
Ali Sherief • Edited

Are there any readily available parsers on Github to load an HTML page as a WebPage schema? I'm looking for something with an API that lets me query things like properties and attributes.

Collapse
 
turnerj profile image
James Turner

On GitHub, I haven't come across any that handle multiple different types at once (Microdata, RDFa, JSON-LD). For JSON-LD though, I definitely can recommend Schema.NET if you deal with C# (full disclosure, I'm one of the two main collaborators on the project).

This being said, I actually launched a business the other day (no, literally the other day) called BrandVantage where I run an API which converts from multiple different types of structured (and unstructured data) to WebPage schema. brandvantage.co/ (There is a URL test on the homepage you can try).

It supports Microdata, RDFa, JSON-LD, Open Graph, Twitter Cards, meta tags and a few bits of unstructured data (with more enhancements coming).