Microdata is a HTML standard, created by WHATWG, for describing rich metadata in web pages. This rich metadata can be used by search engines or other computer systems to better understand the content of the web page.
Microdata is made of up a number of attributes including itemscope
, itemprop
and itemtype
. Below is an example of a basic web page using Microdata.
<html>
<head>
<title>What is Microdata?</title>
</head>
<body itemscope itemtype="https://schema.org/WebPage">
<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
<meta itemprop="url" content="https://brandvantage.co/blog/what-is-microdata">
<meta itemprop="image" content="https://brandvantage.co/blog/2020/images/what-is-microdata-cover.png">
<h1 itemprop="name headline">What is Microdata?</h1>
<time itemprop="datePublished" datetime="2020-09-20">20th of September, 2020</time>
<div itemprop="articleBody">
Hello and welcome to this example!
</div>
</article>
</body>
</html>
The "itemprop" Attribute
The itemprop
attribute defines a name-value pair for data. The value associated to the property can be derived from:
- The inner text content of the tag
- The
content
attribute (if defined) - The
src
attribute (forimg
,audio
,video
,iframe
etc tags) - The
href
attribute (forlink
ora
tags) - The
value
attribute (fordata
ormeter
tags) - The
datetime
attribute (fortime
tags)
An item property can also contain a group of name-value pairs through the use of the itemscope
attribute. Additionally, the value on the itemprop
attribute may refer to multiple properties.
You can see a number of different examples of itemprop
in the previous example.
Grouping Properties Together
<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
With the use of itemscope
attribute here, the "mainEntity" property is grouping other name-value pairs together.
Value from a "content" Attribute
<meta itemprop="url" content="https://brandvantage.co/blog/what-is-microdata">
<meta itemprop="image" content="https://brandvantage.co/blog/2020/images/what-is-microdata-cover.png">
Here we have two properties ("url" and "image") with their values defined from the content
attribute.
Setting Two Properties at Once
<h1 itemprop="name headline">What is Microdata?</h1>
The itemprop
here is setting two properties at once ("name" and "headline") with the value from the inner text.
Value from a "datetime" Attribute
<time itemprop="datePublished" datetime="2020-09-20">20th of September, 2020</time>
The "datePublished" property uses the time
tag with the datetime
attribute. The date is formatted as an ISO 8601 date.
Value from Inner Text
<div itemprop="articleBody">
Hello and welcome to this example!
</div>
The "articleBody" property uses the inner content for its value.
The "itemtype" Attribute
While itemprop
helps set the values of properties, without a method to define what properties to expect, the usefulness can be called into question. This is where itemtype
comes in, giving context to the properties used through a URL which identifies the vocabularly.
One common vocabularly to use is Schema.org, a joint venture between Google, Microsoft, Yahoo and Yandex. A shared vocabularly like Schema.org makes interoperability easier for third parties who want or need to use the data. This being said, there is nothing stopping you from using other vocabularies or even making your own. If a third party doesn't understand your vocabularly, your metadata may not be processed and used.
In the previous example, there were two uses of itemtype
:
<body itemscope itemtype="https://schema.org/WebPage">
<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
The first is defining the scope as a WebPage schema object which has a number of useful properties including "breadcrumb" and "significantLink". In the Schema.org vocabularly, a WebPage extends CreativeWork which has a property "mainEntity". This property can be any Thing, the base type in the Schema.org vocabularly.
Now for the "mainEntity" property, we are defining the type to be an Article schema object for the scope. This means for all subsequent properties inside the article
tag are properties of our Article.
The "itemscope" Attribute
The itemscope
defines a group (scope) of name-value pairs (properties), effectively treating the property value as an object. It doesn't require the presences of an itemtype
however without one, the interoperability between systems can be very limited without a common vocabularly.
Summary
Microdata allows website developers to enrich a webpage with rich metadata, allowing for search engines and other systems to integrate the data into their systems. You can define individual properties with different values including nesting values through scopes.
While there are a number of other aspects about Microdata including additional attributes, this should serve as a basic introduction to the world of Microdata.
Top comments (9)
For anyone curious what this microdata is for and what it does:
Google uses it for tagging page content so it helps with google search results when a page is indexed. It can definitely help your page rankings and google's console shows you the errors to help you understand what needs to be fixed too.
Facebook and Twitter [i figure other social media sites] also use this to grab the main image you see when you share a url, and the description text for it as well.
Various other tools use it to clarify what information is present on the page, but it's largely used for search engines and social media.
Yep! Google uses structured data like Microdata (in combination with other sources) for the Google Knowledge Panel - the panel that sits on the right of search content which displays things like logos and names for organizations through to mapping information, reviews, ratings for businesses or actors etc for movies.
Do you have any validator for microdata on a webpage, apart from Google's? I never really found others
Hey Roel,
Facebook has a microdata debugger, but it's mainly so you can verify content for facebook, so it's not a general debugger. business.facebook.com/ads/microdat...
Thanks, I did see that one too, thanks! Unfortunately you'll need a Facebook account for that one. I just searched and found linter.structured-data.org
Nice article, well written, has lots of useful info.
Do you happen to know if there any javascript tools to extract the microdata into some kind of data structure?
Sorry, I haven't worked with any JS tools to extract microdata. That said, you might be interested in JSON-LD though, a different form of structured data that would be a lot easier to parse in JS. (I'll be writing a post about that soon, as well as a few other types)
Are there any readily available parsers on Github to load an HTML page as a WebPage schema? I'm looking for something with an API that lets me query things like properties and attributes.
On GitHub, I haven't come across any that handle multiple different types at once (Microdata, RDFa, JSON-LD). For JSON-LD though, I definitely can recommend Schema.NET if you deal with C# (full disclosure, I'm one of the two main collaborators on the project).
This being said, I actually launched a business the other day (no, literally the other day) called BrandVantage where I run an API which converts from multiple different types of structured (and unstructured data) to WebPage schema. brandvantage.co/ (There is a URL test on the homepage you can try).
It supports Microdata, RDFa, JSON-LD, Open Graph, Twitter Cards, meta tags and a few bits of unstructured data (with more enhancements coming).