Mark McNeece

Posted on Jan 11

Using Schema.org's DefinedTermSet for Industry Terminology: A Case Study

#schemaorg #structureddata #seo #webdev

How we implemented structured data for emerging AI Visibility terminology — and what we learned about making definitions machine-readable.

When we set out to publish the canonical definition of AI Visibility, we faced a question that doesn't have much documentation: How do you mark up a terminology definition page so that search engines and AI systems treat it as authoritative?

The answer turned out to be Schema.org's DefinedTermSet and DefinedTerm types — but finding practical implementation examples proved surprisingly difficult. So here's what we built, what worked, and what we'd do differently.

The Problem: Defining New Industry Terms

We'd created precise definitions for six interrelated concepts:

AI Visibility (the overarching goal)
AI Visibility Checking (infrastructure validation)
AI Discovery Files (the mechanism — llms.txt, ai.txt, etc.)
AI Visibility Tracking (outcome measurement)
AI Visibility Monitoring (real-time tracking)
AI Retrieval Testing (prompt-based observation)

These terms describe a new domain. They didn't exist in established dictionaries. We wanted search engines and AI systems to understand:

That these are formal definitions, not casual mentions
That they form a controlled vocabulary (a related set)
That each term has a unique code for citation
That there's a canonical source for each definition
That the definitions are versioned and timestamped

Why DefinedTermSet?

Schema.org's DefinedTermSet is designed for exactly this use case. From the spec:

"A set of defined terms, for example a set of categories or a classification scheme, a glossary, dictionary, or enumeration."

The related DefinedTerm type lets you mark up individual terms with properties like:

name — the term itself
description — the formal definition
termCode — a short identifier (we used AV-001 through AV-006)
inDefinedTermSet — links back to the parent set
url — canonical URL for that specific term

This creates a machine-readable knowledge structure, not just a page of text.

The Implementation

Here's the JSON-LD we deployed:

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "DefinedTermSet",
      "@id": "https://www.365i.co.uk/ai-visibility-definition/#termset",
      "name": "AI Visibility Terminology",
      "description": "A controlled vocabulary for AI Visibility concepts and related terms",
      "url": "https://www.365i.co.uk/ai-visibility-definition/",
      "hasDefinedTerm": [
        {"@id": "https://www.365i.co.uk/ai-visibility-definition/#ai-visibility"},
        {"@id": "https://www.365i.co.uk/ai-visibility-definition/#ai-visibility-checking"},
        {"@id": "https://www.365i.co.uk/ai-visibility-definition/#ai-discovery-files"}
      ]
    },
    {
      "@type": "DefinedTerm",
      "@id": "https://www.365i.co.uk/ai-visibility-definition/#ai-visibility",
      "name": "AI Visibility",
      "description": "The degree to which a website or digital entity can be discovered, correctly interpreted, accurately represented, and safely cited by AI systems including large language models, AI search engines, and retrieval-augmented generation systems.",
      "inDefinedTermSet": {"@id": "https://www.365i.co.uk/ai-visibility-definition/#termset"},
      "termCode": "AV-001",
      "url": "https://www.365i.co.uk/ai-visibility-definition/#definition-ai-visibility",
      "sameAs": "https://www.wikidata.org/wiki/Q137757467"
    }
  ]
}

Key Implementation Details

1. Using @graph for Multiple Entities
We used @graph to define multiple connected entities in a single JSON-LD block. The DefinedTermSet references each DefinedTerm via hasDefinedTerm, and each term links back via inDefinedTermSet.

2. Fragment Identifiers for Deep Linking
Each term gets its own @id with a fragment identifier (#ai-visibility, #ai-visibility-checking, etc.). This allows other pages — and other sites — to link directly to specific definitions.

3. Term Codes for Citation
The termCode property (AV-001, AV-002, etc.) gives each term a short, citable identifier. This is particularly useful for documentation, academic citation, and machine processing.

4. Wikidata sameAs Links
We created corresponding Wikidata items for each term and linked them via sameAs. This creates a bidirectional connection between our definitions and the open knowledge graph.

5. Combining with Other Schema Types
The same page also includes:

WebPage — for the page itself
TechArticle — to indicate it's a technical specification
Organization — for the publisher
BreadcrumbList — for navigation context
SpeakableSpecification — to identify key content for voice search

What We Learned

DefinedTerm vs CategoryCode
Schema.org also has CategoryCode and CategoryCodeSet, which serve a similar purpose. The distinction:

DefinedTerm is for textual definitions (glossaries, dictionaries)
CategoryCode is for classification codes (industry codes, enumerated values)

For terminology definitions with prose explanations, DefinedTerm is the better fit.

The inDefinedTermSet Issue

During validation, we discovered that inDefinedTermSet shouldn't be used on the DefinedTermSet itself — only on the individual DefinedTerm items. The Schema.org validator will flag this as a warning.

HTML Microdata Alignment

We also aligned the HTML with the JSON-LD using itemscope, itemtype, and itemprop attributes on the visible content:

<div class="definition-block" itemscope itemtype="https://schema.org/DefinedTerm">
  <meta itemprop="termCode" content="AV-001">
  <link itemprop="inDefinedTermSet" href="https://www.365i.co.uk/ai-visibility-definition/#termset">

  <h3 class="definition-term" itemprop="name">
    <dfn id="dfn-ai-visibility">AI Visibility</dfn>
  </h3>

  <p class="definition-text" itemprop="description">
    The degree to which a website or digital entity can be discovered...
  </p>
</div>

This creates redundant structured data — JSON-LD and microdata marking up the same content. Google recommends JSON-LD, but the microdata doesn't hurt and may help with semantic HTML parsing.

The Results

After deployment and indexing:

Google's Rich Results Test successfully parsed the structured data
Schema.org's validator passed with no errors (after fixing the inDefinedTermSet issue)
The Wikidata items are now live and bidirectionally linked
AI systems with web search can now discover and cite the definitions with their term codes

Recommendations for Similar Implementations

If you're defining industry terminology and want it to be machine-readable:

Use DefinedTermSet + DefinedTerm — it's the right tool for glossaries and vocabularies
Assign term codes — short identifiers make citation and referencing easier
Version your definitions — include version on the containing WebPage or TechArticle
Create Wikidata items — cross-link with sameAs to connect to the broader knowledge graph
License permissively — we used CC BY 4.0 and specified it in the schema via license
Provide machine-readable formats — we also published JSON and YAML versions

The full implementation is live at https://www.365i.co.uk/ai-visibility-definition/.

Mark McNeece is founder of 365i, an AI site identity service that helps businesses communicate accurately with AI systems. The company publishes the reference implementation of AI Visibility Checking.

Tags: schema.org, structured-data, json-ld, seo, ai-visibility, definedterm, vocabulary, web-standards

DEV Community