DEV Community

Mahmut Gündüzalp
Mahmut Gündüzalp

Posted on

Schema.org NewsArticle: A Complete Implementation Guide for Google News in 2026

Schema.org NewsArticle: A Complete Implementation Guide for Google News in 2026

Most news sites that fail to get into Google News don't fail because of their content. They fail because their structured data is wrong, incomplete, or missing — and nobody told them, because the failure is silent. No error, no email, just no traffic.

This is a field guide to getting NewsArticle structured data right. It comes from running it across 200+ production news portals over the last 18 months at Alesta WEB, where a single malformed datePublished field can quietly drop a story out of the news index for a publisher who has no idea why.

I'll cover every field that matters, the publisher markup that ties it together, the news sitemap's brutal 48-hour window, how AMP and canonical interact in 2026, IndexNow for instant Bing/Yandex pickup, and the validation pipeline we run before anything ships.

1. Why Structured Data Matters Beyond Google

It's tempting to think of NewsArticle JSON-LD as "the thing Google wants." It is, but that framing undersells it.

Structured data is now the machine-readable contract for your content across the entire discovery layer: Google News and Top Stories, Bing News, the knowledge graphs that feed voice assistants, and — increasingly — the LLMs that summarize current events. When a model is asked "what happened in city X today," it leans on sources whose articles are cleanly typed, dated, and attributed. Ambiguous HTML doesn't get parsed reliably. Clean JSON-LD does.

So the payoff isn't one channel. Getting NewsArticle right is the cheapest single thing you can do to make a story legible to every automated consumer at once.

2. NewsArticle: Every Field That Matters

Here is a complete, valid NewsArticle block. I'll annotate the fields that people get wrong.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.com/news/city-council-approves-budget"
  },
  "headline": "City Council Approves 2026 Budget After Three-Hour Debate",
  "image": [
    "https://example.com/img/budget-16x9.jpg",
    "https://example.com/img/budget-4x3.jpg",
    "https://example.com/img/budget-1x1.jpg"
  ],
  "datePublished": "2026-06-01T08:30:00+03:00",
  "dateModified": "2026-06-01T09:15:00+03:00",
  "author": {
    "@type": "Person",
    "name": "Ayşe Yılmaz",
    "url": "https://example.com/author/ayse-yilmaz"
  },
  "publisher": {
    "@type": "NewsMediaOrganization",
    "name": "Example Daily",
    "logo": {
      "@type": "ImageObject",
      "url": "https://example.com/logo-600x60.png",
      "width": 600,
      "height": 60
    }
  },
  "description": "The council passed the budget 7-4 after debate over transit funding.",
  "articleSection": "Local",
  "inLanguage": "en"
}
</script>
Enter fullscreen mode Exit fullscreen mode

The fields people break, in order of how often I see them broken:

datePublished without a timezone. This is the number one cause of silent failure. "2026-06-01T08:30:00" is ambiguous. Google may interpret it as UTC, your server may mean local time, and the gap can push a story outside the freshness window or make it look hours old at publication. Always include the offset: +03:00, Z, whatever is correct — but never omit it.

dateModified going backwards or matching publish exactly forever. If you genuinely edit an article, update dateModified. But don't fake it by bumping it on every page load — Google notices articles whose modification date changes without content changing, and it erodes trust. Set it when the content actually changes.

headline over 110 characters. Google truncates and may ignore long headlines for Top Stories. Keep it under 110 characters. This is a hard, documented limit, not a suggestion.

image with a single small image. Provide multiple aspect ratios (16x9, 4x3, 1x1) at a minimum width of 1200px. A 600px-wide thumbnail disqualifies you from large image treatment in Top Stories.

author.url missing. An author object with just a name is weak. Give every author a real, crawlable profile page and link it via url. This is also an E-E-A-T signal — the author needs to be a verifiable entity, not a string.

3. NewsMediaOrganization: The Publisher Half

The publisher inside each article should be a NewsMediaOrganization, and that organization should also exist as a standalone entity on your home page or a dedicated /about page. The two reinforce each other.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsMediaOrganization",
  "name": "Example Daily",
  "url": "https://example.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://example.com/logo-600x60.png",
    "width": 600,
    "height": 60
  },
  "sameAs": [
    "https://twitter.com/exampledaily",
    "https://www.facebook.com/exampledaily"
  ],
  "diversityPolicy": "https://example.com/diversity-policy",
  "ethicsPolicy": "https://example.com/ethics-policy",
  "masthead": "https://example.com/masthead"
}
</script>
Enter fullscreen mode Exit fullscreen mode

The logo constraints trip people up: it must be a raster format (PNG/JPG, not SVG), no wider than 600px, and no taller than 60px. The ethicsPolicy, diversityPolicy, and masthead properties are optional but they are genuine trust signals for news specifically — having real pages behind them helps with Google News eligibility reviews.

One rule we enforce in production: the publisher name and logo must be byte-identical across every article and the organization entity. Inconsistency here — "Example Daily" in one place, "ExampleDaily" in another — is read as two different publishers.

4. Sitemap-news.xml: The 48-Hour Window

A news sitemap is not a regular sitemap. It only lists articles published in the last 48 hours, and it carries extra <news:news> metadata.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
    <loc>https://example.com/news/city-council-approves-budget</loc>
    <news:news>
      <news:publication>
        <news:name>Example Daily</news:name>
        <news:language>en</news:language>
      </news:publication>
      <news:publication_date>2026-06-01T08:30:00+03:00</news:publication_date>
      <news:title>City Council Approves 2026 Budget After Three-Hour Debate</news:title>
    </news:news>
  </url>
</urlset>
Enter fullscreen mode Exit fullscreen mode

Two things make or break this:

Drop articles older than 48 hours. Leaving stale URLs in the news sitemap is a quality signal against you. The sitemap must be generated dynamically and prune itself. We regenerate ours on publish and on a short cron, never as a static file.

news:publication_date must match datePublished. Same timezone, same value. If your JSON-LD says one time and your sitemap says another, you've told Google two contradictory things about the same article.

5. AMP vs Canonical in 2026

This used to be a real decision. In 2026 it mostly isn't.

Google dropped the AMP requirement for Top Stories back in 2021, and Core Web Vitals became the actual gate. If your canonical pages are fast — good LCP, low CLS, responsive — you do not need AMP to appear in Top Stories. We removed AMP from most sites and saw no ranking loss, plus we deleted an entire parallel rendering path and its bugs.

The honest guidance:

  • Default: ship fast canonical HTML, no AMP. One source of truth, less to maintain.
  • Keep AMP only if you have a specific downstream consumer that still requires it, or your canonical pages genuinely can't hit good Core Web Vitals and you can't fix the root cause.

If you do serve both, the canonical page must point to itself with rel="canonical", and the AMP page must point back to the canonical with rel="canonical". Getting that backwards is a common way to deindex your real pages.

6. IndexNow: Instant Pickup on Bing and Yandex

Google still crawls on its own schedule. Bing and Yandex, however, accept a push: IndexNow lets you notify them the instant an article goes live, instead of waiting for a crawl.

The setup is trivial. Host a key file at your root, then POST URLs on publish:

# 1. Host the key at https://example.com/<key>.txt containing just the key
# 2. On every publish, ping:
curl "https://api.indexnow.org/indexnow?url=https://example.com/news/city-council-approves-budget&key=<key>"
Enter fullscreen mode Exit fullscreen mode

Or submit a batch as JSON:

{
  "host": "example.com",
  "key": "your-key-here",
  "urlList": [
    "https://example.com/news/article-1",
    "https://example.com/news/article-2"
  ]
}
Enter fullscreen mode Exit fullscreen mode

For a news site where being first matters, the minutes you save on Bing/Yandex indexing are real. We wire IndexNow into the same publish hook that regenerates the news sitemap — one event, both actions.

7. A Validation Pipeline That Catches Errors Before Deploy

Hand-checking structured data doesn't scale past a few articles. Across hundreds of sites it has to be automated, and it has to run before content reaches users.

What our pipeline checks on every article render in staging:

  1. JSON-LD parses. A trailing comma silently disables the whole block. Parse it as JSON; fail the build if it throws.
  2. Required fields present. headline, image, datePublished, author, publisher — assert each exists and is non-empty.
  3. datePublished has a timezone offset. Regex-reject any ISO timestamp without Z or ±HH:MM.
  4. headline ≤ 110 characters.
  5. image width ≥ 1200px (check the actual asset, not just the URL).
  6. Publisher name/logo match the canonical organization entity.
  7. Sitemap date == JSON-LD date for the same URL.

A minimal version of check 3, the highest-value one:

function assertHasTimezone(string $iso): void
{
    if (!preg_match('/(Z|[+\-]\d{2}:\d{2})$/', $iso)) {
        throw new RuntimeException("datePublished missing timezone: $iso");
    }
}
Enter fullscreen mode Exit fullscreen mode

Beyond the build, use Google's Rich Results Test and the schema.org validator on a sample of live URLs weekly. The build catches structural errors; the external validators catch the rules Google changes without announcing.

8. Passing vs Failing: A Side-by-Side

Failing markup — and why:

{
  "@type": "NewsArticle",
  "headline": "City Council Approves The 2026 Municipal Budget After A Long And Contentious Three-Hour Public Debate Session",
  "datePublished": "2026-06-01 08:30:00",
  "author": "Ayşe Yılmaz",
  "image": "https://example.com/thumb.jpg"
}
Enter fullscreen mode Exit fullscreen mode

Four problems: headline over 110 chars, datePublished with no timezone and a space instead of T, author as a bare string instead of a Person object with a URL, and a single thumbnail-sized image. Each one individually can keep this out of Top Stories.

Passing markup is the full block from section 2: typed author with a profile URL, ISO-8601 date with offset, headline under the limit, and multiple large images. The difference between these two blocks is the difference between being indexed and being invisible.

Wrapping Up

NewsArticle structured data isn't glamorous, but for a news publisher it's the highest-leverage SEO work there is. The content is yours to write; the markup is what makes machines trust it.

Get the five required fields right, give every date a timezone, keep your news sitemap pruned to 48 hours, push to IndexNow on publish, and validate before you deploy. Do that consistently and the silent failures stop being silent — they stop happening.

If you run a single site, do this by hand once and template it. If you run many, build the validation pipeline first. We learned the hard way that across 200+ portals, the cost of one wrong datePublished format multiplied by every article is a traffic problem you'll spend weeks tracing back to one missing +03:00.

Top comments (0)