DEV Community

Cover image for Advanced SEO: Everything I Learnt About Anchor Text Optimization from Google Content Warehouse API Docs
Barack Okaka Obama
Barack Okaka Obama

Posted on • Originally published at bigtowa.gumroad.com

Advanced SEO: Everything I Learnt About Anchor Text Optimization from Google Content Warehouse API Docs

In May 2024, the SEO industry faced its "Library of Alexandria" moment. A massive repository of Google’s internal API documentation—specifically the Content Warehouse API—was leaked.

For years, SEOs have operated on correlation studies and "best practices." We guessed that surrounding text mattered. We theorized that footer links were devalued. But we never knew.

The leaked documentation changed that. It gave us a look at the "Proto" files—the actual data structures Google uses to store and process the web.

I spent weeks digging through the Anchors and DocInfo modules of the leak. Here is everything I learned about how Google engineers anchor text, and how you should optimize for it in 2025.


1. The Anatomy of an Anchor: It’s Not Just a String

In HTML, a link is simple: <a href="/page">Anchor Text</a>.

In Google’s Content Warehouse, a link is a complex data object containing dozens of signals. The leak revealed that Google stores anchors in a structure often referred to as the Anchor Graph. Crucially, an anchor is not just defined by its text. It is defined by its environment.

The Anchor Proto Definition

While the exact definitions are extensive, the core architecture suggests an anchor object looks something like this (simplified for clarity):

message Anchor {
  optional string text = 1; // The visible anchor text
  optional string full_left_context = 2; // Text immediately preceding the link
  optional string full_right_context = 3; // Text immediately following the link
  optional float quality_weight = 4; // The calculated value of this link
  optional bool is_local = 5; // Internal vs External
  optional bool is_spam = 6; // Spam probability
  optional int32 offset = 7; // Position on the page
}
Enter fullscreen mode Exit fullscreen mode

What This Means for SEO

Google doesn't just read your anchor text. It reads the Text Offset.

  • The Finding: The API tracks the position of the link in the document (often referenced as offset or term_location).
  • The Strategy: Links found in the "Main Content" (MC) block have a different offset profile than links in the footer or sidebar. If you are burying your "money links" in the 800th line of code (the footer), the API likely flags them as "supplemental" or "redundant" based on this offset.

2. The "Context Window" is Real (and Stored)

One of the most vindicating discoveries in the leak was the explicit existence of context fields. SEOs have long debated: "Does the text near the link help?"

The docs answer with a resounding YES.

The Fields: fullLeftContext and fullRightContext

The API docs explicitly list fields for capturing the text to the left and right of the anchor. This confirms that Google creates a Context Window around every hyperlink.

Why does Google do this?
Imagine two links with the anchor text "click here."

Context A: "To download our 2025 SEO Guide, [click here]."

Context B: "To claim your lottery winnings, [click here]."

Without context, both anchors are identical. With context, Google knows one is informational and one is likely spam.

The Strategy: "Sentence Embedding" Optimization

Stop optimizing just the blue underlined text. Optimize the sentence.

  • Old Way: ...read our guide on <a href="...">best running shoes</a>.
  • New Way (API Optimized): ...in our tests for marathon durability and foam density, the <a href="...">Nike Pegasus 40</a> outperformed peers.

By placing semantically rich keywords (marathon, durability, foam density) in the fullLeftContext, you pass relevance signals to the target page even if the anchor text itself is branded or generic.


3. The "Supplemental" Filter: How Google Ignores Your Footer

If you have a sitewide footer link on 10,000 pages pointing to your "Services" page, do you get 10,000 votes of authority?

The leak suggests: No.

The Field: supplementalAnchorsDropped

Within the Anchors module, there is a specific metric referencing "dropped" anchors, often tied to "supplemental" data or "redundant" info. This implies a bucketing system. When Google crawls your site, it identifies the "boilerplate" areas (headers, footers, sidebars). Links in these regions are likely:

  1. Aggregated: Counted as 1 link instead of 10,000.
  2. Dropped: Completely ignored for ranking purposes (PageRank calculation) to save processing power.

The Strategy: The "First Link" Priority

The docs also hint at how duplicate links are handled. If Page A links to Page B three times, Google likely processes the first anchor it encounters in the DOM (Document Object Model) and "drops" the redundant ones.

Action Item:
Ensure your primary keyword anchor is the first one in the HTML code.

  • Problem: If your navigation menu (which loads first in HTML) uses the anchor "Home," but your content body uses the anchor "Best SEO Agency," Google might prioritize "Home" because it appeared first.
  • Fix: Use descriptive navigation anchors, or ensure your main content loads before your navigation in the DOM (using CSS for visual positioning).

4. Local vs. Global: The Internal Linking Distinction

The API makes a hard distinction between Local Anchors (Internal Links) and Global Anchors (External Backlinks).

The "NavBoost" Connection

One of the most powerful systems revealed in the leak/trial was NavBoost. This system uses click-stream data to adjust rankings.

  • How it relates to anchors: Internal anchors are the primary way users navigate your site. If users frequently click an internal link with the anchor "Pricing" to get to your pricing page, NavBoost reinforces the relevance of that page for the query "Pricing."

The Strategy: Aggressive Internal Anchors

While you should be cautious with external anchor text (to avoid Penguin penalties), the leak suggests you can be much more aggressive with internal anchors.

  • External: Use branded/natural anchors to look organic.
  • Internal: Use exact-match keywords. The API expects LocalAnchors to be navigational and descriptive. It uses them to build the site structure map.

5. The "News" Tier: Not All Sites Are Equal

A fascinating field found in the documentation was encodedNewsAnchorData. This suggests that Google has a specific data structure for links coming from News or High-Quality publishers.

The Implication

A link from the New York Times is not just "more powerful" because of PageRank. It is structurally different in the database. It carries extra metadata that a link from a random blog does not.

This validates the Digital PR strategy. Getting links from "News" entities triggers this specific data encoding, likely passing a "Trust" signal that pure authority metrics (like DA or DR) cannot measure.


6. Spam Signals are Explicit

The docs didn't just show positive signals; they showed the guns Google points at spammers. Fields like spam_rank, penalty, and demotion were present in various forms. Crucially, there is a difference between Ignoring a link and Demoting a site.

  • Ignoring: The supplementalAnchorsDropped field suggests Google just ignores low-quality/redundant links. No harm done, just no value.
  • Demotion: High concentrations of "spammy" anchors (likely defined by AnchorSpam probability fields) can trigger active demotion.

The "Penguin" in the Code

The API allows Google to calculate a "Spam Probability" for every anchor. If the ratio of "Commercial Keywords" to "Branded Text" exceeds a certain threshold (calculated across the GlobalAnchor set), the is_spam flag flips to TRUE.


Summary: The 2025 Anchor Text Checklist

Based on the engineering reality of the API docs, here is your optimization checklist:

  • Feed the Context Window: Never leave a link "naked." Ensure the 5-10 words before and after every link contain semantic relevance to the target page.
  • Prune Redundant Anchors: Remove sitewide footer links to "Money Pages." They are likely being dropped by the supplementalAnchorsDropped filter.
  • Front-Load the DOM: Ensure your most descriptive anchors appear as high up in the HTML code as possible to beat the de-duplication filter.
  • Go Exact on Internal: Use LocalAnchor fields to tell Google exactly what your pages are about. Don't be shy with exact-match internal links.
  • Audit for "News" Data: Prioritize backlinks from sites that Google classifies as "News" entities to trigger the encodedNewsAnchorData signal.

The leak didn't change the game; it revealed the rules. We now know that Google sees more than just text. It sees context, position, and intent. Optimize accordingly.

Top comments (0)