Scoring a Page's Meta Tags 0-100: The Rubric Behind Our Analyzer

#frontend #html #tooling #webdev

A meta tag audit is a pile of binary checks. Title present, yes or no. Title in range, yes or no. Description present. One H1. og:image set. Canonical present. Run them all and you get a few dozen booleans. The problem is that a wall of green and red checkmarks does not motivate anyone. People glance at it, feel vaguely bad, and close the tab. A single number does motivate. "You are at 62" is a thing a person will act on. But a number only works if it is honest, and a number is only honest if it is explainable.

So we set one hard constraint before writing any scoring code: every point a page loses has to trace back to a named check with a specific fix. No mystery deductions. If you are at 62 and not 100, the tool can point at the exact items that cost you the other 38. That constraint shaped every decision that followed, and it is the reason the rubric looks the way it does. This is the write-up of how we got from a pile of booleans to a number we are willing to defend.

Choosing the dimensions and the weights

The first decision was how to group the checks. We landed on five dimensions, each with a fixed weight, and the overall score is their weighted average:

Basic meta, 30 percent. Title tag and meta description.
Headings, 20 percent. H1 count and heading-level hierarchy.
Open Graph, 20 percent. og:title, og:description, og:image, og:url.
Twitter Card, 15 percent. twitter:card, twitter:title, twitter:description, twitter:image.
Technical, 15 percent. Canonical, html lang, viewport, robots.

The weights are the opinionated part, and they encode what we actually believe about how pages get found now. Basic meta gets 30 percent, the largest slice, because the title and description are the strings an AI engine quotes when it summarizes or cites a page. They are the highest-value characters on the whole page, so a gap there should cost the most.

Technical gets the smallest slice at 15 percent, but for a subtler reason than "it matters least." Technical failures are rarer, and most of the time they are minor: a missing canonical is a warning, not a catastrophe. The exception is the robots meta, where a noindex does not just dent the dimension, it ends the page's chances entirely. We did not solve that by inflating the technical weight, which would have penalized every page for a risk most do not carry. We solved it inside the dimension, by letting the noindex check fail hard rather than nudge an average. More on that split below.

Setting the thresholds

Within each dimension, individual checks need thresholds, and we tried to make them match observed reality rather than a style guide.

The character windows are the clearest case. A title passes between 50 and 60 characters and a description passes between 150 and 160. Outside those bands but present is a warning, not a fail, because a slightly long title still works, it just risks truncation. Missing entirely is a fail. The windows come from where truncation actually bites in search results and citation cards, not from a round number, with the character count standing in as a proxy for the pixel width that does the real clipping.

Headings use two checks. Exactly one H1 is a pass. Zero or multiple H1s is a fail or warning, because the H1 is the page's single claim and splitting or dropping it muddies what the page asserts. The second check walks the heading levels looking for skips, an H2 jumping straight to an H4, and flags them, because a broken outline is harder to parse as a structured document.

The penalties inside a dimension are graded, not binary. A failing item subtracts 40 from its dimension. A warning subtracts 20. That ordering is deliberate. It means a fail always outranks a warning when the tool sorts what to fix, and it means a dimension with one warning still scores well above a dimension with one outright fail. Graded penalties carry more information than a flat "this dimension is broken," and that information is what lets us rank fixes later.

Why a weighted average, not min()

The tempting alternative to a weighted average is to take the minimum across dimensions, or to zero a dimension on any failure. We tried thinking in those terms and rejected them, because they lie about the page.

Consider a page with flawless title and description, clean headings, a solid canonical, and no Twitter Card tags at all. Under a min() model, the missing Twitter Card drags the whole page toward zero, and the score screams emergency at a page that is in good shape and missing one nice-to-have. That is not honest, and worse, it is not actionable, because it tells you nothing about what to do first. A weighted average treats that page correctly: it is strong, with one soft spot worth 15 percent, and the score reflects that. Averaging ranks your fixes by impact. min() just punishes you for your weakest link regardless of how much that link matters.

The counter-case is noindex, and it is the one place we deliberately let a single check dominate. A noindex tag means the page has asked engines not to index it at all. No amount of perfect metadata matters if the page has opted out of being found. So that one check is allowed to fail its dimension hard, because unlike a missing og:url, it does zero the page's prospects. The rule we settled on: a weighted average everywhere, except the one check whose real-world effect is binary and total. Encoding that exception explicitly was cleaner than bending the whole model to accommodate it.

Generating exactly three recommendations

The score tells you where you stand. The recommendations tell you what to do, and there we capped the list at three.

The cap is a behavioral decision, not a technical limit. A list of fifteen fixes does not get done. It gets bookmarked. Three fixes get done this afternoon. So the tool returns at most three Top Recommendations, and they target the lowest-scoring categories.

The ranking is a simple priority score: dimension weight multiplied by the score deficit in that dimension. A big gap in a heavy dimension floats to the top, a small gap in a light dimension sinks. That product captures both halves of "what should I fix first," how much room there is to improve and how much that improvement is worth, in one number. We sort by it, take the top three, and attach the specific fix to each. The user gets a short, ordered, finishable list instead of an exhaustive one nobody completes.

What we left out on purpose

Two checks we could have added and chose not to: keyword density and a meta keywords audit.

Both are dead signals. Search engines stopped using the meta keywords tag for ranking many years ago, and keyword density as a target is a relic of an era when stuffing a term ten times moved rankings. Including either would have padded the check count and made the tool look thorough. It would also have cost us credibility with exactly the people we built this for. An SEO who knows the field would see "meta keywords" in our output and quietly conclude we did not know what year it was. A rubric earns trust partly by what it refuses to score. Leaving the dead signals out was as much a design decision as any weight we picked.

For the record on scope: extraction runs on server-rendered HTML, parsed with BeautifulSoup. We read the markup the page ships, not a post-JavaScript DOM, which is worth knowing if your tags are injected client-side.

Try it

You can run a page through it. The Meta Tag Analyzer is free and takes a URL, no signup. You get the 0 to 100 score, the per-item pass, warning, or fail with a specific fix, and up to three prioritized recommendations.

The Open Graph and Twitter Card scores tell you the tags are present and correct, but not how the card actually looks. For that, run the page through the OG Previewer, which renders the link card as it will appear on Facebook, X, LinkedIn, and Slack. "Are the tags present and in range" and "does the card render right" turned out to be two different questions, so we built two tools rather than bolt a renderer onto a rubric. If you disagree with the weights, that is the interesting part: run a page you know well, see what it scores, and tell us where the number feels wrong.

Mehul Jain is an AI entrepreneur and product builder. He works on Geology, a GEO platform.