Every time you paste a link into Slack, Twitter, or LinkedIn, a small miracle happens behind the scenes. The platform fetches the page, parses its HTML, pulls out the title, description, and preview image, and renders a rich card -- all in under a second. But what happens when the card shows the wrong image? Or no description at all? Or when you need to audit hundreds of pages for SEO compliance?
That is the problem urlmeta-cli solves. One command, one URL, and you get everything: the page title, meta description, Open Graph tags, Twitter Card data, Schema.org markup, content statistics, and an SEO score with actionable recommendations. In this article, I will walk through why URL metadata matters, what the tool extracts, how its SEO scoring works, and how to use batch processing and JSON output to build metadata extraction into your workflows.
Why URL Metadata Matters
Metadata is the first impression your page makes when it never gets a visit. A link shared on social media, embedded in a Slack message, or indexed by a search engine is judged entirely by its metadata. Three areas make this critical:
Link previews. When someone shares your URL, platforms like Twitter, Facebook, LinkedIn, and Discord read your Open Graph and Twitter Card tags to generate a preview card. If og:title is missing, the platform guesses -- often badly. If og:image is absent, the card renders with a generic placeholder. A single missing tag can mean the difference between a click and a scroll-past.
SEO. Search engines use your <title>, <meta name="description">, canonical URL, heading structure, and language attributes to understand and rank your page. A title that is 90 characters long gets truncated in search results. A missing canonical URL can cause duplicate content issues. A page with three <h1> tags confuses crawlers about what the page is actually about.
Content auditing. If you manage a blog, documentation site, or marketing page, you need to verify that every page has the right metadata before it goes live. Doing this manually across dozens or hundreds of pages is not realistic. You need a tool that can check them all at once and flag what is broken.
What urlmeta-cli Extracts
Install it globally and point it at any URL:
npm install -g urlmeta-cli
urlmeta https://github.com
The output is a structured report that covers six categories of metadata.
HTML Meta Tags
The tool parses the fundamental meta tags that every page should have: <title>, <meta name="description">, canonical URL (<link rel="canonical">), language (<html lang="...">), author, published date, modified date, favicon, charset, viewport, robots directive, generator, and theme color. These are the baseline tags that search engines and browsers rely on.
The extraction logic handles common inconsistencies. Meta tags can use name or property attributes. Some sites capitalize tag names. The tool checks both meta[name="description"] and meta[name="Description"], so it works even with non-standard markup.
Open Graph Protocol
Open Graph tags control how your page appears when shared on Facebook, LinkedIn, Discord, and most other platforms. The tool extracts seven OG properties:
-
og:title-- the title shown in the preview card -
og:description-- the snippet below the title -
og:image-- the preview image (resolved to an absolute URL) -
og:type-- article, website, product, etc. -
og:site_name-- the name of your site -
og:url-- the canonical URL for the shared content -
og:locale-- the language/region of the content
Image URLs are resolved relative to the page URL, so even relative paths like /images/og.png are converted to their full absolute form in the output.
Twitter Cards
Twitter (and Bluesky, Mastodon, and others) use their own set of meta tags for card rendering. The tool extracts:
-
twitter:card-- the card type (summary, summary_large_image, etc.) -
twitter:titleandtwitter:description-- override OG tags specifically for Twitter -
twitter:image-- can differ from the OG image -
twitter:siteandtwitter:creator-- the @handles associated with the content
The extraction handles both name and property attribute variants, since different CMS platforms generate these differently.
Schema.org / JSON-LD
Many modern sites embed structured data using JSON-LD <script> blocks. The tool parses these to extract the @type, name, and description fields. This gives you a quick read on whether a page has rich snippet potential for Google search results.
Content Statistics
Beyond metadata, the tool analyzes the actual page content:
- Word count -- calculated after stripping scripts, styles, navigation, and footer elements
- H1 tag -- the primary heading text and total count
- H2 count -- secondary headings for content structure
-
Image count -- total
<img>elements -
Link count -- total
<a>elements with href attributes
These numbers tell you whether a page has thin content, poor heading structure, or an unusually high link density.
Technical Details
The tool also reports server-side information: HTTP status code, response time in milliseconds, Content-Type header, Content-Length (formatted as human-readable bytes), and the charset encoding. A page that takes 4 seconds to respond has an SEO problem regardless of how good its tags are.
The SEO Score Calculator
This is where urlmeta-cli goes beyond simple extraction. After gathering all metadata, the tool computes an SEO score from 0 to 100, weighted across ten factors:
| Factor | Points | Criteria |
|---|---|---|
| Title | 15 | Present (10) + ideal length 30-60 chars (5) |
| Description | 15 | Present (10) + ideal length 120-160 chars (5) |
| Open Graph | 20 | og:title (5) + og:description (5) + og:image (7) + og:type (3) |
| Twitter Card | 10 | twitter:card (4) + twitter:title (3) + twitter:image (3) |
| Canonical URL | 10 | Present |
| H1 Heading | 10 | Exactly one (10), more than one (3) |
| Language | 5 | lang attribute present |
| Favicon | 5 | Detected via link tags or /favicon.ico |
| Viewport | 5 | viewport meta tag present |
| Performance | 5 | Under 1s (5), under 3s (3), under 5s (1) |
The score maps to a letter grade: 90+ is A+, 80+ is A, 70+ is B, 60+ is C, 50+ is D, and below 50 is F.
More importantly, the tool lists specific issues with actionable detail. Instead of just saying "title problem," it tells you: Title too long (73 chars, recommended: 30-60). Instead of a vague "social tags missing," you get: Missing og:image -- link previews will have no image.
This makes the tool practical for both quick spot-checks and systematic auditing.
Batch Processing Multiple URLs
Real-world metadata work almost never involves a single page. You need to audit your entire blog, check a set of landing pages, or compare competitors. Pass multiple URLs and the tool processes them sequentially:
urlmeta https://github.com https://npmjs.com https://dev.to
Each URL gets a full detailed report. At the end, a summary table shows all results side by side:
Batch Summary
────────────────────────────────────────────────────────────────────────
URL Title SEO Time Status
────────────────────────────────────────────────────────────────────────
https://github.com GitHub: Let's build... 87 376ms 200
https://npmjs.com npm 72 980ms 200
https://dev.to DEV Community 91 412ms 200
────────────────────────────────────────────────────────────────────────
If you only want the summary table without the individual detailed reports, use the --summary flag:
urlmeta https://github.com https://npmjs.com https://dev.to --summary
Failed URLs (DNS failures, timeouts, server errors) are shown inline with the error message rather than crashing the entire batch. This is important when you are processing a list and cannot guarantee every URL is reachable.
You can combine batch mode with a file of URLs using shell expansion:
urlmeta $(cat urls.txt)
JSON Output for Scripting
The --json flag outputs all metadata as structured JSON, making the tool composable with other command-line utilities:
urlmeta https://github.com --json
This returns a single JSON object with every field. For multiple URLs, it returns a JSON array. Pipe it to jq for extraction:
# Get just the SEO-critical fields
urlmeta https://example.com --json | jq '{title, description, ogImage, ogTitle}'
# Check if og:image exists
urlmeta https://example.com --json | jq '.ogImage // "MISSING"'
# Batch extract all titles
urlmeta https://github.com https://npmjs.com --json | jq '.[].meta.title'
The --compact flag removes indentation for smaller payloads, useful when piping to another process or storing in a database:
urlmeta https://example.com --json --compact >> metadata-log.jsonl
CI/CD Integration
JSON output makes it straightforward to add metadata checks to your deployment pipeline. A simple approach:
#!/bin/bash
SCORE=$(urlmeta https://your-staging-site.com --json | jq '.statusCode')
if [ "$SCORE" != "200" ]; then
echo "Staging site returned non-200 status"
exit 1
fi
Or check that all required OG tags are present before merging a PR that changes your landing page.
Additional Options
A few more flags worth knowing:
-
--timeout <ms>-- set a custom request timeout (default is 10 seconds). Useful for slow sites or tight CI budgets. -
--user-agent <string>-- send a custom User-Agent header. Some sites serve different content to bots versus browsers; this lets you test both scenarios.
Getting Started
npm install -g urlmeta-cli
urlmeta https://your-site.com
That is it. One command, a full metadata report, and an SEO score with specific issues to fix. For batch work, pass multiple URLs. For automation, add --json. The tool has no configuration files, no API keys, and no dependencies beyond Node.js.
The source is MIT-licensed and available on GitHub. If URL metadata is part of your workflow -- whether for link previews, SEO, content auditing, or competitive analysis -- give it a try.
More CLI tools from the same author:
- urlmeta-cli -- extract metadata, Open Graph, Twitter Cards, and SEO score from any URL
- websnap-reader -- capture and convert web pages into clean, readable Markdown
- ghbounty -- find open-source bounties on GitHub issues
- devpitch -- generate professional pitch decks for developer tools
- pricemon -- monitor product prices and get alerts on drops
- repo-readme-gen -- auto-generate polished README files from repository contents
Top comments (0)