A sitemap is an XML file that lists every URL you want search engines to index. It's not required. Google can discover pages by following links. But a sitemap ensures that orphaned pages (pages with no inbound links), deep pages (more than 3 clicks from the homepage), and newly created pages are discovered quickly.
The format
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-1/</loc>
<lastmod>2026-03-25</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
loc: The full URL. Required. Must be absolute (include the protocol and domain).
lastmod: The last modification date. Optional but recommended. ISO 8601 format (YYYY-MM-DD). Google uses this to decide when to recrawl a page.
changefreq: How often the page changes (always, hourly, daily, weekly, monthly, yearly, never). Google largely ignores this field. It's not harmful to include but don't rely on it.
priority: A value from 0.0 to 1.0 indicating the page's importance relative to other pages on your site. Google largely ignores this too. Many SEOs set everything to 0.8, which makes the field useless.
What to include
Include:
- Every canonical page you want indexed
- All major content pages (blog posts, product pages, landing pages)
- Category and tag archive pages (if they have unique content)
Exclude:
- Pages blocked by robots.txt (contradictory to include them)
- Pages with noindex meta tags
- Redirect URLs (301/302 sources)
- Duplicate pages (non-canonical URLs)
- Internal search result pages
- Login, registration, and account pages
- Paginated pages beyond page 1 (debatable, but generally exclude)
Sitemap index for large sites
A single sitemap is limited to 50,000 URLs and 50MB uncompressed. Sites exceeding this use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-03-25</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-03-24</lastmod>
</sitemap>
</sitemapindex>
Splitting by content type (posts, products, pages, categories) is the most common approach. This also lets Google identify which sections of your site have been updated without re-fetching unchanged sitemaps.
Dynamic generation
For sites with frequently changing content, generate sitemaps dynamically. Most CMS platforms have sitemap plugins. For custom sites, generate on build (static sites) or on request with caching (dynamic sites).
The sitemap should update automatically when content is published, modified, or deleted. Stale sitemaps with incorrect lastmod dates or removed URLs reduce Google's trust in the file.
I built a sitemap generator at zovo.one/free-tools/sitemap-generator that creates valid XML sitemaps from a URL list, handles sitemap indexes for large sites, and validates the output against the sitemaps.org schema. Enter your URLs or crawl your site, configure the optional fields, and download the XML.
I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.
Top comments (0)