DEV Community

Sebastián Aliaga
Sebastián Aliaga

Posted on

Preparing Sitecore Data for Search Engine Crawling: A Complete Metadata Implementation Guide

Search engine optimization (SEO) is crucial for any modern website, and when working with Sitecore headless implementations, properly structuring your metadata is essential for search engine crawlers to understand and index your content effectively. This comprehensive guide will walk you through implementing a robust metadata system that prepares your Sitecore data for optimal search engine crawling.

Table of Contents

  1. Architecture Overview
  2. Core Metadata Service
  3. Template-Specific Metadata
  4. Structured Data Implementation
  5. Integration with Next.js
  6. Best Practices
  7. Implementation Examples

Architecture Overview

The metadata implementation follows a layered architecture that separates concerns and provides flexibility for different content types:

┌─────────────────────────────────────┐
│           Page Rendering            │
├─────────────────────────────────────┤
│      Meta Props Plugin Layer       │
├─────────────────────────────────────┤
│    Search Metadata Service Layer   │
├─────────────────────────────────────┤
│      Constellation Services        │
├─────────────────────────────────────┤
│        Sitecore Layout Data        │
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Components

  1. SearchMetadataService: Core service that extracts and transforms Sitecore field data into search-friendly metadata
  2. MetaPropsPlugin: Plugin that aggregates metadata from multiple sources during page props generation
  3. Template-specific handlers: Specialized logic for different content types (Articles, Products, Events)
  4. Structured Data Schemas: JSON-LD implementation for rich search results

Core Metadata Service

The SearchMetadataService is the heart of our metadata implementation. It processes Sitecore layout data and generates standardized metadata tags for search engines.

Service Structure

export class SearchMetadataService {
  protected layoutData: LayoutServiceData;

  constructor(layoutData: LayoutServiceData) {
    this.layoutData = layoutData;
  }

  getMetaTags(): MetaTag[] {
    const metaTags: MetaTag[] = [];

    if (!this.layoutData) return metaTags;

    // Extract base page fields
    const pageModel = castItem<SearchBasePageFields>(
      this.layoutData.sitecore.route as Item
    );

    // Generate standard metadata
    this.addStandardMetaTags(metaTags, pageModel);

    // Add template-specific metadata
    this.getEventMetaTags(metaTags);
    this.getProductMetaTags(metaTags);
    this.getArticleMetaTags(metaTags);

    return metaTags;
  }
}
Enter fullscreen mode Exit fullscreen mode

Standard Metadata Fields

Every page includes these foundational metadata fields:

// Core identification fields
metaTags.push({
  name: SEARCH_FIELDS.NAME,
  content: pageModel?.NavigationTitle?.value || "",
});

metaTags.push({
  name: SEARCH_FIELDS.NAVIGATION_TITLE,
  content: pageModel?.NavigationTitle?.value || "",
});

// Content classification
metaTags.push({
  name: SEARCH_FIELDS.LANGUAGE,
  content: this.layoutData.sitecore.route?.itemLanguage || "",
});

metaTags.push({
  name: SEARCH_FIELDS.TEMPLATE,
  content: this.layoutData.sitecore.route?.templateName || "",
});

// SEO essentials
metaTags.push({
  name: SEARCH_FIELDS.BROWSER_TITLE,
  content: pageModel?.browserTitle?.value || "",
});

metaTags.push({
  name: SEARCH_FIELDS.TEASER,
  content: pageModel?.metaDescription?.value || "",
});

// Keywords processing
const keywords = pageModel?.keywords?.value?.split(",");
metaTags.push({
  name: SEARCH_FIELDS.KEYWORDS,
  content: keywords?.join("|") || "",
});
Enter fullscreen mode Exit fullscreen mode

Search Field Constants

All metadata field names are centralized in a constants file to ensure consistency:

export const SEARCH_FIELDS = {
  // General fields
  LANGUAGE: "language",
  TEMPLATE: "template",
  NAVIGATION_TITLE: "navigation_title",
  TEASER: "teaser",
  LISTING_IMAGE: "image",
  DATE: "date",
  BROWSER_TITLE: "browser_title",
  SEARCHABLE: "searchable",
  BREADCRUMBS_TITLE: "breadcrumbs_title",
  KEYWORDS: "keywords",
  NAME: "name",

  // Event-specific fields
  EVENT_DATE: "event_date",
  EVENT_LOCATION: "event_location",
  EVENT_FULL_DATE: "event_full_date",

  // Product-specific fields
  PRODUCT_TYPE: "product_type",
  PRODUCT_COLLECTION: "product_collection",
  PDP_PRODUCT_NAME: "pdp_product_name",
  PRODUCT_CATEGORY: "product_category",

  // Article-specific fields
  ARTICLE_TOPIC: "article_topic",
  ARTICLE_DATE: "article_date",
  ARTICLE_FULL_DATE: "article_full_date",
};
Enter fullscreen mode Exit fullscreen mode

Template-Specific Metadata

Different content types require specialized metadata handling. The service uses template ID detection to apply appropriate metadata extraction logic.

Product Detail Pages

Products require rich metadata for e-commerce search optimization:

private getProductMetaTags(metaTags: MetaTag[]): MetaTag[] {
  if (this?.layoutData?.sitecore?.route?.templateId ===
      toGuid(IDS.TEMPLATES.PRODUCT_DETAIL_PAGE)) {

    const productDetailProps = castItem<ProductDetailPage>(
      this.layoutData.sitecore.route as Item
    );

    // Product identification
    metaTags.push({
      name: SEARCH_FIELDS.NAME,
      content: productDetailProps?.productName?.value || '',
    });

    // Product imagery
    if (productDetailProps?.productCardImage) {
      metaTags.push({
        name: SEARCH_FIELDS.LISTING_IMAGE,
        content: productDetailProps?.productCardImage.value?.src || '',
      });
    }

    // Product taxonomy - multiple values joined with pipe
    if (productDetailProps?.productType) {
      metaTags.push({
        name: SEARCH_FIELDS.PRODUCT_TYPE,
        content: productDetailProps?.productType
          .map((type) => type.fields.Title.value)
          .join('|'),
      });
    }

    // Product collections
    if (productDetailProps?.productCollection) {
      metaTags.push({
        name: SEARCH_FIELDS.PRODUCT_COLLECTION,
        content: productDetailProps?.productCollection
          .map((collection) => collection.fields.Title.value)
          .join('|'),
      });
    }
  }

  return metaTags;
}
Enter fullscreen mode Exit fullscreen mode

Event Detail Pages

Events need temporal and location-based metadata:

private getEventMetaTags(metaTags: MetaTag[]): MetaTag[] {
  if (this?.layoutData?.sitecore?.route?.templateId ===
      toGuid(IDS.TEMPLATES.EVENT_DETAIL_PAGE)) {

    const eventDetailProps = castItem<EventDetailPage>(
      this.layoutData.sitecore.route as Item
    );

    // Event identification
    metaTags.push({
      name: SEARCH_FIELDS.NAME,
      content: eventDetailProps?.eventName?.value || '',
    });

    // Temporal metadata
    if (eventDetailProps?.eventDateTime) {
      metaTags.push({
        name: SEARCH_FIELDS.EVENT_FULL_DATE,
        content: eventDetailProps?.eventDateTime.value,
      });

      metaTags.push({
        name: SEARCH_FIELDS.EVENT_DATE,
        content: getEventMonth(eventDetailProps?.eventDateTime.value),
      });
    }

    // Location metadata
    if (eventDetailProps?.eventLocation?.fields.Title) {
      metaTags.push({
        name: SEARCH_FIELDS.EVENT_LOCATION,
        content: eventDetailProps?.eventLocation.fields.Title.value,
      });
    }
  }

  return metaTags;
}
Enter fullscreen mode Exit fullscreen mode

Article Detail Pages

Articles focus on content classification and publication data:

private getArticleMetaTags(metaTags: MetaTag[]): MetaTag[] {
  if (this?.layoutData?.sitecore?.route?.templateId ===
      toGuid(IDS.TEMPLATES.ARTICLE_DETAIL_PAGE)) {

    const articleDetailProps = castItem<ArticleDetailPage>(
      this.layoutData.sitecore.route as Item
    );

    // Article identification
    metaTags.push({
      name: SEARCH_FIELDS.NAME,
      content: articleDetailProps?.title?.value || '',
    });

    // Content classification
    if (articleDetailProps?.topic) {
      metaTags.push({
        name: SEARCH_FIELDS.ARTICLE_TOPIC,
        content: articleDetailProps?.topic?.fields.Title.value,
      });
    }

    // Publication timing
    if (articleDetailProps?.date) {
      metaTags.push({
        name: SEARCH_FIELDS.ARTICLE_FULL_DATE,
        content: articleDetailProps?.date.value,
      });

      metaTags.push({
        name: SEARCH_FIELDS.ARTICLE_DATE,
        content: getEventMonth(articleDetailProps?.date.value),
      });
    }
  }

  return metaTags;
}
Enter fullscreen mode Exit fullscreen mode

Structured Data Implementation

Beyond meta tags, implementing JSON-LD structured data significantly enhances search engine understanding of your content.

Schema Service Architecture

export const getSchemas = (
  layoutData: LayoutServiceData,
  siteSettings: SiteSettingsInfo,
  baseURL: string,
  hospitalitySchemas?: SchemaData[]
): SchemaData[] => {
  const schemas: SchemaData[] = [];
  const { route } = layoutData.sitecore;

  if (!route?.templateId || !route?.fields) return schemas;

  const templateId = toGuid(route.templateId);

  switch (templateId) {
    case toGuid(IDS.TEMPLATES.PRODUCT_DETAIL_PAGE):
      schemas.push(createProductSchema(layoutData, siteSettings, baseURL));
      break;
    case toGuid(IDS.TEMPLATES.ARTICLE_DETAIL_PAGE):
      schemas.push(
        createArticleSchema(
          route.fields as unknown as ArticleDetailPage,
          siteSettings,
          baseURL
        )
      );
      break;
    case toGuid(IDS.TEMPLATES.EVENT_DETAIL_PAGE):
      schemas.push(
        createEventSchema(route.fields as unknown as EventDetailPage)
      );
      schemas.push(createOrganizationSchema(siteSettings, baseURL));
      schemas.push(createWebSiteSchema(siteSettings, baseURL));
      break;
  }

  return schemas;
};
Enter fullscreen mode Exit fullscreen mode

Product Schema Example

export const createProductSchema = (
  layoutData: LayoutServiceData,
  siteSettings: SiteSettingsInfo,
  baseURL: string
): SchemaData => {
  const { route } = layoutData.sitecore;
  const fields = route?.fields as unknown as ProductDetailPage;

  return {
    "@context": "https://schema.org/",
    "@type": "Product",
    name: fields.productName?.value || "",
    image: [fields.productCardImage?.value?.src || ""],
    description: fields.productDescription?.value || "",
    brand: {
      "@type": "Brand",
      name: siteSettings.organizationName?.value || "",
    },
    sku: fields.mikMakProductIds?.value?.split(",")[0] || "",
    url: baseURL,
    offers: {
      "@type": "Offer",
      url: baseURL,
      itemCondition: "https://schema.org/NewCondition",
      availability: "https://schema.org/InStock",
      seller: {
        "@type": "Organization",
        name: siteSettings.organizationName?.value || "",
      },
    },
  };
};
Enter fullscreen mode Exit fullscreen mode

Website Schema with Search Action

export const createWebSiteSchema = (
  siteSettings: SiteSettingsInfo,
  baseURL: string
): SchemaData => {
  const searchPageUrl = siteSettings.searchPage.value?.href || "/search";

  return {
    "@context": "https://schema.org/",
    "@type": "WebSite",
    name: siteSettings.organizationName?.value || "",
    url: baseURL,
    potentialAction: {
      "@type": "SearchAction",
      target: `${baseURL}${searchPageUrl}?q={search_term_string}`,
      "query-input": "required name=search_term_string",
    },
  };
};
Enter fullscreen mode Exit fullscreen mode

Integration with Next.js

The metadata system integrates seamlessly with Next.js through the page props factory pattern.

Meta Props Plugin

class MetaPropsPlugin implements Plugin {
  order = 11;

  async exec(
    props: SitecorePageProps,
    context: GetServerSidePropsContext | GetStaticPropsContext
  ) {
    props.metaProps = [];
    if (!props.layoutData.sitecore.route) return props;

    // Social metadata from Constellation
    const service = new PageTaggingService(props.layoutData);
    const metaProps = await service.getMetaProps();
    props.metaProps = metaProps;

    // Search engine directives
    const searchEngineDirectiveProp =
      await service.getSearchEngineDirectiveProp();
    if (searchEngineDirectiveProp) {
      props.metaProps.push(searchEngineDirectiveProp);
    }

    // Social metadata for Open Graph and Twitter Cards
    const pageMetadata = await service.getSocialMetadataProps(context, {
      languageEmbedding: false,
    });
    if (pageMetadata) {
      props.metaProps = [...props.metaProps, ...pageMetadata];
    }

    // Custom search metadata
    const searchMetadataService = new SearchMetadataService(props.layoutData);
    const searchMetaTags = searchMetadataService.getMetaTags();
    if (searchMetaTags) {
      props.metaProps = [...props.metaProps, ...searchMetaTags];
    }

    return props;
  }
}
Enter fullscreen mode Exit fullscreen mode

Layout Integration

const Layout = ({ layoutData, headLinks, metaProps, siteSettings }) => {
  const canonicalURL = // ... canonical URL logic
  const baseURL = canonicalURL ? new URL(canonicalURL).origin : '';

  return (
    <div>
      {/* Structured data schemas */}
      <Schemas
        layoutData={layoutData}
        siteSettings={siteSettings}
        baseURL={canonicalURL}
      />

      {/* Meta tags rendering */}
      <PageMetadata metaProps={metaProps} />

      <Head>
        <title>{fields?.browserTitle?.value?.toString() || siteName}</title>
        {/* Additional head elements */}
      </Head>

      {/* Page content */}
    </div>
  );
};
Enter fullscreen mode Exit fullscreen mode

Best Practices

1. Field Standardization

Ensure consistent field naming across all Sitecore templates:

export interface SearchBasePageFields {
  NavigationTitle: Field<string>;
  browserTitle: Field<string>;
  searchable: Field<boolean>;
  breadcrumbsTitle: Field<string>;
  keywords: Field<string>;
  metaDescription: Field<string>;
  socialThumbnail: ImageField;
}
Enter fullscreen mode Exit fullscreen mode

2. Fallback Values

Always provide fallback values to prevent empty metadata:

metaTags.push({
  name: SEARCH_FIELDS.BROWSER_TITLE,
  content:
    pageModel?.browserTitle?.value ||
    pageModel?.NavigationTitle?.value ||
    "Default Site Title",
});
Enter fullscreen mode Exit fullscreen mode

3. Data Sanitization

Clean and validate data before including in metadata:

const keywords = pageModel?.keywords?.value
  ?.split(",")
  ?.map((keyword) => keyword.trim())
  ?.filter((keyword) => keyword.length > 0);

metaTags.push({
  name: SEARCH_FIELDS.KEYWORDS,
  content: keywords?.join("|") || "",
});
Enter fullscreen mode Exit fullscreen mode

4. Performance Optimization

Cache computed metadata when possible:

// Cache expensive operations
const templateId = useMemo(
  () => toGuid(this.layoutData.sitecore.route?.templateId),
  [this.layoutData]
);
Enter fullscreen mode Exit fullscreen mode

5. SEO-Friendly URLs

Ensure your metadata includes proper canonical URLs:

// Include canonical URL in structured data
const canonicalUrl = `${baseURL}${route.url}`;

return {
  "@context": "https://schema.org/",
  "@type": "Article",
  url: canonicalUrl,
  mainEntityOfPage: {
    "@type": "WebPage",
    "@id": canonicalUrl,
  },
  // ... other properties
};
Enter fullscreen mode Exit fullscreen mode

Implementation Examples

Complete Product Page Setup

// 1. Define your product template fields
interface ProductPageFields extends SearchBasePageFields {
  productName: Field<string>;
  productDescription: Field<string>;
  productPrice: Field<number>;
  productCategory: Tag[];
  productImage: ImageField;
}

// 2. Implement metadata extraction
private getProductMetadata(pageModel: ProductPageFields): MetaTag[] {
  const metaTags: MetaTag[] = [];

  // Product-specific fields
  metaTags.push({
    name: 'product_name',
    content: pageModel.productName?.value || '',
  });

  metaTags.push({
    name: 'product_price',
    content: pageModel.productPrice?.value?.toString() || '',
  });

  // Category handling
  if (pageModel.productCategory?.length) {
    metaTags.push({
      name: 'product_categories',
      content: pageModel.productCategory
        .map(cat => cat.fields.Title.value)
        .join('|'),
    });
  }

  return metaTags;
}

// 3. Create structured data
const productSchema = {
  '@context': 'https://schema.org/',
  '@type': 'Product',
  name: pageModel.productName.value,
  description: pageModel.productDescription.value,
  image: pageModel.productImage.value.src,
  offers: {
    '@type': 'Offer',
    price: pageModel.productPrice.value,
    priceCurrency: 'USD',
  },
};
Enter fullscreen mode Exit fullscreen mode

Event Page Metadata

// Event-specific metadata with rich temporal data
private getEventMetadata(eventData: EventPageFields): MetaTag[] {
  const metaTags: MetaTag[] = [];

  // Event timing
  if (eventData.eventStartDate?.value) {
    const startDate = new Date(eventData.eventStartDate.value);

    metaTags.push({
      name: 'event_start_date',
      content: startDate.toISOString(),
    });

    metaTags.push({
      name: 'event_month',
      content: startDate.toLocaleDateString('en-US', { month: 'long' }),
    });

    metaTags.push({
      name: 'event_year',
      content: startDate.getFullYear().toString(),
    });
  }

  // Location data
  if (eventData.eventVenue?.fields) {
    metaTags.push({
      name: 'event_venue',
      content: eventData.eventVenue.fields.Name.value,
    });

    metaTags.push({
      name: 'event_city',
      content: eventData.eventVenue.fields.City.value,
    });
  }

  return metaTags;
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Implementing a robust metadata system for Sitecore headless applications requires careful planning and systematic execution. By following the patterns outlined in this guide, you'll create a foundation that:

  1. Maximizes search visibility through comprehensive metadata coverage
  2. Supports rich search results via structured data implementation
  3. Maintains consistency across different content types
  4. Scales efficiently as your content grows
  5. Integrates seamlessly with modern frontend frameworks

The key to success lies in establishing clear standards, implementing fallback mechanisms, and regularly auditing your metadata output to ensure search engines can effectively crawl and understand your content.

Remember that metadata is not a "set it and forget it" implementation—it requires ongoing maintenance and optimization as search engine algorithms evolve and your content strategy develops. Regular monitoring of search performance and crawler behavior will help you identify opportunities for improvement and ensure your Sitecore content achieves maximum search visibility.

Top comments (0)