logarithmicspirals

Posted on Oct 27, 2024 • Originally published at logarithmicspirals.com on Oct 26, 2024

🛠️ Syncing dev.to Posts with Your Static Astro Blog 🚀

#astro #api #seo #webdev

Introduction

Over the last year, I have been participating in dev.to challenges. However, this created an interesting technical challenge: how to integrate these dev.to posts with my Astro blog's Content Collections. While I was publishing challenge submissions on dev.to, I wanted a way to seamlessly combine this third-party content with my existing markdown-based blog posts in Astro.

This post walks through how I solved the challenge of integrating external API data with Astro Content Collections. You'll learn how to fetch posts from dev.to's API and combine them with your local content, all while maintaining Astro's static generation benefits. Whether you're working with dev.to posts or any other third-party data source, this guide will show you how to integrate external content with your Astro Content Collections without requiring a database or traditional CMS.

Prerequisites

This article is written for an intermediate/advanced audience. For this article to make the most sense, readers should have some background knowledge of:

TypeScript
Astro basics
API fundamentals
Rich Search Results
HTML/CSS

Fetching dev.to Posts via API

The first thing to understand is how to fetch dev.to posts programmatically. Without this capability, the whole exercise would become much more difficult. Specifically, there are two endpoints which need to be used:

By combining these two endpoints I can retrieve all the published articles for my username, and then retrieve the content of those articles. What I did is make a utility script to help with this. The file I made is calledsrc/utils/devto.ts. Here's a basic view of the code:

// ... Imports and some type definitions here.

// First, I get the list of published articles. I don't want to display articles that are already published on my site,
// so I filter out the dev articles where the canonical URL starts with my site's URL.
const devArticles = (await (await fetch(`https://dev.to/api/articles?username=logarithmicspirals`)).json())
    .filter((post: { canonical_url: string; }) => !post.canonical_url.startsWith(SITE));
const devPosts: DevPost[] = []; // Variable to store the processed dev posts.

for (let i = 0; i < devArticles.length; i ++) {
  const post = devArticles[i];
  // Second, I retrieve the article data using the post ID.
  const article: DevArticle = await (await fetch(`https://dev.to/api/articles/${post.id}`)).json();

  const reshapedDevPost = {}; // Skipping this for now for simplicity.

  devPosts.push(reshapedDevPost); // Add the post to collection.
}

// Export the collection.
export { devPosts as DEV_POSTS, type DevArticle, type DevPost }

Now you may be wondering what these DevArticle and DevPost types are. These are custom types I wrote based on the shape of the JSON returned from the API. The DevArticle type reflects the shape returned from the API, and theDevPost type reflects the shape I use to build the static cross post pages. Here's what they look like:

type DevArticle = {
  id: number,
  title: string,
  description: string,
  published_at: string,
  cover_image: string | null,
  tags: string[],
  slug: string,
  canonical_url: string,
  reading_time_minutes: number,
  body_html: string
};

type DevPost = {
  id: string,
  collection: string,
  data: {
    title: string,
    description: string,
    pubDate: Date,
    heroImage: ImageMetadata,
    tags: string[],
    publish: boolean,
    heroImageAlt: string,
    updatedDate?: Date,
    videos: VideoData[],
    images: DevImage[]
  },
  canonicalUrl: string,
  readingTimeMinutes: number,
  slug: string,
  body: string,
  headings: MarkdownHeading[],
};

Processing dev.to Posts: Markdown vs HTML

Initially, I thought markdown would be the way to go with this. My intuition was that it would be easiest to render the markdown from dev.to using my existing markdown configuration. Everything would ideally be in sync with minimal additional effort. However, I was thrown a curveball and that curveball was these {% embed %} tags which are supported by dev.to. The embed tag is a liquid tag and dev.to supports different ones. They document their support in theirEditor Guide. Unfortunately, this seems to be unique to dev.to and creating a custom plugin to parse them proved to be quite time-consuming.

Thankfully, the API response has a body_html field which contains the rendered HTML. The advantage of using the data in this field is I don't have to write any custom parsers for the liquid tags. A disadvantage is that I don't have any control over class names, ids, or structure of the rendered HTML. Ultimately, I chose to go with the prerendered HTML returned by the API, with the risk my code might break in the future if dev.to changes how they render markdown.

After deciding on HTML, I had to make some updates to src/utils/devto.ts. Inside the for-loop from earlier, I added the following code:

// All my posts have a "hero" (a.k.a. cover) image.
const heroImage = await getHeroImage(article);
// Need to know headings for the table-of-contents. Videos and images are going to be used to generate JSON-LD.
const { headings, updatedHtml, videos, images } = await processHtml(article.body_html);
const reshapedDevPost: DevPost = {
    id: `${article.slug}-${article.id}`,
    data: {
      title: article.title,
      description: article.description,
      pubDate: new Date(article.published_at),
      heroImage: heroImage,
      tags: article.tags,
      publish: true,
      heroImageAlt: article.title,
      videos,
      images
    },
    canonicalUrl: post.canonical_url,
    readingTimeMinutes: post.reading_time_minutes,
    slug: article.slug,
    collection: "dev-to",
    body: updatedHtml,
    headings
  };

There are a few things to take note of:

The hero image is extracted from the article object with the getImage function.
Headings, modified HTML, videos, and images are retrieved from the processHtml function.
A new object, reshapedDevPost, is created using information from steps 1 and 2 as well as information from the original article object.

Retrieving the Hero Image

The getImage function serves an important role. I need to get the cover image from the dev.to article and use it as the hero image for the post. The dynamic rendering of blog post pages requires a specific shape for the image object:

// Note, this is for example purposes. The actual shape in practice is ImageMetadata from Astro.
type HeroImage = {
  src: string,
  width: number,
  height: number,
  format: 'jpeg' | 'jpg' | 'png' | 'webp' | 'gif' | 'tiff' | 'avif' | 'svg'
}

Since the Forem API only returns the src for the image, I have to do some additional processing:

const getHeroImage = async (article: DevArticle) => {
  let heroImage: ImageMetadata | undefined = undefined;

  if (article.cover_image) {
    try {
      const response = await fetch(article.cover_image);

      if (!response.ok) {
        throw new Error(`Failed to fetch image: ${response.statusText}`);
      }

      const arrayBuffer = await response.arrayBuffer();
      const imageBuffer = Buffer.from(arrayBuffer);
      const metadata = await sharp(imageBuffer).metadata();

      if (!metadata.width || !metadata.height || !metadata.format) {
        throw new Error('Incomplete image metadata.');
      }

      const format = formatMap[metadata.format];

      if (!format) {
        throw new Error(`Unsupported image format: ${metadata.format}`);
      }

      heroImage = {
        src: article.cover_image,
        width: metadata.width,
        height: metadata.height,
        format: format,
      };
    } catch (error) {
      console.error(`Error processing image for article ${article.title}:`, error);
    }
  }

  if (!heroImage) {
    throw new Error(`There is no hero image for article ${article.title}`);
  }

  return heroImage;
};

The above code:

Downloads the image from the src attribute.
Converts the image to a buffer so that metadata can be extracted using the sharp library.
The format is retrieved from the format map.

The format map is a map I wrote to convert the actual format to one which matches the ImageMetadata type. Here's what that looks like:

const formatMap = {
  jpeg: 'jpeg',
  jpg: 'jpg',
  png: 'png',
  webp: 'webp',
  gif: 'gif',
  tiff: 'tiff',
  avif: 'avif',
  heif: 'avif',
  heic: 'avif',
  svg: 'svg',
};

With that out of the way, I can turn my attention towards processing the HTML.

Processing the HTML

To maximize compatibility with my existing code on the dynamic route page for blog posts, I have to extract some data from the HTML and also modify it. The steps are:

Modify the HTML to ensure parity with the markdown rendering pipeline.
Extract headings for the table of contents.
Extract images and videos for JSON-LD generation.

The third step is an extra step. Before working on this project, my pages didn't have JSON-LD for images embedded in the articles, and I also didn't have videos before the dev challenges. Previously, the pages only had JSON-LD for the hero images and breadcrumbs. I figured this would be a good time to integrate build out the schema markup.

For the processHtml method, I created a new file called src/utils/process-devto-html.ts. To process the HTML, I had to install cheerio.

Here's what the code for the function looks like:

export default async function processHtml(html: string) {
  const $ = cheerio.load(html);
  const headings: { depth: number, text: string, slug: string }[] = [];

  $('h1, h2, h3, h4, h5, h6').each((_, element) => {
    const heading = $(element);

    if (heading.closest('div.ltag-github-readme-tag').length > 0) {
      return;
    }

    const tagName = heading[0].tagName;
    const depth = parseInt(tagName.slice(1), 10);

    const text = heading.text().trim();

    let slug = heading.attr('id');
    if (!slug) {
      slug = slugify(text, { lower: true, strict: true });
      heading.attr('id', slug);
    }

    const anchor = $(heading).find('a');
    anchor.addClass(autolinkOptions.properties.class);

    const anchorIcon = $('<i></i>');
    anchorIcon.addClass(autolinkOptions.content.properties.class);
    anchorIcon.append(linkIcon.html[0]);

    anchor.append(anchorIcon);
    anchor.remove();

    heading.append(anchor);

    headings.push({
      depth,
      text,
      slug,
    });
  });

  $('a').each((_, element) => {
    const link = $(element);

    if (link.closest('div.ltag-github-readme-tag').length > 0) {
      return;
    }

    const href = link.attr('href');
    const classes = link.attr('class');

    if (href && externalLinkTest(href) && !classes?.includes("article-body-image-wrapper")) {
      link.attr('target', externalLinksOptions.target);
      link.attr('rel', externalLinksOptions.rel);

      const externalIcon = $('<i></i>');
      externalIcon.addClass(externalLinksOptions.content.properties.class);
      externalIcon.append(arrowUpRightFromSquareIcon.html[0]);

      link.append(externalIcon);
    }
  });

  const youtubeUrls: string[] = [];

  $('iframe').each((_, element) => {
    const src = $(element).attr('src');

    if (src && src.startsWith('https://www.youtube.com')) {
      youtubeUrls.push(src);
    }
  });

  const videos: VideoData[] = [];

  for (let i = 0; i < youtubeUrls.length; i ++) {
    const data = await getVideoData(youtubeUrls[i]);

    videos.push(data);
  }

  const images: DevImage[] = [];

  $('img').each((_, element) => {
    const src = $(element).attr('src');

    if (src) {
      images.push({src});
    }
  });

  return {
    updatedHtml: $.html(),
    headings,
    videos,
    images
  };
}

What does this function do? Here are the steps:

Loop over the headings embedded in the page, but ignore headings embedded in a GitHub README file.
1. Find the heading anchor.
2. Add an anchor icon.
3. Change the position of the anchor from the beginning of the heading to the end.
4. Add the heading to the headings collection.
5. Note the class names coming from autolinkOptions. This is from the markdown configuration.
Loop over the anchor tags.
1. Ignore anchor tags which are in a README or wrapping an image.
2. Append an icon indicating the link is external.
Loop over the iframes and collect all the YouTube video URLs.
Convert the video URLs to VideoData objects.
Loop over the images and create DevImage objects.
Return the results.

For this code to work, I also had to generate a YouTube API key. See Google's documentationCalling the API for more information on how to set this up.

For clarification, the externalLinkTest is a simple function to check if the link is going somewhere outside my website. Here's what that function and the DevImage object look like:

export type DevImage = {
  src: string
};

function externalLinkTest(href: string) {
  return !href.includes(SITE) && !href.startsWith("#");
}

Now, what about the VideoData object? That's another custom type I made for holding data retrieved from the YouTube video data API. For the extraction, I created a function called getVideoData and put that inside a file calledsrc/utils/youtube.ts. Here's what that file looks like:

export type VideoData = {
  publishedAt: string,
  title: string,
  description: string,
  thumbnailUrls: string[],
  embedUrl: string,
  contentUrl: string
};

export const getVideoData = async (embedUrl: string): Promise<VideoData> => {
  const embedUrlParts = embedUrl.split("/");
  const videoId = embedUrlParts[embedUrlParts.length - 1];
  const key = import.meta.env.YOUTUBE_API_KEY;
  const requestUrl = `https://www.googleapis.com/youtube/v3/videos?part=id%2C+snippet&id=${videoId}&key=${key}`;
  const response = await fetch(requestUrl);
  const responseData = await response.json();
  const snippet = responseData.items[0].snippet;
  const thumbnails = snippet.thumbnails;
  const contentUrl = `https://www.youtube.com/watch?v=${videoId}`

  return {
    publishedAt: snippet.publishedAt,
    title: snippet.title,
    description: snippet.description,
    thumbnailUrls: [
      thumbnails.default.url,
      thumbnails.medium.url,
      thumbnails.high.url,
      thumbnails.standard.url
    ],
    embedUrl,
    contentUrl
  };
};

As I have shown, the VideoData type is just a watered down version of what the YouTube API returns. The function is relatively simple, I give it the embed URL extracted from the dev.to HTML and use that to get additional information like the title, description, and thumbnails. Why do I want this data? I'll use it later on to generate the JSON-LD for rich results in Google search.

Integrating dev.to Posts with Astro’s Dynamic Routing

After figuring out how to get the dev posts, the next step was to combine the dev posts with my site's local posts. The local posts are stored as markdown and rendered with Astro's collection API. As such, they have a type defined by the frontmatter. For simplicity, I defined the DevPost type to mimic the shape of the collection entry type. The specific type for my local posts is CollectionEntry<'blog'>.

To build my site, I actually store all the posts in a constant exported from a file calledsrc/content/tags-and-posts.ts. For this project, I was able to update the code which builds the post collection. Here's what it looks like:

type Post = CollectionEntry<'blog'> | DevPost; // Define a unified type for posts.

const localPosts: CollectionEntry<'blog'>[] = await getCollection('blog', ({ data }) => {
  return data.publish; // Check the frontmatter if it's okay to publish the post.
});

const posts: Post[] = [...localPosts, ...DEV_POSTS]; // DEV_POSTS is exported from src/utils/devto.ts.
posts.sort((a, b) => {
  const dateA = a.data.pubDate.valueOf(); // Local posts use Zod to coerce strings in the frontmatter to dates.
  const dateB = b.data.pubDate.valueOf();
  return dateB - dateA;
});

// There's also some code around tags I'm skipping.

export {
  // ... tag exports here
  posts as POSTS,
  type Post
};

With the updated collection, I was able to make some changes to my dynamic blog post pages. The next file I turned my attention to was src/pages/blog/[...slug].astro. I had to change how I pull data from the post collection. For example, I previously would pull data like this:

const post: CollectionEntry<"blog"> = Astro.props;
const { Content, headings, remarkPluginFrontmatter } = await post.render();

However, I now had to account for the new Post type, which meant some additional logic had to be added:

const post: Post = Astro.props;
let Content: AstroComponentFactory | undefined = undefined;
let headings: MarkdownHeading[] = [];
let minutesRead: string;
let canonicalUrl: string | undefined;
let images: string[] | undefined;
if ("render" in post) {
  const renderResult = await post.render();
  Content = renderResult.Content;
  headings = renderResult.headings;
  minutesRead = renderResult.remarkPluginFrontmatter.minutesRead;
  images = renderResult.remarkPluginFrontmatter.images;
} else {
  minutesRead = `${post.readingTimeMinutes} min read`;
  headings = post.headings;
  canonicalUrl = post.canonicalUrl;
}

In the updated code, I'm having to treat content rendering differently depending on whether or not the "render"method is available. Since the render method is only for local posts, I may or may not have the <Content />component. So the markup outside the code fence looks like this

{Content ? <Content /> : <div set:html={post.body} />}

As for the renderResult.remarkPluginFrontmatter.images, that comes from a custom remark component calledsrc/utils/remark-extract-images.ts which exports a function called remarkExtractImages():

export function remarkExtractImages() {
  return function(tree: Node, { data }: any) {
    const images: string[] = [];

    visit(tree, 'image', (node: Image) => {
      if (node.url) {
        images.push(node.url);
      }
    });

    if ('astro' in data) {
      data.astro.frontmatter.images = images;
    }
  }
}

Just like how I had to process the HTML to maintain parity with the markdown, I had to create this plugin to have the markdown maintain parity with the HTML. Having access to the images is important for generating JSON-LD. If I ever start publishing videos in my markdown-based posts, I'll have to do the same thing for videos.

Styling dev.to Embeds on My Blog

As far as styling goes, I'll refrain from posting all of the CSS here. To help others who are working on a similar project, keep an eye out for the following:

README files from GitHub embeds are wrapped in a tag with the class name .ltag-github-readme-tag.
Headings elements already have anchor tags embedded in them.
Videos embeds are rendered to iframes.

I used ChatGPT to help expedite the CSS writing process.

SEO Optimization

Canonical URL

I also had to account for the canonical URL from the dev.to posts. Previously, the canonical URL would always default to the slug and site domain. However, I had to update the code to check if a canonical URL is available:

const { title, description, image = '/profile-pic.png', canonicalUrl } = Astro.props;
if (canonicalUrl) {
  canonicalUrlToUse = new URL(canonicalUrl);
}

Then I could display it like this:

<link rel="canonical" href={canonicalUrlToUse} />

The canonical URL is important for SEO purposes since it tells search engines that the page is just a copy of another page hosted on a different domain.

JSON-LD

In a previous article,Boost Your Blog's SEO with JSON-LD: How I Added Rich Results Using Structured Data, I talked about how I had added JSON-LD for rich results to my blog. Building off of that, I enhanced my posts to have JSON-LD for both images and videos embedded in the articles. Previously, the articles only had JSON-LD for the hero images and the breadcrumbs.

To add the images and videos to the page schema, all I had to do was set the image and video attributes:

const getBlogPostingSchema = (post: Post, url: string, images?: string[]): WithContext<BlogPosting> => {
  const { title, description, tags, pubDate, updatedDate, heroImage } = post.data;
  const schemaImages = [getSimpleImageObjectSchema(heroImage.src)];
  const schema: WithContext<BlogPosting> = {
    "@context": "https://schema.org",
    "@type": "BlogPosting",
    "headline": title,
    "description": description,
    "keywords": tags,
    "author": defaultCreator,
    "datePublished": pubDate.toISOString(),
    ...(updatedDate && { "dateModified": updatedDate.toISOString() }),
    "inLanguage": "en-US",
    "url": url
  }

  // CollectionEntry<'blog'> doesn't have videos in the frontmatter right now.
  if ("videos" in post.data) {
    schema.video = post.data.videos.map(getVideoObjectSchema);
  }

  if ("images" in post.data) {
    // Only posts from dev.to have an images attribute under the data attribute.
    post.data.images.map(image => getSimpleImageObjectSchema(image.src)).forEach(image => schemaImages.push(image));
  } else if (images) {
    // CollectionEntry<'blog'> doesn't have images in the frontmatter. The images come from ./remark-extract-images.ts
    images.map(image => getSimpleImageObjectSchema(image)).forEach(image => schemaImages.push(image));
  }

  schema.image = schemaImages;

  return schema;
};

// Build a basic ImageObject schema using an image URL.
const getSimpleImageObjectSchema = (url: string): WithContext<ImageObject> => {
  return {
    "@context": "https://schema.org",
    "@type": "ImageObject",
    "contentUrl": url,
    "creator": defaultCreator,
    "creditText": defaultCreatorName,
    "copyrightNotice": defaultCreatorName
  };
};

// Build a VideoObject schema from a VideoData object. This is why the YouTube API response had to be reshaped.
const getVideoObjectSchema = (data: VideoData): WithContext<VideoObject> => {
  return {
    "@context": "https://schema.org",
    "@type": "VideoObject",
    name: data.title,
    description: data.description,
    thumbnailUrl: data.thumbnailUrls,
    uploadDate: data.publishedAt,
    contentUrl: data.contentUrl,
    embedUrl: data.embedUrl
  };
};

Future Plans

With Astro already in beta for version 5, I will most likely have to make significant changes to this code in the near future. In version 5, Astro will have the Content Layer API instead of content collections (which is what I'm currently using). Astro has an article titledContent Layer: Deep Dive which goes into detail about the new design.

From what I can see, the big change will be that I can have easier time creating a unified type for posts more easily. Since the API entries will be loaded through the Content Layer, I will be able to enforce common schemas.

Other than that, this project has helped me realize I can also use a similar technique to publish posts to other sites. At the moment, I use RSS to import my posts to accounts on other platforms. However, I think I should be able to figure out a way to use REST APIs to publish posts to sites like dev.to and Hashnode. This should make the crossposting process easier.

Conclusion

Integrating my dev.to posts into my Astro blog has been a rewarding challenge, allowing me to keep my content in sync across platforms while still enjoying the benefits of a static site. By tapping into the dev.to API, I was able to fetch articles, process them for display, and seamlessly merge them with my local markdown posts—all without needing a traditional CMS. Choosing to use the body_html field made rendering straightforward, and adding steps for extracting metadata like headings and multimedia not only improved the content presentation but also boosted SEO through structured data. This project has shown me the potential for further automation, such as using APIs for cross-posting to other platforms, and I'm excited to see how Astro's upcoming Content Layer will make these integrations even more powerful.

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

DEV Community