DEV Community

Cover image for Building Veloren SA: An Astro blog fed by a Google Cloud Translation Pipeline
Henrique Ramos
Henrique Ramos

Posted on

Building Veloren SA: An Astro blog fed by a Google Cloud Translation Pipeline

As the owner of a long-standing Veloren community server in South America, I had a cool idea: create a front page to keep non-English-speaking players updated on the latest devlogs. The community was missing out on the latest news from the official Veloren team since devlogs are only in English. My goal here was to get them translated to Spanish and Portuguese. Devlogs, however, are huge! If I only I could automate this...

Turns out, I could!

Solution

The pipeline is an event-based system using Google Cloud services. It works in three steps:

  1. Ingestion: A Cloud function runs weekly via Google Cloud Scheduler. It checks the RSS feed for new devlogs, saving its HTML content into a Cloud Storage bucket. To avoid reprocessing old posts, the service first downloads a file containing a list of previously processed URLs. It then filters out any posts that have been seen before, ensuring only new content enters the pipeline.
  2. Translation: A second Cloud Function is automatically triggered whenever a new file is saved to the ingestion bucket. It starts a batch Cloud Translation job to convert the HTML files into Spanish and Portuguese, then saves the translated files into a second Cloud Storage bucket. To prevent concurrency errors that can occur when multiple files are uploaded at once, the function uses a unique timestamp for each job's output directory, ensuring all files are translated successfully.
  3. Publishing: A third Cloud Function watches the translation output bucket. When new translated files appear, it downloads them, and extracts the post's metadata from the HTML, including the title, publication date, and URL. It then combines the metadata and the raw HTML content into a single JSON file, which is saved to a third bucket. This JSON structure is perfectly suited for a content collection in Astro. The last thing it does is trigger Netlify's build hook, getting the new posts live on the website.

Service architecture diagram

Tools

  • Nx Monorepo with pnpm workspaces: This setup made managing functions so much easier. It put all code in one place, simplified testing, and gave a clean way to handle deployment with GitHub Actions. Adopting a monorepo meant that services could share dependencies and tooling, which reduced boilerplate.
  • Google Cloud Functions: Using Cloud functions gave a simple, event-based system. It integrates seamlessly with Cloud Storage, which was key to the pipeline. This architecture also ensures that we only pay for the resources used when a function is triggered, making it extremely cost-effective.
  • Google Cloud Translation API (Batch Mode): This was a perfect fit for this kind of work. It’s designed to translate large volumes of documents offline, which is far more efficient than processing files one at a time. The API's ability to handle HTML directly means we didn't have to write any complex parsing logic to preserve the original formatting, images, or links within the devlogs.
  • Netlify Build Hooks: This simple feature was the final piece of the puzzle. It gives a straightforward way to trigger a website rebuild from GCS, completing the automated loop.
  • Astro: The final content is converted to JSON files so Astro can consume them as a content collection. This is a powerful feature of Astro that simplifies building a static content-heavy website.

Astro Integration

One of the coolest parts of this project was using Astro's Content Collections to pull in the remote JSON data. Instead of manually downloading and managing files, I configured Astro to do it all during the build process.

The src/content.config.ts file tells Astro where to find the content and what to do with it.

import { defineCollection, z } from "astro:content";
import { type FileMetadata } from "@google-cloud/storage";
import he from "he";

const blog = defineCollection({
  loader: async () => {
    const response = await fetch(import.meta.env.VELOREN_ARTICLES_URL);

    const data = await response.json();

    return Promise.all(
      (data.items ?? []).map(async ({ mediaLink, name }: FileMetadata) => {
        const item = await (await fetch(mediaLink as string)).json();

        return {
          id: name,
          ...item,
          cover: item.cover,
          summary: item.content.replace(/<[^>]*>/g, "").slice(0, 250),
          content: he.encode(item.content, { allowUnsafeSymbols: true }),
          language: name?.split("/")[0].toLowerCase(),
        };
      }),
    );
  },
  schema: z.object({
    slug: z.string(),
    title: z.string(),
    content: z.string(),
    date: z.string().transform((date) => new Date(date)),
    source_url: z.string().optional(),
    cover: z.string().url().optional(),
    summary: z.string().optional(),
    language: z.enum(["es", "pt-br"]),
  }),
});

export const collections = { blog };
Enter fullscreen mode Exit fullscreen mode

In this code:

  1. The loader function uses fetch and await to get a list of all the JSON files from my Cloud Storage bucket.
  2. It then iterates over the list, fetching the content of each JSON file.
  3. A z.object from the Zod library acts as a schema to validate the incoming data, ensuring every post has a title, date, content, and the correct language.
  4. The cleaned-up and validated data is then returned as a collection of blog posts. Astro automatically handles the rest, creating static pages for each post during the build process.

This approach gives me the best of both worlds: a dynamic, automated content pipeline and a fast, static-first website.

Result

The final result can be found here.

The complete source code for my project is available on GitHub. You can explore the project structure, the TypeScript code for each function, and the GitHub Actions workflow that automates the deployment.

GitHub logo hnrq / veloren-translate

GCP infrastructure to translate TWIV posts to pt-BR and ES

GitHub logo hnrq / veloren-website

Veloren South America website

Netlify Status

Veloren South America

This is the Front-end for my veloren-translate project. It uses Astro for building pages and is available here.






Also, feel free to join the server at anytime at play.veloren.net.br.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.