DEV Community

Ulises Gascón
Ulises Gascón

Posted on • Originally published at blog.ulisesgascon.com

How does the Official Node.js News Feeder work?

Node.js has a new RSS feed that consolidate all the releases and news from the different teams, working groups and projects inside the org.

The Challenge

Node.js as an organization has too many things ongoing all the time. There are many projects, teams, and working groups working on different things. It is hard to keep track of all the things that are happening, so there is a recurrent need for the community to find a better way to be aware of what is going on. This discussion has been going on for a while, and there are many ideas on how to solve this problem, but we decided that RSS is a good way to start as this will also help to promote our activities and achievements outside the Node.js org itself.

Requirements

  • The teams and working groups should be able to add their own news without having to change their way of working (no PRs, forms...)
  • The information should be available in a valid RSS feed.
  • The feed should be updated automatically, but allow for manual news additions and easy content curation.

Decisions made

  • Use GitHub as the source of truth for the news, so we will use the GitHub API to fetch the relevant information from issues, discussions, releases...
  • Use a GitHub Action to generate the RSS feed and publish it to GitHub Pages.
  • Use a GitHub Action to update the feed automatically every week or manually when needed, generating a new commit with the changes in a PR that will be reviewed and curated by the team.
  • Avoid external dependencies as much as possible, so the solution should be self-contained and easy to maintain.

The Solution

The full source code can be found in this repository. I will explain the most relevant parts of the solution here.

The Architecture

Architectural overview. Described below

In general terms, the solution is composed of the following parts:

Community

The community is the source of the news. They are the ones who reply to specific issues or discussions related to the news feed, as well as manage the new releases.

Curators

The curators are the ones who review the changes and merge the PRs that update the feed. The feed is automatically updated every week, but it can also be updated manually when needed. There are several scripts in order to collect, process, validate, and publish the feed.

Readers

The readers are the ones who consume the feed. They can be humans or bots. The readers can subscribe to the feed using the following URL: https://nodejs.github.io/nodejs-news-feeder/feed.xml. We provide a Slack channel where the feed is automatically published, so the community can be aware of the latest news.

The Structure

Configuration

There is a config.json file that stores all the references to the external resources (discussions, issues, releases...), API rate limits, and the configuration of the last execution time (lastCheckTimestamp).

The last execution time (lastCheckTimestamp) will prevent us from including already processed information in the feed. This prevents us from using third-party software or reconciling the feed to avoid duplications.

{
  "lastCheckTimestamp": 1688584036809,
  "reposPaginationLimit": 250,
  "releasePaginationLimit": 10,
  "commentsPaginationLimit": 100,
  "breakDelimiter": "</image>",
  "discussionsInScope": [],
  "issuesInScope": []
}
Enter fullscreen mode Exit fullscreen mode

Modularity

The solution is divided into different scripts that do different things, which allows us to reuse the code and make it easier to maintain.

This structure is clearer by checking the package.json.

{
    "scripts": {
        "collect:releases": "node scripts/collect-releases.js",
        "collect:issues": "node scripts/collect-issues.js",
        "collect:discussions": "node scripts/collect-discussions.js",
        "rss:validate": "node scripts/validate.js",
        "rss:build": "node scripts/build.js",
        "rss:format": "node scripts/format.js",
        "rss:format-check": "node scripts/format-check.js"
    }
}
Enter fullscreen mode Exit fullscreen mode

Fetching content from Github

Releases

Node.js uses Github Releases to publish new versions of different projects. There are many projects in the organization, and we keep adding more on a regular basis.

So, this script will do the following:

  1. Fetch all the repositories in the organization.
  2. Fetch the latest releases for each repository.
  3. Filter the releases by the ones that are newer than the last execution time (lastCheckTimestamp).
  4. Format the releases to be included in the feed.
  5. Add the releases to the feed.

Issues

Each project is publishing its news in GitHub Issues as responses.

So, this script:

  1. Fetches all the comments in the issues that are in scope.
  2. Filters the comments by the ones that are newer than the last execution time (lastCheckTimestamp).
  3. Formats the comments to be included in the feed.
  4. Adds the comments to the feed.

Discussions

Discussions are very similar to issues, but they are not supported in the GitHub API REST, so we used the GitHub GraphQL API to fetch the comments.

const comments = await Promise.all(discussionsInScope.map(async ({ discussionId, team }) => {
  const { repository } = await graphql(
    `
    {
      repository(name: "node", owner: "nodejs") {
        discussion(number: ${discussionId}) {
          comments(last: 100) {
            edges {
              node {
                body
                publishedAt
                updatedAt
                databaseId
              }
            }
          }
        }
      }
    }
    `,
    {
      headers: {
        authorization: `token ${process.env.GITHUB_TOKEN}`
      }
    }
  )

  return repository.discussion.comments.edges
    .filter(comment => new Date(comment.node.publishedAt).getTime() > lastCheckTimestamp)
    .map(comment => ({ ...comment.node, team, discussionId }))
}))
Enter fullscreen mode Exit fullscreen mode

See the full file for more details

Updating the feed

In order to update the feed we need to split the current feed by a breakDelimiter that is defined in the config.json file.

//...OMITED...
const feedContent = getFeedContent()
const [before, after] = feedContent.split(breakDelimiter)
const updatedFeedContent = `${before}${breakDelimiter}${relevantReleases}${after}`
overwriteFeedContent(updatedFeedContent)
Enter fullscreen mode Exit fullscreen mode

See the full file for more details

Formatting the feed

We use the library xml-formatter to normalize the feed content. This will help us curate the content later on when reviewing the PR.

import xmlFormat from 'xml-formatter'
import { getFeedContent, overwriteFeedContent } from '../utils/index.js'

const xml = getFeedContent()
const formattedXml = xmlFormat(xml, { indentation: '  ', collapseContent: true })
overwriteFeedContent(formattedXml)
Enter fullscreen mode Exit fullscreen mode

See the full file for more details

Validate the feed

In order to validate the feed, we directly use the W3C Feed Validation Service with an HTTP Request, simulating the form (using the got library) and parsing the response.

  const data = await got.post('https://validator.w3.org/feed/check.cgi', {
    form: {
      rawdata: xml,
      manual: 1
    }
  }).text()

  // Avoid importing CSS in the document
  const dom = new JSDOM(data.replace(/@import.*/gm, ''))

  const title = dom.window.document.querySelector('h2').textContent
  const recommendations = dom.window.document.querySelector('ul').textContent

  console.log(recommendations)

  if (title === 'Sorry') {
    console.log('🚨 Feed is invalid!')
    process.exit(1)
  } else {
    console.log('✅ Feed is valid!')
  }
Enter fullscreen mode Exit fullscreen mode

Note: In order to use the library jsdom to scrape the HTML response we need to avoid the @import statements in the CSS.

The Github Action

Cron Job and Manual Trigger

The GitHub Action is configured to run every week, but it can be triggered manually by using the workflow_dispatch event. This is useful when we want to update the feed manually, for example when we want to add a new news that is not available on GitHub or just want to promote some news quickly.

on:
    workflow_dispatch:
    schedule:
        - cron: '0 0 * * 0'
# ...OMITED... 
Enter fullscreen mode Exit fullscreen mode

See the full file for more details

API Limits

The GitHub API has a limit on requests. This process makes many requests to the API, so the best way to overcome this limitation is by using a GitHub Token.

This token can be created by a user and then added to the repository secrets. The GitHub Action will use this token to authenticate the requests to the API, and it will have a higher limit than the anonymous requests.

But the best solution is to use the already available tokens in the GitHub Actions as follows:

# ...OMITED...  
permissions:
  contents: write
  pull-requests: write
  issues: read
  packages: none

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    # ...OMITED...  

    - name: Collect Releases
      run: npm run collect:releases
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

    # ...OMITED...  
Enter fullscreen mode Exit fullscreen mode

See the full file for more details

We are passing the secrets.GITHUB_TOKEN as an environment variable GITHUB_TOKEN to the scripts.

Slack Notifications

The feed is published on Slack using the RSS App. This app is listening to the feed and pushing the new items to specific channel(s). In our case, we are using the channel #nodejs-news-feed.

Slack Node.js News Feeder channel screenshot that is showing the feed items published in the channel including a fancy preview of the releases.

Acknowledgment

Thanks a lot to the Node.js Next 10 team for the support and feedback on this project, especially to Michael Dawson for the guidelines, reviews, and suggestions.

Top comments (0)