Converting Spotify Web API Objects to GraphQL Schema Object Type Definitions

#graphql #spotifywebapi

I made a web application that scrapes and displays the Spotify Web API objects in two formats: GraphQL Schema Definition Language (SDL) and JSON.

Example: SimplifiedTrackObject

Spotify official documentation https://developer.spotify.com/documentation/web-api/reference/#object-simplifiedtrackobject
GraphQL schema format https://wsywh.sse.codesandbox.io/SimplifiedTrackObject/schema
JSON format https://wsywh.sse.codesandbox.io/SimplifiedTrackObject/json

The stack: 🥞

Node.js app with the Express framework
Nunjucks templating engine for the index view
Cheerio for parsing data from HTML
Hosted on CodeSandbox

Why did I make this app?

As part of learning about the GraphQL schema

Schema is a key concept in GraphQL. It specifies what kinds of data clients can request and what operations clients can perform through a GraphQL API. It consists of type definitions, written in a syntax called the Schema Definition Language (SDL). In many implementations, we pass the type definitions to template literal tags such as gql in Apollo Server and createTypes in Gatsby.

Example from Apollo Server documentation:

const typeDefs = gql`
  type Author {
    name
  }
`;

Example from Gatsby documentation:

exports.createSchemaCustomization = ({ actions }) => {
  const typeDefs = `
    type AuthorJson implements Node {
      joinedAt: Date
    }
  `;
  actions.createTypes(typeDefs);
}

Took me a while 😬, but I just realised that since template literals are strings, a schema type definition is actually, literally (pun intended), a regular string. It is then processed by functions such as gql and createTypes above into a schema format required by respective libraries.

My hypothesis:

const { gql } = require('apollo-server');

// We can do this...
const typeDefs = gql`
  type Author {
    name
  }
`;

// ...and we can also do this?? 🤔
const typeDefsString = `
  type Author {
    name
  }
`;
const typeDefs = gql(typeDefsString);

If true, this means full versatility of where and how we get the data for our schema without requiring a transpiling/compiling step. Instead of manually writing the type definitions in SDL, for instance, we may have a "single source of truth" file in any format readable by JavaScript. We can even programmatically scrape a documentation web page to build the schema.

And thus, I got the idea to scrape the Spotify API documentation page and transform the data into GraphQL SDL.

When this app was finished, I used the resulting object type definitions in my local Apollo GraphQL server, and it worked, confirming my above hypothesis on versatile schema data source.

To see how a large-scale production app’s data models and relations are depicted in a schema

Documentations and beginner-level tutorials, understandably, focus on the API/concept and use highly simplified, shortened schemas. Converting Spotify API's objects to schema format gave me an idea of what I will be dealing with when setting up schema for a production app. This leads to the next point...

To better understand REST and GraphQL API behaviours

Despite identical database structure, models, and relations, different API architectures require different way of thinking. Observing object types designed for a large production app on REST API and converting them to GraphQL schema highlights the different perspectives between the two.

For example, the Spotify API has ArtistObject and SimplifiedArtistObject. The former is used eg. when returning a single artist, and the latter when including artist data in a track or an album. The former contains fields like followers and genres, which are never needed in the latter use cases. This distinction is necessary for a REST API, but may not be needed for a GraphQL one. Clients simply request the fields they need from the ArtistObject type.

How does this app work?

Scrape the Spotify documentation page with request
- For the sake of speed, I copied the HTML into a local file in the node app’s public directory and read it with fs instead.
Parse the HTML string with cheerio to find the objects data.
- It doesn't matter where the data comes from (remote URL, local file), as long as we pass HTML string to cheerio: const $ = cheerio.load(scraper.scrapeLocal()).
- Cheerio is based off core jQuery, so we use jQuery selectors—which for the most part resembles DOM element selectors—to find the data we need: const unformattedObjectTypes = $("#reference-index + .left-split-container > .row:last-child ul li").
- Heads up: As Cheerio relies on the markup selectors to parse data, if we use a remote source (ie. Spotify documentation page) and the markup changes, our code breaks. I use a local file for this app, so this will not happen.
Now I've got a list of object types; each object type has multiple fields, each of which has a name, description, and type. Then I prepare the data to send to our routes. The data preparation process consists of two separate things:
- The first group is minor adjustment/formatting tasks such as getting the innerText of target elements, trimming whitespace, replacing double quote marks, etc.
- I did not have much time to work on this MVP, so I removed double quote marks and omitted description with complex markup (eg. reason field in TrackRestrictionObject has lists and formatted content in the description). Note that we can have multi-line description in GraphQL schema by wrapping it in three double quote marks, and we can convert HTML markup to Markdown syntax for the description.
- The second group is related to syntax and specification differences between Spotify's documentation and SDL requirements.
  - Minor syntax differences: [SomeObjectType] instead of Array[SomeObjectType], Int instead of Integer.
  - SDL uses union types, eg. union TrackOrEpisode = TrackObject | EpisodeObject instead of directly using TrackObject | EpisodeObject in the object field type.
  - SDL cannot have an empty type. TrackObject has empty linked_from, which should be replaced with something else (LinkedTrackObject maybe?).
  - SDL does not recognize generic Object or any type. PagingObject has an Array[Object] type, which could be replaced either once we know what Object refers to, or by making a custom generic scalar type.
- I have not fixed these, but I found these interesting to note. Of course, if I were refactoring a REST API codebase for my own/my workplace's app, I would have more information (and time!) to address them.
Finally, I return the data in appropriate format for each route (schema in plain text, json, and HTML page for the home page). I use Nunjucks for the home page, which is not necessary but makes the code tidier. I also add a parameter to save the data to a static file (object-types.txt and object-types.json) for faster access.

Caveat/notes:

This app simply returns the object types data as string in the SDL format. It does not fetch or serve or interact with data in any way. It demonstrates what a GraphQL type definition looks like without getting into the implementation.
Related to the point above, a GraphQL schema does not only contain object type definitions but also GraphQL-specific types like Query, Mutation, and Subscription. This app does not address those.
This app is hosted on a free CodeSandbox account, so it shuts down after a few minutes of inactivity. If I were using it for a production app, I would host it on a server that is always on.

As I learn more about GraphQL I plan to build a GraphQL server of some sorts with schema from this app. Til then, thank you for reading and take care.