DEV Community

Cover image for Diving into Docusaurus codebase
Soham Thaker
Soham Thaker

Posted on

Diving into Docusaurus codebase

What is docusaurus?

Docusaurus is an open-source SSG backed by Meta that allows you to build and deploy a documentation website, a personal portfolio or a blog website at a lightning-fast speed. It provides support for SEO, Markdown, Browser, Versioning, Localization, and Content Search among many other features out of the box.

Lab

The lab that I was working on had to do with examining code implementation, test cases, build or config settings and docs for the Docusaurus repo about a feature that I'd want to implement in my SSG project.

Feature

I picked the feature that converts markdown syntax for an image denoted as ![Image alt text](Link to image) to an <img> tag when the markdown document is converted to HTML. The generic way of implementing this would be to use a regular expression to search through the contents of a file and see if any of the contents match the markdown syntax for an image and then convert it to an HTML tag.

Strategies used to find feature implementation

I used a bunch of strategies to find how my picked feature might be implemented by Docusaurus maintainers.

  1. Tried looking up on ChatGPT to get ideas on how to navigate through this large codebase. It suggested that Docusaurus is possibly using a markdown engine that does this sort of parsing from markdown to HTML conversions. It also suggested I look up keywords like "image conversion," "convert to image tag," or "markdown to HTML.", go through changelogs, PRs, filed issues, etc to find my desired feature implementation.
  2. Docusaurus also has a nice documentation on how to use the tool itself but not so much on how the codebase is glued together which was a major setback for me. I went through its documentation to see if they have any mention of how they are implementing this feature in their codebase.
  3. I also searched for image and <img keywords throughout the codebase in the VSCode to see if I could find something related to the feature implementation.
  4. I also tried GitHub's new code search feature to dig deeper into the files to find references for my feature.

The best strategy that worked for me was 3. It initially gave me a lot of results. I skimmed through them to find the right ones that were regarding the MD to HTML image conversion.

Findings/Learning Outcomes

I unfortunately couldn't find how the feature is implemented end to end. I could only find the bits and pieces of the feature implementation and lost my way as I was navigating through this deep codebase. Some of my findings are as below:

  1. They seem to be breaking down their features and writing their packages located within the /packages directory for everything. For example, they suggest everyone use a classic theme when setting up the project for the first time, ideal for almost all the user's needs. The code for this classic theme can be found under /packages/docusaurus-classic-theme. There are also a lot of other packages that they created and are used by them. The origin of the feature that I was trying to find resides in this folder within /packages/docusaurus-classic-theme/theme/MDXComponents/Img/index.tsx. The code can be found below,

    import React from 'react';
    import clsx from 'clsx';
    import type {Props} from '@theme/MDXComponents/Img';
    
    import styles from './styles.module.css';
    
    function transformImgClassName(className?: string):         
    string {
      return clsx(className, styles.img);
    }
    
    export default function MDXImg(props: Props): JSX.Element 
    {
      return (
        // eslint-disable-next-line jsx-a11y/alt-text
        <img
          loading="lazy"
          {...props}
          className={transformImgClassName(props.className)}
        />
      );
    }
    

    The above code is the origin of how they create their own MDX component which simply returns the HTML image tag. There's another component that I found in the same directory which simply returned an anchor tag. All these components seem to be a wrapper around the HTML tags. The components are then exported as a typed interface part of an object's key-value pairs. From this point onward, as I kept on trying to decipher how the MDXImg component could be used, I failed time and time again to find its usage explicitly in the codebase. I tried finding out how their packages are configured and used elsewhere in the codebase but couldn't find anything there either. However, as I was reading the code part of /packages/docusaurus-theme-classic I went into the test file packages/docusaurus-theme-classic/src/__tests__/options.test.ts and found something interesting. They wrote a test case where they allowed the users to specify options for a specific page using a config object. I noticed as part of the logo key, an object was assigned to it with a link to the image in a key-value pair form with src as its key and path to the image as its value, part of a bigger object that has a bunch of other keys like logo, navbar, footer, header, etc. This config object is possibly then mapped to the individual wrapper components that take the options as props and then utilize the values of those props to pass it as HTML tag property. The shorter version of the config object is below,

        const userConfig = {
          image: 'img/docusaurus-social-card.jpg',
          navbar: {
            style: 'primary',
            hideOnScroll: true,
            title: 'Docusaurus',
            logo: {
              alt: 'Docusaurus Logo',
              src: 'img/docusaurus.svg',
              srcDark: 'img/docusaurus_keytar.svg',
              target: '_self',
              className: 'navbar__logo__custom',
              style: {
                maxWidth: 42,
              },
            },
          },
          footer: {
            style: 'dark',
          }
        };
    

    This config object is then possibly used in the ThemedImage component located at packages/docusaurus-theme-classic/src/theme/ThemedImage/index.tsx as it's quite noticeable from the code that it's using the props to assign values to the HTML properties to create and return an <img> tag. This entire process is possibly one way of them allowing users to construct an img tag. The code for the component is below,

    import React from 'react';
    import {ThemedComponent} from '@docusaurus/theme-common';
    import type {Props} from '@theme/ThemedImage';
    
    export default function ThemedImage(props: Props): 
    JSX.Element {
      const {sources, className: parentClassName, alt, 
    ...propsRest} = props;
      return (
        <ThemedComponent className={parentClassName}>
          {({theme, className}) => (
            <img
              src={sources[theme]}
              alt={alt}
              className={className}
              {...propsRest}
            />
          )}
        </ThemedComponent>
      );
    }
    
  2. Besides, as I was looking for the implementation for my feature, I found other things as well that would relate to the feature. For example, admin/scripts/resizeImage.js. The code in this file deals with resizing an image in terms of width and height. It also checks whether the image path is valid and whether the file extensions are jpg, jpeg or png among other validations and functionalities that it provides.

  3. Also, I found a plugin named transformImage located at packages/docusaurus-mdx-loader/src/remark/transformImage which possibly is the closest link to the feature implementation that I'm looking for so far. As I was reading code for its test file located at packages/docusaurus-mdx-loader/src/remark/transformImage/__tests__/index.test.ts, I saw that it had a very specific test case titled transform MD images to <img /> and thus I started digging deeper into the code. From my investigations about the code, it seems that they're testing against an MD file entirely covered with images in markdown syntax located at packages/docusaurus-mdx-loader/src/remark/transformImage/__tests__/__fixtures__/img.md to parse and convert contents of the file and match them with a snapshot file located at packages/docusaurus-mdx-loader/src/remark/transformImage/__tests__/__snapshots__/index.test.ts.snap. The snapshot file has the images converted to HTML's <img> tags. As I was trying to figure out how the parsing of the MD to HTML was performed, I noticed that they were using an external dependency called remark. It allows you to parse MD to HTML seems like, from its documentation. I couldn't find the usage of remark to parse MD to HTML, in the transformImage package itself outside of the test case, which was bizarre, unless I missed it. Test case calls a function called processFixture() which performs MD to HTML conversion and returns the HTML content, whose code is below,

    const processFixture = async (
      name: string,
      options: Partial<PluginOptions>,
    ) => {
      const {remark} = await import('remark');
      const {default: mdx} = await import('remark-mdx');
      const filePath = path.join(__dirname, 
    `__fixtures__/${name}.md`);
      const file = await vfile.read(filePath);
    
      const result = await remark()
        .use(mdx)
        .use(plugin, {siteDir: __dirname, staticDirs: [], 
    ...options})
        .process(file);
    
      return result.value;
    };
    

Experience

One thing that I noticed was that all of my findings came from reading the test files. I don't know what's the reason for that but it's quite interesting to me. The first time that it happened to me, I got my probable answer through reading test files. Also, this was a refresher of jest for me. It's been a while since I read test code written in jest and it was nice to visit jest again.

Besides, this experiment didn't go as planned. I wasn't expecting the codebase to be this challenging for me but I learned about how to navigate through a large codebase. It felt like I was hunting for a feature in a black box. I asked for help in Docusaurus' discord channel too and a member commented on some parts of the code that I should start looking which was a good start for me. All of the above findings took me about a day's worth of effort split into 2 days since I was quite frustrated trying to find the feature on 1st day so I put myself away from it to start afresh the next day. I feel sad and heartbroken as I write this blog, thinking I failed to decipher the end-to-end feature implementation because the explanations that I gave above are my assumptions. I can't say for sure that I'm looking at the right place for its implementation or that's how it is actually implemented. I really wished I could've found out how they glued all their packages together and make use of them to utilize this tiny feature out of many other large features that the tool provides. Nevertheless, I'm keeping my hopes high. I'm proud of myself of what I've accomplished and will continue to plough through new challenges that open source world has to throw at me, one challenge at a time.

P.S.

One of the moderators on Docusaurus' discord channel replied to my question, Hi, I wanted to know how is the markdown image conversion in the form of ![alt text](link to image) is done to <img> tags. Where can I find the code that does end to end processing for this in the repo?. They said, You can't it's inside the markdown renderer which is a third party library, we don't handle this ourselves since it's core markdown syntax. They replied to my follow-up question You mean the remark? as somewhere in the core remark plugins - maybe remark-gfm - maybe one of the others. This confirms my 3rd finding that after all, they were actually using a third party library, essentially a markdown processor, which ChatGPT also suggested, could possibly be the way that they're handling MD to HTML processing.

Digging deeper into this, I found out that there's a collective known as unifiedjs which manages plugins that revolve around creating and manipulating content. It does this by taking Markdown, HTML, or plain text prose, turning it into structured data, and making it available to over 100 plugins. Plugins for example do tasks such as spellchecking, linting, or minifying. 2 popular plugins provided by unifiedjs are remark & rehype which convert markdown to markdown and html to html respectively. Eventually, they created a mix of these 2 ecosystems known as remark-rehype to parse markdown to html explained through the usage. I see quite a lot of unifiedjs' plugins being used throughout the Docusaurus' codebase. The closest I got on how this feature is implemented is documented in 3rd finding where they literally parsed an entire MD file full of images to an HTML file using remark. So chances are that remark uses remark-rehype underneath to convert the MD into HTML considering all these plugins are part of unifiedjs acting as a family.

Top comments (0)