Interview Prep: Converting Plain-text to Markup

Introduction

In this post I'll go over a question I received in a tech interview. It's a great question that ties together some trends I've uncovered going through prep on LeetCode, InterviewCake, and other practice sites.

The goal is to write a function, which takes in a string of characters, such as 'ABCDE', and an object representing markup for that string, such as {'b': [[0,2]], 'i': [[1,3]]}, where the keys represent the markup codes and the values represent start/stop positions for that markup. The expected output of the function for the two input values provided should be "ABCDE"

Approach

We will ultimately loop through the raw string and insert the markup open/close tags specified in the input object.

But there are two challenges which stand out to me with this problem.

The first is that, because of the structure of markup, we will sometimes need to close one tag before we can close another, and that information isn't necessarily going to be encoded in the input object which specifies markup positions. Consider the AB pattern from the example, where the  tag needed to be closed and re-opened to accommodate the closing of the  tag. A stack is an ideal data structure to model markup tags that are currently open, since markup tags need to be closed in a first-in-last-out pattern. In my solution, I implement the stack with an array that I pop off of.

The second challenge is that the structure of the input object that specifies markup positions is not currently a great interface. It looks like the engineer who wrote the function that 'encoded' the rich text optimized to make it easy to encode the positions of markups, at the cost of making it more difficult to 'decode' markups on our end. When we go to loop through the raw string, we want to know is there a relevant tag for this character? We would be better served by an object that is structured so open/close positions are keys and markup or tags; said another way, we want to invert this dictionary. There are many ways to do this. I opted to restructure the dictionary so it had openPositions and closePositions, e.g. {'b': [[0,2]], 'i': [[1,3]]} would become { openPositions: { 0: ["b"], 1: ["i"] }, closePositions: { 2: ["b"], 3: ["i"] }

Solution

You can checkout and fork my solution (with tests) at this CodePen, or read the code below:

const convertRichTextToHTML = (rawString, markupDictionary) => {
  const positionDictionary = translateMarkupDictionary(markupDictionary);
  let htmlOutput = "";
  const unclosedTags = [];

  for (let i = 0; i < rawString.length; i++) {
    let openTagsAtPosition = positionDictionary.openPositions[i];
    let closeTagsAtPosition = positionDictionary.closePositions[i];
    if (openTagsAtPosition || closeTagsAtPosition) {
      let lastUnclosedTag = unclosedTags[unclosedTags.length - 1] || null;
      if (openTagsAtPosition) {
        for (const tag of openTagsAtPosition) {
          unclosedTags.push(tag);
          htmlOutput += `<${tag}>`;
        }
      }
      if (closeTagsAtPosition) {
        for (const tag of closeTagsAtPosition) {
          if (lastUnclosedTag && lastUnclosedTag === tag) {
            htmlOutput += `</${tag}>`;
            unclosedTags.pop();
          }
          if (lastUnclosedTag && lastUnclosedTag !== tag) {
            htmlOutput += `</${lastUnclosedTag}></${tag}><${lastUnclosedTag}>`;
          }
        }
      }
    }
    htmlOutput += rawString[i];
  }
  return cleanUp(htmlOutput);
};

const translateMarkupDictionary = (markupDictionary) => {
  const translation = { openPositions: {}, closePositions: {} };
  for (const tag in markupDictionary) {
    let tagPositions = markupDictionary[tag];
    for (const position of tagPositions) {
      if (translation.openPositions[position[0]]) {
        translation.openPositions[position[0]].push(tag);
      } else {
        translation.openPositions[position[0]] = [tag];
      }
      if (translation.closePositions[position[1]]) {
        translation.closePositions[position[1]].push(tag);
      } else {
        translation.closePositions[position[1]] = [tag];
      }
    }
  }
  return translation;
};

const cleanUp = (htmlOutput) => {
  htmlOutput = htmlOutput.replace("<i></i>", "");
  htmlOutput = htmlOutput.replace("<b></b>", "");
  htmlOutput = htmlOutput.replace("<u></u>", "");
  return htmlOutput;
};

Conclusion

There are of course libraries that do a great job of converting between rich text and markup, but the great joy of this community is to take things apart for yourself and put them back together. If you wanted to take my approach further, consider: how would you implement anchor links?