Element(ary), my dear Watson.

#opensource

Today I took on the task of fixing my html output files for the Release 0.1 of textToHTML, my command line SSG (static site generator) tool.

A couple days ago I thought I had it working because I tested some txt files I wrote and the paragraph HTML tags seemed to be around each line.

Then... I tested the Sherlock Holmes texts to see how it handled large data. And it turns out my program was creating a paragraph tag around every single line. To fix this, it was required to ask "what is a paragraph?". A paragraph is a line of text that is followed by 2 newlines rather than another line of text.

So using a combination of split on the string, and map was the solution:
const html = data.split(/\r?\n\r?\n/)
.map(para =>
<p>${para.replace(/\r?\n/, ' ')}</p>
).join(' ');

To break it down line by line:

split the data by 0 or 1 occurrences of \r, followed by \n, followed by 0 or 1 occurrences of \r, followed by \n. This covers all the combinations of \r and \n we could have. Returns an array.
Map the returned array, and replace within the paragraph string any occurrence of either \r and \n together, or just \n, with a space.
.join(' ') makes it so that our HTML doesn't contain "," separating the paragraph tags as if it were an array. This makes it 1 string separated by spaces.
const html will contain our returned paragraphs.

It was fun seeing a use of regex like that in web development, considering I am used to seeing them in Linux and bash scripts hehe.

Seems to be working good... to be improved. :)

All the best friends!!!

Andre Willomitzer

DEV Community

Element(ary), my dear Watson.

Top comments (0)

Read next

Zoraxy vs Nginx Proxy Manager

NVIDIA Labs developed SANA model weights and Gradio demo app published —Check out this amazing new Text to Image model by NVIDIA

How to deploy Qwen2.5-Coder-32B-Instruct in the Cloud?

Open-Source != Transparency