Today I took on the task of fixing my html output files for the Release 0.1 of textToHTML, my command line SSG (static site generator) tool.
A couple days ago I thought I had it working because I tested some txt files I wrote and the paragraph HTML tags seemed to be around each line.
Then... I tested the Sherlock Holmes texts to see how it handled large data. And it turns out my program was creating a paragraph tag around every single line. To fix this, it was required to ask "what is a paragraph?". A paragraph is a line of text that is followed by 2 newlines rather than another line of text.
So using a combination of split on the string, and map was the solution:
const html = data.split(/\r?\n\r?\n/)
.map(para =>
<p>${para.replace(/\r?\n/, ' ')}</p>
).join(' ');
To break it down line by line:
- split the data by 0 or 1 occurrences of \r, followed by \n, followed by 0 or 1 occurrences of \r, followed by \n. This covers all the combinations of \r and \n we could have. Returns an array.
- Map the returned array, and replace within the paragraph string any occurrence of either \r and \n together, or just \n, with a space.
- .join(' ') makes it so that our HTML doesn't contain "," separating the paragraph tags as if it were an array. This makes it 1 string separated by spaces.
- const html will contain our returned paragraphs.
It was fun seeing a use of regex like that in web development, considering I am used to seeing them in Linux and bash scripts hehe.
Seems to be working good... to be improved. :)
All the best friends!!!
Andre Willomitzer
Top comments (0)