Processing unstructured content into a clean, well-formatted output is a common problem in software development. One elegant solution is the pipeline programming pattern. A pipeline organizes data transformations into discrete, reusable steps, making the process more maintainable, testable, and extensible.
In this article, we’ll implement a realistic example: transforming a messy email containing images, links, and text into a structured blog post.
The Scenario
We receive an email with:
- Images mixed with text
- Links to various websites
- Embedded YouTube links
- Inconsistent HTML formatting
Our goal:
- Extract all images and store them in a gallery.
- Detect links and turn them into clickable links, or embed if from special providers like YouTube.
- Extract the clean, plain text body without formatting.
Why Pipelines Work
Instead of writing one large function that handles everything, we’ll create a sequence of small processors. Each processor:
- Takes a standard input format
- Outputs a standard format
- Passes the result to the next processor
This structure makes the process easier to read, debug, and extend.
Implementation
Here’s an example in modern JavaScript:
javascript
class Pipeline {
constructor(steps = []) {
this.steps = steps;
}
add(step) {
this.steps.push(step);
return this;
}
run(input) {
return this.steps.reduce((data, step) => step(data), input);
}
}
// Processing steps
function extractImages(data) {
const images = [...data.raw.matchAll(/<img[^>]+src="([^">]+)"/g)]
.map(m => m[1]);
return { ...data, images };
}
function extractLinks(data) {
const links = [...data.raw.matchAll(/https?:\/\/[^\s<]+/g)]
.map(m => m[0]);
return { ...data, links };
}
function embedSpecialLinks(data) {
const embeddedLinks = data.links.map(link => {
if (link.includes("youtube.com")) {
return `<iframe src="${link}" frameborder="0"></iframe>`;
}
return `<a href="${link}">${link}</a>`;
});
return { ...data, embeddedLinks };
}
function cleanText(data) {
const text = data.raw.replace(/<[^>]*>/g, "").trim();
return { ...data, text };
}
// Build the pipeline
const contentPipeline = new Pipeline()
.add(extractImages)
.add(extractLinks)
.add(embedSpecialLinks)
.add(cleanText);
// Example usage
const rawEmail = `
<p>Hello <b>World</b></p>
<img src="photo1.jpg" />
https://youtube.com/watch?v=abc123
https://example.com
`;
const result = contentPipeline.run({ raw: rawEmail });
console.log(result);
/*
{
raw: "...",
images: ["photo1.jpg"],
links: [
"https://youtube.com/watch?v=abc123",
"https://example.com"
],
embeddedLinks: [
"<iframe src=\"https://youtube.com/watch?v=abc123\" frameborder=\"0\"></iframe>",
"<a href=\"https://example.com\">https://example.com</a>"
],
text: "Hello World https://youtube.com/watch?v=abc123 https://example.com"
}
*/
Top comments (0)