DEV Community

Cover image for Crowdsourcing the evolution of text parsing with unified
Merlijn Vos
Merlijn Vos

Posted on

Crowdsourcing the evolution of text parsing with unified

New projects and how you, like ZEIT and Gatsby, can help out

Written by Merlijn and Titus, with help of Jen, John, and countless others. Originally posted on Medium.

unified's interface used to transform markdown to HTML while adding other features unified is a tool for manipulating content using syntax trees. MDX is an extension that lets you use markdown in JSX. micromark is a new parser we're planning to make manipulation super fast. Today, we're announcing the unified collective to fund the development of all three. And we'd love you to get involved.

First, what's unified?

unified is a friendly interface backed by an ecosystem of plugins built for creating and manipulating content. unified does this by taking markdown, HTML, or plain text prose, turning it into structured data, and making it available to over 100 plugins. Tasks like text analysis, preprocessing, spellchecking, linting, and more can all be done through compatible tools, and even chained together.

This and more is possible thanks to unified's plugin pipeline, which lets you typically write one line of code to chain a feature into this process. It's also possible to stitch together content from different sources and output it as a single source.

Bottom line: with unified, you don't manually handle syntax or parsing.

How it's used today

unified has been actively worked on over the last five years, but this year, it has gotten a lot of traction. It’s used to create websites like freeCodeCamp, Node.js, and WebFundamentals (Google). It’s powering new exciting projects like Gatsby to pull in markdown, MDX to embed JSX, and Prettier to format it. It’s used to check code for Storybook, debugger.html (Mozilla), (GitHub), and many more.

To further speak to one's imagination, here are the more common plugins used by the above projects to do interesting things:

From a maintainer's perspective, all this new traction comes with an immense amount of customer support, where maintainers are spending their evenings responding to questions posed as issues. The stress that comes with working on highly used open source ecosystems and the ever-increasing amount of issues results in more time spent on maintaining existing code, instead of creating new things.

Announcing the unified collective

Organisations under the unified umbrella
Organisations under the unified umbrella

Today, we are pleased to announce the creation of the unified collective. It's an effort to bring together like-minded organisations to collaboratively work on the innovation of content through seamless, interchangeable, and extendible tooling. We build parsers, transformers, and utilities so that others don't have to worry about syntax. We make it easier for developers to develop.

The humans

The people who will initially be taking a lead in advancing the unified collective are Titus (@wooorm, original author of unified), John (@johno, original author of mdx), Stephan (@zcei), Merlijn (@murderlon), Richard (@richardlit), Victor (@vhf), Mudit (@zeusdeux), Christian (@christianmurphy), and, …you?

Of course, we also want to thank all the lovely contributors across the ecosystem who have helped us to even get to this point by reporting issues, writing utilities and plugins, and submitting all kinds of improvements!

To be able to deliver on our mission, we need to start maintaining unified in a sustainable way, create a better ecosystem, and grow by adding new projects. We're doing just that today: unified is expanding, with MDX and micromark.

MDX joins forces with unified

Next to existing low-level organisations under unified - such as remark for markdown, rehype for HTML, retext for natural language - we're excited to announce that we are partnering with high-level projects as well. MDX is joining unified 🎉

A large part of MDX's success has been leveraging the unified and remark ecosystem. I was able to get a prototype working in a few hours because I didn't have to worry about markdown parsing: remark gave it to me for free. It provided the primitives to build on. It makes sense for these projects to come together and make each other better.

John Otander, author of mdx-js/mdx

MDX is powerful. It's markdown for the component era. It lets you write JSX embedded inside markdown. That's a great combination because it allows you to use markdown's often terse syntax (such as # heading) for the little things and JSX for more advanced components. MDX is useful for a JAMStack application, injecting dynamic data into a document, or building slides in mdx-deck.

mdx-deck by Brent Jackson
mdx-deck by Brent Jackson

Introducing micromark

micromark is a new, tiny, and fast, markdown parser written in TypeScript under the unified umbrella - micromark/micromark.

We believe evolving unified shouldn't just be about new high-level features, like MDX, but also about rethinking core mechanisms. That's where micromark comes in.

In March 2019 markdown will be turning 15. Over the years it has become ubiquitous, but as it wasn't formally specified, many flavours emerged. Most of these flavours continue to serve their purpose but ever since GFM (GitHub Flavored Markdown) settled on using CommonMark as a base, it became more or less the de facto style.

The original, and CommonMark as well, focused on making writing websites as easy as writing an email. Nowadays, markdown is used to do all kinds of different things. It's used to create slides or to generate man pages. It's supported in major CMS's and is the language most developers document their code in. Things like Gatsby and MDX attest to the fact that this syntax is reaching a new era.

A new project is needed to support standards like CommonMark and GFM but also support extensions like MDX, while still being fast, small, and modern.

Something like remark, but on a lower level: a lexer (in nerdy terms 🤓). Syntax trees have many good things, but they do come with the downside of having a big memory footprint and sometimes being more than what you need.

We're launching micromark as just an idea. The first line of code still needs to be written. But we imagine it to be:

  • small in file size, max 10 kB minzipped, and tiny in memory use
  • fast in speed, compared to existing parsers on real world documents
  • safe to use, it should safely work on untrusted content by default
  • compliant to CommonMark but extendible for GFM, MDX, etc.
  • complete, in that it should give access to all info in the source document

But it's not:

  • something that creates HTML and the like: other projects use micromark for that
  • something that creates a syntax tree: remark will use it to do just that

micromark will likely not be something you'd directly interact with, unless you're interested in working on parsers, but it will make high-level tooling better.

Be part of the change

We're invested in making unified and the ecosystem under it better. We believe micromark should exist. And we need your help.

For example, you could contribute in the following ways:

  • Use the projects, and let us know through spectrum or GitHub issues what was hard to figure out, so we can improve the docs
  • Discuss. Just excited but want to keep it simple for now? Head over to spectrum and start a conversation!
  • Fix existing issues. You can check out all the open issues in the ecosystem. All suggestions are welcome, no matter how small!
  • Submit new ideas. unified and the organisations in it have dedicated repositories for new ideas
  • Support us financially by becoming a backer or sponsor on Open Collective

Being an open collective

Open Collective allows unified to collect money from backers and sponsors in a transparent way. We need your support…

  • to pay out core maintainers for project leadership
  • to finance non-coding work, like technical writing, community consulting, etc.
  • to get our remote team together in real life
  • to do fun things for the community, such as getting stickers to people that contribute

Both individuals and companies can back our mission. You can help make unified sustainable by becoming a backer, starting at $2 per month, or an official unified sponsor, starting at $100 per month. As our way of saying thanks, we list backers and sponsors on our main GitHub repositories. Sponsors will also appear on and get a shout-out on Twitter. 🥈 Silver ($500+) and 🥇 Gold ($1000+) sponsors additionally get access to help chats with core maintainers.

The early and amazing sponsors of unified through Open Collective.
The early and amazing sponsors of unified through Open Collective.

The early and amazing sponsors of unified through Open Collective.
We're super excited that ZEIT, Gatsby, Compositor, and Holloway, are helping us to become sustainable.

Join our early sponsors in sustaining the future of unified on open collective.

This is just the beginning

With our early sponsorship we'll be able to make the ecosystem better starting today. micromark will go into development shortly and it should be ready on markdown's birthday, March 15, 2019. In the meantime we hope to be as transparent as possible on what we will be doing and you can expect more blog posts to keep you in the loop. For more information, find us on GitHub and visit If you have any questions already you can ask them on spectrum or tweet to us @unifiedjs.

These are exciting times for unified and open source in general. We strive to improve the quality and possibilities of the organisations that make up a sustainable unified collective. Rethinking its core with micromark and joining with high-level organisations like MDX, are the first two steps we're taking to do just that.

Together, thanks to sponsors, we can build the most friendly, secure, fast, and extensive bridges between content formats.

Top comments (0)