Skip to content
loading...

How would you build a Medium-esque highlighting feature?

twitter logo github logo ・1 min read  

Medium's highlighting feature is pretty cool and there's even more you could do with this on a platform like dev.to in my opinion. But I've long been unsure about how you'd go about building it with these things in mind:

  • What happens if the author edits the highlight? Does it remain with some form of fuzzy-matching? At what point is a passage fully changed?
  • How would you approach collisions? Like, I highlight this half of the passage and you highlight another. This is something I've wondered about Genius's annotations.

I haven't used these features on Medium/Genius much, so maybe the solution is clearer to the user, but either way I'd love some thoughts about how this would be done.

twitter logo DISCUSS (16)
markdown guide
 

How would you approach collisions? Like, I highlight this half of the passage and you highlight another. This is something I've wondered about Genius's annotations.

Haven't used them as much either but from the little I have used, here's how I would theoretically approach different highlighting collisions. Let's have User A highlight Passage A and User B highlight Passage B in this example. We will assume a table or storage medium that allows for storing of highlights with extra information such as comments

When a user highlights a section, you store the sub-string of what was highlighted as well as the starting point & end point for each highlight sub-string in our 'database'. You then highlight the entire area (by simply checking if highlights stack up with no breaks up until where there is a break is where you end the highlight) that has been highlighted using one highlight (whether there's multiple highlights or not). Store the entire highlighted area. When a user hovers over the highlighted area; You check which highlights are within the range of the highlighted area and simply display the substring and whoever highlighted it, if it is in range.

For a passage
She sells sea shells at the sea shore.

Let's assume User A highlighted She sells seashells at the and User B highlighted seashells at the seashore.

We would store highlight1 as having a start_index=0; end_index=x (bummer, this was a pain to count so I just used x, might edit in the actual value)

  • We would also store any information related to that highlight here

same thing for highlight2

How we would end up highlighting the whole string would be something close to

//Tired at the moment, 3.23am here, will attempt writing pseudocode when I can

but basically that was my thought process for the collision problem. Of course feel free to share any issues that might arise in this approach so that we can arrive at a better solution

 

I think that store the whole string is a bad practice since it would increment the database size exponentially.

I know that if you don't store the string you couldn't find where was the annotation if the author decides to change the content but I don't find it that important compared to the storage cost it would provoke plus the reprocess you would have to do refinding the annotations each time an author edits the content.

 

That's reasonable. The algorithm should probably do a decent job of taking care of the easy cases and raise the right question if it's not sure. It's mostly edge cases where this would come into play anyway.

 

I think the one on medium is actually really cool. It allows readers to highlight snippets that really stand out and gives an excellent way of sharing said snippet (twitter etc). Oftentimes, when shared on social media, these highlights are what draws people to the post more than the original meta description. They also allow readers to identify with particular points in the copy which I find really nice.

 

I think one of the most important things to take care of is how you store the content in the DB. For example, store the posts in some kind of text field with a bunch of markdown text or use some other protocol or format like WordPress using a mix of HTML and shortcodes (keywords). Ghost has changed the way it stores its content from a whole huge markdown field to a structured object using the mobiledoc standard.

From here you can start thinking how that info is going to change.

My guess is that Medium stores the position of the user annotations (first and last char pos) but the platform stores the content in a structured way like mobiledoc, so you can expose a link to a concrete paragraph of that content. By this way, if the author edits the article you can detect if the annotated paragraph has changed and opt to remove (or not) the annotation display int he content (the reference to that paragraph on the shared annotation links would still working).

Without a structured object having only a huge text field I would reference the whole article in the shared annotations link and I would invalidate the display of previous annotations if the author edits it.

For the collisions, Kindle does that stuff too saying how many people have highlighted some text.

I think that the best approach is to have some rules that would trigger a visual annotation in the content, like to have at least 5 annotations between similar ranges/positions and once this condition has being accomplished get the minimum start position and the maximum end position of that annotation group to generate a display annotation avoiding generate annotations with more length than 250 chars (for example).

So you would have the real annotations made by users in one place and display annotations (generated from the real ones) for visual purposes in another place, the last ones could even be referenced by URL.

 

I've never even figured out how the feature works. I think one of my popup blockers breaks it.

 
 

What's really annoying about the feature is you can only highlight a paragraph at a time. This is fine if the content of interest is not split but I've found a few people adding code or diagrams which splits a section up meaning you can't highlight the whole area of interest.

This is one of my biggest gripes with Medium highlights. Just tried to highlight more than one paragraph which it wouldn't let me do. It seems like an arbitrary limit to me.

Yep exactly. They want to stop people highlighting the whole thing I expect

 

Take a look at Hypothes.is, it's an open-source project to bring annotations to the whole web. The core parts are the client (for creating and display of annotations) and the h server for storing annotations.

Hypothes.is contributes to the Web Annotation standard, which attempts to solve exactly these kinds of issues you've mentioned. The basic idea is that the annotation is an entity, which marks a part of a document using the combination of selectors, see Web Annotation Model for specifics and Web Annotation WG for more W3C recommendations.

Ideally there should be an open ecosystem for web annotations instead of proprietary, locked-in solutions like Genius, or site-specific, like Medium.

 

After finding some way to store it, use html diffing to convert location data after edits are saved. Here's an html diffing implementation i ported to vanilla js that could give you a start: github.com/frattaro/htmldiff.js

Overlapping might be a pain to code but should be doable. Maybe use data attributes on the wrapper tags.

 

As you said, you didn't really used to feature, me either, so, do people actually want it?

 

I'm not looking to necessarily implement the exact same feature, but the technical considerations are about the same either way. I thought this was the easiest way to describe it.

Whether or not it lands in the app, I don't want it to be because I don't know how.

Classic DEV Post from Feb 24

How Do You Automate Your Boilerplate?

My team is reviewing how we start-up our projects, a large time sink for us is the initial set-up for...

Ben Halpern profile image
A Canadian software developer who thinks he’s funny. He/Him.