Justin Mancinelli

Posted on Jan 23, 2020 • Edited on Jun 19, 2023 • Originally published at piannaf.com

113

The Dev.to Feed Algorithm 🤖

#meta #rails #beginners #webdev

TL;DR👇

I used to develop apps. I still do, but I used to, too

Back in 2007/2008, I learned Ruby on Rails and developed two prototype sites that didn't end up in production. Since then, I did extensive work on non-Ruby, non-Rails server applications and learned enough about Android and iOS apps to manage the development of mobile apps in my current role.

I never touched Ruby on Rails again...until @anshbansal asked a question that I had asked myself a few times before.

What is the algorithm for dev.to's feed?

Aseem Bansal ・ Jan 12 '20 ・ 1 min read

#discuss #help

The following is my deep dive into the dev.to codebase to answer this question. There are probably a few things wrong, please point them out in the comments so I can correct them. Thank you.

Start at the beginning

And it doesn't get much earlier than the root route
root "stories#index"

Taking control

Rails follows a Model View Controller (MVC) architecture. When you ask dev.to to show you the root page, it will ask the stories controller to run the index action.

What we see there is it sets up a bunch of state then renders the articles/index template
render template: "articles/index"

Show me the stories

If you inspect your dev.to home screen, you'll notice all the articles/stories are listed within an articles-list div. You can find it in the articles/index view as expected.

And here's where we start to see how the feed is populated.

OK, first show me the featured story

The first story in the article list is a featured story.

The algorithm to get the featured story for a logged in user comes from the stories controller and the articles/index view. I've simplified it by substituting some variables and reorganizing some statements.

@stories = Article.published.limited_column_select.page(1).per(35)
@stories = @stories.
  where("score > ? OR featured = ?", 9, true).
  order("hotness_score DESC")
offset = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
          1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 
          5, 6, 7, 8, 9, 10, 11].sample # random offset, weighted more towards zero
@stories = @stories.offset(offset)

@featured_story = @stories.where.not(main_image: nil).first&.decorate || Article.new

In English:

Fetch a collection of stories that score above 9 or are featured

Order them, starting with the "hottest" one

Randomly skip the first 0 to 11 stories, weighted more towards 0

The featured story is the first story that has a main image

Leaving how score, featured, and hotness are determined as an exercise for the reader

Notice the featured article has nothing to do with which people, organizations, or tags you follow.

Now show me the rest of the stories?

After rendering the featured story, the article/index view creates a substories div and then renders the stories/main_stories_feed partial
<%= render "stories/main_stories_feed" %>

These are not the divs you are looking for

I was scratching my head while reading through the _main_stories_feed partial

It populates the data attributes of a new-articles-object div and a home-articles-object div, then a bunch of other divs that have no contents. And the divs I do see when inspecting the home screen have the single-article single-article-small-pic class, but don't look like what's in this file.

Evil action-at-a-distance like this can only mean one thing: JavaScript

Nobody expects the Spanish Inquisition

Searching the repo for new-articles-object and home-articles-object, we find them both in initializeFetchFollowed Articles, called very early when a page is initialized.

And there is a lot of logic here which I did not expect.

The new stories are not the old stories

The stories controller populated the @stories collection used for the for the featured story. It is also used to populate the the data attributes of the home-articles-object div. But that comes next, not now.

Instead, The first stories we see after the feature article are, populated from a query directly in the view.

@new_stories = Article.published.
  where("published_at > ? AND score > ?", rand(2..6).hours.ago, -15).
  limited_column_select.
  order("published_at DESC").
  limit(rand(15..80))

In English:

Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15

Order them by most recent first

Return the first 15 to 80 of them

Then the JavaScript function insertNewArticles takes over:

articlesJSON.forEach(function(article){
      var articlePoints = 0
      var containsUserID = findOne([article.user_id], user.followed_user_ids || [])
      var containsOrganizationID = findOne([article.organization_id], user.followed_organization_ids || [])
      var intersectedTags = intersect_arrays(user.followed_tag_names, article.cached_tag_list_array)
      var followedPoints = 1
      var experienceDifference = Math.abs(article['experience_level_rating'] - user.experience_level || 5)
      var containsPreferredLanguage = findOne([article.language || 'en'], user.preferred_languages_array || ['en']);
      JSON.parse(user.followed_tags).map(function(tag) {
        if (intersectedTags.includes(tag.name)) {
          followedPoints = followedPoints + tag.points
        }
      })
      articlePoints = articlePoints + (followedPoints*2) + article.positive_reactions_count
      if (containsUserID || article.user_id === user.id) {
        articlePoints = articlePoints + 16
      }
      if (containsOrganizationID) {
        articlePoints = articlePoints + 16
      }
      if (containsPreferredLanguage) {
        articlePoints = articlePoints + 1
      } else {
        articlePoints = articlePoints - 10
      }
      var rand = Math.random();
      if (rand < 0.3) {
        articlePoints = articlePoints + 3
      } else if (rand < 0.6) {
        articlePoints = articlePoints + 6
      }
      articlePoints = articlePoints - (experienceDifference/2);
      article['points'] = articlePoints
    });
    var sortedArticles = articlesJSON.sort(function(a, b) {
      return b.points - a.points;
    });
    sortedArticles.forEach(function(article){
      var parent = insertPlace.parentNode;
      if ( article.points > 12 && !document.getElementById("article-link-"+article.id) ) {
        insertArticle(article,parent,insertPlace);
      }
    });

In English:

Give each article 0 points to start off with

Sum the weight of each tag (which can also be negative) the user follows and this article is tagged with, then double it

Now add to that, the number of positive reactions the article currently has

If the user follows the article's author, or is the articles author, add 16 points

If the user follows the article's organization, add 16 points

If the article is written in the user's language, add 1 point, otherwise, subtract 10 points

Randomly (with equal chance) give the article an extra 0, 3, or 6 points.

Subtract half the difference of this articles experience level vs the user's experience

Order the articles by most points first

If the article has more than 12 points, show it to the user

What about the rest?

The next batch of initialized articles come from the same batch we got the featured article from and processed by a new (but familiar) algorithm in insertTopArticles.

When you get to the bottom of that list, articles are populated from an algoliasearch index of ordered articles. The definition of that index is found in the Article model.

Finally, scrolling kicks in which you can find in initScrolling.js.erb and populates more articles from the algoliasearch index.

Leaving the details of these as an exercise for the reader

TL;DR

For the first article in the list:

Fetch a collection of stories that score above 9 or are featured

Order them, starting with the "hottest" one

Randomly skip the first 0 to 11 stories, weighted more towards 0

The featured story is the first story that has a main image

For the next batch of articles:

Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15

Order them by most recent first

Return the first 15 to 80 of them

Give each article 0 points to start off with

Sum the weight of each tag (which can also be negative) the user follows and this article is tagged with, then double it

Now add to that, the number of positive reactions the article currently has

If the user follows the article's author, or is the articles author, add 16 points

If the user follows the article's organization, add 16 points

If the article is written in the user's language, add 1 point, otherwise, subtract 10 points

Randomly (with equal chance) give the article an extra 0, 3, or 6 points.

Subtract half the difference of this articles experience level vs the user's experience

Order the articles by most points first

If the article has more than 12 points, show it to the user

If you've scrolled passed all of those,

Using the same collection the featured article came from

Process with a similar but different algorithm as the previous batch

And, finally

All articles ordered by hotness

Closing remarks

This could change at any time. For example, on 2019-09-19, @ben merged a PR to add more variation to home feed. All links to github are to the commit that I saw which was in master at the time of writing but, by the time you read this, master has probably moved on.

Top comments (23)

Ben Halpern • Jan 23 '20

This is really timely because we've just begun the phase of overhauling this. @nickytonline and @joshpuetz should check this out 😄

Justin Mancinelli • Jan 23 '20

Ha, really glad I added the disclaimer

by the time you read this, master has probably moved on.

Really happy you all keep iterating on every aspect, with community involvement 🙌, to keep improving

Madza • Jan 23 '20 • Edited

Reminds me of Jose Aguinaga's famous article about JS development.
First line: No new frameworks were made during the writing.
Top comment below: I highly doubt that.

Nick Taylor • Jan 23 '20

Thanks for sharing Justin. 🔥

Madza • Jan 23 '20 • Edited

Someone should do this about YouTube recommendations algo.

Emma Goto 🍙 • Jan 23 '20

Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15

I set up an RSS feed from my website and was using that to publish to DEV, but since I don't have timestamps set up properly on my RSS feed when it published to DEV it would immediately show as published "20 hours ago". Once I started going in and manually modifying the timestamps it seems like more people have been viewing my posts. I guess this is why!

Justin Mancinelli • Jan 24 '20

Definitely. This reminds me of a PR I saw not too long ago

Ability to backdate a post #3455

janedotbiz commented on Jul 11, 2019

Is your feature request related to a problem? Please describe. Unable to change publish date. I personally wish to back date a post, but there is no way to set the publish date (or time) for a post.

Describe the solution you'd like Add a custom variable for publish_date

Describe alternatives you've considered Time travel?

Additional context In lieu of being able to delete/edit comments on an old post I have duplicated and republished it as a new post, but the date does not/cannot reflect the origin publish date.

Semi related to #3274 and #1363

View on GitHub

And @jess posed a great question

if a post is backdated, would we still surface it as new content?

Josh Puetz • Jan 28 '20

Just wanted to add another thanks for this deep dive @piannaf as I've been referencing it over the past day. First up is getting all of these pieces in the same place: while technically someone could change the feed algorithm right now, one would need to change code in multiple places. That's part of what we're trying to improve!

Justin Mancinelli • Jan 28 '20

Wow, thanks! That's an unintended side-effect I'm really glad has been beneficial.

Aseem Bansal • Jan 23 '20

Thank you for taking the time to answer the question in such detail. I believe many users were expecting only to view the tags, users they are following chronologically perhaps like a RSS feed.

Justin Mancinelli • Jan 23 '20

Yeah, when I first joined, that's what I expected "feed" to mean. Pretty quickly discovered that wasn't the case. But I've been happy with the recommendations because I like seeing things outside my chosen bubble from time to time.

Can understand, though people getting upset if they put -100 on a tag and still saw it anywhere