DEV Community

loading...

DEV 2020 Year-in-Review: Scraping data using the Console

tylerlwsmith profile image Tyler Smith ・6 min read

The year is almost over, and I wanted to see how many posts I wrote in 2020 and how many views/comments/reactions they received. DEV doesn't offer this as a feature, so I hacked together a script in the Dev Tools to help me find out.

This post will show you how to use JavaScript's query selector methods in combination with the map(), filter() and reduce() array methods to pull data off of a page and process it into the following shape:

{reactions: 23, comments: 4, views: 740, numberOfStories: 5}
Enter fullscreen mode Exit fullscreen mode

The query selector methods are tightly coupled to the page's markup, so if DEV changes the class names it uses on the Dashboard page in the future, this code may break. However, all these techniques will still work, and you can adapt the code as the markup changes in the future.

This article will assume you have a functional knowledge of CSS selectors and you are using the Chrome browser.

Step 1: Go to your DEV dashboard

On the top right corner of DEV, click on your profile icon, then click "Dashboard" to go to your dashboard page.

Find the class name of the stories

On the Dashboard page, right click on the page and then click "Inspect" in the context menu. This will open the browser Dev Tools.

Once in the Dev Tools, click the "select element" icon on the top-left on the Dev Tools dashboard, then click on the containing element for one of the stories. You'll be able to see your selected element in the dashboard.

Select element instructions

We can see that the stories all have the class name dashboard-story. We'll need that in the next step.

Make an array of stories in the console

On the top of the Dev Tools dashboard, click "Console." The console gives you an interactive environment to write JavaScript commands.

Open your console

We want to use JavaScript to get all of the elements with the dashboard-story class. We'll use the querySelectorAll method.

The document.querySelectorAll() method gives you a jQuery like API to get all of the HTML elements matching a CSS selector.

Type the following into the console. Press shift + enter for new lines, then press enter to execute the final code.

let stories = document.querySelectorAll('.dashboard-story');
console.log(posts);
Enter fullscreen mode Exit fullscreen mode

In the console, you'll see that the querySelectorAll() method returns a NodeList data structure that contains all of your stories. However, NodeLists don't let us use array methods like map(), filter() and reduce(). We must convert the NodeList into an array. We can do this by spreading the NodeList into an array.

let stories = [...document.querySelectorAll('.dashboard-story')];
console.log(posts);
Enter fullscreen mode Exit fullscreen mode

This returns an array of all the story elements.

Removing unpublished posts

We only want stats on our published posts, so we want to remove unpublished posts from our array. While we could do that with JavaScript filtering, it's easiest to remove them by changing our selector string.

If you don't have any posts in draft, just follow along using the screenshot below.

Finding a draft post

In the Dev Tools "Elements" tab, inspect an unpublished post. You can see in the sidebar that unpublished posts have the class story-unpublished on them. Back in the Console, we'll update our querySelector string to exclude unpublished posts using :not().

let publishedStories = [...document.querySelectorAll('.dashboard-story:not(.story-unpublished)')];
console.log(publishedPosts);
Enter fullscreen mode Exit fullscreen mode

Now your array will only include published posts.

Creating data objects from the markup

We have our published stories in an array, but we need to get data out of the markup. We'll use JavaScripts map() method, which creates a new array based on running each array item through a function.

let publishedStories = document.querySelectorAll('.dashboard-story:not(.story-unpublished)');

let storyData = [...publishedStories].map(function(story) {
    return {
        title: story.querySelector('.dashboard-story__title').innerText,
        published: story.querySelector('time').dateTime,
        reactions: story.querySelector('[title="Reactions"]').innerText,
        comments: story.querySelector('[title="Comments"]').innerText,
        views: story.querySelector('[title="Views"]').innerText,
    }
});

console.log(storyData);
Enter fullscreen mode Exit fullscreen mode

One of the powerful features of the querySelector() and querySelectorAll() methods is that you can use it on any DOM element to query its children.

Here, we are using querySelector, which returns the first matching element, and we're using class, element, and attribute selectors to get the data we need from each story. Then we are returning the text or datetime from each element. Use the inspector to find these HTML elements and their corresponding attributes.

Mapping through this returns an array of data objects from each story.

Filtering to stories published in 2020

Now we need to filter to stories published in 2020. The filter() array method lets you evaluate an expression as true or false, and create a new array from any item that evaluates as true.

We will convert the timestamp into a JavaScript date object so we can easily extract the year then compare it to this year.

// Include the previous code...
let storiesFrom2020 = storyData.filter(function(story) {
    let publicationYear = new Date(story.published);
    return publicationYear.getFullYear() === 2020;
});
console.log(storiesFrom2020);
Enter fullscreen mode Exit fullscreen mode

This will leave you with any array that only includes stories from 2020.

Reducing your data to the totals

We now have the data from each story published in 2020 as an object, but now we want the totals. We can use JavaScripts reduce() array method to create a new object with totals.

The reduce() array method passes each item of an array to a function that does a transformation on it, then returns a result that can be any shape: a string, a number, a new array, etc. That result is then passed into the next call of the reduce function, through an accumulator. The reduce() method also takes in an initial value for the accumulator.

We'll use it to return a totals object, which adds together the totals from each post.

Look at the code below:

// Include the previous code...
let totals = storiesFrom2020.reduce((accumulator, current) => {
    return {
        reactions: accumulator.reactions + +current.reactions,
        comments: accumulator.comments + +current.comments,
        views: accumulator.views + (Number.isNaN(+current.views) ? 0 : +current.views),
    }
}, {
    reactions: 0,
    comments: 0,
    views: 0,
});

console.log(totals);
Enter fullscreen mode Exit fullscreen mode

There are a few "gotchas" here that the code is handling:

  1. Initial value. We need to explicitly pass in the initial value of what we want our returned object to look like.
  2. String to number conversion. See the plus signs in front of the current values? Those are taking the string of a number (example: "42") and converting it to a proper number to ensure that it adds instead of concatinates.
  3. Unknown view counts. Any post with less than 25 views will display as "< 25". This is not a number, so we'll use the ternary operator to set this to zero if it is unknown.

Finally, we can add the total number of posts from the year by adding a numberOfStories key to the totals object.

totals['numberOfStories'] = storiesFrom2020.length;
console.log(totals);
Enter fullscreen mode Exit fullscreen mode

Putting it all together.

Here is the complete code you'll end up with in the console:


let publishedStories = document.querySelectorAll('.dashboard-story:not(.story-unpublished)');

let storyData = [...publishedStories].map(function(story) {
    return {
        title: story.querySelector('.dashboard-story__title').innerText,
        published: story.querySelector('time').dateTime,
        reactions: story.querySelector('[title="Reactions"]').innerText,
        comments: story.querySelector('[title="Comments"]').innerText,
        views: story.querySelector('[title="Views"]').innerText,
    }
});

let storiesFrom2020 = storyData.filter(function(story) {
    let publicationYear = new Date(story.published);
    return publicationYear.getFullYear() === 2020;
});

let totals = storiesFrom2020.reduce((accumulator, current) => {
    return {
        reactions: accumulator.reactions + +current.reactions,
        comments: accumulator.comments + +current.comments,
        views: accumulator.views + (Number.isNaN(+current.views) ? 0 : +current.views),
    }
}, {
    reactions: 0,
    comments: 0,
    views: 0,
});

totals['numberOfStories'] = storiesFrom2020.length;
console.log(totals);
Enter fullscreen mode Exit fullscreen mode

Here's how I did in 2020:

{reactions: 193, comments: 52, views: 8269, numberOfStories: 14}
Enter fullscreen mode Exit fullscreen mode

I want to reiterate: this is closely tied to DEV's markup and will almost certainly break in the future. There are likely also cases that this script doesn't handle that would cause it to fail. However, you can use the steps from this post to adjust the script as the site changes.

I hope you found this helpful. Feel free to ask questions below or even leave a comment with your 2020 stats!

Discussion (0)

pic
Editor guide