DEV Community

loading...
Cover image for Scrape your Dev.to pageviews with Rust

Scrape your Dev.to pageviews with Rust

deciduously profile image Ben Lovy ・2 min read

Here's a quick 'n' dirty way to dump your new-fangled post analytics to a CSV using Rust. You have to save the page source to src/page.html. Y'know, for graphs and stuff. Who doesn't like graphs?

This ain't polished - It was my "one-hour-before-my-day-job-starts" project today. Snag the regex for your own real version, or improve this one and show me!

extern crate chrono;
extern crate csv;
#[macro_use]
extern crate lazy_static;
extern crate regex;
extern crate select;
extern crate serde;
#[macro_use]
extern crate serde_derive;

use chrono::prelude::*;
use regex::Regex;
use select::{
    document::Document,
    predicate::{Class, Name},
};
use std::{
    error::Error,
    fs::{File, OpenOptions},
};

lazy_static! {
    static ref NOW: DateTime<Local> = Local::now();
    static ref STAT_RE: Regex = Regex::new(".+?([0-9]+).+//.?([0-9]+).+//.?([0-9]+).+").unwrap();
}

#[derive(Debug, Serialize)]
struct Record {
    time: String,
    title: String,
    views: i32,
    reactions: i32,
    comments: i32,
}

impl Record {
    fn new(time: String, title: String, views: i32, reactions: i32, comments: i32) -> Self {
        Self {
            time,
            title,
            views,
            reactions,
            comments,
        }
    }
}

fn write_entries(rs: Vec<Record>, f: File) -> Result<(), Box<Error>> {
    let mut wtr = csv::Writer::from_writer(f);
    for r in rs {
        wtr.serialize(r)?;
    }
    wtr.flush()?;
    Ok(())
}

fn scrape_page(doc: &Document) -> Result<Vec<Record>, Box<Error>> {
    let mut ret = Vec::new();
    for node in doc.find(Class("dashboard-pageviews-indicator")) {
        let text = node.text();
        if STAT_RE.is_match(&text) {
            let title = node
                .parent()
                .unwrap()
                .parent()
                .unwrap()
                .find(Name("a"))
                .next()
                .unwrap()
                .find(Name("h2"))
                .next()
                .unwrap()
                .text();
            for cap in STAT_RE.captures_iter(&text) {
                let r = Record::new(
                    NOW.to_rfc2822(),
                    title.clone(),
                    cap[1].parse::<i32>()?,
                    cap[2].parse::<i32>()?,
                    cap[3].parse::<i32>()?,
                );
                ret.push(r);
            }
        }
    }
    Ok(ret)
}

fn run() -> Result<(), Box<Error>> {
    let doc = Document::from(include_str!("page.html"));
    let file = OpenOptions::new()
        .write(true)
        .create(true)
        .append(true)
        .open("stats.csv")?;
    let entries = scrape_page(&doc)?;
    write_entries(entries, file)?;
    Ok(())
}

fn main() {
    if let Err(e) = run() {
        eprintln!("Error: {}", e);
        ::std::process::exit(1);
    }
}


edit finished off the error handling

Discussion (11)

pic
Editor guide
Collapse
ben profile image
Ben Halpern

You may have given me an excuse to finally execute some Rust code! I’ve done lots of reading but haven’t actually tried using Rust yet.

Collapse
deciduously profile image
Ben Lovy Author

Evangelizing: complete.

Collapse
ben profile image
Ben Halpern

I'll make a post about my hello DEV scrapity-scrape world experience once I'm done. 🙂

Collapse
nektro profile image
Meghan (she/her)

Adding the view count to https://dev.to/deciduously/scrape-your-devto-pageviews-with-rust-2dgc.json?

Collapse
deciduously profile image
Ben Lovy Author

That's a good idea!

Collapse
antonrich profile image
Anton

Ben do you have interesting Rust open source project on your radar? Something that you yourself learn from or contribute to?

I recently googled "Emacs written in Rust" and found this one remacs.

Collapse
deciduously profile image
Ben Lovy Author

Remacs is what I would have suggested! I haven't gotten too involved but there a lot of little bite-sized translations of all the Lisp functions from C to look at. There also always the Servo project too, and ripgrep

Collapse
antonrich profile image
Anton

Another one that's interesting is bat (as an alternative to cat).

Collapse
antonrich profile image
Anton

Thanks, this is interesting.

Collapse
deciduously profile image
Ben Lovy Author

Full docs for select.rs here - viable alternative to, say, Python in my opinion

Collapse
jbull328 profile image
John Bull

This is really cool Ben thanks for sharing!