DEV Community

I wrote a crawler for the first time.

Kayla Sween on February 12, 2021

Early on in the pandemic, I decided that I wanted a way to track the moving average of cases per day in my state, Mississippi, since that wasn't so...

Read full post

Functional Javascript • Feb 12 '21

Great work Kayla!

A quick tip. If you use the async-await pattern, you don't need the ".then" pattern.

const getCovidData = async () => {
  try {
    daily.newDeaths = await getDailyDeaths();
    daily.testsRun = await getTestsRun();
    daily.newCases = await getDailyCases();
    daily.totalCases = await getTotalCases();

//....

Muhammad Hasnain • Feb 13 '21

The problem with this approach is that it blocks all the next await chains. Instead, opt for something like this.

const promises = [
  getDailyDeaths(),
  getTestsRun(),
  getDailyCases(),
  getTotalCases(),
];

// If you don't want to use await just go with .then in the line below
const resolvedPromises = await Promise.all(promises);

Functional Javascript • Feb 13 '21

You could do that in certain cases. But there are only 4 of them. Plus by doing it serially to the same domain, you're not overwhelming the one server endpoint in this case.

Actually, as a robustness strategy, with multiple calls to the same domain, you typically want to insert manual delays to prevent a server rejections. I'll actually place a "sleep" between calls. It's not a race to get the calls done as fast as possible. The goal is robustness.

Also, this is a background job, so saving a second or two is not a critical performance criteria here. In this case one request isn't dependent on another so it's fine, but if it was you'll want to run them serially.

Also with Promise.all, if one fails, they all fail. It's less robust. With the serial approach each request is atomic and can succeed on it's own. I.E. Getting 3 out of 4 successful results is better than getting 0 out of 4 successes, even though some of them succeeded.

Also you now have an array of resolved promises that you still have to loop through and process. With the serial approach, it was done in 4 lines. Much easier to grok. Much easier to read. Much easier to debug.

If I had to do dozens of requests that had no order dependency on each other, and they all went to various domains, and 100% of them had to succeed otherwise all fail, then Promise.all is certainly the way to go. If you have 2 or 3 or 4 requests, there's really no compelling benefit. Default to simple.

So there are pros and cons to each approach to consider.
Thanks for the input!

Muhammad Hasnain • Feb 14 '21

Ahan, thank you for presenting the case in such detail. These things never occurred to me and now I know better. :)

Robloche • Feb 17 '21

The point about robustness is valid but for the sake of it, I'll mention that the potential issue with Promise.all can be avoided by using Promise.allSettled instead.

Kayla Sween • Feb 12 '21

Good to know! Thanks for the tip! I’ll change that in my code!

Comment hidden by post author - thread only accessible via permalink

Dónal • Feb 12 '21

Tip, there's no point in in using a try-catch if you're just going to re-throw the error. In other words, this

try {
  doStuff()
} catch (err) {
  throw err
}

is functionally identical to

doStuff()

Kayla Sween • Feb 13 '21

Tip, if you’re going to give unsolicited advice on someone’s code in a blog post, at least comment about the core of the post. Like, “hey great job! This is something you could do instead:” or “hey this blog sucked and here’s why:”

I see your account is brand new, so I’m hoping to help educate you on a little bit of etiquette in regards to constructive criticism. Hope that helps!

Comment marked as low quality/non-constructive by the community. View Code of Conduct

Dónal • Feb 13 '21

Well seeing as you asked, here's why this blog sucks:

A web crawler is not a technically challenging task. As we can see from the post above, it's just a combination of basic web programming steps such as making HTTP requests and HTML parsing.

That it took someone whose core competency is front-end development 9 years to tackle something so straightforward is fairly risible. I don't see how you can claim to be a "senior-ish" front-end engineer yet be apparently ignorant of core concepts such as error-handling and asynchronous programming.

Hope this helps!

Kayla Sween • Feb 13 '21

Thanks for reading!

raddevus • Feb 12 '21

This is a really nice write-up and a great idea.

Are you running Ubuntu or some other Linux version as your desktop OS? Just curious.
I run Ubuntu 20.04 (I switched to Ubuntu about 2 years ago, after using windows for over 25 years...seriously).
Thanks for sharing this interesting article.

Kayla Sween • Feb 12 '21

Thanks! Nope, Ubuntu was just the default option in that YAML file, and I didn’t change it because it worked just fine. I’m an OSX person.

Kevin Hicks • Feb 21 '21

I had to write some crawlers when I first started. It definitely is one of those things you start out having no clue how to do it and thinking it's going to be extremely tough.

I like the simple approach you took to store the data in a JSON file and use Github Actions to update it. It's nice when we can keep things simple instead of spinning up databases and complex infrastructure.

Also, congratulations on getting over the intimidation of doing this. Us senior developers definitely do get afraid of tasks. We need to remember that we can always learn how to do something, just like how we didn't know how to write code before we were developers.

Kayla Sween • Feb 23 '21

Definitely! Keeping it simple was really important to me because I didn’t want to spend a long time getting it set up or potentially spending longer maintaining it.

Thanks! Yeah this is one that’s been haunting me for sure, so it gave me a good confidence boost! And that’s very true!

Thanks for reading!

pashacodes • Feb 22 '21

Thank you so much for sharing, I have been really curious about writing cron's, it was so helpful to read through your thought process, and it seems a lot less intimidating now

Kayla Sween • Feb 23 '21

Awesome!!! So glad I could help. Thank you for reading!

Rafael • Feb 12 '21 • Edited

That's awesome! I love your initiative. Also, am glad that cases seem to be going down lately.

Kayla Sween • Feb 12 '21 • Edited

Thanks! Yeah, I’m pretty thankful for that. I believe we have more people vaccinated than have tested positive in Mississippi at this point, so I’m optimistic!

Michael Hungbo • Feb 12 '21

Amazing tutorial Kayla. And congratulations for getting over the intimidation you had to live with for 9yrs! It's a win and I'm happy you got over it.