Early on in the pandemic, I decided that I wanted a way to track the moving average of cases per day in my state, Mississippi, since that wasn't so...
Some comments have been hidden by the post's author - find out more
For further actions, you may consider blocking this person and/or reporting abuse
Great work Kayla!
A quick tip. If you use the async-await pattern, you don't need the ".then" pattern.
The problem with this approach is that it blocks all the next await chains. Instead, opt for something like this.
You could do that in certain cases. But there are only 4 of them. Plus by doing it serially to the same domain, you're not overwhelming the one server endpoint in this case.
Actually, as a robustness strategy, with multiple calls to the same domain, you typically want to insert manual delays to prevent a server rejections. I'll actually place a "sleep" between calls. It's not a race to get the calls done as fast as possible. The goal is robustness.
Also, this is a background job, so saving a second or two is not a critical performance criteria here. In this case one request isn't dependent on another so it's fine, but if it was you'll want to run them serially.
Also with Promise.all, if one fails, they all fail. It's less robust. With the serial approach each request is atomic and can succeed on it's own. I.E. Getting 3 out of 4 successful results is better than getting 0 out of 4 successes, even though some of them succeeded.
Also you now have an array of resolved promises that you still have to loop through and process. With the serial approach, it was done in 4 lines. Much easier to grok. Much easier to read. Much easier to debug.
If I had to do dozens of requests that had no order dependency on each other, and they all went to various domains, and 100% of them had to succeed otherwise all fail, then Promise.all is certainly the way to go. If you have 2 or 3 or 4 requests, there's really no compelling benefit. Default to simple.
So there are pros and cons to each approach to consider.
Thanks for the input!
Ahan, thank you for presenting the case in such detail. These things never occurred to me and now I know better. :)
The point about robustness is valid but for the sake of it, I'll mention that the potential issue with
Promise.all
can be avoided by usingPromise.allSettled
instead.Good to know! Thanks for the tip! Iāll change that in my code!
This is a really nice write-up and a great idea.
Are you running Ubuntu or some other Linux version as your desktop OS? Just curious.
I run Ubuntu 20.04 (I switched to Ubuntu about 2 years ago, after using windows for over 25 years...seriously).
Thanks for sharing this interesting article.
Thanks! Nope, Ubuntu was just the default option in that YAML file, and I didnāt change it because it worked just fine. Iām an OSX person.
Tip, there's no point in in using a try-catch if you're just going to re-throw the error. In other words, this
is functionally identical to
Tip, if youāre going to give unsolicited advice on someoneās code in a blog post, at least comment about the core of the post. Like, āhey great job! This is something you could do instead:ā or āhey this blog sucked and hereās why:ā
I see your account is brand new, so Iām hoping to help educate you on a little bit of etiquette in regards to constructive criticism. Hope that helps!
Well seeing as you asked, here's why this blog sucks:
A web crawler is not a technically challenging task. As we can see from the post above, it's just a combination of basic web programming steps such as making HTTP requests and HTML parsing.
That it took someone whose core competency is front-end development 9 years to tackle something so straightforward is fairly risible. I don't see how you can claim to be a "senior-ish" front-end engineer yet be apparently ignorant of core concepts such as error-handling and asynchronous programming.
Hope this helps!
Thanks for reading!
I had to write some crawlers when I first started. It definitely is one of those things you start out having no clue how to do it and thinking it's going to be extremely tough.
I like the simple approach you took to store the data in a JSON file and use Github Actions to update it. It's nice when we can keep things simple instead of spinning up databases and complex infrastructure.
Also, congratulations on getting over the intimidation of doing this. Us senior developers definitely do get afraid of tasks. We need to remember that we can always learn how to do something, just like how we didn't know how to write code before we were developers.
Definitely! Keeping it simple was really important to me because I didnāt want to spend a long time getting it set up or potentially spending longer maintaining it.
Thanks! Yeah this is one thatās been haunting me for sure, so it gave me a good confidence boost! And thatās very true!
Thanks for reading!
Thank you so much for sharing, I have been really curious about writing cron's, it was so helpful to read through your thought process, and it seems a lot less intimidating now
Awesome!!! So glad I could help. Thank you for reading!
I used cheerio a while but realized when processing a lot of pages it's very slow.
I'd recommend using regex I cut down processing time by 80%.
Good job though it's a great start!
Good to know! Iām just collecting data from one page for now, but Iāll definitely keep that in mind. Thank you!
As you say, I've not done it and thank you for showing how!!
Thank you for reading! š
Amazing tutorial Kayla. And congratulations for getting over the intimidation you had to live with for 9yrs! It's a win and I'm happy you got over it.
Thank you so much! Iām happy I got over it too. š
That's awesome! I love your initiative. Also, am glad that cases seem to be going down lately.
Thanks! Yeah, Iām pretty thankful for that. I believe we have more people vaccinated than have tested positive in Mississippi at this point, so Iām optimistic!