DEV Community

How I made a web scraper because LinkedIn

Ricardo A. Mercado on December 26, 2018

Having lots of LinkedIn connections can be convenient for many people. You and your connection agreed to be connected through the platform, thus ...
Collapse
 
thebouv profile image
Anthony Bouvier

1) Why? What use is having all of their emails? Especially 2000+ of them at once? Maybe this is why LinkedIn stopped exporting that data?

2) You know you broke the User Agreement, right?

linkedin.com/legal/user-agreement and search for "scrape".

I'm a big fan of scrapers. I've written tons of them too.

But you have to pay attention to TOS/EULAs/etc.

Collapse
 
futoricky profile image
Ricardo A. Mercado

1) If you can't think of a use of having all of their emails, doesn't mean there aren't uses for having them.

2) I guess they'll have to suspend/ban me.

Collapse
 
thebouv profile image
Anthony Bouvier

1) I didn't say there aren't uses. I asked what yours was. Since we're having a technical discussion, I figured the typical "why am I doing this" would be a good part of the back and forth. As you mentioned in the article you think they used to export this info, but stopped. So maybe this is a time to step back and say "should I?". Also a healthy part of the discussion.

2) I suppose. Rather, I think it'd be best to once again examine the possible why and note that you are purposefully breaking an agreement you signed up for. For a fun comparison, what are the terms of service or user agreement used by AccountBerry? Do you have a similar agreement that might not allow for scraping either? And what if someone did anyway? You may not notice, but what if you did because they coded in error and slammed your system?

Like I've said, I've created lots of spiders/bots/scrapers. It is fun. And there are great reasons to make them.

But a discussion of the ethics of building them to use to scrape data from sites that you agreed not to scrape is an interesting article-worthy thing to think about. Hopefully an aspiring scraper-maker reads your article and this discussion and keeps it in mind.

Thread Thread
 
futoricky profile image
Ricardo A. Mercado

1) Can't be too specific, but is for data analytics purposes. Why wouldn't they want them to be exported if I could get them by going to each connection one by one manually? The scraper basically automates that tedious process. I mean, connections agreed to share certain info, and email is just one of that information (they could even set it so the email is not shown).

2) I completely understand your point and I agree completely. I did break the agreement unknowingly (until you pointed it out), but there was no malicious intent. I only automatized a process I am allowed to do manually. I find that if you write some code to automatize a process you can achieve manually, then there shouldn't be no restriction to it. It's like a post I read yesterday, a person had 400 unread messages and couldn't select them all to mark them as read, so he just opened the dev tools and wrote a simple code to loop through all the messages and click them. My response "I guess they'll have to suspend/ban me." is based on that what is done is done.

Maybe adding "For educational purposes" changes the whole context of what is written?

Thread Thread
 
ermirbeqiraj profile image
Ermir Beqiraj

Lol.. "For educational purposes" & "Don't try this at home, especially in the kitchen"

Collapse
 
aarmora profile image
Jordan Hansen

I actually loved this. Nice article.

So do you perform a login with Nightmarejs and then just search from there?

I realize it's against TOS but I do believe it's still legal

arstechnica.com/tech-policy/2017/0...

The above article says you're good legally but I believe anything behind a password is where the line is drawn. I'm not sure if that means other people's passwords (hacking their accounts?) or your own. I've taken the former approach and I think the use you are doing is a perfect example of something that would be legal. You have access to all of the data already, this just speeds it up.

Anyway, great article!

Collapse
 
futoricky profile image
Ricardo A. Mercado • Edited

Thanks!

Yeah you are prompted to fill in your personal LinkedIn credentials. The script logs you in and gets the emails from your personal connections. It's basically automizing a process I could do manually.

Collapse
 
lilmissblockchain profile image
lilmissblockchain

Very good article Ricardo. Thank you.

Collapse
 
krusenas profile image
Karolis

I had a similar needs few months ago :) I created a chrome extension to accomplish several things for me:

  1. Search for people that I would like to connect and connect
  2. Endorse all their skills

It was quite an interesting exercise for me as I haven't tried developing browser extensions before. Also, I have never encountered any rate limiting so I deem browser extensions to be quite safe to use.

Collapse
 
futoricky profile image
Ricardo A. Mercado

Awesome! Is the chrome extension public?

Collapse
 
lilmissblockchain profile image
lilmissblockchain • Edited

Sounds super interesting, would love to read a blog about this.

Collapse
 
turnerj profile image
James Turner

FYI, it seems that LinkedIn does actually allow you to download emails via the CSV you mentioned however each connection must opt-in for that.

LinkedIn Email Settings

Collapse
 
futoricky profile image
Ricardo A. Mercado

Interesting! Thanks for pointing this out.

Collapse
 
crewxx profile image
Crewxx • Edited

PLEASE SORRY FOR THE DUMB QUESTION, AS YOU KNOW NOT ALL IS TECH SAVVY, I'M JUST IN NEED OF GETTING MY CONTACT WHICH IS STRESSFUL GETTING THEM ONE AFTER THE OTHER. PLEASE I HAVE BEEN TRYING TO FIGURE OUT THE PROCESS IN MAKING THE CHANGES YOU TALKED ABOUT BUT I HAVE NO IDEA ON STEPS TO TAKE.

PLEASE KINDLY WORK ME THROUGH THE PROCESS, A DIRECTION OF WHERE TO CHECK TO CONFIRM THE LINKEDIN CHANGES AND REPLACING WOULD BE REALLY APPRECIATED.

Collapse
 
stealthmusic profile image
Jan Wedel

You seem to have accidentally enabled your Caps lock...

Collapse
 
crewxx profile image
Crewxx

Nope, not really just wanted a bold text. Any help please, thanks.

Collapse
 
misterhtmlcss profile image
Roger K.

While I kind of agree, I also don’t agree.

I also don’t connect with people I don’t know and it has nothing to do with his behaviour, but a matter of practice self harm reduction. If any one of my connections minus the recruiters of course were to do the same as the author I would assume it’s a reasonable use case and be fine with it.

Fundamentally I have no issue with someone wanting the access they were granted, but if you connect with randoms then you get what you get. Maybe it’s a little like dating ;)

Collapse
 
lilmissblockchain profile image
lilmissblockchain

Didn't we.ver date start out as a random connection though
🤔

Collapse
 
itsasine profile image
ItsASine (Kayla)

Email isn't a completely unused field, though it looks like they only provide publically available emails rather than any ones you're privy to as a connection.

I downloaded my 216 connections and had 1 email address (a chronic startup founder, so he wants to be seen) and 1 completely empty line other than connection date. I just reused that field as one for describing, manually, how I know them since for some awful reason LinkedIn removed the ability to tag people.

Collapse
 
zatrix_za profile image
ZatriX • Edited

Hm... Firstly - thanks a lot, Ricardo!

Some code needed to be changed indeed, to account for renamed fields, but then it did start working.

The problem I'm having atm, however, is it seems to get stuck after scraping about 180 records (see screen). It gives a few errors extracting (emails exist on the profile) and then just sits there.

Any ideas?

screen

Collapse
 
tahakucukcom profile image
Taha Yasin KÜÇÜK

TL;DR was too long 😀

Collapse
 
taviroquai profile image
Marco Afonso

Nightmare vs cheeriojs?

Collapse
 
futoricky profile image
Ricardo A. Mercado

Either, the one that works best for what you need to do. I used nightmare because it was the first one that came to mind.

Collapse
 
crewxx profile image
Crewxx

Script is no longer working, tried it out and all instructions were duly followed suit but the folder for supposed scrap list is empty. Any tips on how to get it working would be great.

Thanks

Collapse
 
futoricky profile image
Ricardo A. Mercado

I'll check it out. The issue is this is a scraper, so if linkedin updates their page and changes the class of an element used in the script it will stop working. You can check out the source code and verify if any class has changed on linkedin

Collapse
 
crewxx profile image
Crewxx

PLEASE POINT ME TO THE RIGHT DIRECTION SO I CAN BE ABLE TO CHANGE WHAT YOU MENTION. I HAVE SEARCH TRYING NOT TO BOTHER YOU, BUT I DON'T GET IT BECAUSE I AM NOT A PROGRAMMER, JUST A REGULAR USER.

PLEASE HELP ME OUT DISTINGUISHED

Collapse
 
lilmissblockchain profile image
lilmissblockchain

Or, attach your spam email account to Linkedin. I say yes to everyone, knowing that most of my data on Linkedin never points to me.