DEV Community

Discussion on: Automate Your Job Search: Scraping 400+ LinkedIn Jobs with Python

Collapse
 
denys_bochko profile image
Denys Bochko

I have been looking into a programmatic approach into job search, very much like what you are posting here, but always get into the issue of scrapping in terms of danger of being locked out or blocked by Indeed and LinkedIn.
Doing it from the same IP that you use to login, how does it work for you?

I am definitely going to give it a try, that would simplify the search by far.

Collapse
 
franciscomoretti profile image
Francisco Moretti

I've just pushed version 0.1.4 that connects with jobspy proxy option:
E.g. --proxies '208.195.175.46:65095' --proxies '208.195.175.45:65095'

Thanks for your comment, I totally missed this possibility.

Collapse
 
olddutchcap profile image
Onorio Catenacci

This would be my concern as well; besides the fact that scraping the site might be violating the robots.txt convention too. Doesn't LinkedIn at least support some sort of approved API for job listings? I mean if they do and they're charging for using the API then scraping the site definitely seems unethical and possibly slightly illegal.

Collapse
 
denys_bochko profile image
Denys Bochko

Well, Job listings and job search are two different things. They surely do support job listings API as there are tons of other HR systems that post to linkedin.

It's all about money and since they have promoted job listings, LInkedin does not have a publicly available api to search for jobs, forcing people to use their site with very broken search.
I disagree about unethical since it's not their proprietary information and basically public posting.

Thread Thread
 
olddutchcap profile image
Onorio Catenacci

This is the plain language of their robots.txt:

"# Notice: The use of robots or other automated means to access LinkedIn without

the express permission of LinkedIn is strictly prohibited.

See linkedin.com/legal/user-agreement.

LinkedIn may, in its discretion, permit certain automated access to certain LinkedIn pages,

for the limited purpose of including content in approved publicly available search engines.

If you would like to apply for permission to crawl LinkedIn, please email whitelist-crawl@linkedin.com.

Any and all permitted crawling of LinkedIn is subject to LinkedIn's Crawling Terms and Conditions.

See linkedin.com/legal/crawling-terms."

Like it or not, unless you've got their permission to scrape the site, you're violating their robots.txt. That's at least unethical and probably illegal (although I don't believe they'd bother to sue anyone).

Thread Thread
 
denys_bochko profile image
Denys Bochko

I don't disagree with what you said.

Thread Thread
 
lico profile image
SeongKuk Han

As far as I know, this is an ethical concern. If we ignore, we might get some restrictions. But I think, we need to think, why they restricted in the first place, let's think about why. They noted that to revent softwares doing something harmful by scarpping their website. But if it provides one of the ways to use the website then, it would be good for both providers like linkedin and clients.

Thread Thread
 
denys_bochko profile image
Denys Bochko

I am not sure what you mean by "get some restrictions", it is a closed system, I don't see a way to have more restrictions.
Why is clear. LinkedIn has promoted jobs that will appear on your search no matter what you search. It's their income. It is a company, it's understandable that is it about money. If they provided an API to search for jobs there would be a ton of recruiting services to offer proper search to their clients and LI would loose their promotional income.
What I don't like is that LI has a massive amount of jobs that they are keeping within their system without providing a proper way to search them. I think that's unfair.
I am good to use their search and notifications if they were properly done and useful, especially when IT industry is a mess

Thread Thread
 
Sloan, the sloth mascot
Comment deleted
 
lico profile image
SeongKuk Han • Edited

Yes, I see. I was thinking of small supports. I think you’re right. That would be loss for both, providers, and clients, if users don’t use the service as the provider expected. Thanks for sharing your thought πŸ‘