DEV Community

Cover image for Scraping Linkedin Data with Proxycurl, Python Program, and Nodejs
Anuoluwapo Balogun
Anuoluwapo Balogun

Posted on

2

Scraping Linkedin Data with Proxycurl, Python Program, and Nodejs

Today, I want to show you how you can scrape data from linkedin using Proxycurl api, Python programming and nodejs.

Let's scrape data using python programming and the library request.

I am going to use the Proxycurl Company api to get the Employee Count Endpoint

install the package request

!pip install requests
Enter fullscreen mode Exit fullscreen mode

let's get our Proxycurl api create an account with Proxycurl and generate your api.

Let's count the number of employees working at Apple.inc

Using the library

import requests

api_endpoint ='https://nubela.co/proxycurl/api/linkedin/company/employees/count/'

api_key = 'YOUR_API_KEY_HERE'

header_dic = {'Authorization': 'Bearer ' + api_key}

params = {
    'linkedin_employee_count': 'include',
    'employment_status': 'current',
    'url': 'https://www.linkedin.com/company/apple/',
}

response = requests.get(api_endpoint,
                        params=params,
                        headers=header_dic)

Enter fullscreen mode Exit fullscreen mode

The output response is:

{
'total_employee': 94262,
'linkedin_employee_count': 567686,
'linkdb_employee_count': 94262
}

Let's try to count the number of employees working at twitter

import requests

api_endpoint = 'https://nubela.co/proxycurl/api/linkedin/company/employees/count/'
api_key = '3HqZGXdoejPB8YYT4KRb3w'
header_dic = {'Authorization': 'Bearer ' + api_key}
params = {
    'linkedin_employee_count': 'include',
    'employment_status': 'current',
    'url': 'https://www.linkedin.com/company/twitter/',
}
response = requests.get(api_endpoint,
                        params=params,
                        headers=header_dic)

Enter fullscreen mode Exit fullscreen mode

The output is

{'total_employee': 7472,
'linkedin_employee_count': 7992,
'linkdb_employee_count': 7472
}

You can try this with as many companies as possible

Next let's try scraping data from linkedin using Proxycurl and Nodejs

  • Create a folder directory
cd c:\\User\user\Folder name
Enter fullscreen mode Exit fullscreen mode
  • Build file package
npm install express axios dotenv

or with Yarn

yarn add express axios dotenv
Enter fullscreen mode Exit fullscreen mode
API_KEY = 'YOUR_API_KEY_HERE'
Enter fullscreen mode Exit fullscreen mode
  • Code snippet
import express from 'express';
import axios from 'axios';
import dotenv from 'dotenv';

const app = express();

dotenv.config();

app.listen(8000, () => {
    console.log('App connected successfully!');
});
Enter fullscreen mode Exit fullscreen mode
// Getting Company's job listing

const TWITTER_URL = 'https://www.linkedin.com/company/twitter/';  // Line 1

const COMPANY_PROFILE_ENDPOINT = 'https://nubela.co/proxycurl/api/linkedin/company';

const JOBS_LISTING_ENDPOINT = 'https://nubela.co/proxycurl/api/v2/linkedin/company/job';

const JOB_PROFILE_ENDPOINT = 'https://nubela.co/proxycurl/api/linkedin/job';

const companyProfileConfig = {  // Line 2
    url: COMPANY_PROFILE_ENDPOINT,
    method: 'get',
    headers: {'Authorization': 'Bearer ' + process.env.API_KEY},
    params: {
    url: TWITTER_URL
  }
};

const getTwitterProfile = async () => {  // Line 3
    return await axios(companyProfileConfig);
}

const profile = await getTwitterProfile();

const twitterID = profile.data.search_id;

console.log('Twitter ID:', twitterID);


const jobListingsConfig = {
    url: JOBS_LISTING_ENDPOINT,
    method: 'get',
    headers: {'Authorization': 'Bearer ' + process.env.API_KEY},
    params: {
    search_id: twitterID // Line 4
    }
}

const getTwitterListings = async () => { // Line 5
     return await axios(jobListingsConfig);
}

const jobListings = await getTwitterListings();

const jobs = jobListings.data.job;

console.log(jobs);
Enter fullscreen mode Exit fullscreen mode
// Specific Job listing code snippet

const jobProfileConfig = {
    url: JOB_PROFILE_ENDPOINT,
    method: 'get',
    headers: { 'Authorization': 'Bearer ' + process.env.API_KEY },
    params: {
        url: jobs[0].job_url   // Line 1
    }
};

const getJobDetails = async () => {  // Line 2
    return await axios(jobProfileConfig);
};

const jobDetails = await getJobDetails(); 

console.log(jobDetails.data);

Enter fullscreen mode Exit fullscreen mode

How the package.json should look like;

{
  "name": "nubela",
  "version": "1.0.0",
  "type": "module",
  "description": "",
  "main": "proxycurl.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "axios": "^1.1.3",
    "dotenv": "^16.0.3",
    "express": "^4.18.2"
  }
}
Enter fullscreen mode Exit fullscreen mode

You can try scraping any data of your choice from Linkedin using Proxycurl Api

References
Proxycurl API
Proxycurl Documentation
Node js
Proxycurl Writer

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay