In today's tutorial, Iβm going to show you how to efficiently scrape LinkedIn for up-to-date job postingsβstraight from the source! π Whether you're a developer looking to automate your job search or simply want a powerful tool for staying ahead of the competition, this guide has got you covered.
What's more, the core logic behind this process is the same technology powering my app, CoverAI: AI-Powered Cover Letter Generator. π
With CoverAI, not only can you instantly generate tailored cover letters, but you can also integrate your resume and analyze job postings to extract key detailsβsaving you time while increasing your chances of landing that dream job. π
If youβre ready to streamline your application process and make every job application count, check out CoverAI
We're going to use Jsoup, Java HTML parser that is fully compatible with Kotlin
Setup
libs.versions.toml
[versions]
jsoup = "1.17.2"
[libraries]
jsoup = { module = "org.jsoup:jsoup", version.ref = "jsoup"}
build.gradle.kts
implementation(libs.jsoup)
Base definition
If you're thinking of using web scraping for different purposes (apart from the one discussed here), it's a good idea to put the main logic into an interface
interface Scraper {
fun Connection.getPageDocument(): Document? = return this.ignoreContentType(true).get()
}
The code above executes the get request on a Jsoup Connection while ignoring the content type
In order to efficiently store job posting data, define an appropriate data class:
data class LinkedInJob(
val id: String = UUID.randomUUID().toString(),
val title: String,
val company: String,
val location: String,
val link: String)
Next we'll define a LinkedInJobsScraper
class:
class LinkedInJobsScraper(
private val connection: (String) -> Connection
): Scraper {
fun getJobs(country: String, jobTitle: String, hiringOrganization: String): List<LinkedInJob> {
val url = constructUrl(country, coverLetter)
val document = connection(url).getPageDocument()
}
private fun constructUrl(country: String, jobTitle: String, hiringOrganization: String): String {
val jobTitle = coverLetter.jobTitle
val hiringOrganization = coverLetter.hiringOrganization
return "https://www.linkedin.com/jobs/search?keywords=${jobTitle} $hiringOrganization" +
"&location=$country".replace(" ", "%20")
}
}
connection
parameter is of higher-order function type. It's a function that takes a String url as input and returns a Connection object. getJobs
uses that Kotlin features to establish an appropriate connection, and then get the Document
Here's what a url requesting jobs could look like: https://www.linkedin.com/jobs/search?keywords=Android%20Developer%20Google&location=Germany
Replacing spaces with %20 in URLs is important because spaces are not valid characters in a URL. URLs have specific encoding rules to ensure they can be properly understood by web browsers and servers. Spaces are replaced by %20, which is the ASCII hexadecimal value for a space character, ensuring that the URL is correctly encoded and interpreted.
The last thing left to do related to scraping is extracting appropriate text with CSS queries:
inside getJobs
function
var jobCards = document.select("div.base-card.base-card--link.job-search-card")
if (jobCards.isEmpty()) {
jobCards = document.select("li a.base-card")
}
val linkedInJobs = jobCards.map { card ->
val titleElement = card.selectFirst("h3.base-search-card__title")
val companyElement = card.selectFirst("h4.base-search-card__subtitle")
val locationElement = card.selectFirst("span.job-search-card__location")
val linkElement = card.selectFirst("a.base-card__full-link")
LinkedInJob(
title = titleElement?.text() ?: "",
company = companyElement?.text() ?: "",
location = locationElement?.text() ?: "",
link = linkElement?.attr("href") ?: card.attr("href") ?: ""
)
}
return linkedInJobs
I've tested job cards extraction with different CSS queries and have come to the conclusion that the structure of downloaded HTML from LinkedIn can vary, which is why document.select("li a.base-card")
and linkElement?.attr("href") ?: card.attr("href")
are used.
Inside of the map
function we're extracting titleElement
, companyElement
, locationElement
, and linkElement
from each individual Element
, inserting them into LinkedInJob
data class, and saving them inside of linkedInJobs
variable.
Top comments (0)