DEV Community

Cover image for How you can extract company signals from job postings
Joerg Rech
Joerg Rech

Posted on

How you can extract company signals from job postings

Identifying New Office Locations in Job Postings

As software developers, we know the importance of staying ahead in the tech industry, whether it's through learning new programming languages or understanding market trends. By analyzing job postings from companies, we can uncover new office locations, reveal growing tech hubs, or even identifying potential business partners for freelancing.

Imagine knowing that a major tech company is setting up a new office before everyone else does—that's the kind of edge we’re talking about. Let's dive into how we can do this using practical steps and tools.

The Basics: Job Postings and Company Signals

Job postings are typically targeted at potential new employees and used to create awareness that a company exists, is hiring for a specific role, why they are hiring, what skills an applicant should have, what the company works on, and provide insights into what companies does, plan, want, use, build, etc.

But job posting also contain many signals what a company is up to and are indicators that provide insights into a company's activities, strategies, and future plans. They help to identify sales opportunities, understand market trends, and develop targeted sales strategies. These signals can be derived from various sources, including job postings, press releases, financial reports, social media activity, and more.

In the remaining of this post we will use the free Luxembourg job posting data feed from Techmap on AWS Data Exchange (ADX) which gives access to all our historical data since January 2020. More specifically, we use the data from June 2024 with 14.7k job postings where the main sources are Linkedin, CareerJet, Indeed, Eures, and Smartrecruiters. For Luxembourg the compressed data files, typically, have a size between 100KB and 1.5MB per day.

Identifying New Office Locations in Job Postings

As Luxembourg has multiple official languages, we can identify location signals in English, French, and German. Furthermore, the regular expressions can identify location signals by using synonyms used by the hiring companies, such as “bureau” or “premise” instead of “office”.

(?i)\b(
  (new (\w+ ){0,2}(office|premise|location|facilit(y|ie)|site)s?)
  |(nouveau (\w+ ){0,2}(bureau|locaux|facilité|site)s?)
  |(neue(n|m|s)? (\w+ ){0,2}(Büro|Office|Räum(en|lichkeiten)|Standorte?|Einrichtung(en)?))
  |(we (are )?(expand(ing)?) to)|(nous (\w+ )?(étendons|développons?) (à|en))
  |(wir (\w+ )?(expandieren) nach)
)\b
Enter fullscreen mode Exit fullscreen mode

Extracting Location Signals from Job Postings

In order to use the job postings you can follow AWS’s tutorial to subscribe to AWS Data Exchange for Amazon S3 but use our free Luxembourg data feed on ADX instead of their test product.

Then you can download the files from AWS ADX and decompress all files from June 2024 (results in 130 MB)

aws s3 sync \
    s3://<YOUR_BUCKET_ALIAS>/lu/ . \
    --request-payer requester \
    --exclude "*" \
    --include "techmap_jobs_lu_2024-06-*.jsonl.gz"

gzip -d *.gz
Enter fullscreen mode Exit fullscreen mode

Now that we have the job postings in textual files with JSON Lines format, we can program the identification of location signals:

#!/bin/bash

# Define the regex as an environment variable 
export REGEX='\b((new (\w+ ){0,2}(office|premise|location|facilit(y|ie)|site)s?)|(nouveau (\w+ ){0,2}(bureau|locaux|facilité|site)s?)|(neue(n|m|s)? (\w+ ){0,2}(Büro|Office|Räum(en|lichkeiten)|Standorte?|Einrichtung(en)?))|(we (are )?(expand(ing)?) to)|(nous (\w+ )?(étendons|développons?) (à|en))|(wir (\w+ )?(expandieren) nach))\b' 

# Define and clear output file for the results 
export OUTPUT_FILE="location_signals.txt"
printf '' > "$OUTPUT_FILE"

# Loop over all files matching the pattern
for file in techmap_jobs_lu_2024-06-*.jsonl; do
    # Check if the file exists to avoid errors if no files match the pattern
    if [[ -e "$file" ]]; then
        # Decompress the file, filter JSON lines with jq, and extract fields
        cat "$file" | jq -r --arg regex ".{0,20}$REGEX.{0,20}" '
            select(
                . | to_entries[] | select(.value | type == "string" and test($regex))
            ) | {
                job_name: .name,
                job_url: .url,
                company_name: .company.name,
                location: (.location.orgAddress.addressLine // .location.orgAddress.city),
                matched_text: [
                    . | to_entries[] | select(.value | type == "string" and test($regex)) | .value | match($regex).string
                ] | unique | map("..." + . + "...") | join(", ")
            }
        ' >> "$OUTPUT_FILE"
    else
        echo "No files matching the pattern found."
    fi
done
Enter fullscreen mode Exit fullscreen mode

To extract unique companies from job postings, regardless of the number of postings they placed, use the following code snippet:

cat location_signals.txt | grep company_name | sort | uniq
Enter fullscreen mode Exit fullscreen mode

And get a result that looks like this:

  "company_name": "Allianz Global Investors",
  "company_name": "Amazon EU Sarl - A84",
  "company_name": "Amazon EU Sarl",
  "company_name": "Amazon",
  "company_name": "ArcelorMittal",
  "company_name": "Astel Medica",
  "company_name": "Deloitte",
  "company_name": "Euro Exim Bank",
  "company_name": "Koch Global Services",
  "company_name": "Koch Industries",
  "company_name": "Luxscan Weinig Group",
  "company_name": "MD Skin Solutions",
  "company_name": "ROTAREX",
  "company_name": "Thales",
  "company_name": "e-Consulting RH, Sourcing & Recrutement de Profils pénuriques",
  "company_name": "myGwork - LGBTQ+ Business Community",
Enter fullscreen mode Exit fullscreen mode

In summary, in under a minute, we identified 144 job postings in Luxembourg mentioning new offices in June 2024. After refining the data, we pinpointed 16 unique companies with location signals, showcasing the effectiveness of our extraction method.

Conclusion

In this article, we showed how to identify 144 job postings mentioning a location change for 16 unique companies in June 2024 in Luxembourg. A similar analysis in the USA revealed 1241 signals from 918k job postings (i.e., ~1 signal per 1k job postings).

For more details, check out our full article on how to extract company signals from job postings.

If you are interested in further exploring the potential of job postings or applying machine learning / AI techniques, we encourage you to experiment with our data feed and source code provided in this article. Happy analyzing!

Top comments (0)