Freshactors

Posted on Jun 11

How to scrape Personio career-portal jobs in Python — no API key

#python #webscraping #api #tutorial

Personio is the ATS standard of the German-speaking SMB world — thousands of companies in Germany, Austria, and Switzerland run their careers page on a {tenant}.jobs.personio.de portal. Here's the part most people miss: every one of those portals serves a public XML feed of its published positions — no API key, no login, no headless browser. One GET returns the whole board with departments, seniority levels, and full descriptions. In this tutorial we'll pull a company's job board as clean structured JSON in a few lines of Python.

The endpoint

Every Personio career portal serves its feed at the explicit /xml path:

GET https://{tenant}.jobs.personio.de/xml
GET https://{tenant}.jobs.personio.de/xml?language=en   (optional localization)

The response is a <workzag-jobs> document with one <position> per job — id, office, department, name, employmentType, seniority, schedule, createdAt, and labeled description sections as CDATA HTML.

So why not just requests.get() it yourself? You can — but then you own the parser: handling CDATA sections, stripping the HTML, decoding entities, splitting multi-office strings, and fixing it the day the feed shape shifts and your pipeline goes quietly empty. A cleaner path: hand a list of tenants to an actor that returns one stable schema — the same schema as Greenhouse, Lever, Workable, SmartRecruiters, Recruitee, and Teamtailor. Here's how with the Personio Jobs Scraper.

Step 1 — Install the Apify client

pip install apify-client

Read your Apify API token (Console → Settings → Integrations) from an environment variable:

export APIFY_TOKEN="apify_api_xxx"

Step 2 — Run the actor with a list of tenants

companies accepts tenant subdomains (lanch) or {tenant}.jobs.personio.de URLs.

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

run_input = {
    "companies": ["teamative", "https://lanch.jobs.personio.de"],
    "includeDescription": True,
    "maxJobsPerCompany": 100,
}

run = client.actor("freshactors/personio-jobs-scraper").call(run_input=run_input)
print("Dataset id:", run["defaultDatasetId"])

Step 3 — Read the normalized output (departments + seniority included)

Every position comes back in the same shape, with null (never missing keys) where Personio's feed lacks a field:

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f'{item["company"]:<11} {item["title"]}  [{item.get("department") or "n/a"} / {item.get("seniority") or "n/a"}]')

A single record (a real one, from teamative's portal):

{
  "_type": "job",
  "_schemaVersion": "1.0",
  "_source": "personio",
  "company": "teamative",
  "jobId": "2623782",
  "title": "Initiativbewerbung (m/w/d)",
  "department": "Marketing",
  "seniority": "experienced",
  "location": "DE - Stuttgart",
  "allLocations": ["DE - Stuttgart"],
  "commitment": "Full-or-part-time",
  "url": "https://teamative.jobs.personio.de/job/2623782",
  "applyUrl": "https://teamative.jobs.personio.de/job/2623782",
  "postedAt": "2026-05-05T08:58:57.000Z",
  "descriptionText": "Über uns:\nteamative bietet Beratung, Entwicklung und... (labeled sections, clean text)",
  "_scrapedAt": "2026-06-10T12:34:05.149Z"
}

This is the same record shape our Greenhouse & Lever, Workable, SmartRecruiters, Recruitee, and Teamtailor scrapers emit — plus Personio's department and seniority, segmentation fields most ATS feeds don't expose.

Step 4 — Localized or lighter output

Want English titles/descriptions where the company maintains them? Pass a language code. Only need metadata for a hiring-signal dashboard? Drop the descriptions:

run_input = {
    "companies": ["teamative", "lanch"],
    "language": "en",               # localized where provided
    "includeDescription": False,    # smaller records; same cost & speed
    "maxJobsPerCompany": 100,
}

Prefer Node.js?

npm install apify-client

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor('freshactors/personio-jobs-scraper').call({
    companies: ['teamative', 'https://lanch.jobs.personio.de'],
    includeDescription: true,
    maxJobsPerCompany: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const job of items) console.log(`${job.company} — ${job.title} [${job.department ?? 'n/a'} / ${job.seniority ?? 'n/a'}]`);

What about cost?

Pay-per-event: $0.02 per company portal fetched and $0.0005 per job posting returned. So 5 companies returning 100 postings total is 5 × $0.02 + 100 × $0.0005 = $0.15 — departments, seniority, and full descriptions included. No subscription.

Why use the actor instead of the feed directly?

You can parse the XML yourself. The reason to use the actor is maintenance: it normalizes everything into one schema (shared with our five other ATS scrapers), handles CDATA/entities/multi-office strings, isolates per-company failures (an unknown tenant never kills your run), and is monitored by a daily canary — so a silent feed change doesn't quietly empty your pipeline.

The actor is here: Personio Jobs Scraper on Apify. Point it at your target companies and consume one normalized JSON feed.

Happy scraping.

DEV Community