장백호

Posted on Mar 28 • Edited on Mar 30

Stop Writing Custom Importers: Import Multilingual Data in Drupal with Migrate API

#drupal #php #opensource #webdev

Most Drupal developers still write custom importers for external data.

In many cases, that’s unnecessary. Drupal’s Migrate API already provides a robust and extensible solution, even for complex, multilingual imports.

This post is intended for beginner and mid-level Drupal Developers. The examples provided follow the current supported Drupal standards.

What you'll learn

How to import XML feeds using Drupal Migrate API
How to handle multilingual content in a single migration
How to keep content in sync with remote systems
How to automate imports using cron

The Challenge

In many enterprise projects, you’ll need to regularly import data from third-party APIs or feeds. Having worked extensively on such integrations, I want to show how you can build a clean, maintainable importer in Drupal without reinventing the wheel.

This is an example feed of job postings that need importing into the application:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Fictional HR feed (Publication per locale). Each Publication has an ID attribute. -->
<Jobs>
  <Job id="DEMO-1001" ID="DEMO-1001">
    <ExternalPublication>
      <Publication ID="DEMO-1001-en_US" language="en_US">
        <Jobname>Developer Advocate</Jobname>
        <ShortDescription>Help developers succeed with our APIs and tools.</ShortDescription>
        <Leadaftertitle>Remote-first, documentation-focused team.</Leadaftertitle>
        <Publicationdate>2026-01-15</Publicationdate>
        <URL>https://www.example.org/jobs/apply/demo-1001</URL>
      </Publication>
      <Publication ID="DEMO-1001-nl_NL" language="nl_NL">
        <Jobname>Developer Advocate</Jobname>
        <ShortDescription>Ontwikkelaars helpen slagen met onze API's en tools.</ShortDescription>
        <Leadaftertitle>Remote-first, focus op documentatie.</Leadaftertitle>
        <Publicationdate>2026-01-15</Publicationdate>
        <URL>https://www.example.org/jobs/apply/demo-1001</URL>
      </Publication>
    </ExternalPublication>
  </Job>
  <Job id="DEMO-1002" ID="DEMO-1002">
    <ExternalPublication>
      <Publication ID="DEMO-1002-en_US" language="en_US">
        <Jobname>Site Reliability Engineer</Jobname>
        <ShortDescription>Keep production calm, observable, and boring—in a good way.</ShortDescription>
        <Leadaftertitle>On-call rotation with strong blameless culture.</Leadaftertitle>
        <Publicationdate>2026-02-01</Publicationdate>
        <URL>https://www.example.org/jobs/apply/demo-1002</URL>
      </Publication>
    </ExternalPublication>
  </Job>
  <Job id="DEMO-1003" ID="DEMO-1003">
    <ExternalPublication>
      <Publication ID="DEMO-1003-fr_FR" language="fr_FR">
        <Jobname>Ingénieur logiciel</Jobname>
        <ShortDescription>Concevoir et maintenir des services fiables, du prototype à la production.</ShortDescription>
        <Leadaftertitle>Équipe produit, stack moderne, télétravail possible.</Leadaftertitle>
        <Publicationdate>2026-03-01</Publicationdate>
        <URL>https://www.example.org/jobs/apply/demo-1003</URL>
      </Publication>
    </ExternalPublication>
  </Job>
</Jobs>

Things that immediately come to mind are; How much data? How often? Are incremental updates required? Does data need to be in sync with remote removals? All of these will attribute to the choice of implementation.

Assuming all these requirements are applicable, we come down to a few solutions in the Drupal landscape:

Migrate API (semi custom - migrate scaffolding - extensible)
Feeds (click and assemble - interface driven - extensible)
Fully custom (write from scratch - high maintenance)

Why Migrate API?

We chose the Migrate API as it lives in core and has matured in various ways alongside many community contributed building blocks and tooling on top. For our client's needs it was the perfect choice to build the importer while being mindful of low effort, high quality and easy maintainability.

For the custom integration, we will rely on the following stack:

Migrate - Core eco system and scaffolding
Migrate Plus - Extended set of plugins for fetching, parsing, auth, etc...
Migrate Tools - Extra commands to aid the importer
Ultimate Cron - Advanced scheduled task administration (optional)

XML Feed → Migrate Source → Process Plugins → Node (translations)
                                ↓
                           Cron (Ultimate Cron)

Implementation

I created a demo to support this blog post while expanding on the most notable pieces of its functionality:

Jobs Import Demo: https://github.com/baikho/drupal-jobs_import_demo

Install the demo, by running:

composer require baikho/drupal-jobs_import_demo
drush en jobs_import_demo -y
drush cr

We start with the migrate YAML that will map the ETL processes altogether:

id: jobs
label: 'Jobs (demo)'
migration_group: jobs_import_demo

Using a migration group is not needed, but it will help if you have an advanced migration with many referencing entities of different types.

Next up is our migration source definition in which we can specify our custom source plugin, and out of the box available fetcher and parser plugins. We also want to make sure we track changes so setting that flag as TRUE.

For the sake of the demo we use an internal sample endpoint for the http fetcher, but in real-world scenarios this would be a remote URL with an authentication layer:

source:
  plugin: job_feed_url
  data_fetcher_plugin: http
  data_parser_plugin: simple_xml
  method: GET
  track_changes: true
  headers:
    Accept: 'application/xml, text/xml;q=0.9, */*;q=0.8'
  namespaces: { }
  item_selector: '//Job/ExternalPublication/Publication'
  fields:
    -
      name: publication_id
      label: 'Publication ID'
      selector: '@ID'
    -
      name: job_id
      label: 'Job ID'
      selector: 'ancestor::Job/@id'
    -
      name: locale
      label: 'Publication locale'
      selector: '@language'
    -
      name: title
      label: Title
      selector: 'Jobname'
...

Next we will define our migration index, in which we will use a composite key as the entity ids are uniquely identified by a combination of job_id + locale to match Drupal's architecture:

  ids:
    job_id:
      type: string
    locale:
      type: string

Our data transformation will happen in the process definition. We map the titles and languages. Already we can see some useful plugins at work, like the skip_on_empty and static_map plugins. We won't cover all fields, but you can find them in the demo repository:

process:
  title:
    plugin: skip_on_empty
    source: title
    method: row
    message: 'Missing title'

  langcode:
    plugin: static_map
    source: locale
    map:
      en_US: en
      nl_NL: nl
      fr_FR: fr
    default_value: en

To make multilingual data land as entity translations we write a custom plugin and pipe it upon the result of the migration_lookup plugin. This will settle our first unique id index as the source translation in sequence:

  # Same job_id exists for every locale row; lookup by job_id alone can return
  # several nids. Translation rows need one nid — use first id (same job).
  nid:
    -
      plugin: migration_lookup
      migration: jobs
      source_ids:
        jobs:
          - job_id
      no_stub: true
    -
      plugin: migration_lookup_first_nid
    -
      plugin: skip_on_empty
      method: process

See the demo code for the migration_lookup_first_nid helper plugin.

Finally to map the data transformation on the Drupal entity type we add the following destination definition:

destination:
  plugin: 'entity:node'
  default_bundle: vacancy
  translations: true

Now that our ETL process is final, we can already run this on demand by executing the migrate drush command:

drush mim jobs
 [notice] Processed 4 items (4 created, 0 updated, 0 failed, 0 ignored) - done with 'jobs'

Great! Now we have our 4 vacancy nodes created from the feed just like that, while respecting entity translations in a single migrate YAML file:

If this were a one-off migration, this would be sufficient and all done!

However, the client needs this synchronized on an hourly interval. While this would typically be handled via a server-level crontab executing the command, the client preferred to keep this logic within the application itself. We achieved this by creating a small service with a callback for Ultimate Cron and the adjusted command with the --sync option provided by Migrate Tools:

<?php

declare(strict_types=1);

namespace Drupal\jobs_import_demo\Service;

/**
 * Cron-style import runner: spawns Drush migrate:import in the background.
 *
 * Ultimate Cron should call {@see self::jobsImportCron()}.
 */
final class ImportCronService {

  /**
   * Ultimate Cron callback: import jobs for the jobs_import_demo group.
   */
  public function jobsImportCron(): void {
    $this->runGroup('jobs_import_demo');
  }

  /**
   * Spawns drush mim in the background (non-blocking).
   *
   * @param string $migrationGroup
   *   Migrate Plus group id (e.g. jobs_import_demo).
   */
  private function runGroup(string $migrationGroup): void {
    $drush = DRUPAL_ROOT . '/../vendor/bin/drush';
    if (!is_executable($drush)) {
      return;
    }

    $command = $drush . ' mim --group=' . escapeshellarg($migrationGroup) . ' --update --sync > /dev/null 2>&1 &';
    exec($command);
  }

}

With this extra bit of code in place and the Ultimate Cron config, we can navigate to the Cron interface in Drupal and administer or manually trigger the scheduled task for the import:

Under the hood, Migrate API will take care of the remote changes and keep all entities in sync. All of which we didn't have to write ourselves and are able to just solely focus on the business logic mapping.

Why this approach scales

One of the biggest advantages of using Migrate API is that it treats imports as first-class citizens in Drupal.

Instead of writing one-off scripts, you get:

Change tracking
Rollbacks
Re-runs
Extensibility via plugins

This makes it especially suitable for long-lived enterprise integrations.

Conclusion

The implementation has demonstrated how quickly one can build a clean solution in Drupal with minimal effort. It has also proven how good it works with Drupal entities and translations tracking upstream changes.

You can find the demo on my GitHub profile, which is easily installable as a module. Instructions are found in the README.

Final words

I want to thank all of the core maintainers and contributors who work actively on enriching Drupal.

The Drupal ecosystem thrives on community and contributions, so if you want to help it grow make sure to look at the issue queue and start reviewing or testing bugfixes and features.

In another blog post I will expand on building similar functionality with Feeds which also offers rich tooling, just to show you how diverse the landscape in Drupal really is with various solutions and approaches.