From Drupal to Obsidian: Exporting a Personal Knowledge Base to Markdown

#productivity #tutorial #webdev #drupal

For several years I used a small Drupal instance as my personal knowledge base. The main reason I chose Drupal in the first place was its powerful content modeling capabilities. Out of the box it provides content types, custom fields, and flexible taxonomy. Since I was already working with Drupal professionally, using it for personal knowledge management felt natural and efficient.

This post was originally published on my blog - https://dzhebrak.com/blog/from-drupal-to-obsidian/

Over time, however, I started to feel that the system was heavier than necessary for personal note-taking. Running a full CMS requires maintenance: updates, backups, hosting, and database management.

Obsidian appealed to me for several reasons. It stores notes as plain Markdown files, which means the data is future-proof and easy to manipulate with scripts. The entire knowledge base lives locally in a simple folder structure, yet the application provides powerful features like backlinking, graph visualization, and plugins.

The challenge was exporting structured content from Drupal and converting it into Markdown files that Obsidian could understand. My Drupal instance had few content types, some custom fields and tags. The goal was to export every node as a Markdown file with YAML front matter, so Obsidian could index the metadata and the content would be immediately readable.

Preparing Drupal for Export

Drupal's JSON:API module (included in core since Drupal 8.7) exposes all your content as a RESTful API with a consistent, well-documented response format. It was the obvious choice for programmatic access.

To authenticate the requests, I enabled the HTTP Basic Authentication module — also in Drupal core. This lets you pass credentials via standard HTTP headers, which is straightforward to use in a script without setting up OAuth or tokens.

Both modules can be enabled in one command:

drush en jsonapi basic_auth -y

Or through the admin UI at Extend (/admin/modules).

With these active, your content is accessible at endpoints like /jsonapi/node/article, returning a paginated JSON response with all field data included.

What Data I Needed

The example fields I cared about were:

Title — the node title, naturally
Body — the main article content in HTML (which I converted to Markdown)
Tags — a taxonomy term reference field, for Obsidian tags
field_related_links — custom multi-value link field containing external reference URLs

The JSON:API response nests taxonomy terms as relationships, so fetching tags required a quick include parameter: ?include=field_tags. Custom fields appear as first-class attributes in the response, so field_related_links came through without any special configuration.

The Export Script

Below is the script that handles the full export. It paginates through the JSON:API endpoint, converts each node's body HTML to Markdown, and writes individual .md files with YAML front matter to an output directory.

<?php declare(strict_types=1);

require_once __DIR__.'/vendor/autoload.php';

use League\HTMLToMarkdown\HtmlConverter;
use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\Yaml\Yaml;
use function Symfony\Component\String\u;

////////
$url = 'https://DRUPAL_DOMAIN/jsonapi/node/article?include=field_tags';
$username = '';
$password = '';
////////

set_time_limit(0);

$client = HttpClient::create();
$converter = new HtmlConverter();
$requestOptions = empty($username) && empty($password) ? [] : [
    'auth_basic' => [$username, $password],
];

do {
    $response = $client->request('GET', $url, $requestOptions)->toArray();

    $tags = [];
    foreach ($response['included'] ?? [] as $included) {
        $type = $included['type'] ?? null;
        $name = $included['attributes']['name'] ?? null;

        if ($type !== 'taxonomy_term--tags' || $name === null) {
            continue;
        }

        $tags[$included['id']] = $name;
    }

    foreach ($response['data'] ?? [] as $node) {
        $id = $node['id'] ?? null;
        $title = $node['attributes']['title'] ?? null;
        $html = $node['attributes']['body']['value'] ?? null;

        $nodeTagsIds = array_column($node['relationships']['field_tags']['data'] ?? [], 'id');
        $nodeTags = array_filter(array_map(
            fn($tagId) => $tags[$tagId] ?? null,
            $nodeTagsIds
        ));

        $markdownContent = $html ? $converter->convert($html) : null;
        $links = array_map(
            fn (array $link) => $link['uri'] ?? null,
            $node['attributes']['field_related_links'] ?? []
        );

        $title = u($title)->replaceMatches('/[\/:\*\?"<>|\\\\]/', ' ')->trim()->toString();

        $frontMatter = [
            'title' => $title,
            'tags' => array_map(
                fn (string $tag) => u($tag)->ascii()->lower()->replaceMatches('/(\s+|\.)/', '-')->trim()->toString(),
                $nodeTags,
            ),
        ];

        $linksCount = count($links);

        if ($linksCount === 1) {
            $frontMatter['source'] = current($links);
        } else if ($linksCount > 1) {
            $frontMatter['links'] = $links;
        }

        $frontMatterYaml = Yaml::dump($frontMatter);

        $outputFilename = sprintf('%s.md', $title);
        $outputFilepath = sprintf('%s/%s', __DIR__.'/output', $outputFilename);
        $outputContent = sprintf("---\n%s---\n%s", $frontMatterYaml, $markdownContent);
        file_put_contents($outputFilepath, $outputContent);
        printf("%s\n", $outputFilename);
    }

    $url = $response['links']['next']['href'] ?? null;
} while (!empty($url));

While the code may not be the most elegant solution, it effectively gets the job done, making it suitable for one-off tasks.

The script relies on several external packages, which can be installed via Composer:

composer require league/html-to-markdown symfony/http-client symfony/string symfony/translation-contracts symfony/yaml

I’ve also published the export script on GitHub, including a Docker setup to make it easier to run.

How to Use It

There are three things to configure at the top:

$url — set this to your Drupal site's JSON:API article endpoint, for example:

   https://your-drupal-site.com/jsonapi/node/article?include=field_tags

If you have a different content type, replace article with your machine name.

$username and $password — your Drupal user credentials if the website requires authentication. The account needs at least the access content permission.

Run the script with:

php export.php

The script will paginate automatically through all your nodes of the specified in the url type. When it finishes, you'll find one .md file per node in the output/ directory. File names are slugified from the node title to keep things readable.

Once the export is done, copy the contents of output/ into your Obsidian vault folder.

Final Thoughts

This migration turned out to be surprisingly straightforward thanks to Drupal’s strong API support. By enabling JSON:API and writing a small export script, it was possible to extract structured content and convert it into a portable Markdown format.

Moving from Drupal to Obsidian removed the overhead of maintaining a CMS while keeping the knowledge itself intact. Everything now lives in simple text files that are easy to version, sync, and manipulate with scripts.

For a personal knowledge base, that simplicity is hard to beat.