<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DecDEPO</title>
    <description>The latest articles on DEV Community by DecDEPO (@c_d_084d360f424581c9995).</description>
    <link>https://dev.to/c_d_084d360f424581c9995</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3884594%2F42fd7e56-edb4-4830-9ef2-06f11bf392ae.png</url>
      <title>DEV Community: DecDEPO</title>
      <link>https://dev.to/c_d_084d360f424581c9995</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/c_d_084d360f424581c9995"/>
    <language>en</language>
    <item>
      <title>Distributing an open dataset across 12 platforms in one day: a playbook</title>
      <dc:creator>DecDEPO</dc:creator>
      <pubDate>Fri, 17 Apr 2026 16:00:22 +0000</pubDate>
      <link>https://dev.to/c_d_084d360f424581c9995/distributing-an-open-dataset-across-12-platforms-in-one-day-a-playbook-2493</link>
      <guid>https://dev.to/c_d_084d360f424581c9995/distributing-an-open-dataset-across-12-platforms-in-one-day-a-playbook-2493</guid>
      <description>&lt;p&gt;We shipped the &lt;strong&gt;&lt;a href="https://github.com/zaragoza-ab/swedish-construction-faq-1000" rel="noopener noreferrer"&gt;Swedish Construction FAQ&lt;/a&gt;&lt;/strong&gt; dataset (503 bilingual Q&amp;amp;A pairs, CC BY 4.0, DOI &lt;a href="https://doi.org/10.5281/zenodo.19630803" rel="noopener noreferrer"&gt;10.5281/zenodo.19630803&lt;/a&gt;) yesterday. By end of day it was on &lt;strong&gt;twelve&lt;/strong&gt; platforms.&lt;/p&gt;

&lt;p&gt;This is the checklist, in order. Each step takes 10-30 minutes. Most are scriptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 12 platforms
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;What it gives you&lt;/th&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Canonical source, issues, Pages&lt;/td&gt;
&lt;td&gt;base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zenodo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DOI, Scholar indexing, archival&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Hugging Face Datasets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ML audience, &lt;code&gt;load_dataset()&lt;/code&gt; UX&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;PyPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pip install&lt;/code&gt; for Python users&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kaggle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data-science audience&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Wikidata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Knowledge graph, LLM indexing&lt;/td&gt;
&lt;td&gt;30 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GitHub Pages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Static landing page, SEO&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Colab notebook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One-click "try it" UX&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Hugging Face Space&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Live interactive demo&lt;/td&gt;
&lt;td&gt;20 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Awesome-list PRs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Organic discovery&lt;/td&gt;
&lt;td&gt;15 min/list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Dev.to / Mastodon / HN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Human announcement reach&lt;/td&gt;
&lt;td&gt;10 min each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;LinkedIn / Medium&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Longer-form professional reach&lt;/td&gt;
&lt;td&gt;15 min each&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What to ship before you announce
&lt;/h2&gt;

&lt;p&gt;Before posting &lt;em&gt;anywhere&lt;/em&gt;, finalize these files in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[x] &lt;code&gt;README.md&lt;/code&gt; with badges (license, DOI, language, format)&lt;/li&gt;
&lt;li&gt;[x] &lt;code&gt;LICENSE&lt;/code&gt; (CC BY 4.0 for data, MIT for any code)&lt;/li&gt;
&lt;li&gt;[x] &lt;code&gt;CITATION.cff&lt;/code&gt; with the DOI and a &lt;code&gt;preferred-citation&lt;/code&gt; block&lt;/li&gt;
&lt;li&gt;[x] &lt;code&gt;.zenodo.json&lt;/code&gt; so Zenodo picks up rich metadata on release&lt;/li&gt;
&lt;li&gt;[x] &lt;code&gt;llms.txt&lt;/code&gt; at repo root — tells LLMs where to find the canonical metadata&lt;/li&gt;
&lt;li&gt;[x] Data files in multiple formats: JSON, JSONL, CSV, Alpaca, ShareGPT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip these, every downstream platform will have thin, inconsistent metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scripting order
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. GitHub first
&lt;/h3&gt;

&lt;p&gt;Create the repo, push everything, tag a release. Zenodo listens for GitHub releases via webhook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh repo create myorg/mydataset &lt;span class="nt"&gt;--public&lt;/span&gt; &lt;span class="nt"&gt;--source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
git push &lt;span class="nt"&gt;-u&lt;/span&gt; origin main
gh release create v1.0.0 &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Initial release"&lt;/span&gt; &lt;span class="nt"&gt;--notes&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Zenodo — before anything else downstream
&lt;/h3&gt;

&lt;p&gt;Because the DOI becomes part of every downstream card. Don't mirror first and then retrofit the DOI everywhere.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Link your GitHub account to Zenodo&lt;/li&gt;
&lt;li&gt;Flip the switch on your repo&lt;/li&gt;
&lt;li&gt;Cut a release → Zenodo mints a DOI automatically&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.zenodo.json&lt;/code&gt; in the repo controls metadata (creators, keywords, related_identifiers)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Hugging Face Datasets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the dataset repo via API&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://huggingface.co/api/repos/create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"type":"dataset","name":"mydataset","organization":"MYORG"}'&lt;/span&gt;

&lt;span class="c"&gt;# Then it's just git&lt;/span&gt;
git clone https://USER:&lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt;@huggingface.co/datasets/MYORG/mydataset
&lt;span class="c"&gt;# Add your data/, add README.md with the YAML card, push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dataset card YAML matters. Use &lt;code&gt;configs:&lt;/code&gt; with split paths — it unlocks the in-browser dataset viewer.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. PyPI wrapper
&lt;/h3&gt;

&lt;p&gt;Ship a tiny Python package that just bundles the data and exposes a loader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pyproject.toml + one module that does:
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__package__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;joinpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;faq.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Publish with &lt;code&gt;python -m build &amp;amp;&amp;amp; twine upload dist/*&lt;/code&gt;. Users get &lt;code&gt;pip install your-dataset-name&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Kaggle mirror
&lt;/h3&gt;

&lt;p&gt;One-time manual upload via the web form, then the Kaggle API for updates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kaggle datasets version &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Update to v1.0.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tag generously. Kaggle's search weights tags heavily.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Wikidata
&lt;/h3&gt;

&lt;p&gt;This is where most datasets stop — and why their metadata never gets into LLM training pipelines. The MediaWiki API is a bit arcane but the full flow is ~30 lines of Node. I wrote the how-to &lt;a href="https://dev.to/c_d_084d360f424581c9995/how-i-put-my-open-dataset-on-the-wikidata-knowledge-graph-and-why-you-should-too-3f9i"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Create at minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dataset entity (&lt;code&gt;P31&lt;/code&gt; → &lt;code&gt;Q1172284&lt;/code&gt; "dataset")&lt;/li&gt;
&lt;li&gt;The publisher entity (your company / yourself)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;P123&lt;/code&gt; (publisher) + &lt;code&gt;P170&lt;/code&gt; (creator) links between them&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;P275&lt;/code&gt; (license), &lt;code&gt;P356&lt;/code&gt; (DOI), &lt;code&gt;P407&lt;/code&gt; (language), &lt;code&gt;P856&lt;/code&gt; (official website)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. GitHub Pages
&lt;/h3&gt;

&lt;p&gt;A one-page static landing site helps human discoverability and SEO. Enable it on the repo's &lt;code&gt;main&lt;/code&gt; branch.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Colab notebook
&lt;/h3&gt;

&lt;p&gt;Drop a &lt;code&gt;notebooks/quickstart.ipynb&lt;/code&gt; that loads the raw JSON from GitHub and does three example queries. Add the Colab badge to README:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;![Open In Colab&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://colab.research.google.com/assets/colab-badge.svg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;](https://colab.research.google.com/github/ORG/REPO/blob/main/notebooks/quickstart.ipynb)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;People click it. Dramatically lowers the "try it" barrier.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Hugging Face Space (Static)
&lt;/h3&gt;

&lt;p&gt;A static Space is HTML + JS + CSS, no backend. Three files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;index.html&lt;/code&gt; — the UI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;app.js&lt;/code&gt; — fetch dataset from GitHub raw, render&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; with &lt;code&gt;sdk: static&lt;/code&gt; in the YAML frontmatter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost: zero. Traffic: HF Spaces is indexed, browsable, trending. Ours is at &lt;code&gt;DecDEPO/swedish-construction-faq-search&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Awesome-list PRs
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;awesome-X&lt;/code&gt; repos are still the organic-discovery hack of 2026. Find the relevant ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh search repos &lt;span class="s2"&gt;"awesome-YOUR-TOPIC"&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; fullName,stargazersCount &lt;span class="nt"&gt;--limit&lt;/span&gt; 50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fork, add your entry under the matching section, open a PR. Keep the line format aligned with existing entries — maintainers reject inconsistent PRs fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Dev.to + Mastodon + HN
&lt;/h3&gt;

&lt;p&gt;Three short announcements, roughly same copy, different lengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mastodon:&lt;/strong&gt; 500 chars, one link, 3-5 hashtags. Warning: em-dash (—) triggers HTTP 400 on the API. Use hyphens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HN:&lt;/strong&gt; &lt;code&gt;Show HN: &amp;lt;name&amp;gt; — &amp;lt;one-line&amp;gt;&lt;/code&gt;. Wait for karma before spamming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dev.to:&lt;/strong&gt; a proper blog post with frontmatter. The API accepts the whole markdown file in one POST. Use &lt;code&gt;canonical_url&lt;/code&gt; if the same content lives elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  12. LinkedIn / Medium
&lt;/h3&gt;

&lt;p&gt;Last, because they're manual. Write once for LinkedIn (short), once for Medium (long-form tutorial). Crosslink both back to the canonical GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons from this particular rollout
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The DOI is the glue.&lt;/strong&gt; Once the DOI exists, every platform accepts the same metadata block. Mint it first, copy-paste everywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wikidata moves the needle more than expected.&lt;/strong&gt; Google, Siri, Perplexity, and LLM training pipelines all read from Wikidata. A single QID gets your dataset into places you can't even list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write a tutorial article mid-rollout.&lt;/strong&gt; We wrote the Wikidata how-to &lt;em&gt;between&lt;/em&gt; rolling out to other platforms. It became its own distribution channel (and ranked for "wikidata open dataset tutorial" in under 24 hours).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static Spaces &amp;gt; gradio Spaces for demos.&lt;/strong&gt; No build step, no dependencies, no cold-start. Instant load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't forget &lt;code&gt;CITATION.cff&lt;/code&gt; for companion datasets.&lt;/strong&gt; We did it for the flagship only. Had to go back and enrich four companions a day later. Do them all at once.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The scorecard
&lt;/h2&gt;

&lt;p&gt;From our own rollout:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;17 GitHub repos, all topic-tagged&lt;/li&gt;
&lt;li&gt;1 flagship + 5 companion datasets on Hugging Face&lt;/li&gt;
&lt;li&gt;6 Wikidata entities in a connected graph&lt;/li&gt;
&lt;li&gt;1 DOI, auto-updated on each release&lt;/li&gt;
&lt;li&gt;1 Colab notebook, 1 Static Space demo&lt;/li&gt;
&lt;li&gt;8 awesome-list PRs open&lt;/li&gt;
&lt;li&gt;4 Mastodon toots, 3 Dev.to posts, 1 HN submission, 1 LinkedIn post&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of it in 36 hours, mostly automated. Not a single paid channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one step everyone skips
&lt;/h2&gt;

&lt;p&gt;Cross-linking. After you've shipped to 12 platforms, go back and add the other 11 links to each one's metadata. Wikidata's &lt;code&gt;P856&lt;/code&gt;, HF's &lt;code&gt;homepage:&lt;/code&gt;, PyPI's &lt;code&gt;project_urls&lt;/code&gt;, Kaggle's description. This is what makes the graph a graph.&lt;/p&gt;

&lt;p&gt;That's it. One day, twelve platforms, zero spend. Go ship.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Swedish Construction FAQ&lt;/strong&gt; is maintained by &lt;a href="https://zaragoza.se" rel="noopener noreferrer"&gt;Zaragoza AB&lt;/a&gt;, Helsingborg. All datasets CC BY 4.0. Try the live demo: &lt;a href="https://huggingface.co/spaces/DecDEPO/swedish-construction-faq-search" rel="noopener noreferrer"&gt;Space&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>opendata</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I put my open dataset on the Wikidata knowledge graph (and why you should too)</title>
      <dc:creator>DecDEPO</dc:creator>
      <pubDate>Fri, 17 Apr 2026 15:48:30 +0000</pubDate>
      <link>https://dev.to/c_d_084d360f424581c9995/how-i-put-my-open-dataset-on-the-wikidata-knowledge-graph-and-why-you-should-too-3f9i</link>
      <guid>https://dev.to/c_d_084d360f424581c9995/how-i-put-my-open-dataset-on-the-wikidata-knowledge-graph-and-why-you-should-too-3f9i</guid>
      <description>&lt;p&gt;Last week we released the &lt;strong&gt;&lt;a href="https://github.com/zaragoza-ab/swedish-construction-faq-1000" rel="noopener noreferrer"&gt;Swedish Construction FAQ&lt;/a&gt;&lt;/strong&gt; — 503 bilingual Q&amp;amp;A pairs, CC BY 4.0, DOI &lt;a href="https://doi.org/10.5281/zenodo.19630803" rel="noopener noreferrer"&gt;10.5281/zenodo.19630803&lt;/a&gt;. The dataset got onto Hugging Face, Zenodo, Kaggle, PyPI — the usual distribution stack.&lt;/p&gt;

&lt;p&gt;But one step moved the needle more than the others: &lt;strong&gt;putting it on Wikidata as a first-class entity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post is a practical walkthrough. No philosophy, just what we did and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Wikidata entity gets you
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A stable identifier&lt;/strong&gt; — &lt;code&gt;Q139393633&lt;/code&gt; is the QID for our dataset. It'll outlive our GitHub account, our domain, our company.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Machine-readable citability&lt;/strong&gt; — Wikidata is the knowledge graph that Google, Siri, Alexa, OpenAI, Anthropic, Perplexity, and basically every LLM training pipeline reads from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A place for the DOI and license to live together&lt;/strong&gt; — P275 (license) + P356 (DOI) + P407 (language) + P1476 (title) as proper RDF triples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free cross-linking&lt;/strong&gt; — every related entity (the company, the companion datasets, the subject matter) shows up in the &lt;code&gt;Special:WhatLinksHere&lt;/code&gt; graph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It costs nothing&lt;/strong&gt; — no API key, no app registration, no fees. Just a SUL account.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The entities we created
&lt;/h2&gt;

&lt;p&gt;For the FAQ dataset alone, we ended up with six connected entities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;QID&lt;/th&gt;
&lt;th&gt;Entity&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393633&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Swedish Construction FAQ&lt;/td&gt;
&lt;td&gt;The flagship dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393658&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zaragoza AB&lt;/td&gt;
&lt;td&gt;The publisher/creator (our company)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393817&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Construction Terminology Glossary&lt;/td&gt;
&lt;td&gt;Companion trilingual glossary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393818&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Building Materials Specifications&lt;/td&gt;
&lt;td&gt;Companion spec dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393819&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Construction Inspection Templates&lt;/td&gt;
&lt;td&gt;Companion template dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Q139393821&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Renovation Timeline Reference&lt;/td&gt;
&lt;td&gt;Companion timeline dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each companion dataset has &lt;code&gt;P123&lt;/code&gt; (publisher) and &lt;code&gt;P170&lt;/code&gt; (creator) pointing to &lt;code&gt;Q139393658&lt;/code&gt;, which means the company's entity automatically aggregates everything we publish.&lt;/p&gt;

&lt;h2&gt;
  
  
  The minimum viable dataset entity
&lt;/h2&gt;

&lt;p&gt;Here's the shape we used — it's the minimum that actually gets you into the graph in a way that indexers respect.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"labels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Swedish Construction FAQ"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Svensk byggbransch FAQ"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"descriptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"en"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open bilingual Q&amp;amp;A dataset on Swedish construction"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"claims"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P31"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q1172284"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;instance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;dataset&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P275"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q20007257"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;license&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;CC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;BY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P356"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.5281/zenodo.19630803"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DOI&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P407"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q9027"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;language&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Swedish&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P407"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q1860"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;language&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;English&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q139393658"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;publisher&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Zaragoza&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AB&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P170"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Q139393658"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;creator&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Zaragoza&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AB&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P856"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github.com/..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;official&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;website&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P1324"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github.com/..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;repo&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"property"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"P577"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"+2026-04-17"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;publication&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;date&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Calling the MediaWiki API from Node
&lt;/h2&gt;

&lt;p&gt;The API takes three requests: get a login token, log in, get a CSRF token. Then you can &lt;code&gt;wbeditentity&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.wikidata.org/w/api.php&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;jar&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User-Agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-app/1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cookie&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/x-www-form-urlencoded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;API&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;API&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Parse cookies into the jar so subsequent calls are authenticated&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;set-cookie&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/,&lt;/span&gt;&lt;span class="se"&gt;(?=[^&lt;/span&gt;&lt;span class="sr"&gt;;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+=&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;kv&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;kv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;jar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// 1. Login token&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tokens&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Login (SUL credentials work — no bot password required on Wikidata)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lgname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WIKI_USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lgpassword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WIKI_PASS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lgtoken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logintoken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 3. CSRF token for the edit&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tokens&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;csrf&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Create the entity&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wbeditentity&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;new&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;item&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;csrftoken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Create entity for Swedish Construction FAQ (CC BY 4.0)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// → "Q139393633"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole thing. About 30 lines of code, one HTTP session, and your dataset has a permanent identity in the global knowledge graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas we hit
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Abuse filter 11 ("badwords")&lt;/strong&gt; will flag entities with the word "Awesome" in the label. We had to rename one repo's entity description. Doesn't matter what the actual word is — if the filter warns, your write fails with &lt;code&gt;abusefilter-warning&lt;/code&gt; and you have to retry with different wording.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CAPTCHAs appear on newer accounts&lt;/strong&gt; when you add external links. On &lt;code&gt;sv.wikipedia.org&lt;/code&gt;, we had to strip URLs from our first edit. On &lt;code&gt;wikidata.org&lt;/code&gt;, none of our edits hit a CAPTCHA — the Wikibase API is more permissive here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SUL (Single User Login) isn't automatic across all projects.&lt;/strong&gt; Our account worked on &lt;code&gt;wikidata.org&lt;/code&gt; from day one but not on &lt;code&gt;sv.wikipedia.org&lt;/code&gt; (login failed with "wrongpassword"). If you want to edit multiple projects, expect one manual login per project to "attach" your SUL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits are generous.&lt;/strong&gt; We created six entities in under five minutes with a 2-second sleep between them. No throttling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connecting the graph
&lt;/h2&gt;

&lt;p&gt;Once the entities exist, the real value comes from linking them. We made the dataset a "work by" the company entity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Q139393633&lt;/code&gt; (dataset) — has &lt;code&gt;P123&lt;/code&gt; and &lt;code&gt;P170&lt;/code&gt; → &lt;code&gt;Q139393658&lt;/code&gt; (company)&lt;/li&gt;
&lt;li&gt;All companion datasets do the same&lt;/li&gt;
&lt;li&gt;The company entity itself has &lt;code&gt;P31&lt;/code&gt; (aktiebolag), &lt;code&gt;P17&lt;/code&gt; (Sweden), &lt;code&gt;P159&lt;/code&gt; (Helsingborg), &lt;code&gt;P452&lt;/code&gt; (construction industry)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run &lt;code&gt;SPARQL&lt;/code&gt; on &lt;code&gt;query.wikidata.org&lt;/code&gt; and the whole graph falls out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sparql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;?dataset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;?datasetLabel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nv"&gt;?dataset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wdt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;P123&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wd&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;Q139393658&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nv"&gt;?dataset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wdt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;P31&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wd&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;Q1172284&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="k"&gt;SERVICE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wikibase&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;label&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;bd&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;serviceParam&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;wikibase&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="ss"&gt;language&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"en,sv"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's how external tools find everything we've published — one query.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed after we did this
&lt;/h2&gt;

&lt;p&gt;Small, obvious things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Hugging Face dataset card now links to six Wikidata QIDs — readers click through.&lt;/li&gt;
&lt;li&gt;Our Zenodo record has &lt;code&gt;related_identifiers&lt;/code&gt; pointing at the Wikidata URI — so when someone cites the DOI, the graph picks up the citation.&lt;/li&gt;
&lt;li&gt;Wikidata's own "What links here" page now serves as a free backlink-monitoring tool for us.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is rocket science. But it's the one layer most open datasets skip.&lt;/p&gt;

&lt;h2&gt;
  
  
  Go do it
&lt;/h2&gt;

&lt;p&gt;Your dataset already has a DOI, probably a GitHub repo, probably a Hugging Face mirror. Putting it on Wikidata takes ~30 lines of Node and one afternoon.&lt;/p&gt;

&lt;p&gt;The entity you create today will outlive your current URLs. That's worth the afternoon.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Swedish Construction FAQ&lt;/strong&gt; is maintained by &lt;a href="https://zaragoza.se" rel="noopener noreferrer"&gt;Zaragoza AB&lt;/a&gt;, Helsingborg. All datasets CC BY 4.0. Explore the graph: &lt;a href="https://www.wikidata.org/wiki/Q139393633" rel="noopener noreferrer"&gt;Q139393633&lt;/a&gt; · &lt;a href="https://www.wikidata.org/wiki/Q139393658" rel="noopener noreferrer"&gt;Q139393658&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>opendata</category>
      <category>nlp</category>
      <category>python</category>
    </item>
    <item>
      <title>Building an Open Bilingual Q&amp;A Dataset for Swedish Construction Law (503 entries, CC BY 4.0)</title>
      <dc:creator>DecDEPO</dc:creator>
      <pubDate>Fri, 17 Apr 2026 13:31:37 +0000</pubDate>
      <link>https://dev.to/c_d_084d360f424581c9995/building-an-open-bilingual-qa-dataset-for-swedish-construction-law-503-entries-cc-by-40-1c51</link>
      <guid>https://dev.to/c_d_084d360f424581c9995/building-an-open-bilingual-qa-dataset-for-swedish-construction-law-503-entries-cc-by-40-1c51</guid>
      <description>&lt;p&gt;I spent the last few weeks building something that felt missing in the Swedish AI ecosystem: &lt;strong&gt;an open, bilingual, legally-grounded Q&amp;amp;A dataset for the construction industry.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Just released v1.2.1 with 503 question-answer pairs across 39 categories, in both Swedish and English, under CC BY 4.0. Here's what I learned building it, how it's structured, and how to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Swedish construction law (PBL, BBR, ABS 18, AB 04) is dense, fragmented, and lives in PDFs scattered across municipal websites, Boverket, Skatteverket, and court archives.&lt;/p&gt;

&lt;p&gt;If you've ever tried to answer "do I need a bygglov for this renovation?" you know the pain — three websites, two PDFs, one Skatteverket hotline, and maybe an answer.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of problem LLMs can help with — &lt;strong&gt;but only if there's grounded training data&lt;/strong&gt;. Most open multilingual datasets barely include Swedish at all, and when they do, construction/legal Swedish is a rounding error.&lt;/p&gt;

&lt;p&gt;So I built the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the dataset
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;503 Q&amp;amp;A pairs × 2 languages = 1,006 entries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;39 categories covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Permits:&lt;/strong&gt; bygglov, attefallshus, tillbyggnad, marklov&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Taxes:&lt;/strong&gt; ROT-avdrag, RUT-avdrag, F-skatt, omvänd moms, personalliggare&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trades:&lt;/strong&gt; takläggning, fasadrenovering, köksrenovering, badrumsrenovering, isolering, VVS, elinstallation, ventilation, värmesystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal:&lt;/strong&gt; dolda fel, verifiera byggfirma, ABS18/AB04/ABT06 contracts, arbetsmiljö (AFS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulation:&lt;/strong&gt; BBR, PBL, Miljöbalken, Energideklaration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Costs &amp;amp; disputes:&lt;/strong&gt; kostnader, offerter, ARN, dispute resolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30–150 words&lt;/li&gt;
&lt;li&gt;Grounded in a specific Swedish statute or authority guidance&lt;/li&gt;
&lt;li&gt;Cites the source (PBL § 9:2, BBR 6:5321, Skatteverket handledning)&lt;/li&gt;
&lt;li&gt;Hand-reviewed for factual accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Design choices (and mistakes I learned from)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Don't translate legal terms
&lt;/h3&gt;

&lt;p&gt;Early attempt: I translated "bygglov" to "building permit" everywhere in the English set. &lt;strong&gt;Bad idea.&lt;/strong&gt; A Swede reading the English set wants to see &lt;code&gt;bygglov (building permit)&lt;/code&gt; so they can map to the original. And an English-speaking researcher working on Swedish legal NLP wants the Swedish term preserved.&lt;/p&gt;

&lt;p&gt;Rule I landed on: &lt;strong&gt;keep Swedish legal terminology in the English set, with English gloss in parentheses the first time it appears&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cite sources inside the answer, not just a metadata field
&lt;/h3&gt;

&lt;p&gt;Initial structure had a separate &lt;code&gt;sources: [...]&lt;/code&gt; array. Worked for humans, but when fine-tuning, the model doesn't always learn to carry the citation into its output.&lt;/p&gt;

&lt;p&gt;Now: citations appear &lt;strong&gt;inline in the answer text&lt;/strong&gt; (&lt;code&gt;"Enligt PBL 9 kap. 2 §..."&lt;/code&gt;) AND in the metadata field. The model learns to cite, not just to answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 30–150 words per answer
&lt;/h3&gt;

&lt;p&gt;Tested with shorter (15 words) and longer (500 words). Shorter loses grounding; longer drifts. 30-150 is the sweet spot for factual legal Q&amp;amp;A.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multi-format release
&lt;/h3&gt;

&lt;p&gt;Shipped in 5 formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;faq.json&lt;/code&gt; — master with metadata&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;faq.jsonl&lt;/code&gt; — HuggingFace-native (one record per line)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;faq-alpaca.jsonl&lt;/code&gt; — Alpaca instruction format&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;faq-sharegpt.jsonl&lt;/code&gt; — ShareGPT conversation format&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;faq.csv&lt;/code&gt; — for non-ML users (Excel / Google Sheets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same data, 5 pipelines, zero conversion friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Via HuggingFace datasets:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dataset&lt;/span&gt;
&lt;span class="n"&gt;ds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DecDEPO/swedish-construction-faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Swedish: ds["train"] (503 rows)
# English: load_dataset(..., "english")
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Via pip:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;zaragoza-construction-faq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;zaragoza_construction_faq&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;zcf&lt;/span&gt;

&lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                    &lt;span class="c1"&gt;# 503 SV Q&amp;amp;A as list of dicts
&lt;/span&gt;&lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# 503 EN
&lt;/span&gt;&lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bygglov&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# filter by category
&lt;/span&gt;&lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;              &lt;span class="c1"&gt;# all 39 categories
&lt;/span&gt;
&lt;span class="c1"&gt;# Iterators for LLM fine-tuning
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_alpaca&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# rec = {"instruction": "...", "output": "..."}
&lt;/span&gt;    &lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;zcf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_sharegpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# rec = {"conversations": [{"role": "user", ...}, {"role": "assistant", ...}]}
&lt;/span&gt;    &lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Via Kaggle / CSV:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/datasets/decdepo/swedish-construction-faq" rel="noopener noreferrer"&gt;Kaggle dataset page&lt;/a&gt; — download as zip, drop into your notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Academic citation (DOI assigned)
&lt;/h2&gt;

&lt;p&gt;Zenodo assigned a permanent DOI, so it's citable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight bibtex"&gt;&lt;code&gt;&lt;span class="nc"&gt;@dataset&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;zaragoza_swedish_construction_faq_2026&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;author&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{{Zaragoza AB}}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{Swedish Construction FAQ — Open Q\&amp;amp;A Dataset (SV + EN)}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;year&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{2026}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;publisher&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{Zenodo}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;doi&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{10.5281/zenodo.19630803}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DOI: &lt;a href="https://doi.org/10.5281/zenodo.19630803" rel="noopener noreferrer"&gt;10.5281/zenodo.19630803&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  License
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CC BY 4.0&lt;/strong&gt; — free for commercial and research use, attribution required.&lt;/p&gt;

&lt;p&gt;Intentionally chose BY over BY-SA because I want this in commercial products (fine-tune a chatbot for a construction firm, build a RAG system for a municipal permit office, whatever) with no copyleft friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it lives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🐙 GitHub: &lt;a href="https://github.com/zaragoza-ab/swedish-construction-faq-1000" rel="noopener noreferrer"&gt;zaragoza-ab/swedish-construction-faq-1000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🤗 HuggingFace: &lt;a href="https://huggingface.co/datasets/DecDEPO/swedish-construction-faq" rel="noopener noreferrer"&gt;DecDEPO/swedish-construction-faq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 PyPI: &lt;a href="https://pypi.org/project/zaragoza-construction-faq/" rel="noopener noreferrer"&gt;zaragoza-construction-faq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📊 Kaggle: &lt;a href="https://www.kaggle.com/datasets/decdepo/swedish-construction-faq" rel="noopener noreferrer"&gt;decdepo/swedish-construction-faq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📜 Zenodo: &lt;a href="https://doi.org/10.5281/zenodo.19630803" rel="noopener noreferrer"&gt;DOI 10.5281/zenodo.19630803&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Target is &lt;strong&gt;1000+ Q&amp;amp;As&lt;/strong&gt; (v2.0). The areas most underrepresented right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kommun-specific rules (each of Sweden's 290 municipalities has its own bygglov process quirks)&lt;/li&gt;
&lt;li&gt;Post-2020 case law (big shifts on dolda fel doctrine)&lt;/li&gt;
&lt;li&gt;Cross-border cases (what if your contractor is Polish, Romanian, Baltic?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you're Swedish and find something outdated or wrong, open a PR or an issue.&lt;/strong&gt; I'll merge within a day.&lt;/p&gt;

&lt;p&gt;Also releasing a 510-entry &lt;strong&gt;trilingual construction glossary&lt;/strong&gt; (Swedish / English / Polish) in a sibling repo, because Polish construction workers in Sweden are a huge demographic and there's zero open terminology for them.&lt;/p&gt;




&lt;p&gt;Built this for Zaragoza AB (Helsingborg) — a small construction firm that's using the dataset internally for their customer Q&amp;amp;A chatbot. Open-sourced because Swedish AI needs more domain data and there's no business reason to keep it closed.&lt;/p&gt;

&lt;p&gt;Feedback welcome. Especially if you're working on Swedish NLP, building a Swedish legal RAG system, or just trying to renovate your kök and wondering if bygglov applies.&lt;/p&gt;

</description>
      <category>dataset</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
