In March 2025, Japan's Ministry of Health, Labour and Welfare (MHLW) published a structured JSON schema for Safety Data Sheet data exchange. The schema covers roughly 200 deeply nested fields and is intended to standardize how SDS information moves between chemical management systems.
Most SDS tooling was not built for this.
What makes Japan's SDS requirements different
Japan's SDS requirements come from two laws: the Industrial Safety and Health Act (ISAH, 労働安全衛生法) and the Chemical Substances Control Law (化審法). Both mandate SDS for regulated chemicals, with format requirements governed by JIS Z 7253 — Japan's implementation of the UN Globally Harmonized System (GHS).
JIS Z 7253 follows the standard 16-section GHS structure. In principle, any GHS-compliant SDS satisfies the content requirements. What makes Japanese compliance distinct is a digital layer: the MHLW schema specifies how SDS content should be structured as machine-readable data, with field-level granularity that PDF documents cannot capture.
How GHS looks different by country
GHS uses a "building block" approach — each country adopts the elements it chooses. The result is that the same GHS-aligned document varies by jurisdiction:
| Country/Region | Standard | GHS basis | Notable difference |
|---|---|---|---|
| Japan | JIS Z 7253:2019 | GHS Rev. 6 | MHLW digital schema; revised to GHS Rev. 9 in Dec 2025 |
| United States | OSHA HazCom 2012 | GHS Rev. 3 | Updated to GHS Rev. 7 in 2024 |
| European Union | CLP Regulation | GHS-aligned | Stricter on environmental hazards |
| China | GB 13690-2009 | GHS Rev. 4 equivalent | Moving to GB 30000.1-2024 (GHS Rev. 8), mandatory from August 2025 |
| Taiwan | CNS 15030 | GHS-aligned | — |
Japan-specific regulatory fields
The MHLW schema includes fields with no equivalent in EU REACH or US OSHA HazCom formats. These are the main reason international SDS tooling does not cover the schema out of the box:
| Law | Example fields | What they capture |
|---|---|---|
| Chemical Substances Control Law (化審法) |
CaSCL.ClassificationStatus, CaSCL.RegistrationNumber
|
Regulatory classification and registration numbers under this law |
| Industrial Safety and Health Act (安衛法) |
ISHAct.PublicationOfName, ISHAct.Notification
|
Name disclosure and notification obligations |
| Poisonous and Deleterious Substances Control Law | ControlledSubstancesAct.Applicability |
Whether the substance is classified as poison, deleterious, or specific poison |
| PRTR Law | — | Chemical release and transfer reporting obligations |
Section 15 (Regulatory Information) is the most complex section in the schema — it contains separate subsections for each of these laws, each with its own field structure.
Why this matters now: the 2022 law revision
The MHLW published the schema in 2025, but the driver was a 2022 amendment to the Industrial Safety and Health Act. The amendment shifted Japan's chemical substance regulation from a prescriptive model (government designates specific hazardous substances) to an autonomous management model (companies assess and manage risk themselves).
The practical impact:
| Enforcement date | Change |
|---|---|
| April 2023 | Shift to autonomous management model — all substances with confirmed GHS hazard classifications brought progressively into scope |
| April 2024 | SDS must now specify concentration ranges numerically (not just qualitatively) |
| April 2025 | Protective equipment mandatory for substances with skin/eye hazards |
| April 2027 | Risk assessment obligations expand to all regulated substances |
With risk assessment coverage expanding significantly, companies need to process SDS data faster and more accurately. Manual PDF entry does not scale. The JSON schema is the infrastructure layer for automating this.
Where existing tools stop
Commercial SDS platforms
The major SDS authoring platforms — Sphera, EcoOnline, Chemwatch, Verisk 3E — have broad international coverage. Japanese is typically a supported output language. What they do not provide, as far as I have found, is export to the MHLW JSON schema. They produce Word or PDF output in the correct section structure, which satisfies the document requirement but not the structured data exchange requirement.
Japanese-market products like SDS Meister and SmartSDS support MHLW JSON output, but their PDF-to-JSON conversion coverage is limited — they are primarily SDS authoring tools, not bulk conversion tools for incoming supplier documents.
Open-source options
| Tool | Language | MHLW JSON | PDF → JSON | Approach |
|---|---|---|---|---|
| sds_parser | Python | No | Yes | Regex, per-manufacturer rules |
| tungsten | Python | No | Yes | Rule-based, English-only |
| sds-converter | Rust | Yes | Yes | LLM-based extraction |
sds_parser and tungsten solve a different problem: extracting SDS data in English, for specific known manufacturer formats. Neither targets the MHLW schema.
The format inconsistency problem
Even within JIS Z 7253-compliant documents, format varies by manufacturer:
| Source of variation | Example |
|---|---|
| Section heading labels | "2. 危険有害性の要約" (JIS Z 7253) vs "2. Hazard(s) identification" (OSHA HazCom) vs "第2部分 危险性概述" (GB/T 16483) — all mean the same thing |
| Section order | The 16 sections can appear in any order the manufacturer chooses |
| Concentration notation | "≥95%", "1〜5%", "約100%", "企業秘密" (trade secret) all need different handling |
| Language mixing | Japanese SDS documents regularly contain English chemical names and CAS numbers |
A rule-based parser must enumerate every variant. In practice, manufacturer-specific headings add another layer of variation on top of the standard differences.
The schema itself
Two properties of the MHLW schema are worth knowing before implementing against it.
Section 3 (composition) is the hardest part
Section 3 stores component information as a repeating array. Each component object has nested fields for chemical identity, concentration range, and hazard classification. The same data appears differently depending on whether the source document covers a pure substance, a mixture, or a trade secret formulation.
{
"Composition": {
"CompositionAndConcentration": [
{
"ChemicalIdentity": {
"CASNumber": "64-17-5",
"ISHActNotificationNumber": "2-396"
},
"ConcentrationRange": {
"ConcentrationRangeFrom": 95.0,
"ConcentrationRangeTo": 100.0,
"ConcentrationRangeUnit": "%"
},
"TradeSecretFlag": false
}
]
}
}
Typos locked into v1.0
The schema contains field name errors that are now part of the specification:
HumanExposureAndEmergencyMeasuress ← trailing double-s
TestGuidline ← missing 'e' (not Guideline)
Desclaimer ← transposed letters (not Disclaimer)
gazetteNo ← lowercase first character
Correcting these would break all existing implementations, so they cannot be fixed in v1.0. An implementation that normalizes these to standard English spellings will fail schema validation.
sds-converter
I built sds-converter to address the MHLW schema gap. It handles both directions: PDF/DOCX/XLSX to MHLW JSON, and MHLW JSON to a JIS Z 7253-compliant Word document.
The core approach: rather than enumerating format variants with rules, the tool passes raw section text and the corresponding MHLW schema fields to an LLM and asks it to map values. The LLM handles heading label variation naturally. The output is validated against the schema before writing.
cargo install sds-converter
# PDF → MHLW JSON
sds-converter to-json --input input.pdf --output output.json
# MHLW JSON → JIS Z 7253 Word document
sds-converter to-docx --input output.json --output result.docx --lang ja
The LLM backend is pluggable — Claude, GPT, Gemini, Mistral, Groq, or local models via Ollama. A --quality flag adjusts cost versus accuracy for batch workloads.
Known limitations:
| Issue | Status |
|---|---|
| Scanned PDFs without a text layer | Not supported — requires upstream OCR |
| Section 3 tables with merged cells | Extraction sometimes fails on complex DOCX layouts |
| Precision fields mixed with "not measured" entries | Occasional type errors in Section 9 output |
These are open problems, not design decisions.
The open gap
The MHLW schema represents a real need for anyone handling chemical compliance in Japan at volume. Commercial tools cover the authoring side; the bulk conversion of incoming supplier PDFs to structured data has no open-source solution targeting this schema — other than sds-converter, which I developed and which is the only implementation I am aware of.
The repository is open. Contributions on the extraction side — particularly Section 3 table handling — are welcome. If you work in cheminformatics or chemical compliance and have approached the MHLW compliance problem differently, I would be interested to hear it.
Top comments (0)