DEV Community

Hugo Russel
Hugo Russel

Posted on

Building a CAD viewer: the DWG format is hostile by design

The DWG Parsing Nightmare: A War Story from Building a CAD Viewer

When I set out to build a web-based CAD viewer, I thought the hard part would be rendering complex geometries in WebGL. I was wrong. The hardest part was getting the data out of DWG files in the first place.

DWG is the native file format for AutoCAD, arguably the most widely used CAD software in the world. It's been around since 1982, used by architects, engineers, and designers everywhere. You'd think parsing a 40-year-old format would be straightforward. It's not.

The Proprietary Wall

DWG is a proprietary, undocumented binary format owned by Autodesk. There is no official specification. The format has evolved through dozens of versions, each with subtle (and sometimes not-so-subtle) changes. Autodesk has historically been... let's say "protective" of the format, including sending cease-and-desist letters to projects attempting to reverse-engineer it.

This means if you want to parse DWG files, you have exactly two options:

  1. License the official SDK from Autodesk (expensive, restrictive licensing)
  2. Use a reverse-engineered library (buggy, incomplete, crashes frequently)

I went with option 2: LibreDWG, an open-source library that has been painstakingly reverse-engineered over many years. It's genuinely impressive work. It's also a minefield. I think they stopped the development of it at least twice by lack of devs.

The Architecture of Desperation

My first realization was that I couldn't parse DWG directly in the browser. LibreDWG is written in C, and while WebAssembly ports exist, they're incomplete and even more crash-prone than the native version. So I built a conversion service:

DWG file → Cloud service (LibreDWG) → DXF file → Browser (custom parser)
Enter fullscreen mode Exit fullscreen mode

DXF is the "open" exchange format for CAD files - also from Autodesk, but at least it's documented and text-based. The plan was simple: convert DWG to DXF server-side, then parse the DXF in JavaScript.

Simple plans rarely survive contact with reality.

The Segfault Epidemic

LibreDWG crashes. A lot. Not because it's poorly written - the maintainers have done heroic work - but because the DWG format is a moving target full of undocumented features, proprietary extensions, and vendor-specific entities.

Here's what happens when someone opens a DWG file created with specialized CAD software (think: electrical schematics, structural engineering, piping):

// server.js - the error handling that keeps me up at night
try {
  execSync(command, {
    timeout: 120000, // 2 minutes for complex files
    stdio: ['pipe', 'pipe', 'pipe'],
    maxBuffer: 50 * 1024 * 1024,
  });
} catch (conversionError) {
  conversionHadErrors = true;
  conversionErrorMessage = conversionError.message;
  console.warn(`Conversion had errors (checking for partial output):`,
    conversionErrorMessage.substring(0, 500));
}

// Check for common crashes
if (conversionErrorMessage.includes('Segmentation fault') ||
    conversionErrorMessage.includes('Unknown Class entity')) {
  throw new Error(
    'This DWG file contains custom/proprietary entities that are not supported. ' +
    'The file may have been created with specialized CAD software or plugins.'
  );
}
Enter fullscreen mode Exit fullscreen mode

The key insight here is that LibreDWG often produces valid output even when it crashes. The process segfaults after writing 90% of the file. So instead of giving up, we check if the output file exists:

// Check if output file was created (might exist even after errors)
if (!fs.existsSync(outputPath)) {
  // Only fail if we truly have nothing
  throw new Error(`DWG conversion failed`);
}

// Verify output file has content
const outputStats = fs.statSync(outputPath);
if (outputStats.size < 100) {
  throw new Error('Conversion produced an empty or invalid output file');
}

if (conversionHadErrors) {
  console.log(`Output file created despite errors - proceeding with partial conversion`);
}
Enter fullscreen mode Exit fullscreen mode

Repairing Truncated Files

But here's where it gets really interesting. When LibreDWG crashes mid-write, it leaves behind truncated DXF files. A valid DXF file must end with an EOF marker:

  0
EOF
Enter fullscreen mode Exit fullscreen mode

Without it, parsers choke. So I wrote a repair function - 100 lines of code to salvage partial output:

function repairDxfFile(filePath, requestId) {
  let content = fs.readFileSync(filePath, 'utf-8');

  // Check if file has a valid EOF at the end
  const cleanEofPattern = /\r?\n\s*0\r?\n\s*EOF\s*$/i;

  if (cleanEofPattern.test(content)) {
    return; // File is fine
  }

  console.log(`DXF file needs repair - checking structure...`);

  // Strategy: Find the last COMPLETE entity by searching backwards
  const entityTypes = new Set([
    'LINE', 'CIRCLE', 'ARC', 'POLYLINE', 'LWPOLYLINE', 'TEXT', 'MTEXT',
    'INSERT', 'DIMENSION', 'SPLINE', 'ELLIPSE', 'HATCH', 'POINT', 'SOLID',
    '3DFACE', 'ATTRIB', 'SEQEND', 'VERTEX', 'LEADER', 'TOLERANCE', 'TRACE',
    'VIEWPORT', 'IMAGE', 'WIPEOUT', 'XLINE', 'RAY', 'REGION', 'BODY',
    '3DSOLID', 'ENDSEC', 'ENDBLK', 'ATTDEF', 'BLOCK', 'ENDTAB', 'TABLE'
  ]);

  const lines = content.split(/\r?\n/);
  let lastValidEntityEnd = lines.length;

  // Search backwards to find the last complete entity
  for (let i = lines.length - 1; i >= 1; i--) {
    const prevLine = lines[i - 1].trim();
    const currLine = lines[i].trim().toUpperCase();

    if (prevLine === '0' && entityTypes.has(currLine)) {
      lastValidEntityEnd = i - 1;
      break;
    }

    // Check for obvious truncation (binary garbage)
    if (prevLine === '0' && currLine.length > 0 && !/^[A-Z0-9_]+$/.test(currLine)) {
      lastValidEntityEnd = i - 1;
      break;
    }
  }

  // Truncate and add EOF
  const truncatedLines = lines.slice(0, lastValidEntityEnd);
  content = truncatedLines.join('\n') + '\n  0\nEOF\n';

  fs.writeFileSync(filePath, content);
}
Enter fullscreen mode Exit fullscreen mode

This function has saved countless files that would otherwise be completely unusable.

The Encoding Nightmare

You'd think once you have a valid DXF file, the hard part is over. You'd be wrong.

DXF files can contain text in any encoding. The format predates Unicode, so files might be encoded in Windows-1252 (Western European), Windows-1257 (Baltic), Windows-1251 (Cyrillic), or any number of legacy code pages. Sometimes the encoding is specified in the file header. Sometimes it isn't. Sometimes it's specified incorrectly.

And then there's mojibake - what happens when UTF-8 bytes are interpreted as a legacy encoding. A Lithuanian street name like "Šiaulių" becomes "Å iaulių". A Spanish word like "señal" becomes "señal".

I wrote 400 lines of code to detect and fix these encoding issues:

function fixUtf8Mojibake(text: string): string {
  // Try to reverse UTF-8 mojibake by re-encoding as Latin-1 and decoding as UTF-8
  try {
    const bytes: number[] = [];
    for (let i = 0; i < text.length; i++) {
      const code = text.charCodeAt(i);
      if (code < 256) {
        bytes.push(code);
      } else {
        // Windows-1252 has special chars in 0x80-0x9F that map to different Unicode points
        const win1252ToBytes: Record<number, number> = {
          0x20AC: 0x80, // €
          0x0160: 0x8A, // Š (important for Baltic!)
          0x0161: 0x9A, // š
          0x017D: 0x8E, // Ž
          0x017E: 0x9E, // ž
          // ... 30 more mappings
        };
        const byteVal = win1252ToBytes[code];
        if (byteVal !== undefined) {
          bytes.push(byteVal);
        } else {
          return text; // Can't convert - not mojibake
        }
      }
    }

    const uint8 = new Uint8Array(bytes);
    const decoded = new TextDecoder('utf-8', { fatal: true }).decode(uint8);

    if (decoded !== text && !decoded.includes('\uFFFD')) {
      return decoded;
    }
  } catch {
    // UTF-8 decoding failed - text wasn't mojibake
  }

  return text;
}
Enter fullscreen mode Exit fullscreen mode

The code also includes explicit pattern replacements for common mojibake sequences:

const printableMojibake: Record<string, string> = {
  // Lowercase accented vowels
  'á': 'á', 'é': 'é', 'í': 'í', 'ó': 'ó', 'ú': 'ú',

  // Lithuanian/Baltic characters
  'Å\u00A0': 'Š',  // S with caron
  'Å¡': 'š',       // s with caron
  'Ä…': 'ą',       // a with ogonek
  'Ä™': 'ę',       // e with ogonek

  // Common symbols
  '°': '°', '±': '±', '²': '²', '³': '³',
  // ... 100+ more patterns
};
Enter fullscreen mode Exit fullscreen mode

The Entity Zoo

Even after solving conversion and encoding, there's the matter of actually parsing the DXF content. The dxf-parser npm package handles basic entities, but CAD files are full of complex entity types that it doesn't support:

  • LEADER/MULTILEADER: Annotation arrows with attached text
  • HATCH: Filled regions with pattern fills
  • ATTRIB: Block attributes with text values
  • ACAD_TABLE: Embedded tables

I wrote custom parsers for each of these. The HATCH parser alone is 450 lines of code, handling polyline boundaries, arc edges, pattern definitions, and nested holes (islands within islands).

The MULTILEADER parser is particularly nightmarish because the entity format changed between AutoCAD versions, and the data is encoded in nested "sections" that can appear in variable order:

// Track if we're inside a LEADER_LINE section
let inLeaderLine = false;

// Code 303 with "LEADER_LINE" marks the start
if (code === 303 && value.includes('LEADER_LINE')) {
  inLeaderLine = true;
  currentLeaderLineVertices = [];
}

// Closing brace ends the current section
// Be lenient - some formats have extra chars
if ((code === 304 || code === 305) && value.includes('}')) {
  if (inLeaderLine && currentLeaderLineVertices.length >= 2) {
    allLeaderLines.push([...currentLeaderLineVertices]);
  }
  inLeaderLine = false;
}
Enter fullscreen mode Exit fullscreen mode

What I Still Can't Parse

After all this work, there are still entity types I've given up on:

const unsupportedTypes = [
  'VIEWPORT', 'WIPEOUT',           // Display entities
  'REGION', 'BODY', '3DSOLID',     // 3D solids (would need a whole CSG engine)
  'RAY', 'XLINE',                  // Infinite lines
  'IMAGE', 'OLE2FRAME',            // Embedded objects
];
Enter fullscreen mode Exit fullscreen mode

3D solids in particular would require implementing a full constructive solid geometry (CSG) engine. The entities store boolean operations on primitive shapes - it's basically a compressed representation of the modeling history.

Lessons Learned

  1. Embrace partial success. A file that renders 90% correctly is infinitely more useful than an error message.

  2. Layer your defenses. Validate at every stage: file size limits, conversion timeouts, output validation, content repair, encoding detection.

  3. Log everything. When users report "this file doesn't work," you need to know exactly where in the pipeline it failed.

  4. Accept your limits. Some files will never work. Proprietary plugins, extremely old format versions, corrupted files - know when to show a helpful error message and move on.

  5. Compression is your friend. DXF files are text-based and compress 80-90% with gzip. This turned what would be a 50MB transfer into a 5MB transfer.

The Silver Lining

Despite all these challenges, building a DWG viewer has been deeply satisfying. Every time I fix a parsing edge case, another class of files starts working. Engineers who previously couldn't share their drawings without expensive software can now view them in a browser.

The DWG format may be hostile, but the engineering community's determination to make it accessible is stronger. Projects like LibreDWG represent years of painstaking reverse-engineering work. The Open Design Alliance maintains another implementation. The community keeps pushing forward.

If you're thinking about building something that handles DWG files, my advice is: don't underestimate the challenge, but don't be deterred either. Start with DXF support, add DWG conversion when you're ready, and build up your error handling iteratively. Every edge case you fix helps another user.


The code examples in this article are from a production CAD viewer. Names have been shortened for clarity, but the logic is real.

Top comments (0)