Yangholmes

Posted on Aug 10 • Edited on Sep 3

How to Use GDAL in Web Applications (Part 2)

#webassembly #gdal

This article was translated, original article here.

Part 1 covered the compilation process. This article explores how to use it.

Basic Usage of WebAssembly

Instantiating wasm

Though named "WebAssembly," it resembles assembly language and sits between intermediate representation and machine code. Unlike typical JavaScript libraries, WebAssembly cannot be loaded via import or <script> tags since it's not JavaScript.

Illustration from Creating and working with WebAssembly modules, an excellent article makes my brain spin.

Browsers provide a complete WebAssembly API for loading code. Assuming we have a some.wasm file:

fetch("some.wasm")
  .then((response) => response.arrayBuffer())
  .then((bytes) => WebAssembly.instantiate(bytes, options))
  .then(({instance}) => {
    // Assuming a some_func function is exported
    instance.exports.some_func();
  });

All exports are mounted under instance.exports. We can discover callable interfaces and their parameters by reviewing source code or documentation provided by the wasm author.

Memory Management

JavaScript developers rarely worry about memory management, as if handled by a dedicated manager. In contrast, WebAssembly requires manual memory management for proper read/write operations. WebAssembly uses contiguous, untyped linear memory – like an array – accessed through specific instructions, similar to C/C++ pointers. Memory is allocated in JavaScript and passed to the wasm instance during initialization:

const memory = new WebAssembly.Memory({ initial: 10, maximum: 100 });

fetch("some.wasm")
  .then((response) => response.arrayBuffer())
  .then((bytes) => WebAssembly.instantiate(bytes, {
    env: { memory: memory }
  }));

WebAssembly shares the same thread with its calling code. Unlike workers using transfer, WebAssembly and JavaScript can access the same memory block, enabling data transfer via shared memory. Note: WebAssembly can only access memory allocated by JavaScript and passed during instantiation.

Providing initial memory is not mandatory – wasm can allocate its own. JavaScript-managed memory offers advantages in data sharing and replication. For graphics, audio/video processing requiring large data transfers, passing memory addresses is far more efficient than serializing/deserializing data through function parameters. Additionally, JavaScript-allocated memory can be shared among multiple wasm modules, enabling inter-module collaboration.

Table Mechanism

Beyond exposed interfaces, WebAssembly may need to call JavaScript functions. For example, mapping console.log to C's standard output:

WebAssembly.instantiate(wasmBlob, {
  env: {
    js_callback: (value) => console.log(value) // Inject JS function directly
  }
});

Calling js_callback in C logs messages to console. However, this approach has risks:

Potential pointer leaks and arbitrary code execution via js_callback
Bound functions become immutable
wasm can't detect if bound functions are garbage collected, causing crashes

WebAssembly uses Tables to ensure security. Instead of raw pointers, functions are stored as references. Example:

// C

// Define function signature matching console.log
typedef void (*log_func_ptr)(const char* message);

void safe_log(const char* message) {
    // Get global log function index (set in JS)
    extern uint32_t log_function_index;

    // Pointer declaration
    log_func_ptr log_ptr;

    // Call function through table
    log_ptr(message);
}

// JavaScript

function sanitizedConsoleLog(messagePtr) {
  // Boundary check - limit string length
  const maxLen = 256;
  let length = 0;

  // Safe string reading
  while (length < maxLen) {
      const byte = wasmMemory.getUint8(messagePtr + length);
      if (byte === 0) break;
      length++;
  }

  // Extract safe-range string
  const messageBytes = new Uint8Array(
      wasmMemory.buffer, 
      messagePtr, 
      Math.min(length, maxLen)
  );

  // Convert to string
  const message = new TextDecoder('utf-8', { fatal: true }).decode(messageBytes);

  // Actual console.log call
  console.log(message);
}

const table = new WebAssembly.Table({
  initial: 3,
  maximum: 10,
  element: 'anyfunc'  // Stores function references only
});

let index = 0;
table.set(
  index,
  new WebAssembly.Function(
    { parameters: ["i32"], results: [] },
    sanitizedConsoleLog
  )
); // Bind sanitizedConsoleLog to table index 0

// ...

const { instance } = await WebAssembly.instantiate(bytes, {
    env: {
        table: table,
        memory: wasmMemory,
        log_function_index: index // log_func_ptr points to table index 0
    }
});

// ...

// Call safe logging function
instance.exports.safe_log(messagePtr);

Functions defined in the JavaScript process cannot be directly set onto the table because JavaScript Functions lack type definitions. WebAssembly.Function provides a way to assign type definitions to JavaScript Functions, but as of now, WebAssembly.Functionhas not been implemented and is still in the proposal stage.

Thread Management

WebAssembly execution times are unpredictable. Since it runs in the same thread as the caller, calling wasm interfaces on the main JavaScript thread may block UI rendering. Typically, we use a Web Worker to execute WebAssembly:

Emscripten Glue Code

As seen, WebAssembly usage involves complexity: managing exports, memory, and function mapping with security considerations. Is there a simpler approach? Yes, friends, yes.

Remember the .js file in the compilation output from the previous article? This is the glue code – the messenger between JavaScript and WebAssembly. It simplifies:

WebAssembly module loading/initialization
JavaScript↔WebAssembly interfacing
C/C++ standard library implementations (file I/O, memory management)

The glue code outputs a function accepting an injection object and returning a Module object with all wasm exports:

(moduleArg = {}): Module

The glue code injects moduleArg into the Module object. For example, to map std::print to console.log:

let moduleArg = {
  print: function(text: string) { console.log('stdout: ' + text); }
};

Glue code can be exported as IIFE (global Module), ESM, or UMD. Export mode depends on compilation parameters.

Exported WebAssembly functions mount to the output Module, callable via ccall or converted to JavaScript functions using cwrap. Additional utilities like virtual filesystem I/O mount based on -s EXPORTED_RUNTIME_METHODS compilation parameters.

With this understanding, using WebAssembly becomes straightforward:

Hands-on Example

We'll read a TIFF file and extract its information. Directory structure:

├── CANYrelief1-geo.tif
├── gdal.worker.ts
├── gdal3WebAssembly.data
├── gdal3WebAssembly.js
├── gdal3WebAssembly.wasm
└── index.ts

index.ts is the entry point, initializing a GdalWorker. The gdal3WebAssembly.* files are compilation outputs (see previous article). Core functionality resides in gdal.worker.ts:

import CModule from './gdal3WebAssembly.js';
// Import wasm file as resource URL
import wasm from './gdal3WebAssembly.wasm?url';

// GDAL object mapping GDAL exports
let GDAL = {};

// Point to Emscripten virtual filesystem
let FS = {};
const SRCPATH = '/src';

let Module = {
  locateFile: () => wasm, // Critical for build tooling
  onRuntimeInitialized() {
    // Register all GDAL drivers
    Module.ccall('GDALAllRegister', null, [], []);

    GDAL.GDALOpen = Module.cwrap('GDALOpen', 'number', ['string']);
    GDAL.GDALClose = Module.cwrap('GDALClose', 'number', ['number']);
    // Register gdalinfo command
    GDAL.GDALInfo = Module.cwrap('GDALInfo', 'string', ['number', 'number']);

    // Mount FS object
    FS = Module.FS;
  }
}

/**
 * Initialize Module 
 */
function init() {
  return CModule(Module);
}

/**
 * Read TIFF file info
 * @param files tiff 文件
 */
function getTiffInfo(files: [File]) {
  // Create working directory
  FS.mkdir(SRCPATH);
  // mount tiff file
  FS.mount(Module.WORKERFS, {
    files: files
  }, SRCPATH);

  // open file and get handle
  const dataset = GDAL.GDALOpen(SRCPATH + '/' + files[0].name);
  // read info
  const info = GDAL.GDALInfo(dataset);

  return info;
}

/**
 * fetch tiff
 */
function fetchtiff() {
  return fetch('/api/tiff/CANYrelief1-geo.tif').then(res => res.blob());
}

self.onmessage = () => {
  fetchtiff().then(blob => {
    console.log(blob);
    const file = new File([blob], 'CANYrelief1-geo.tiff', {
        type: 'image/tiff'
    });

    init().then(() => {
      const result = getTiffInfo([file]);
      console.log(result);
    })
  })
}

Note line 13: The locateFile function specifies the wasm resource path, ideal for build tooling. The glue code uses this path to locate the wasm file.

After successful wasm instantiation, onRuntimeInitialized() executes. At this point:

GDAL drivers register via GDALAllRegister() (docs)
Key functions (GDALOpen, GDALClose, GDALInfo) wrap via cwrap() and mount to GDAL
Emscripten's virtual filesystem (FS) mounts

Note: cwrap only encapsulates C functions – other loaded C functions remain accessible.

Lines 39+ demonstrate GDAL usage. We:

Create workspace via FS.mkdir()
Mount TIFF file using WORKERFS
Open file with GDALOpen() to get dataset handle
Extract info via gdalinfo equivalent

Example output:

Driver: GTiff/GeoTIFF
Files: /src/CANYrelief1-geo.tiff
Size is 2800, 2800
Coordinate System is:
ENGCRS["WGS 84 / Pseudo-Mercator",
    EDATUM["Unknown engineering datum"],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (-12249462.599999999627471,4629559.794860946945846)
Pixel Size = (13.284000000000001,-13.285397060378999)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2017:04:01 20:24:57
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_SOFTWARE=Adobe Photoshop CC (Macintosh)
  TIFFTAG_XRESOLUTION=72
  TIFFTAG_YRESOLUTION=72
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
  PREDICTOR=2
Corner Coordinates:
Upper Left  (-12249462.600, 4629559.795) 
Lower Left  (-12249462.600, 4592360.683) 
Upper Right (-12212267.400, 4629559.795) 
Lower Right (-12212267.400, 4592360.683) 
Center      (-12230865.000, 4610960.239) 
Band 1 Block=2800x31 Type=Byte, ColorInterp=Red
Band 2 Block=2800x31 Type=Byte, ColorInterp=Green
Band 3 Block=2800x31 Type=Byte, ColorInterp=Blue

Conclusion

This article introduced WebAssembly usage. Future topics will cover:

Compilation optimization strategies for smaller outputs
Deep dive into emscripten glue code capabilities

DEV Community