Let's step beyond the scope of GDAL and explore the features of emscripten.
File System
In computing, a File System is a method for managing and accessing data in the form of files. Data is stored on a variety of hardware devices, each with its own way of accessing data. The file system abstracts the complexities of data management and access into a unified interface, allowing users to manage and access stored data without understanding the underlying physical device parameters. Different operating systems have developed various file systems over time—for example, Linux supports ext, ext2, etc.; Windows supports NTFS, FAT32, etc.; and macOS supports HFS+, APFS, etc. These are not entirely compatible with each other.
To enable software to run across different UNIX-like operating systems, the IEEE developed the POSIX standard. Linux largely adheres to the POSIX specification, including its file system conventions. Linux implements abstraction through the VFS (Virtual File System) layer, a software layer in the kernel that provides a common interface for all types of file systems. Applications and system calls interact only with the VFS, which routes operations to specific file system drivers (e.g., ext4, NTFS).
Why discuss Linux and POSIX? Two main reasons:
- Most algorithm libraries are compiled in operating systems that support POSIX-compliant file systems.
- Emscripten’s file system design is inspired by Linux’s POSIX compatibility.
How Applications Access the File System
Operating systems provide library functions for file access to applications. In C/C++, this library is libc/libc++. These library functions further encapsulate the details of file system operations, turning file operations into operations on file handles. This approach offers several advantages:
- Reduces kernel system calls.
- Facilitates compatibility across different operating systems.
- Simplifies operations.
To read a file in a C program, the process is: Open File → Read Data → Close File
.
include <stdio.h>
include <stdlib.h>
int main(void) {
const char *path = "input.txt";
FILE *fp = fopen(path, "r"); // Open text file (read-only)
if (!fp) {
perror("fopen failed");
return 1;
}
char buffer[1024]; // Buffer to store each line
while (fgets(buffer, sizeof(buffer), fp)) {
// Data is now in buffer; use as needed
// e.g., process the string, parse content, etc.
}
if (ferror(fp)) { // Check for read errors
perror("read error");
fclose(fp);
return 1;
}
fclose(fp); // Close the file
return 0;
}
libc (C Standard Library) includes 30 header files to date.
<stdio.h>
contains core input/output functions,<stdlib.h>
includes functions for number conversion, memory allocation, process control, etc. Other commonly used headers include<math.h>
for math functions and<assert.h>
for assertions.
How WebAssembly Reads and Writes Files
In JavaScript, files are typically stored as File
objects, which inherit from Blob
and are essentially large chunks of binary data. If designing an algorithm from scratch, one might write files into memory and pass an ArrayBuffer
as a pointer when calling functions. However, mature libraries often rely on file system operations—using file paths to locate files and file handles to pass them—making it difficult to switch to pointer-based approaches.
To address this, Emscripten provides a set of interfaces compatible with standard file operations. Inspired by POSIX, these interfaces closely resemble Linux’s file operations. For applications, the file system is transparent; they only know how to read and write files via libc interfaces, unaware of the underlying data storage mechanisms. During compilation, Emscripten performs a "bait-and-switch," replacing libc interfaces with syscalls and substituting the operating system’s VFS calls with Emscripten VFS calls, enabling WebAssembly file operations.
Emscripten File System
With the file operation interfaces in place, how is the data stored? Emscripten offers a flexible virtual file system architecture:
MEMFS
The Memory File System is Emscripten’s default file system, automatically mounted at the root directory /
. Data is stored in memory and is lost when the page is refreshed.
NODEFS / NODERAWFS
These file systems can only be used in a Node.js environment.
NODEFS proxies the host’s file system into Emscripten’s virtual file system using Node.js’s synchronous file APIs, indirect read/write access to the local disk.
NODERAWFS bypasses Emscripten’s proxying and directly uses Node.js’s file module. The key difference is that NODEFS requires calling FS.mount()
to mount the virtual file system and access files via virtual paths, while NODERAWFS uses absolute physical paths directly without mounting.
NODERAWFS is faster than NODEFS, but NODEFS uses file caching to reduce system calls. Use NODERAWFS for reading/writing large files from disk, and NODEFS for handling small, fragmented files.
IDBFS
IDBFS can only be used in browsers, including Web Workers.
IDBFS stores data in an IndexedDB
instance. IndexedDB provides an asynchronous interface, while POSIX standards are synchronous—the two are incompatible. When using IDBFS, Emscripten first stores data in MEMFS and tracks file changes. The user must call FS.syncfs()
to flush changes to IndexedDB. If the user forgets to call FS.syncfs()
before closing or refreshing the page, changes in MEMFS will be lost. This can be mitigated by listening to pagehide
or beforeunload
events to force a sync.
When mounting IDBFS, the autoPersist: true
parameter can be set to automatically save changes after each file modification. However, frequent file changes may impact performance.
WORKERFS
WORKERFS can only be used within Workers.
This file system provides read-only access to File
and Blob
objects inside a Worker without copying the entire file data into memory, making it suitable for handling large files.
PROXYFS
PROXYFS enables file sharing between multiple WebAssembly modules.
// Module 2 can use the path "/fs1" to access and modify Module 1's filesystem
module2.FS.mkdir("/fs1");
module2.FS.mount(module2.PROXYFS, {
root: "/",
fs: module1.FS
}, "/fs1");
Virtual File System Analysis
The core data structure of Emscripten’s file system is FSNode
, which mimics the inode structure in Linux file systems. The basic structure is:
class {
node_ops = {}; // Node operations (e.g., lookup, create)
stream_ops = {}; // Stream operations (e.g., read, write, seek)
mounted = null; // Mount information of the node
constructor(parent, name, mode, rdev) {
if (!parent) {
parent = this; // Root node sets parent to itself
}
this.parent = parent; // Parent node (directory node)
this.name = name; // Node name (file or directory name)
this.mode = mode; // File type and permissions
this.rdev = rdev; // Major/minor device numbers (0 for non-device files)
this.id = FS.nextInode++; // Globally unique node ID
this.contents = null; // File content (ArrayBuffer) or list of directory entries
this.size = 0; // File size in bytes
this.mount = parent.mount; // File system mounted at this node
this.atime = this.mtime = this.ctime = Date.now(); // Access, modification, and status change times
}
}
During file system initialization, FS.mount(MEMFS, {}, '/')
is called to mount the memory file system at the root directory. Other file systems can be mounted as needed within MEMFS, e.g.,
FS.mount(WORKERFS, {
files: files // Array of File objects or FileList
}, '/worker'); // Mount WORKERFS at /worker
Other file operations, such as mkdir
, rmdir
, chmod
, and link
, are implemented in the FS
object and can be called directly. The file system is hierarchical; unless it is a mount point, child nodes inherit the file system type from their parent:
mkdir() -> mknod() -> lookupPath() -> new FSNode()
Application calls to open
, read
, write
, and close
are ultimately directed to FS.open
, FS.read
, FS.write
, and FS.close
.
The mode
field records the file type and permissions using the POSIX standard, represented as a 32-bit integer. The first 8 bits indicate the file type, and the remaining 24 bits represent permissions.
Hardware Devices
Everything is a file. Like other Unix-like operating systems, Emscripten’s virtual file system can register hardware devices. For example, to simulate a serial communication device in the browser:
// Generate a device number
const dev = FS.makedev(1, 8);
// Register device operations
FS.registerDevice(dev, {
read(stream, buffer, offset, length) {
// TODO ...
},
write(stream, buffer, offset, length) {
// TODO ...
},
ioctl() {
// TODO Simulate getting baud rate
}
});
// Create a device node
FS.mkdir('/dev/ttyUSB0');
FS.mkdev('/dev/ttyUSB0', dev);
Now, the serial port /dev/ttyUSB0
can be read from and written to in C.
Future Development
Currently, Emscripten’s virtual file system is implemented in JavaScript, which has a significant drawback: it does not support multithreading. Emscripten is developing a new file system, WASMFS, though it is not yet complete. In the future, WASMFS will support multithreading and is expected to deliver significant performance improvements.
Top comments (0)