From WebAssembly.org:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

WebAssembly is incredibly simple. I imagine that if you're writing the WebAssembly VM for a browser or Node.js, this can be a "good" simple. If you're a JavaScript programmer trying to incorporate WebAssembly into your application, this is more like a "frustrating" simple.

WebAssembly requires some glue code at the boundary layer between fetching a compiled wasm file and application JavaScript. One thing I found lacking in many guides was a focus on that boundary layer, and that left me hesitant to try to use any of the higher level toolchains that seem to magically generate interop code that I didn't understand.

How do you get wasm onto the web?

Ultimately you need to fetch a wasm file, and there are a few ways to generate one

  • WABT, or WebAssembly Binary Toolkit, includes the wat2wasm tool that converts WebAssembly's text representation into binary. It also includes wasm2wat to reverse the process.
  • Rust's wasm32-unknown-unknown target will compile and link a Rust library into a wasm file. It will not generate any JavaScript glue.
  • Presumably any language that is ultimately compiled with LLVM can target WebAssembly, same as Rust.
  • wasm-bindgen, js-sys, and stdweb are all Rust libraries that aim to generate glue code
  • If those above generate glue, Emscripten glues together a complete popsicle-stick house. For instance, if your desktop app needs filesystem access, it will generate one for you on top of IndexedDB.

The examples in this article will be either hand written, or compiled from Rust to the wasm32-unknown-unknown target.

I'll just say it: Rust is by far the easiest WebAssembly toolkit to get up and running, whether or not you use wasm-bindgen and its ilk. It's a literal 15 minute install from scratch, compared with Emscripten's requirement of 11GB of disk space and 45 minutes of your CPU at 100%. Look, it was 104℉ a week ago and I do not need that in my living space.

What WebAssembly can't do, and why glue code is needed

WebAssembly can't handle errors. When something like a divide by zero or a failed allocation occurs, the current function halts execution and an exception is raised on the JavaScript side.

WebAssembly can't represent any values other than the integer types i32 and i64, and the floating point types f32 and f64. This also limits what types of values can be passed into exported WebAssembly functions, as well as their return values. The vast majority of glue code deals with sending Arrays and Strings across FFI boundaries.

WebAssembly can't access outside of the memory space it's been explicitly granted.

"Hello, type coersion!"

The "Hello, world!" program of WebAssembly is returning the same number that you call a function with.

(module
    (func (export "passthroughI32") (param $value i32) (result i32)
      get_local $value)
    (func (export "passthroughF32") (param $value f32) (result f32)
      get_local $value))
WebAssembly.instantiateStreaming(fetch("/wasm-demo/hello_wasm.wasm"), {})
.then(wasm => { window.hello_wasm = wasm.instance.exports; });

From here on out, every JavaScript snippet in this article is also live on the page. If you open your dev tools, you can (and should!) interact with window.hello_wasm and all the examples that follow.

If we invoke the exported passthroughI32, we get just what we expect.

hello_wasm.passthroughI32(5) // => 5

Yup, checks out. The function is declared to take a single i32 param, and return an i32. The value we pass in will get coerced to that type. For instance:

hello_wasm.passthroughI32(5.0) // => 5
hello_wasm.passthroughI32(5.9) // => 5

And since WebAssembly can only use numeric data types, non-rational-numbers get passed into the function as 0.

hello_wasm.passthroughI32(NaN) // => 0
hello_wasm.passthroughI32(-Infinity) // => 0
hello_wasm.passthroughI32("Hello, world!") // => 0
hello_wasm.passthroughI32(window.navigator) // => 0

When coercing to a f32 or f64, Infinity and -Infinity are valid values. Non-numbers are passed into functions as NaN.

hello_wasm.passthroughF32("bees!") // NaN
hello_wasm.passthroughF32(1/0) // Infinity
hello_wasm.passthroughF32(-1/0) // -Infinity

Importing functions

Compilers like Rust's wasm32-unknown-unknown are opinionated about how to import functions, and this confused me because it differed from MDN's sample code.

Take this function that intentionally wastes time (by incrementing a number by 1 repeatedly), and imports some functionality from JavaScript in order report just how much time it's wasting.

(func $logTime (import "import" "logTime") (param f64))
(func $getTimestamp (import "import" "getTimestamp") (result f64))
;;                                    ^^^^^^^^^^^^ imported function
;;                           ^^^^^^ module name

This can be read as "from the 'import' module, use 'logTime' and 'getTimestamp'". In this case, the string "import" directly corresponds to a property in the import object (below, where we can see the program get instantiated), and the strings "logTime" and "getTimestamp" correspond to properties on that object.

const importedFuncImportObject = {
    import: {
        getTimestamp: Date.now,
        logTime: (value) => console.log(`Elapsed time: ${value} ms`),
    }
};
WebAssembly.instantiateStreaming(
    fetch("/wasm-demo/imported_func.wasm"),
    importedFuncImportObject,
).then(wasm => { window.imported_func = wasm.instance.exports; });

To clear up things I was confused about:

  • "module" here refers only to the property names of the import object. It has nothing to do with JS modules, or the fact that WebAssembly programs are called modules, or the (module) expression that wraps all WebAssembly programs.
  • The name of the module is completely arbitrary. Here we chose "import". Some examples on MDN use "js". If you're compiling through LLVM (as with the Rust toolchain), it will choose "env" for you.
  • You can use more than one module.
  • You can't import a function without putting it in a module.
imported_func.doWork(10); // 10
// log: "Elapsed time: 0ms"
imported_func.doWork(10000000); // 10000000
// log: "Elapsed time: 5ms"

Not only does it return the value passed in, but it console.logs the elapsed time. So WebAssembly can use imported functions not only for their return values, but also for side effects.

Using WebAssembly.Memory

In order to work with data types other than numbers, we'll need to use store those data in memory, and access them indirectly through pointers and offsets.

Importing memory

Here is a program that exports a function double that takes a pointer and a length and doubles all of the i32 elements within that slice. The only import it requires is an instance of WebAssembly.Memory, which is imported exactly like any other function is.

It's an error if:

  • If you import something that isn't Memory in its place
  • Your imported Memory doesn't have the required number of memory pages (the number 1 in the expression (memory (import "import" "memory") 1) )
  • You import a second Memory
const importedMemory = new WebAssembly.Memory({ initial: 1 });
const importedMemoryArray = new Uint32Array(importedMemory.buffer);
const importedMemoryImportObject = {
    import: {
        memory: importedMemory
    }
};
WebAssembly.instantiateStreaming(
    fetch("/wasm-demo/imported_memory.wasm"),
    importedMemoryImportObject,
).then(wasm => { window.imported_memory = wasm.instance.exports; });

The ArrayBuffer starts zeroed, so let's fill a slice of it with 32-bit ints, then call imported_memory.double with the slice's address and its length.

const ptr = 5;
const len = 10;
for (let n = 0; n < len; n++) {
    importedMemoryArray[ptr + n] = n;
}
importedMemoryArray.slice(ptr, ptr+len);
// Uint32Array(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
imported_memory.double(ptr, len);
importedMemoryArray.slice(ptr, ptr+len);
// Uint32Array(10) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Note that our indices count the positions of 32-bit numbers. If you look back at the program you can see that it's the responsibility of the WebAssembly module to convert those numbers to byte indices, and advance the loop cursor by 4 instead of 1 every iteration.

Exporting, growing, and expiring memory (and running out of it)

In the previous example, we created memory in JavaScript and imported it into a WebAssembly module, but a WebAssembly module can create its own memory. From there it can be exported, but not necessarily.

(module
  (memory $memory 1 3)
  (export "memory" (memory $memory))
  (func (export "grow") (result i32)
    (grow_memory (i32.const 1)))
  (func (export "store") (param $val f64)
    (f64.store (i32.const 0) (get_local $val)))
)

This program creates memory with 1 initial page and a maximum of 3 pages. This same memory could have been created in JavaScript with the code new WebAssembly.Memory({ initial: 1, maximum: 3 }).

There's one quirk about growing memory: when it grows, the existing memory is copied into a larger space, and the memory object gets a completely new ArrayBuffer. So any references to the old buffer (and any TypedArrays that were created with it) become invalidated and appear empty. Indexing an invalidated memory buffer will always return undefined, which is only possible by indexing outside of the array.

WebAssembly.instantiateStreaming(fetch("/wasm-demo/grow_memory.wasm"), {})
.then(wasm => { window.grow_memory = wasm.instance.exports; });

// What size is the memory initially? 65536 bytes
grow_memory.memory.buffer.byteLength;

// Store a number at address 0 and confirm it's there
grow_memory.store(0.75);
array = new Float64Array(grow_memory.memory.buffer);

// Grow memory with the exported grow function;
grow_memory.grow();

// What size is the memory now? 131072 bytes
grow_memory.memory.buffer.byteLength;

// But when memory is resized, its buffer is invalidated and a new
// one is created. So `array` points to nothing now.
array.buffer.byteLength === 0;

// We need to recreate it after every grow
array = new Float64Array(grow_memory.memory.buffer);

WebAssembly.Memory has a grow method we can access from JavaScript as well.

// Unlike the exported function, the JavaScript API takes
// the number of pages to grow by.
grow_memory.memory.grow(1);

// What is the memory size after growing twice? 199608
grow_memory.memory.buffer.byteLength;

And there is one more thing: if memory was initialized with a maximum page size, it can't grow beyond that. Our memory is now at 3 pages, the maximum we allowed it to go. Let's grow it once more, for science!

// First grow using the JavaScript API
try {
  grow_memory.memory.grow(1);
} catch(e) {
  // RangeError: WebAssembly.Memory.grow(): maximum memory size exceeded
}

// Now try growing from the WebAssembly side
grow_memory.grow(); // -1

Keeping memory slices valid

When our WebAssembly application can grow its memory on its own and without notifying its JavaScript environment, a way to easily keep our TypedArrays valid is a required part of our app's js-to-wasm infrastructure.

function alwaysValid(memory) {
  let buffer = memory.buffer;
  return {
    get buffer() {
      if (!buffer.byteLength) {
        buffer = memory.buffer;
      }
      return buffer;
    },
    get Uint32() {
      return new Uint32Array(this.buffer);
    },
    get Float32() {
      return new Float32Array(this.buffer);
    },
  }
}

Try it on grow_memory.

Put all of our state logic in WebAssembly

If we're going to compile a non-toy program from some language to WebAssembly, it's likely that:

  • we'll want to keep the memory allocator we're probably already using
  • we'll want to keep all of our state living in WebAssembly memory

For our allocator, turns out that letting our toolchain bundle an allocator into our app is completely fine. I'll admit I didn't realize they were so small. In Rust, wee_alloc seems to be the slimmest allocator you can use, though you can do without it too.

The general pattern we'll probably want to use:

  • call an initial create or init WebAssembly function that allocates app state
  • that function returns the address of the state in memory; we'll save it as our state handle
  • when we call functions that need to read or modify state, we'll pass that handle as a param so that the app can find the state in memory

Here's a program compiled from Rust which has a small State object, and exported functions to manipulate it.

WebAssembly.instantiateStreaming(fetch("/wasm-demo/allocating_memory.wasm"), {})
.then(wasm => { window.allocating_memory = wasm.instance.exports; });
handle = allocating_memory.create(10); // 1179640 (probably)

/* The `set` and `increment` functions modify the state, and return the number of times the state has been modified. */
allocating_memory.set(handle, 20) // 1
allocating_memory.set(handle, 100) // 2
allocating_memory.increment(handle, 1) // 3
allocating_memory.increment(handle, 1) // 4
/* `unwrap` returns the final modified value */
allocating_memory.unwrap(handle) // 102

The initial address 1179640 is interesting because, for a pointer, it's pretty small. But then WebAssembly memory is measured in 64KB pages. It's also, curiously, not zero despite being the first thing we allocated (there are no null pointers in WebAssembly).

Part of that can be explained by having an allocator in our program. It has its own state, and that state is kept in memory. But it's not taking up 1,179,640 bytes of memory. Let's investigate!

// After we refresh the page or just re-instantiate the module
initialSize = allocating_memory.memory.buffer.byteLength; // 1114112
pageSize = 65536
initialSize / pageSize // 17 pages

handle = allocating_memory.create(10); // 1179640
newSize = allocating_memory.memory.buffer.byteLength // 1179648
newSize - initialSize // 65536

We start with 17 pages initialized and none of that is being used for our app, because when we allocated 8 bytes of data, we saw memory grow by a page to make room for it. As it turns out, these phantom pages are due to something called the shadow stack. (Or is it shadow heap. I can't remember, please correct me!) Basically, because the WebAssembly VM can't store complex structures on its stack, it stores them in its memory instead. Even if your source program doesn't call for allocation. 17 is apparently just the number of pages LLVM picks as a good amount of memory for this.

Okay, one last thing that is probably obvious to people who've written C-like languages.

allocating_memory.unwrap(handle) // 10
/* Um, wasn't `handle` already deallocated when we last called `unwrap`? */
allocating_memory.unwrap(handle) // 1171456 (garbage data!)

What could go wrong with strings? ⛈

The answer to that rhetorical question is obviously "Unicode" because not everyone limits themselves to the Latin-1 charset, Brad.

It's easy to convert UTF-8 to and from an array of bytes, just use new TextDecoder('utf-8') and new TextEncoder()! (Per the spec, TextEncoder only outputs UTF-8, hence the lack of an encoding argument.)

Passing a string to a WebAssembly function

Aside from the encoding step, it's exactly like giving an array to a WebAssembly function because like an array you know the position and size of what you're passing in.

WebAssembly.instantiateStreaming(fetch("/wasm-demo/strings.wasm"), {})
.then(wasm => { window.strings = wasm.instance.exports; });

text = "“It's 2018 and we don't all write in ASCII anymore,” she said.";
bytes = (new TextEncoder()).encode(text);

stringsMemoryBytes = new Uint8Array(strings.memory.buffer);
stringsMemoryBytes.set(bytes, 256); // copy one TypedArray into another
strings.giveString(256, bytes.length);

Reading a string from a WebAssembly's return value

A problem arises if a WebAssembly function returns a pointer to a string it created: UTF-8 is variable width so we don't necessarily know how many bytes to decode.

The solution to that is to use NUL-terminated strings like in C based languages, and scan bytes until a NUL is encountered. Just like implementing a string from scratch in CS101. Never did that in college? Now's a good time to try it!

function decodeString(memory, ptr) {
  const array = new Uint8Array(memory.buffer, ptr);
  function* readUntilNul() {
    let offset = 0;
    while (array[offset] !== 0) {
      if (array[offset] === undefined) {
        throw new Error(`String ${ptr} continued into undefined memory at ${ptr+offset}`);
      }
      yield array[offset];
      offset++;
    }
  }

  const bytes = new Uint8Array(readUntilNul());
  return (new TextDecoder('utf-8')).decode(bytes);
}

Aside: A nifty trick I didn't know until I saw it done on Hello, Rust! is that TypedArray constructors can accept an iterable that yields bytes. Like, say, the generator function readUntilNul above!

Let's try it out.

strings.getString(); // 1024

// Peek at the bytes starting at that address
new Uint8Array(strings.memory.buffer, 1024);
// Uint8Array(64512) [226, 152, 142, 239, 184, 143, 32, 72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33, 32, 240, 159, 140, 144, 0, 0, 0, 0, 0...

decodeString(strings.memory, 1024) // "☎️ Hello, world! 🌐"

I'll be honest and say that the easist way to get Unicode strings out of WebAssembly memory is to just find a reason to not need to do it at all. Edge, despite supporting WebAssembly, doesn't implement TextDecoder at all, and neither Edge nor Safari implement TextEncoder either.

Wrapping it all up

Okay, finally, a real app that's not a contrived example: a game of hangman. It's a more practical example of a program that needs to pass strings as arguments and return values, keep its memory buffers valid, and keep track of app state in WebAssembly memory.

hangman.js is the main app scaffolding that holds all the glue code.

The app doesn't need to export its memory at all, but since it does we can peek inside. After the game state is allocated, we can take a look at the bytes in memory.

  • The Hangman array is a look into the game struct, which consists of three vectors (themselves a pointer, capacity, and length) and a count of the number of wrong guesses.
  • The Phrase array is UTF-8 encoded text.
  • The Mask array is a list of booleans representing which letters have been guessed.
  • The Guessed array is a list of characters, both right and wrong, that the player has guessed. I deliberately made this a bit too small, so you can see it being reallocated while the game progresses.

More resources

The MDN WebAssembly Concepts page is pretty comprehensive.

I am delighted that the "TodoApp" of WebAssembly world is turning out to be Conway's Game of Life. Here are two fun guides, Writing WebAssembly By Hand by Colin Eberhardt, and the Rust and WebAssembly book, which take different approaches to running the game of life in the browser.

Hello, Rust! hosts a number of bite sized examples of Rust compiled to WebAssembly.