Speeding up with ArrayBuffers and Typed Arrays

This post is about how ArrayBuffers and their views allow you to easily out-perform standard Arrays.

Whenever I implement an algorithm in JavaScript and want to improve the execution time, I always start by trying to find a way to convert my Arrays to Typed Arrays. It's an easy, low-effort way to consistenly improve performance in numeric-heavy workloads. This blog post will investigate exactly why you get this free performance boost,​ where you can use it, and what its limitations are.

To understand Typed Arrays, we first need to understand ArrayBuffers.

ArrayBuffers

ArrayBuffers are JavaScript's byte arrays. It represents a continuous chunk of raw binary data, initialised to 0, with no assumptions or knowledge of whether it is 8-bit or 64-bit numbers it contains. The array may either be fixed-length, or resizable up to a pre-defined maxByteLength. Either type only initialises memory of the length provided, but resizable ArrayBuffers may reallocate their memory to larger sizes elsewhere in memory in the future - except for Firefox, which allocates maxByteLength from the start.

To initialise an ArrayBuffer, you simply construct it with a given byte length, and optionally provide a maxByteLength option to make it resizable. Once initialised, you cannot modify or read from it directly. You must use a View – more on that later.

The only permitted operations on an ArrayBuffer are transferring it to a new buffer (and detaching the old along the way), slicing part of the ArrayBuffer and copying it into a new ArrayBuffer as well as a couple of resize-related functions.

Purpose of ArrayBuffers

The purpose of ArrayBuffers is to provide a way of accessing and modifying memory in a performant, consistent and predictable way.

Compared to ArrayBuffers, regular non-optimized JavaScript Arrays contain the overhead of values being unboxed when reading, type guards verifying value types, as well as size and memory allocation handled ad-hoc, at the mercy of the Garbage Collector. Regular Arrays usually have those things optimized away if the engine figures out that they're not needed. Chrome has SMI(SMall Integer) Arrays that actually come quite close, if not outperforms the ArrayBuffer data structure. But the optimizations are opaque, differ between engines and hardware and it can be quite finicky when trying to ensure they're applied - and once an array is de-optimised, f.ex. from an SMI array to a DOUBLE-array, it never goes back to its optimised variant.

The ArrayBuffer on the other hand makes the developer define a contiguous block of fixed-length memory, where performant data access and predictable GC handling is baked in. So generally, ArrayBuffers are closer to the hardware representation, with guaranteed simplicity in how values are handled, and JIT-compilers produce much simpler machine-code.

Views on ArrayBuffers

Having established the storage aspect of individual bytes, let's move on to the abstractions, or views, on top of the buffer. If you know the constraints of your values, you can translate them to specific bit length integers, either through Typed Arrays, enforcing the same bit-length for all, or a DataView, allowing you to write arbitrary-length values to the buffer.

Typed Arrays

Typed Arrays are views on an ArrayBuffer for 8-, 16-, 32- and 64-bit values, and ditto floats (except 8-bits). They support most of the regular Array operations, except for length-modifying ones like .push, .pop or .splice. They are instantiated by simply creating a new array, e.g.: new Int32Array(size), which returns an array of 32-bit integers initialised to 0.

DataView

If your data is of a more complex nature, such as files of various media formats, you may need to access different bit-length values in the same buffer. Here, a DataView can read values of any bit-length anywhere in the buffer, and of any endianness:

function parseWav(buffer) {
  const view = new DataView(buffer);

  return {
    sampleRate: view.getUint32(24, true), // Get the next 4 bytes starting at the 24th byte, read as littleEndian
    bitsPerSample: view.getUint16(34, true),
    numChannels: view.getUint16(22, true),
    dataSize: view.getUint32(40, true)
  };
}

Performance of Typed Arrays

How Typed Arrays Achieve Better Performance

As explained above, ArrayBuffers have no unboxing overhead, a smaller memory footprint and contiguous memory assignment. CPUs are optimized for sequential, contiguous data access, with multi-level caches loading in 64-byte cache lines. The more data you can pack into lower level caches, the faster you can run your computations on it. With a TypedArray, providing 16- or 8-bit values, you can effectively pack more integers into lower level caches in the CPU, compared to the relatively large 64-bit doubles stored in the heap, which another 64-bit pointer in the standard Array points to - or in the case of optimized arrays, 32- or 64-bit integers inlined in the Array.

Comparisons

The best way to prove the efficiency of Typed Arrays is to benchmark them against standard Arrays in various scenarios. This is not to prove that Typed Arrays are always better than standard Arrays - but rather to demonstrate that changing the data structure in a few places can give instant performance boosts.

Method

The following benchmarks are made with the benchmark package. It is run on a Ryzen 9 9900X@4.4GHz CPU, 64GB DDR5-6000MHz RAM and a Windows 11 Home 64-bit OS. The runtime is Node.js 24.4.0.

Benchmarks

Disjoint-set Algorithm / Union-Find

The following is a well-optimised Union-Find algorithm. The only differences between implementations are in initialiseUnionFind, where the parent- and rank-arrays are initialised as either Int32Arrays or a standard Array, as well as the input array tuples:

export function initialiseUnionFind(N) {
  let parent = new Int32Array(N); // new Array(N)
  let rank = new Int32Array(N); // new Array(N)
  for (let i = 0; i < N; i++) {
    parent[i] = i;
  }
  return { parent, rank };
}
export function find(i, parent) {
  while (parent[i] !== i) {
    let tmpI = parent[i];
    parent[i] = parent[parent[i]];
    i = tmpI;
  }
  return i;
}
export function union(x, y, parent, rank) {
  let xroot = find(x, parent);
  let yroot = find(y, parent);
  if (xroot === yroot) {
    return 0;
  }
  let xrootrank = rank[xroot];
  let yrootrank = rank[yroot];
  if (xrootrank < yrootrank) {
    parent[xroot] = yroot;
  } else if (xrootrank > yrootrank) {
    parent[yroot] = xroot;
  } else {
    parent[yroot] = xroot;
    rank[xroot] = rank[xroot] + 1;
  }
  return 1;
}

export function countComponents(N, operations) {
  let { parent, rank } = initialiseUnionFind(N);
  let unionsDone = 0;
  for (let [x, y] of operations) { // This is a Int32Array tuple for the Int32Array implementation, and a regular Array tuple for the regular implementation
    // It's important that both operations and parent/rank are Int32Arrays, otherwise the engine has to perform type checks, unbox numbers, and generate slower generic code instead of tight integer operations.
    unionsDone += union(x, y, parent, rank);
  }
  return N - unionsDone;
}

Input: 2 000 000 unions on N=1 000 000 elements

Results

Standard Array: 5.84 runs / second
Typed Array: 11.65 runs / second (99% faster)

Dynamic Programming

Next up is Dynamic Programming. These are obvious candidates for Typed Array optimisations, as they are very Array-centric.

Here is a 2-dimensional DP problem implemented using a space-optimised 1-dimensional DP array for the classic "shortest path in a matrix"-problem. The idea is to find the shortest path from top left to bottom right using only right- and down-movements. The inputs are a 5000x5000 matrix of random numbers from 0 to 10.

export function shortestPathInMatrix(matrix) {
  let dp = new Int32Array(matrix[0].length); // new Array(matrix[0].length).fill(0)
  for (let i = 0; i < matrix.length; i++) {
    for (let j = 0; j < matrix[0].length; j++) {
      if (i === 0 && j === 0) {
        dp[j] = matrix[i][j];
      } else if (i === 0) {
        dp[j] = matrix[i][j] + dp[j-1];
      } else if (j ===0) {
        dp[j] = matrix[i][j] + dp[j];
      } else {
        dp[j] = matrix[i][j] + Math.min(dp[j], dp[j-1]);
      }
    }
  }
  return dp[dp.length-1];
}

Input:

let randomMatrix = new Array(5000)
  .fill(0)
  .map((c) =>
    new Array(5000).fill(0).map(() => Math.floor(Math.random() * 10))
  );
Results

Standard Array: 7.30 runs / second
Typed Array: 12.15 runs / second (66% faster)

Prefix Sums

A prefix sum array allows you to quickly calculate the sum of a range of numbers in an array. By calculating the sum from 0 to i for every index of an array, you can subtract the left hand side's prefix sum from the right hand side's prefix sum of a range to get the sum of the numbers in between. The inputs are 10 000 000 random numbers between 0 and 100, and 10 000 000 random queries on the array.

export function rangeSumQuery(numbers, queries) {
  const prefixSums = new Int32Array(numbers.length + 1); // Array(numbers.length + 1)
  // prefixSums[0] = 0;
  for (let i = 1; i < numbers.length + 1; i++) {
    prefixSums[i] = prefixSums[i - 1] + numbers[i - 1];
  }
  const result = new Int32Array(queries.length); // Array(queries.length)
  for (let q = 0; q < queries.length; q++) {
    let [start, end] = queries[q];
    result[q] = prefixSums[end + 1] - prefixSums[start];
  }
  return result;
}

Input:

const randomMatrix = new Array(10_000_000).fill(0).map(() => Math.floor(Math.random() * 100));
const randomInt32Matrix = new Int32Array(randomMatrix);
const queries = new Array(10_000_000).fill(0).map(c => {
  const a = Math.floor(Math.random() * randomMatrix.length);
  const b = Math.floor(Math.random() * randomMatrix.length);
  return [a, b < a ? a : b];
})
Results

Standard Array: 6.26 runs / second
Typed Array: 7.14 runs / second (14% faster)

Limitations

While Typed Arrays can provide speed-ups by changing just a few lines of code, they are actually quite finicky. Once your start mixing the typed arrays with standard arrays, the engine has to convert their types to be able to do any logic or math on them.

Sparse data

Typed Arrays allocates and initialises the full array length to serve the indices you need. So if you just need index 0 and 999, you will have to initialise an array of size 1000. Standard Arrays, on the other hand, don't necessarily need to initialise every index. Rather it has differenr representations for packed (every index used), holey (some, but not all indices used) and dictionaries (sparse arrays, which are slower, but use way less memory). Each of these kinds of arrays are optimised to varying degrees for the use cases, that the engine tries to guess.

Mixed data

Storing objects like {node, value, weight} are impossible to implement with Typed Arrays. You may be able to do something with DataViews, using pointers and your own simulated heap, but at that point things get too complicated, as you're imllementing a runtime inside a runtime.

Conclusion

That's it. We've covered how the ArrayBuffer data structure can outperform regular arrays. It's important to note that regular Arrays in no way are implied to be generally slow. While writing this blog post I actually ran into algorithms where the Typed Array was outperformed significantly. So don't go out and convert every array to a Typed Array without testing the difference. It's an easy optimisation, but not a silver bullet.

You can find the algorithms and benchmark code here.