DEV Community

Cover image for Show DEV: Interactive Mandelbrot Set Visualization
Aleksei Berezkin
Aleksei Berezkin

Posted on

Show DEV: Interactive Mandelbrot Set Visualization

https://mandelbrot.run/

Source: https://github.com/aleksei-berezkin/mandelbrot

The Mandelbrot set is one of the best known mathematical fractals. With very simple equations, it plots convoluted and beautiful curves, which is why visualizing it is a popular programming task. However, don't be misled β€” the seemingly trivial formulas hide a wealth of complexity πŸ˜‰ It is the story of striving for performance and precision.

The math

The fractal is plotted on a complex plane. Each pixel color depends on if/when the zz diverges:

z0=0zn+1=zn2+x0+y0i, \begin{align} z_0 &= 0 \newline z_{n+1} &= z_n^2 + x_0 + {y_0}i, \end{align}

where x0x_0 and y0y_0 are pixel coordinates on a screen. We may rewrite (2)(2) to more familiar real numbers, having substituted zn=xn+yniz_n = x_n + y_ni :

xn+1=xn2βˆ’yn2+x0yn+1=2xnyn+y0 \begin{align} x_{n+1} &= x_n^2 - y_n^2 +x_0 \newline y_{n+1} &= 2x_n y_n + y_0 \end{align}

The zz is said to diverge when the following condition is met:

∣zn∣>2β€…β€ŠβŸΊβ€…β€Šxn2+yn2>2β€…β€ŠβŸΊβ€…β€Šxn2+yn2>4, \begin{align} |z_n| &> 2 \newline \iff \sqrt{x_n^2 + y_n^2} &> 2 \newline \iff x_n^2 + y_n^2 &> 4, \end{align}

because subsequent items will inevitably grow to infinity. The pixel color is coded in the following way:
  • If the zz never diverges, the color is black
  • If the zz diverges at iteration nn , the color is some function of nn

So, let's write some JavaScript?

It's quite easy to write several dozen simple JS lines to get it working. But if you try, you quickly discover where the real problems are:

  • When zooming into the set, the required number of iterations rapidly grows, causing your animation to first lose smoothness and then freeze
  • At a certain level of zoom, the JS number (64-bit floating point) becomes too imprecise, and the picture becomes pixelated

How to approach these? Let's see!

1. Parallel workers

These days, even inexpensive smartphones have several CPU cores, and utilizing them for computation-heavy tasks is a must. For web-based tasks, you can take advantage of the multiple CPU cores by running parallel workers. Additionally, there is an API that suggests the optimal number of workers to use.

For my case, I divide the whole picture into approximately 8*workersNumber rectangles, and shuffle them for better load balance. This picture shows different workers' tasks with different hues:

Regions rendered by workers

2. From JS to Webassembly

Webassembly (WASM) is the format for binary instructions that browsers can execute very effectively. You can write WASM programs with an assembly-like language or compile to WASM from a high-level language. I chose AssemblyScript, which is very similar to JS and TS and is well documented.

As a low-level format, raw WASM lacks convenient data structures such as arrays and even strings, and requires manual memory management, which can be quite complex. Much of WASM compilers can emulate these features for you, but of course, that comes at a cost β€” each additional feature consumes valuable CPU. That's why, if you strive for performance, it's better not to use these features. That's exactly what I did. Here's a function which reads the rendered value of a pixel directly from memory:

function loadRendered(xy: u64): u32 {
  const y = (xy >>> 32) as u32;
  const x = xy as u32;
  return load<u32>(4 * (y * canvasW + x));
}
Enter fullscreen mode Exit fullscreen mode

Is it TS? Or maybe C? πŸ™‚

3. From number to arbitrary-precision arithmetic

That's the image of 10^14 zoom when used only 64-bit floating point numbers:

Pixelated Mandelbrot
Looks like an UFO trace from early 90s game πŸ™‚

To address this problem we need to extend the width of a number which is known as arbitrary-precision arithmetic. Unfortunately, there's no such thing in WASM standard library, so I implemented my own heavily fitted for the problem.

Very generally, a number consists of multiple 32-bit β€œdigits”:

             |  integer | fractional part
             |  part    | 
-------------+----------+----------------------------
32-bit items |  digit0  | digit1  digit2  digit3  ...
bits         |  0-31    | 32-63   64-95   96-127
Enter fullscreen mode Exit fullscreen mode

In fact that's similar to how we write fractional numbers, but β€œdigits” are not 10-based but 2^32-based instead. Or, 2-based, if we think of every bit as a β€œdigit”; the decimal values of such 2-based β€œdigits” are: 2^31, 2^30, ..., 2, 1, (decimal separator), 1/2, 1/4, ...

Here is the example:

decimal =   10.5
hex     =    a.8
binary  = 1010.1
Enter fullscreen mode Exit fullscreen mode

And another:

hex      = c.8000_0000_8 (width = 3)
decimal  = 12
           + 1/2    (bit 32)
           + 1/2^33 (bit 64)
         β‰ˆ 12.5000000001 
Enter fullscreen mode Exit fullscreen mode

Arithmetic operations

Operations work the same way we were taught in primary school. For example, the addition:

  a0 a1 a2 ...
+ b0 b1 b2 ...
  ============
  c0 c1 c2 ...
Enter fullscreen mode Exit fullscreen mode

Note: the overflow (carry-out from c0) is discarded because it's not needed for the task.

How to write the code for this with arbitrary width? Well, there are two ways:

Operands in memory: use loops

Each number is stored in consecutive memory cells, and we may use indexed loops to access digits:

function add(a: u32, b: u32, c: u32, width: u32) {
  let carryOver: u64 = 0;
  for (let i: i32 = width - 1; i >= 0; i--) {
    let x: u64 = (load<u32>(a + i) as u64)
               + (load<u32>(b + i) as u64)
               + carryOver;
    carryOver = x >>> 32;
    store<u32>(c + i, x as u32);
  }
}
Enter fullscreen mode Exit fullscreen mode

Operands are local vars: unroll loops

const width = 3
let a0: u64, a1: u64, a2: u64,
    b0: u64, b1: u64, b2: u64,
    c0: u64, c1: u64, c2: u64,
    carryOver: u64;
// ...
carryOver = 0;

c2 = a2 + b2;
carryOver = c2 >>> 32;
c2 &= 0xffffffff;

c1 = a1 + b1 + carryOver;
carryOver = c1 >>> 32;
c1 &= 0xffffffff;

c0 = a0 + b0 + carryOver;
c0 &= 0xffffffff;
Enter fullscreen mode Exit fullscreen mode

Note: although digits are 32-bit, variables are 64-bit to retain carry-over.

Which is faster?

The approach with unrolled loops is faster (15-20% for my case). However, it requires having the separate code for each width. Happily, when the code is almost the same, you can easily generate it. I generate computations for width up to 12. I doubt anyone zooms deeper, but if you do, please file an issue πŸ˜‰

Now, the fixed image. Indeed an UFO but much nicer πŸ™‚

Fixed precision image

3. WASM vectorization

β€œSingle instruction, multiple data” (SIMD) or β€œvectorization” is simply when your CPU executes the same instructions against multiple variables at once. In WASM you work with 128-bit vectors that contain 2 to 16 data items (β€œlanes”) depending on their size. For example, if your data items are 8-bit integers, one vector contains 16 lanes. In my case, data items are 64-bit (floating-point or integer), so I'm able to compute series for 2 pixels at once.

Here's the example. First, non-vectorized code for 64-bit float:

for (let i: u32 = 0; i < maxIterations; i++) {
  const xSqr: f64 = x * x;
  const ySqr: f64 = y * y;
  if (xSqr + ySqr > 4) {
    // Diverged
    break;
  }
  // Compute next (x, y) ...
}
Enter fullscreen mode Exit fullscreen mode

Now the vectorized version:

for (let i: u32 = 0; i < maxIterations; i++) {
  const xSqr = v128.mul<f64>(x, x);
  const ySqr = v128.mul<f64>(y, y);
  const g4 = v128.g<f64>(v128.add<f64>(xSqr, ySqr), FOUR);

  if (!point0DivergedAt && v128.extract_lane<u64>(g4, 0)) {
    point0DivergedAt = i;
  }
  if (!point1DivergedAt && v128.extract_lane<u64>(g4, 1)) {
    point1DivergedAt = i;
  }
  if (point0DivergedAt && point1DivergedAt) {
    // Both diverged
    break;
  }
  // Compute next (x, y) ...
}
Enter fullscreen mode Exit fullscreen mode

How faster did it get?

Floating-point arithmetic became exactly 2x faster. BigNumber got only 15–20% faster, and I don't know why, honestly. Perhaps my code has a problem that I cannot spot, but I hope to find it someday.

4. Optimization: filling same-color rectangles

Note how many same-color areas there are in the following example:

A lot of same color regions

Computing almost same series for each pixel seems ineffective. Is it possible to optimize it? Happily, yes. With this optimization, rendering is coded as a recursive process of painting nested rectangles. The first rectangle is the whole render task given to the worker. Then, the recursive algorithm goes like this:

  • Render only the contour of the current rectangle + mid point; or, for small rectangles, only corners + mid point
    • If all pixels are of the same color:
      • Fill the entire rectangle with that color
    • Otherwise:
      • If the current rectangle is still not too small, split the current rectangle into two and recursively render each one
      • Otherwise, render all pixels of the current rectangle as usual

Here's the same image with fast-filled rectangles outlined:

Same color regions outlined

This enormously boosts performance, especially for β€œstable” parts of the image, i.e., parts that do not reveal too much color variability.

Conclusion: is it finally fast enough?

The issue with the Mandelbrot set is that, as the iteration count and the BigNumber width constantly increase, it is not possible to eliminate lags, but only to postpone them. Optimizations allow you to explore it more deeply, but inevitably there will come a point where the required amount of computations exceeds any CPU power. This means that there will always be a room for optimizations, but at present, I seem to have exhausted all available options. Or, am I wrong? Can you suggest anything to improve? And thank you for reading!

Mandelbrot set picture

Top comments (2)

Collapse
 
dmitriiprusakov profile image
Dmitry Prusakov

That's awesome!
Nice visualization!

Collapse
 
alekseiberezkin profile image
Aleksei Berezkin

Glad you liked it!