Raymarching WebGPU Stack Machine
I’ve built something horrendous, so, naturally, I’ve decided to share it with the world. For some background, you will need to be familiar with raymarching1. For a brief introduction, raymarching2 is a strategy for rendering a scene of objects using rays cast from the camera/viewer’s eye. Raymarching, specifically, provides an optimization for taking as few steps as possible.
It does this through the use of signed distance functions3 (SDFs), which are essentially composable mathematical functions that describe an object. Below is a quick animation of this in action. The growing circles represent the evaluation of this function which returns a distance to the nearest object (but not a direction vector). This means our ray can safely “step” that distance without colliding with any object in the scene. Notice how it slows down slightly as it passes the rectangle. Without the rectangle, the ray could immediately jump to the circle’s radius.
A really powerful characteristic of these SDFs is that they can be composed with
set operations like union, intersection and subtraction. This is achieved with
the use of simple min
and max
functions. Not only that, but determining
shadows and doing other lighting calculations, even approximations, feels
significantly more straightforward and intuitive with ray marching than it does
in a regular scene with meshes.
The horrendous idea
Now, I implemented a simple raymarching algorithm using WebGPU. This can actually be achieved with only two triangles. Since all that’s needed is a fragment shader. I started by building the algorithm inside of the shader. That means all objects, composition of objects and other visual traits were hard-coded inside of the shader.
I thought to myself, “I could push this data from the CPU.” The open question was, “In what format?” I realized that I don’t need anything more complicated than a stack machine. These are easy to represent with an array, which perfectly suits the limitations of GPU programming. So, I modelled a number of shapes and commands that could be performed on those shapes. Of course, to add new shapes and commands, I would have to update the shader to support these. What I send from the CPU is just a data description of the shapes and commands.
There are some gotchas here, like the fact that you have to pad your structs on 16 byte boundaries. There’s a really nice website for double-checking your padding. I also made it such that all shapes and commands use the same amount of memory. In other words, every shape and command is as big as the biggest shape and command, respectively. Overall, they’re tiny for what they represent, so I’m not worried.
For a taste of what this looks like, I drive all calls to fetching the data from
the stack through this shape_sdf
function:
fn shape_sdf(p: vec3f, shape: Shape) -> Sdf {
let color = shape.color;
switch u32(shape.id) {
case 0: { // composite
return Sdf(shape.a, color);
}
case 1: { // box
return Sdf(box(p - vec3f(shape.a, shape.b, shape.c), vec3f(shape.d, shape.e, shape.f)), color);
}
case 2: { // cylinder
return Sdf(cylinder(p - vec3f(shape.a, shape.b, shape.c), shape.d, shape.e), color);
}
case 3: { // plane
return Sdf(plane(p, vec3f(shape.a, shape.b, shape.c), shape.d), color);
}
case 4: { // sphere
return Sdf(sphere(p - vec3f(shape.a, shape.b, shape.c), shape.d), color);
}
default: {
return Sdf(MAX_DIST + EPSILON, color);
}
}
}
The virtual stack machine itself exists inside of the function for actually
evaluating the overall SDF. I’ve called it scene
. I’ve modelled an accumulator
for easily emitting shapes which are just unioned. This wasn’t necessary but it
felt easier for grouping logical “entities”.
fn scene(p: vec3f) -> Sdf {
let shape_count = i32(uniforms.shape_count);
if (shape_count == 0) {
default_sdf();
}
var acc = default_sdf();
var stack_pointer: i32 = shape_count - 1;
// a mutable variable representing the top of the stack, since we can't
// modify the underlying array.
var top_of_stack: Shape = shapes[stack_pointer];
// Points to the second item on the stack (instead of the top).
stack_pointer -= 1;
let command_count = i32(uniforms.command_count);
for (var i = 0; i < command_count; i++) {
let command = commands[i];
switch u32(command.id) {
case 0: { // accumulate
acc = sdf_min(acc, shape_sdf(p, top_of_stack));
if (stack_pointer > 0) {
top_of_stack = shapes[stack_pointer];
stack_pointer -= 1;
}
}
case 1: { // union
top_of_stack = composite_shape(sdf_min(shape_sdf(p, top_of_stack), shape_sdf(p, shapes[stack_pointer])));
stack_pointer -= 1;
}
case 2: { // intersection
top_of_stack = composite_shape(sdf_max(shape_sdf(p, top_of_stack), shape_sdf(p, shapes[stack_pointer])));
stack_pointer -= 1;
}
...
default: { // unknown
return Sdf(0, vec3f(1, 1, 1));
}
}
}
return acc;
}
So, from the JavaScript side, I can just generate the data and operations as I want:
shapes.push({
type: ShapeType.Sphere,
position: new Vec3(0, 0, 0),
radius: 0.5,
});
shapes.push({
type: ShapeType.Box,
position: new Vec3(0.5, 0, 0),
dimensions: Vec3f.fill(0.25),
});
commands.push({type: CommandType.Subtration});
commands.push({type: CommandType.Accumulate});
The above is a very imperative way of building up objects. It’s not an issue, and may be your preference, but there’s also an option to generate these commands from a more tree-like data structure which represents the composition of an entity rather than how to build it. I stopped at the stack machine, however.
For a visual, here’s a video from the scene configured in the repository linked below. The great thing is that these objects can be morphed, merged and sheared in ways you could not easily achieve with meshes.
Conclusion
This is a really interesting way of rendering a real-time scene in an unconventional way. However, it’s slow. With the current implementation, on my computer, I can’t have more than maybe 20 objects in the scene without noticeable slowdown. I confirmed with a performance benchmark that it wasn’t actually the JavaScript causing the issue. JavaScript is only using roughly 1ms of frame time, but I’m seeing frame drops of below 10 FPS (100 ms per frame).
Nevertheless, you could probably make an interesting game even with this limitation. There’s also nothing stopping one from composing a rasterized scene with a raymarched one! One game in recent memory that does this is, “It Takes Two,” with its lava lamps.
Links
Footnotes
-
Inigo Quilez’s website is a treasure trove of information on this topic: https://iquilezles.org/. ↩