Implement GPU Inference with WebGPU

Fo shure, straight to WebGPU it is. Bold choice — I like it. Let me lay out the refined design.

The Design

Project: vibe-infer — GPU inference from scratch

Architecture:

index.htmlcanvas for drawing/loading digits + result display
inference.jsWebGPU init, buffer management, forward pass orchestration  
shaders.jsWGSL compute shader strings (matmul, relu, softmax)
weights/pre-trained weight files (raw float32 binary)
train_export.py     — (optional) tiny script to train & export weights

The forward pass, step by step:

[28x28 image] → flatten to [784] 
GPU buffer
matmul kernel (784×128) + bias (128)
ReLU kernel
matmul kernel (128×10) + bias (10)  
softmax kernel
read back [10 probabilities]
argmax on CPU"It's a 7!"

Learning sequence (what we'll build, in order):

  1. WebGPU bootstrap — get a device, create a buffer, prove the GPU is alive
  2. First compute shader — a trivial "add two numbers" shader to learn the WGSL syntax and dispatch model
  3. Matrix multiply kernel — the big one. This is 80% of inference.
  4. ReLU kernel — dead simple, but teaches element-wise GPU operations
  5. Softmax kernel — trickier (requires a reduction), teaches GPU coordination
  6. Chain them together — the actual forward pass
  7. Load real weights & test images — the payoff moment

Key decision I want to flag: We'll start with tiny hardcoded test data (like 2x2 matrices) to verify each shader is correct before plugging in real MNIST weights. Debug small, then scale up. This is how real GPU developers work.

Does this design and learning sequence look right to you? Ready to start with step 1 (WebGPU bootstrap)?

Let's go
Adjust something

[Request interrupted by user]

one caveat: we're partners. you don't write code, just guide me, explain me what are the steps etc

Fo shure, Vlad. You write every line, I explain what and why. I'm your GPS, you're driving.

Let's start.

Step 1: WebGPU Bootstrap

The very first thing you need is proof that your browser can talk to the GPU. Here's what you're going to do:

Create index.html — just a bare bones HTML file with a <script type="module"> tag. Nothing fancy. No canvas yet, just a page that runs JS.

Inside the script, you need two things:

  1. navigator.gpu.requestAdapter() — this asks the browser "hey, is there a GPU available?" An adapter is like a handle to a physical GPU. It returns null if WebGPU isn't supported.

  2. adapter.requestDevice() — this gives you a logical device — your interface to actually do things on the GPU. Think of it like opening a database connection. The adapter is "which database server," the device is "an active connection to it."

Both calls are async, so you'll want an async function init() that awaits both.

For now, just log the device to the console. If you see a GPUDevice object, you're in business. If you get null or an error, we'll troubleshoot.

Go ahead and create that file. Make sure you're opening it in Chrome (or Edge) — Firefox's WebGPU support is still behind a flag.

claude-opus-4-6
2 likes
21 views
155 messages
4 files