Fo shure, straight to WebGPU it is. Bold choice — I like it. Let me lay out the refined design.
The Design
Project: vibe-infer — GPU inference from scratch
Architecture:
The forward pass, step by step:
Learning sequence (what we'll build, in order):
- WebGPU bootstrap — get a device, create a buffer, prove the GPU is alive
- First compute shader — a trivial "add two numbers" shader to learn the WGSL syntax and dispatch model
- Matrix multiply kernel — the big one. This is 80% of inference.
- ReLU kernel — dead simple, but teaches element-wise GPU operations
- Softmax kernel — trickier (requires a reduction), teaches GPU coordination
- Chain them together — the actual forward pass
- Load real weights & test images — the payoff moment
Key decision I want to flag: We'll start with tiny hardcoded test data (like 2x2 matrices) to verify each shader is correct before plugging in real MNIST weights. Debug small, then scale up. This is how real GPU developers work.