Rastrigin
Standard optimization benchmark, embarrassingly parallel (POP=4096, DIM=2000)
N-Body Simulation
Gravitational physics, 512 bodies, 200 timesteps fused (SEQUENTIAL)
Acrobot-v1
Standard Gym RL, double pendulum, 500 steps with RK4 physics (SEQUENTIAL)
MountainCar-v0
Standard Gym RL, 200 timesteps, linear policy (SEQUENTIAL)
CartPole-v1
Standard Gym RL, inverted pendulum, 500 steps, 4→8→2 NN policy (SEQUENTIAL)
Monte Carlo Pi
Classic parallel estimation, 100K samples per worker (PARALLEL)
By clicking Run, your GPU model and benchmark results are saved anonymously. No personal information is collected. Privacy policy
The science behind the benchmarks
Cross-vendor medians, fused vs unfused on the same device: 71× Apple Silicon, 56× NVIDIA, 20× phones (92 devices, 7 vendors). Controlled M2 Pro vs PyTorch MPS in the paper: 159× WebGPU, 720× CUDA on T4.
Gunaydin, A.B. (2026)
Single-Kernel Fusion for Sequential Fitness Evaluation via WebGPU Compute Shaders
doi:10.5281/zenodo.19331833Every result is public
We don't cherry-pick. Every benchmark run from every device is published — GPU name, score, browser, OS, timestamp. No data is hidden. Verify any claim yourself.
Browse all results →The research line and the end-to-end projects that build on it.
kernelfusion.dev
The research line. Two published preprints, one npm SDK, 92 unique devices across 7 GPU vendors. The theory that all the applied projects build on.
webgpudna.com
Electron track-structure simulation ported from the CNRS/IN2P3 Geant4-DNA toolkit to WebGPU. One thread per primary, full 10 keV history in a single for-loop. Radiolysis chemistry and DNA damage scoring live in a browser tab.
See the simulation →LLM inferencezerotvm.com
Phi-3-mini (3.8B) running end-to-end via 10 kernel roles across 27 WGSL files, replacing the 85 TVM-autotuned shaders WebLLM needs. ~40 tok/s on M2 Pro, 22% behind WebLLM.
Run it live →Visualizationneuropulse.live
A real forward pass of Phi-3-mini visualised tensor-by-tensor. 3.8 billion parameters, your GPU, your browser — every glow is a live activation read back from WebGPU. Zero server, zero API key.
Watch it think →Quantumwebgpu-q.vercel.app
Statevector + MPS quantum simulator running on commodity hardware via WebGPU compute. Six-level research ladder from bandwidth-bound statevector through MPS, kernel fusion, WebRTC swarm, IBM hardware cross-verify, to chemistry/VQE. No CUDA, no install.
Open the simulator →Personalbarisgunaydin.com
Personal site and project hub.
About →