Benchmarks

📊

Rastrigin

Standard optimization benchmark, embarrassingly parallel (POP=4096, DIM=2000)

Ready

🌌

N-Body Simulation

Gravitational physics, 512 bodies, 200 timesteps fused (SEQUENTIAL)

Ready

🎯

Acrobot-v1

Standard Gym RL, double pendulum, 500 steps with RK4 physics (SEQUENTIAL)

Ready

⛰️

MountainCar-v0

Standard Gym RL, 200 timesteps, linear policy (SEQUENTIAL)

Ready

⚖️

CartPole-v1

Standard Gym RL, inverted pendulum, 500 steps, 4→8→2 NN policy (SEQUENTIAL)

Ready

🎲

Monte Carlo Pi

Classic parallel estimation, 100K samples per worker (PARALLEL)

Ready

By clicking Run, your GPU model and benchmark results are saved anonymously. No personal information is collected. Privacy policy

Research

The science behind the benchmarks

Cross-vendor medians, fused vs unfused on the same device: 71× Apple Silicon, 56× NVIDIA, 20× phones (92 devices, 7 vendors). Controlled M2 Pro vs PyTorch MPS in the paper: 159× WebGPU, 720× CUDA on T4.

Gunaydin, A.B. (2026)

Single-Kernel Fusion for Sequential Fitness Evaluation via WebGPU Compute Shaders

doi:10.5281/zenodo.19331833

Every result is public

We don't cherry-pick. Every benchmark run from every device is published — GPU name, score, browser, OS, timestamp. No data is hidden. Verify any claim yourself.

Browse all results →

Companion projects

The research line and the end-to-end projects that build on it.

TheoryResearch line

kernelfusion.dev

The research line. Two published preprints, one npm SDK, 92 unique devices across 7 GPU vendors. The theory that all the applied projects build on.

webgpudna.com

Electron track-structure simulation ported from the CNRS/IN2P3 Geant4-DNA toolkit to WebGPU. One thread per primary, full 10 keV history in a single for-loop. Radiolysis chemistry and DNA damage scoring live in a browser tab.

See the simulation →LLM inference

zerotvm.com

Phi-3-mini (3.8B) running end-to-end via 10 kernel roles across 27 WGSL files, replacing the 85 TVM-autotuned shaders WebLLM needs. ~40 tok/s on M2 Pro, 22% behind WebLLM.

Run it live →Visualization

neuropulse.live

A real forward pass of Phi-3-mini visualised tensor-by-tensor. 3.8 billion parameters, your GPU, your browser — every glow is a live activation read back from WebGPU. Zero server, zero API key.

Watch it think →Quantum

webgpu-q.vercel.app

Statevector + MPS quantum simulator running on commodity hardware via WebGPU compute. Six-level research ladder from bandwidth-bound statevector through MPS, kernel fusion, WebRTC swarm, IBM hardware cross-verify, to chemistry/VQE. No CUDA, no install.

Open the simulator →Personal

barisgunaydin.com

Personal site and project hub.

About →