We proved that browsers can run GPU computation at near-native speed. No download. No account. No expensive hardware. Just open a page.
For decades, running computation on a graphics card meant navigating three barriers that kept most of the world locked out.
CUDA only runs on NVIDIA GPUs. If you have a Mac, an AMD card, or an Intel integrated chip — you're out.
Python environments, CUDA drivers, cuDNN versions, framework dependencies. One mismatch and nothing works.
Cloud GPU instances cost $2–4/hour. A university lab without GPU budget simply can't participate.
WebGPU is a new browser standard that gives web pages direct access to your graphics card. We showed it's fast enough for real scientific computation.
“For fitness functions with sequential dependencies — common in reinforcement learning, financial simulation, and control systems — custom compute shaders dramatically outperform framework-based GPU code.”
From the paper, Section 4.2
This isn't about replacing data centers. It's about giving GPU access to the people who never had it.
A computer science student in Lagos, Bangalore, or rural anywhere can now run real GPU-accelerated experiments on their $400 laptop. Open a browser tab, not a grant application.
Instead of slides about parallel computing, show it running live in the classroom. Every student's laptop becomes a GPU workstation. No lab setup, no admin permissions.
“To reproduce our results, open this URL.” Not “install Python 3.10.12, CUDA 12.1, cuDNN 8.9.7, match the exact driver version.” Reproducibility should be one click.
A biotech team in Nairobi running molecular screening. A fintech in Sao Paulo backtesting strategies. Their scientists' laptops ARE the compute cluster. No AWS required.
The technical details are in the paper. Here's the intuition.
The graphics chip in your laptop — whether it's Apple Silicon, AMD, Intel, or NVIDIA — is a parallel processor with thousands of cores. It spends most of its time idle.
WebGPU is a new web standard (shipping in Chrome since 2023) that lets web pages run computation directly on your GPU. Not just graphics — real number crunching.
Instead of sending thousands of small tasks to the GPU one by one (like PyTorch does), we pack the entire computation into a single instruction. One dispatch instead of 22,500. The browser only adds 48% overhead vs native, and it's still faster than PyTorch.
30 independent runs per experiment. Statistical tests. Comparisons against 8 systems on 2 hardware platforms. Not hype — evidence.
This isn't theoretical. Here's what's different tomorrow.
“Reproduce our results” → Install Python 3.10.12 → Install CUDA 12.1 → Install cuDNN 8.9.7 → Match driver version → Debug for 3 hours → Maybe it works
Click this link. Results run in your browser. Verified in 30 seconds.
“I need GPU compute” → AWS account → GPU instance ($2–4/hr) → DevOps setup → $5,000/month bill → Need funding before building
Users' own GPUs do the work. Server cost: $0. Ship on day one.
“Today we'll learn parallel computing” → Show slides → Students nod → Nobody runs anything because the school has no GPU lab
“Open this URL on your laptop.” 30 students run GPU computation simultaneously. They see it, touch it, modify it.
“Run screening on patient data” → 6-month legal review to upload to cloud → HIPAA/GDPR audit → $200K contract → Finally start work
Data never leaves the laptop. GPU compute runs in the browser. Compliance by architecture, not by contract.
On the same GPU, a browser running a compute shader is 159× faster than PyTorch on sequential workloads — because PyTorch launches 22,500 separate GPU tasks where we launch one.
We tested on the same Apple M2 Pro: WebGPU 46.2 gen/s vs PyTorch MPS 0.29 gen/s on a 1,500-step financial simulation. Same GPU, same memory, no hardware tricks. On a Tesla T4, JAX with loop fusion gets 13× over PyTorch CUDA — fusion helps, but our hand-fused shader still leads. An ablation (fused vs unfused, same hardware) isolates 2.18× from fusion alone. PyTorch's own compiler crashes when it tries to fuse at this scale (RecursionError at 1,000 steps, out of memory at 5,000).
Run real GPU benchmarks on your hardware, right now, in your browser.