In Feb 2026 Hugging Face shipped Transformers.js v4 — a C++ WebGPU runtime built with Microsoft's ONNX Runtime team — as the production answer for browser LLM inference. Zero-TVM shows that for Phi-3 Mini specifically, the answer can instead be 10 kernel roles across 27 WGSL files and ~2,000 lines of TypeScript. No compiler, no WASM, no server. Requires WebGPU with shader-f16; first load downloads ~2.1 GB of Q4F16 weights, cached after.