2× Faster GPT‑OSS with Minimal BLEU Loss
March 26, 20264 min read
The new Puzzle extension adds dynamic routing for Mixture‑of‑Experts models, spreading work across experts to keep load even. On GPT‑OSS it cuts inference time by up to 2× while dropping BLEU by less than 0.3%, and it matches or beats the original on language and reasoning benchmarks.