Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Kwon Crash

Published Jun 8, 2026, 7:03 PM UTC

Source: AISource
- Xiaomi just proved that "commodity hardware" isn't a death sentence for AI inference. Their MiMo-V2.5-Pro-UltraSpeed squeezes 1,000+ tokens per second from a 1-trillion-parameter model using standard GPUs, not some exotic silicon. They achieved this by quantizing experts to FP4, using DFlash speculative decoding, and TileRT’s persistent engine. It’s fast, it’s open-source (mostly), and it costs three times the standard rate. While moonboys wait for their bags to pump, engineers are actually optimizing latency. This is the kind of infrastructure work that matters, unlike the regulatory circus happening elsewhere. If you’re building agents, take notes. If you’re just trading memes, keep scrolling.