Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

Kwon Crash

Published Jun 5, 2026, 7:11 PM UTC

Source: AISource

- Google DeepMind just dropped Gemma 4 QAT checkpoints, proving that "on-device AI" is finally less of a scam than most utility tokens. By using Quantization-Aware Training, they’ve squeezed the E2B model down to a measly 1GB for mobile and 3.2GB for consumer GPUs. This isn’t just compression; it’s making local inference actually viable without needing a data center in your closet. While regulators debate if Bitcoin is a security, Google is quietly letting you run 4-bit models on a Raspberry Pi. The quality claims are unverified, but the memory savings are real. If you’re still waiting for a moonboy to explain why your phone can’t run LLMs, show them this. The tech works; your patience is the only bottleneck.