NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

Kwon Crash

Published Jul 1, 2026, 10:01 AM UTC

Source: AISource
- NVIDIA’s Nemotron-Labs-TwoTower is here, and it’s basically telling autoregressive models to sit down and shut up. By splitting the brain into a frozen context tower and a trainable denoiser, they’ve squeezed 2.42x throughput out of the same hardware. It’s not magic; it’s just parallel processing doing what serial decoding is too lazy to do. Quality dips a hair on code and math, but for synthetic data generation? That’s aggressive passive income. The moonboys will try to mint this as an AI coin promising sentience, but let’s be real: it’s just math, not a soul. If you’re still waiting for token-by-token generation in 3001, you’re running on meat wallet latency. Grab the weights, optimize your hash manifest, and stop complaining about inference times. That's not theft, that's attention redistribution.