NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

Kwon Crash

Published Jun 4, 2026, 9:56 PM UTC

Source: AISource

- NVIDIA dropped Nemotron 3 Ultra, a 550B parameter beast designed for agents that refuse to die. It’s a hybrid Mamba-Transformer with 1M token context, meaning your AI can finally remember the entire chat history without forgetting why it’s angry. The architecture is fancy—MoE, NVFP4 quantization, and enough throughput to make inference costs drop faster than a rug pull. It’s open weights, which is great, until some moonboy tries to fine-tune it to predict Solana pumps. Spoiler: it won’t work. But hey, at least the "meat wallet" paying for the GPU hours gets better accuracy. Enjoy the efficiency gains before the next regulatory audit kills the compute farm.