MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

Kwon Crash

Published Jun 17, 2026, 11:37 AM UTC

Source: AISource
- MiniMax dropped MSA, a sparse attention method that finally treats long-context AI like the bloated, inefficient mess it is. By using a two-branch system to cherry-pick only the top 16 key-value blocks per query, they slashed compute costs by 28.4x without tanking benchmarks. It’s not magic; it’s just basic math applied to a model that was previously hoarding data like a dragon with a hoarding disorder. The open-source kernel targets NVIDIA SM100 GPUs, proving that efficiency doesn’t require a Chrome Syndicate debt contract. While moonboys chase 2000% pumps on garbage tokens, real infrastructure is getting leaner. Stop paying for quadratic attention when linear logic works. Where's my cut?