Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

Kwon Crash

Published Jun 15, 2026, 11:35 AM UTC

Source: AISource

- Flash-KMeans just dropped, and it’s making FAISS look like a dial-up modem. Researchers from UC Berkeley and UT Austin realized that while k-means math hasn’t changed, GPU memory bandwidth is still the bottleneck. By ditching the massive distance matrix materialization with FlashAssign and fixing atomic contention via Sort-Inverse Update, they’re running exact Lloyd’s iterations over 200x faster than FAISS on an H200. This isn’t some vaporware AI scam; it’s pure IO optimization. If you’re still waiting for your vector search to finish while the market moves, you’re wasting time. Optimize your dataflow or stay poor. Where's my cut?