Microsoft open-sources the popular 1.58-bit large model inference framework! After quantization, the model with hundreds of billions of parameters can run on a single CPU at a speed of 5-7 tokens per second
The west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI
Microsoft Open source 1 bit large model inference framework!
Now, after quantization, a large model with 100 billion parameters can run on a single CPU at a speed of 5-7 tokens per second.
For example, running the BitNet b1.58 3B model on the new Apple M2 , be like: