Article count:10350 Read by:146647018

Account Entry

Microsoft open-sources the popular 1.58-bit large model inference framework! After quantization, the model with hundreds of billions of parameters can run on a single CPU at a speed of 5-7 tokens per second

Latest update time:2024-10-22
    Reads:
The west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI

Microsoft Open source 1 bit large model inference framework!

Now, after quantization, a large model with 100 billion parameters can run on a single CPU at a speed of 5-7 tokens per second.

For example, running the BitNet b1.58 3B model on the new Apple M2 , be like: