Large Language Models Quantization

Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.

TechBooky

Alibaba Expands Qwen Lineup with New Mid-Sized AI Models

Alibaba’s Qwen AI team has introduced a new Qwen3.5 Medium model series, adding fresh competition to the large language model ...

SiliconANGLE

Meta debuts slimmed-down Llama models for low-powered devices

Meta Platforms Inc. is striving to make its popular open-source large language models more accessible with the release of “quantized” versions of the Llama 3.2 1B and Llama 3B models, designed to run ...

Semiconductor Engineering

The On-Device LLM Revolution

Users running a quantized 7B model on a laptop expect 40+ tokens per second. A 30B MoE model on a high-end mobile device ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results