Quantization of LLMs - Search Videos

Quantization in modern LLMs - Advanced Quantization Techniques for Large Language Models Video Tutorial | LinkedIn Learning, formerly Lynda.com

Quantization in modern LLMs - Advanced Quantization Technique…

The magic behind LLMs from tokenization all the way to final prediction in under 1 min ✨ I’m obsessed with digging into every detail of LLMs and I found this visualization incredibly helpful for… | Amer H. | 11 comments

The magic behind LLMs from tokenization all the way to final pr…

11 views1 month ago

#edgeai #gguf #quantization #aiengineering #llm #ondeviceai #llamacpp #ollama #modeloptimization #aiinfrastructure #huggingface #unsloth #generativeai | Ashraf Nabout

#edgeai #gguf #quantization #aiengineering #llm #ondeviceai #l…

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB Source: Ivan Fioravanti | Thanh Hoang

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writin…

1.1K views2 weeks ago

FacebookThanh Hoang

What is Quantization? | IBM

What is Quantization? | IBM

LLM Model Quantization: An Overview

LLM Model Quantization: An Overview

28 viewsApr 3, 2024

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16)

LLMs can take gigabytes of memory to store, which limits what can be …

6.8K viewsApr 15, 2024

FacebookAndrew Ng

"Fine-tuning LLMs on AMD Strix Halo with Framework Deskto…

6.7K views · 292 reactions | Have you used quantization with an ope…

2.6K views2 weeks ago

FacebookAndrew Ng

Optimize LLMs for faster AI inference

351 views1 month ago

[LoRA] Unsloth Fine-Tuning: LoRA and QLoRA Guide. Efficient LLM fi…

389 views1 month ago

YouTubeAI Podcast Series. Byte Goose AI.

LLM Inference on a Budget: Speed vs. Cost! #llm #inference #optimiz…

YouTubeThe Code Architect

Run Giant AI Models on Your Laptop 🚀 (INT8 Explained)

6 views1 month ago

YouTubeForward Logic

🤯 Run LLMs on Your Laptop?! The Quantization Secret! #Shorts

YouTubeCodeTapasya

[IDSL Seminar'25] M-ANT: Efficient Low-bit Group Quantization for LL…

20 views2 months ago

LLMs for Quantitative Investment Research: A Practitioner’s Guide

38 views2 months ago

YouTubeThe AI Shift

Quantization Making LLMs Lightning Fast & Tiny

8 views1 month ago

YouTubeThe Code Architect

Why Your LLM Crashes Google Colab | VRAM, Quantization Explai…

208 views3 weeks ago

YouTubeAnalytics Vidhya

What Is Quantization | Quantization | TensorTeach

300 viewsNov 20, 2024

YouTubeTensorTeach

Understanding Symmetric Quantization | Quantization | Tens…

276 viewsNov 20, 2024

YouTubeTensorTeach

SmoothQuant

4.3K viewsOct 25, 2023

YouTubeMIT HAN Lab

Host a AI Server

453 viewsMar 27, 2024

YouTubeAI Arcade

What is LLM Quantization ?

3K views11 months ago

YouTubeNew Machina

How Does an AI Work

YouTubeAzure cloud, AI, Devops channel.

LLMs Naming Convention Explained

1.8K viewsSep 15, 2023

YouTubeAI Readme

AGI Dreams Podcast – September 07, 2025

YouTubeRobert Lee

AWQ for LLM Quantization

12.4K viewsOct 25, 2023

YouTubeMIT HAN Lab

Free Course: Training & Finetuning LLMs

96.9K viewsOct 5, 2023

YouTubeWeights & Biases

LLMs On The Edge

1.6K views8 months ago

YouTubeSemiconductor Engineering

Optimize Your AI - Quantization Explained

382.6K viewsDec 28, 2024

YouTubeMatt Williams

See more videos