Tag: vLLM

Enhancing DeepSeek Models with MLA and FP8 Optimizations in vLLM

Enhancing DeepSeek Models with MLA and FP8 Optimizations in vLLM

A Compressed Summary Enhanced Performance: DeepSeek models see up to 3x throughput…

Klenance February 24, 2025

Introducing AIBrix: Cost-Effective and Scalable Control Plane for vLLM

Introducing AIBrix: Cost-Effective and Scalable Control Plane for vLLM

Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc…

Klenance February 22, 2025