CVSS Vector
CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:N/I:H/A:L
Lifecycle Timeline
3Description
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm. This discrepancy results in inconsistency between audio heard by humans (e.g., through headphones/regular speakers) and audio processed by AI models (Which infra via Librosa, such as vllm, transformer). This issue has been patched in version 0.18.0.
Analysis
vLLM versions 0.5.5 through 0.17.x use incorrect mono audio downmixing via numpy.mean instead of the ITU-R BS.775-4 weighted standard, causing audio processed by AI models to diverge from human perception. An authenticated remote attacker with low privileges can exploit this inconsistency to manipulate audio-based model outputs or infer mismatches between expected and actual audio processing, affecting integrity of audio-driven inference pipelines. …
Sign in for full analysis, threat intelligence, and remediation guidance.
Priority Score
Share
External POC / Exploit Code
Leaving vuln.today
EUVD-2026-18522