CVE-2025-62426
MEDIUMCVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Lifecycle Timeline
3Tags
Description
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
Analysis
vLLM is an inference and serving engine for large language models (LLMs). Rated medium severity (CVSS 6.5), this vulnerability is remotely exploitable, low attack complexity. This Allocation of Resources Without Limits vulnerability could allow attackers to exhaust system resources through uncontrolled allocation.
Technical Context
This vulnerability is classified as Allocation of Resources Without Limits (CWE-770), which allows attackers to exhaust system resources through uncontrolled allocation. vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1. Affected products include: Vllm. Version information: version 0.5.5.
Affected Products
Vllm.
Remediation
A vendor patch is available. Apply the latest security update as soon as possible. Set resource limits, implement rate limiting, validate input sizes.
Priority Score
Vendor Status
Share
External POC / Exploit Code
Leaving vuln.today