CVE-2026-34756

| EUVD-2026-19351 MEDIUM
2026-04-03 https://github.com/vllm-project/vllm GHSA-3mwp-wvh9-7528
6.5
CVSS 3.1
Share

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Attack Vector
Network
Attack Complexity
Low
Privileges Required
Low
User Interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
High

Lifecycle Timeline

4
Analysis Generated
Apr 03, 2026 - 16:00 vuln.today
EUVD ID Assigned
Apr 03, 2026 - 16:00 euvd
EUVD-2026-19351
Patch Released
Apr 03, 2026 - 16:00 nvd
Patch available
CVE Published
Apr 03, 2026 - 15:35 nvd
MEDIUM 6.5

Description

### Summary A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the `n` parameter in the `ChatCompletionRequest` and `CompletionRequest` Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large `n` value. This completely blocks the Python `asyncio` event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. ### Details The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers: 1. **Protocol Layer:** In `vllm/entrypoints/openai/chat_completion/protocol.py`, the `n` parameter is defined simply as an integer without any `pydantic.Field` constraints for an upper bound. ```python class ChatCompletionRequest(OpenAIBaseModel): # Ordered by official OpenAI API documentation # https://platform.openai.com/docs/api/reference/chat/create messages: list[ChatCompletionMessageParam] model: str | None = None frequency_penalty: float | None = 0.0 logit_bias: dict[str, float] | None = None logprobs: bool | None = False top_logprobs: int | None = 0 max_tokens: int | None = Field( default=None, deprecated="max_tokens is deprecated in favor of " "the max_completion_tokens field", ) max_completion_tokens: int | None = None n: int | None = 1 presence_penalty: float | None = 0.0 ``` 1. **SamplingParams Layer (Incomplete Validation):** When the API request is converted to internal `SamplingParams` in `vllm/sampling_params.py`, the `_verify_args` method only checks the lower bound (`self.n < 1`), entirely omitting an upper bounds check. ```python def _verify_args(self) -> None: if not isinstance(self.n, int): raise ValueError(f"n must be an int, but is of type {type(self.n)}") if self.n < 1: raise ValueError(f"n must be at least 1, got {self.n}.") ``` 1. **Engine Layer (The OOM Trigger):** When the malicious request reaches the core engine (`vllm/v1/engine/async_llm.py`), the engine attempts to fan out the request `n` times to generate identical independent sequences within a synchronous loop. ```python # Fan out child requests (for n>1). parent_request = ParentRequest(request) for idx in range(parent_params.n): request_id, child_params = parent_request.get_child_info(idx) child_request = request if idx == parent_params.n - 1 else copy(request) child_request.request_id = request_id child_request.sampling_params = child_params await self._add_request( child_request, prompt_text, parent_request, idx, queue ) return queue ``` Because Python's `asyncio` runs on a single thread and event loop, this monolithic `for`-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances via `copy(request)`, driving the host's Resident Set Size (RSS) up by gigabytes per second until the OS `OOM-killer` terminates the vLLM process. ### Impact **Vulnerability Type:** Resource Exhaustion / Denial of Service **Impacted Parties:** - Any individual or organization hosting a public-facing vLLM API server (`vllm.entrypoints.openai.api_server`), which happens to be the primary entrypoint for OpenAI-compatible setups. - SaaS / AI-as-a-Service platforms acting as reverse proxies sitting in front of vLLM without strict HTTP body payload validation or rate limitations. Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.

Analysis

Denial of Service in vLLM OpenAI-compatible API server allows unauthenticated remote attackers to crash the service via a single HTTP request containing an extremely large n parameter. The lack of upper bound validation causes the asyncio event loop to freeze while allocating millions of request object copies, leading to rapid Out-Of-Memory crashes. …

Sign in for full analysis, threat intelligence, and remediation guidance.

Priority Score

32
Low Medium High Critical
KEV: 0
EPSS: +0.0
CVSS: +32
POC: 0

Share

CVE-2026-34756 vulnerability details – vuln.today

This site uses cookies essential for authentication and security. No tracking or analytics cookies are used. Privacy Policy