vLLM CVE-2026-54235
MEDIUMSeverity by source
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X
Network-reachable API requires no confirmed authentication; single crafted parameter crashes the worker (A:H); no confidentiality or integrity impact identified.
Primary rating from Vendor (https://github.com/vllm-project/vllm).
CVSS VectorVendor: https://github.com/vllm-project/vllm
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X
Lifecycle Timeline
3Blast Radius
ecosystem impact- 4 pypi packages depend on vllm (3 direct, 1 indirect)
Ecosystem-wide dependent count for version 0.23.0.
DescriptionCVE.org
Summary
All temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: -Infinity is correctly caught.
Root Cause
sampling_params.py:384:
if 0 < self.temperature < _MAX_TEMP:
# NaN → False; +Inf → Falsesampling_params.py:462:
if self.temperature < 0.0:
# NaN → False; +Inf → False
raise VLLMValidationError(...)No math.isnan() or math.isinf() check exists anywhere in sampling_params.py.
Python semantics (verified): float('nan') < 0.0 → False, float('inf') < 0.0 → False.
Impact
Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users.
Remediation
Add math.isfinite(self.temperature) check in _verify_args(). Reject non-finite float values with a 400 error.
Fix
A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/45116
AnalysisAI
Temperature parameter validation in vLLM (pip/vllm ≤ 0.23.0) can be bypassed by supplying NaN or positive Infinity as the temperature value, because Python's IEEE 754 float comparison operators silently return False for these inputs, allowing the values to propagate unchecked into GPU CUDA sampling kernels. The invalid inputs trigger undefined behavior or fatal CUDA errors that crash the inference worker process, dropping all in-flight requests and degrading service for every concurrent user sharing that worker. …
Unlock full vulnerability intelligence
- Risk assessment & exploitation conditions
- Attack chain visualization
- Remediation with exact patch versions
- Threat intelligence from 22 sources
- Personal watchlist & email alerts
Free forever · No credit card required
Attack ChainAIDerived
Hypothetical attack flow derived from CVE metadata
Vulnerability AssessmentAI
| Exploitation | Exploitation requires the ability to submit an inference request to the vLLM API and supply an arbitrary floating-point value for the temperature parameter. … Additional conditions and limiting factors are described in the full assessment. |
| Risk Assessment | No official CVSS score or vector was provided in any source, so all metric assessments in this analysis are independently inferred and should be treated accordingly. … Full risk analysis with EPSS, KEV, and SSVC signal comparison available after sign-in. |
| Exploit Scenario | An attacker with network access to an exposed vLLM inference API endpoint submits a standard generation request with the temperature field set to NaN or positive Infinity - expressible in JSON via a non-compliant NaN literal or through a Python client that serializes float('nan') or float('inf'). The value bypasses all comparison-based guards in SamplingParams._verify_args(), is forwarded directly to the CUDA softmax sampling kernel, and produces a fatal CUDA error that terminates the inference worker process, immediately dropping all in-flight requests from every concurrent user. … |
| Remediation | An upstream fix is available via GitHub PR #45116 (https://github.com/vllm-project/vllm/pull/45116) and the associated commit d598d239737cfa37bcfcb98886ec3f3557fc7198, which adds math.isfinite() guards for both temperature and repetition_penalty in _verify_args(). … Detailed patch versions, workarounds, and compensating controls in full report. |
Threat intelligence, references, and detailed analysis are available after sign-in.
Wazuh SIEM platform versions 4.4.0 through 4.9.0 contain an unsafe deserialization vulnerability in the DistributedAPI t
BentoML version 1.4.2 and earlier contains an unauthenticated remote code execution vulnerability through insecure deser
pgAdmin 4 contains critical remote code execution vulnerabilities in the Query Tool download and Cloud Deployment endpoi
BentoML is a Python library for building online serving systems optimized for AI apps and model inference. Rated critica
pyLoad download manager version prior to 0.5.0b3.dev77 exposes the Flask SECRET_KEY through an unauthenticated endpoint.
Unauthenticated remote code execution in Marimo ≤0.20.4 allows attackers to execute arbitrary system commands via the `/
pyLoad is the free and open-source Download Manager written in pure Python. Rated medium severity (CVSS 5.3), this vulne
Langflow (a visual LLM pipeline builder) contains a critical unauthenticated code execution vulnerability (CVE-2026-3301
Code injection in Langflow CSV Agent node before 1.8.0. The node hardcodes allow_dangerous_code=True, enabling arbitrary
A vulnerability, that could result in Remote Code Execution (RCE), has been found in DocsGPT. Rated critical severity (C
## Abstract Trend Micro's Zero Day Initiative has identified a vulnerability affecting FlowiseAI Flowise. ## Vulnerabi
Keras Model.load_model can execute arbitrary code even with safe_mode=True by manipulating the config.json inside a .ker
Same technique Denial Of Service
View allVendor StatusVendor
Share
External POC / Exploit Code
Leaving vuln.today
GHSA-7h4p-rffg-7823