llama.cpp CVE-2024-21836
HIGHSeverity by source
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Primary rating from NVD · only source for this CVE.
CVSS VectorNVD
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
DescriptionCVE.org
A heap-based buffer overflow vulnerability exists in the GGUF library header.n_tensors functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.
AnalysisAI
Heap-based buffer overflow in llama.cpp's GGUF library header parser (commit 18c2e17) enables code execution when a victim loads a maliciously crafted .gguf model file. The CWE-190 integer overflow in the n_tensors field corrupts heap allocations, leading to attacker-controlled memory writes. Publicly available exploit code exists, though EPSS remains low at 0.15% (35th percentile), and there is no public exploit identified as actively used per CISA KEV.
Technical ContextAI
llama.cpp is the widely-used C/C++ inference engine for LLaMA-family large language models, distributed by the ggml project (CPE cpe:2.3:a:ggml:llama.cpp). The GGUF file format is its native serialization for model weights and metadata, replacing the older GGML format. The vulnerability resides in header parsing where the n_tensors field - an attacker-controlled count - feeds into heap allocation arithmetic. CWE-190 (Integer Overflow or Wraparound) describes the root cause: oversized n_tensors values wrap during size computation, producing an undersized buffer that subsequent tensor metadata writes overflow, corrupting adjacent heap chunks.
RemediationAI
Upstream fix available (PR/commit); released patched version not independently confirmed from the provided data, so users should update llama.cpp to the latest main-branch build past commit 18c2e17 and rebuild any downstream wrappers (Python bindings, server binaries) that statically link the GGUF parser. Consult the Cisco Talos advisory referenced by talos-cna@cisco.com for the exact remediation commit. Until updated, treat .gguf files as untrusted input: only load model files from cryptographically verified sources, validate checksums against publisher signatures, and isolate inference workloads in containers or sandboxed user accounts with no access to credentials or sensitive data. Avoid auto-loading models supplied through web uploads or chat attachments, which removes a convenient delivery vector at the cost of workflow friction.
Share
External POC / Exploit Code
Leaving vuln.today