llama.cpp CVE-2024-21825
HIGHSeverity by source
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Primary rating from NVD · only source for this CVE.
CVSS VectorNVD
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
DescriptionCVE.org
A heap-based buffer overflow vulnerability exists in the GGUF library GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.
AnalysisAI
Remote code execution in llama.cpp (commit 18c2e17) is possible when a victim loads a malicious .gguf model file, triggering a heap-based buffer overflow in the GGUF library's GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing routines. Publicly available exploit code exists, though EPSS rates near-term mass exploitation probability as low (0.19%, 41st percentile) and the issue is not listed in CISA KEV.
Technical ContextAI
llama.cpp is a widely used C/C++ inference runtime for LLaMA-family large language models, and GGUF (GPT-Generated Unified Format) is its native serialization format for model weights and metadata. The CPE cpe:2.3:a:ggml:llama.cpp confirms the ggml/llama.cpp project as the affected codebase. The root cause is classified as CWE-190 (Integer Overflow or Wraparound): during parsing of GGUF_TYPE_ARRAY and GGUF_TYPE_STRING fields, attacker-controlled length values are used in size calculations that wrap or are mishandled, producing an undersized heap allocation followed by an oversized copy - the classic integer-overflow-to-heap-buffer-overflow pattern. Because GGUF files are routinely shared on hubs like Hugging Face and dropped into local inference tooling, the parser sits directly on an untrusted-input boundary.
RemediationAI
Upstream fix available (PR/commit); released patched version not independently confirmed - rebuild llama.cpp from a current upstream master that postdates commit 18c2e17 and includes the GGUF parser hardening referenced in the Talos disclosure (see https://talosintelligence.com/vulnerability_reports/ for the corresponding TALOS report). Downstream consumers such as llama-cpp-python, Ollama, LM Studio, and text-generation-webui should be updated to a build that vendors the fixed llama.cpp commit. Until rebuilt, the most effective compensating control is to load only GGUF files from trusted, integrity-verified sources (signed hashes from the original model publisher) and to refuse files of unexpected size or origin; treat GGUFs from anonymous Hugging Face uploads or chat attachments as executable content. Where possible, run inference in a sandbox (container with no network, seccomp/AppArmor profile, dedicated low-privilege user) so a parser compromise cannot pivot - the trade-off is added operational complexity and possible GPU passthrough friction.
Share
External POC / Exploit Code
Leaving vuln.today