Skip to main content

llama.cpp CVE-2024-23605

HIGH
Integer Overflow or Wraparound (CWE-190)
2024-02-26 talos-cna@cisco.com
8.8
CVSS 3.1 · NVD
Share

Severity by source

NVD PRIMARY
8.8 HIGH
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Primary rating from NVD · only source for this CVE.

CVSS VectorNVD

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Attack Vector
Network
Attack Complexity
Low
Privileges Required
None
User Interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

DescriptionCVE.org

A heap-based buffer overflow vulnerability exists in the GGUF library header.n_kv functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.

AnalysisAI

Remote code execution in llama.cpp (GGUF library) allows attackers to achieve arbitrary code execution by tricking a user into loading a maliciously crafted .gguf model file, exploiting a heap-based buffer overflow in the header.n_kv parsing logic at commit 18c2e17. Publicly available exploit code exists, though EPSS rates real-world exploitation probability low at 0.15% (35th percentile), reflecting the user-interaction requirement. The flaw was reported by Cisco Talos and impacts confidentiality, integrity, and availability of any system loading untrusted GGUF models.

Technical ContextAI

llama.cpp is a widely used C/C++ inference engine for running LLaMA-family large language models locally, and GGUF is its native binary model format that encodes tensor data alongside key-value metadata in the file header. The vulnerability sits in the GGUF parser's handling of the header.n_kv field, which declares how many key-value metadata entries follow; CWE-190 (Integer Overflow or Wraparound) indicates that an attacker-controlled count value is used in arithmetic - likely a multiplication for allocation sizing - that wraps around and produces an undersized heap buffer, leading to subsequent out-of-bounds heap writes when the entries are populated. The affected CPE cpe:2.3:a:ggml:llama.cpp covers the upstream ggml-org project, and because GGUF is the de-facto model format consumed by many downstream tools (Ollama, LM Studio, text-generation-webui, llama-cpp-python bindings), the parser code is widely embedded beyond the upstream binary itself.

RemediationAI

Upstream fix available (commit) per the Cisco Talos disclosure (TALOS-2024-1903); released patched version not independently confirmed from the provided data, so operators should pull the latest llama.cpp main branch past commit 18c2e17 and rebuild, then verify downstream wrappers (llama-cpp-python, Ollama, etc.) have updated to a llama.cpp revision that includes the GGUF parser fix. As a compensating control until patched, restrict .gguf model loading to files from cryptographically verified sources only (signed releases from official Hugging Face repos with known publishers), and avoid loading community-uploaded or unsigned GGUF files; the trade-off is reduced model experimentation flexibility. Network-segment inference workloads so they cannot fetch arbitrary models from the public internet, accepting the operational cost of curating an allowlist of trusted model sources. Refer to the Cisco Talos advisory referenced in the CVE record for the exact commit hash of the fix.

Share

CVE-2024-23605 vulnerability details – vuln.today

This site uses cookies essential for authentication and security. No tracking or analytics cookies are used. Privacy Policy