Skip to main content

llama.cpp CVE-2024-23496

HIGH
Integer Overflow or Wraparound (CWE-190)
2024-02-26 talos-cna@cisco.com
8.8
CVSS 3.1 · NVD
Share

Severity by source

NVD PRIMARY
8.8 HIGH
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Primary rating from NVD · only source for this CVE.

CVSS VectorNVD

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Attack Vector
Network
Attack Complexity
Low
Privileges Required
None
User Interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

DescriptionCVE.org

A heap-based buffer overflow vulnerability exists in the GGUF library gguf_fread_str functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.

AnalysisAI

Remote code execution in llama.cpp (commit 18c2e17) occurs when the GGUF library's gguf_fread_str function parses a maliciously crafted .gguf model file, triggering a heap-based buffer overflow rooted in integer overflow handling (CWE-190). Any user or service loading an untrusted GGUF model into a vulnerable llama.cpp build can be compromised, with publicly available exploit code increasing accessibility despite a low EPSS score of 0.15%.

Technical ContextAI

llama.cpp is a widely deployed C/C++ inference engine for LLaMA-family large language models, used in local LLM runners, chat frontends, and embedded AI tooling. The GGUF (GPT-Generated Unified Format) is its binary model serialization format, and gguf_fread_str is the helper that reads length-prefixed strings from .gguf files. CWE-190 (Integer Overflow or Wraparound) indicates that an attacker-controlled length field is processed without sufficient bounds checking, producing an undersized heap allocation followed by an oversized copy - a classic heap buffer overflow primitive that can corrupt adjacent heap metadata or function pointers and yield arbitrary code execution. The affected CPE cpe:2.3:a:ggml:llama.cpp confirms the ggml-maintained upstream project as the impacted component.

RemediationAI

Upstream fix available (PR/commit); released patched version not independently confirmed from the provided data - consult the Talos Intelligence advisory and the ggerganov/llama.cpp GitHub repository for the commit that follows 18c2e17 and rebuild against that or a later release. As a compensating control, only load .gguf files from trusted, signature-verified sources and reject models obtained from untrusted hubs or user uploads, which limits attack surface at the cost of model availability. Where third-party models must be supported, run llama.cpp inside a sandbox (seccomp, container with no network, or a separate low-privilege user) so that successful exploitation does not yield the calling application's privileges, accepting the operational complexity that sandboxing adds. Disabling or pre-screening GGUF file ingestion at upload boundaries (size/length sanity checks) provides partial mitigation but is not a substitute for patching the parser.

Share

CVE-2024-23496 vulnerability details – vuln.today

This site uses cookies essential for authentication and security. No tracking or analytics cookies are used. Privacy Policy