Linux CVE-2025-38436
MEDIUMCVSS VectorNVD
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Lifecycle Timeline
1DescriptionNVD
In the Linux kernel, the following vulnerability has been resolved:
drm/scheduler: signal scheduled fence when kill job
When an entity from application B is killed, drm_sched_entity_kill() removes all jobs belonging to that entity through drm_sched_entity_kill_jobs_work(). If application A's job depends on a scheduled fence from application B's job, and that fence is not properly signaled during the killing process, application A's dependency cannot be cleared.
This leads to application A hanging indefinitely while waiting for a dependency that will never be resolved. Fix this issue by ensuring that scheduled fences are properly signaled when an entity is killed, allowing dependent applications to continue execution.
AnalysisAI
Linux kernel DRM scheduler fails to signal scheduled fences when killing job entities, causing dependent applications to hang indefinitely waiting for unresolved dependencies. Authenticated local users can trigger this denial of service by terminating applications whose job dependencies are not properly cleared during entity kill operations. The vulnerability affects multiple Linux kernel versions and has been patched upstream.
Technical ContextAI
The vulnerability exists in the Linux kernel's Direct Rendering Manager (DRM) scheduler subsystem, specifically in the job entity lifecycle management. The DRM scheduler coordinates GPU job execution across multiple applications through fence mechanisms that track task dependencies. When drm_sched_entity_kill() terminates an entity (representing one application's GPU work), it invokes drm_sched_entity_kill_jobs_work() to remove all associated jobs. However, the vulnerability stems from inadequate fence signaling during this cleanup: if Application A's job has a scheduled fence dependency on Application B's job, and Application B's entity is killed without properly signaling its scheduled fences, Application A's job remains stuck waiting for a dependency signal that never arrives. This represents a classic CWE-667 (Improper Locking) scenario in concurrent subsystem design where resource cleanup in one execution context fails to notify dependent contexts.
RemediationAI
Update the Linux kernel to a version incorporating the upstream fix. The primary remediation is to apply kernel security updates from the stable tree branches; check your distribution's kernel update channel for patched versions corresponding to the commits referenced (471db2c2d4f, 8342127a8a6, aa382a8b6ed, aefd0a9356, c5734f9bab6). For systems unable to immediately patch, implement process isolation and resource limits: use cgroups and namespace isolation to prevent cross-application GPU job dependency chains (separate cgroups for unrelated applications), disable GPU sharing between untrusted applications where feasible, and implement application health monitoring and restart policies to mitigate hangs. Set aggressive application watchdog timeouts (e.g., cgroup memory.memsw.limit_in_bytes, systemd service restarts) so that hanging applications are automatically terminated and restarted rather than blocking system resources indefinitely. These compensating controls reduce availability impact but do not address the underlying vulnerability and may mask other issues causing legitimate long-running GPU workloads.
Vendor StatusVendor
Share
External POC / Exploit Code
Leaving vuln.today