CWE-36 CVEs and Security Vulnerabilities

Search

Use the Query Builder to create your own search query, or check out the documentation to learn the search syntax.

Search Examples

CVEs in KEV CVEs with EPSS >= 80% Crit. Microsoft High Apache SQL Injection (CWE-89) Linux Kernel High (CVSS 3.1) Apache Struts RCE (Remote Code Execution) XSS (CWE-79) Critical (CVSS 4.0) CVEs to check

Search Results (4047 CVEs found)

Export CSV

CVE	Vendors	Products	Updated	CVSS v3.1
CVE-2026-53259	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: ipv6: anycast: insert aca into global hash under idev->lock syzbot reported a splat [1]: a slab-use-after-free in ipv6_chk_acast_addr(), which walks the global inet6_acaddr_lst[] hash under RCU and dereferences a struct ifacaddr6 that has already been freed while still linked in the hash, so a later reader walks into a dangling node. In __ipv6_dev_ac_inc() the aca is allocated with refcount 1, then aca_get() bumps it to 2 to keep it alive across the unlocked region. It is published to idev->ac_list under idev->lock, but ipv6_add_acaddr_hash() runs after write_unlock_bh(). A concurrent teardown (ipv6_ac_destroy_dev() from addrconf_ifdown(), under RTNL) can slip into that window: CPU0 __ipv6_dev_ac_inc CPU1 ipv6_ac_destroy_dev (RTNL) ------------------------------ ------------------------------------ aca_alloc() refcnt 1 aca_get() refcnt 2 write_lock_bh(idev->lock) add aca to ac_list write_unlock_bh(idev->lock) write_lock_bh(idev->lock) pull aca off ac_list write_unlock_bh(idev->lock) ipv6_del_acaddr_hash(aca) hlist_del_init_rcu() is a no-op, aca is not in the hash yet aca_put() refcnt 2->1 ipv6_add_acaddr_hash(aca) aca now inserted into the hash aca_put() refcnt 1->0 call_rcu(aca_free_rcu) -> kfree(aca) The hash removal becomes a no-op because the insertion has not happened yet, so once CPU0 inserts and drops the last reference, the aca is freed while still linked in inet6_acaddr_lst[], and readers dereference freed memory after the slab slot is reused. This window opened once RTNL stopped serializing the join path against device teardown. Move ipv6_add_acaddr_hash() inside the idev->lock section so the ac_list and hash insertions are atomic with respect to teardown: a racing remover now either misses the aca entirely or finds it in both lists. acaddr_hash_lock is now nested under idev->lock, which is acquired in softirq context, so switch all acaddr_hash_lock sites to spin_lock_bh() to avoid the irq lock inversion reported in [2]. [1] https://syzkaller.appspot.com/bug?extid=a01df04303c131efbf3a [2] https://lore.kernel.org/netdev/[email protected]/
CVE-2026-53256	1 Linux	1 Linux Kernel	2026-06-28	8 High
In the Linux kernel, the following vulnerability has been resolved: Bluetooth: RFCOMM: hold listener socket in rfcomm_connect_ind() rfcomm_get_sock_by_channel() scans rfcomm_sk_list under the list lock, but returns the selected listener after dropping that lock without taking a reference. rfcomm_connect_ind() then locks the listener, queues a child socket on it, and may notify it after unlocking it. The buggy scenario involves two paths, with each column showing the order within that path: rfcomm_connect_ind(): listener close: 1. Find parent in 1. close() enters rfcomm_get_sock_by_channel() rfcomm_sock_release(). 2. Drop rfcomm_sk_list.lock 2. rfcomm_sock_shutdown() without pinning parent. closes the listener. 3. Call lock_sock(parent) and 3. rfcomm_sock_kill() bt_accept_enqueue(parent, unlinks and puts parent. sk, true). 4. Read parent flags and may 4. parent can be freed. call sk_state_change(). If close wins the race, parent can be freed before rfcomm_connect_ind() reaches lock_sock(), bt_accept_enqueue(), or the deferred-setup callback. Take a reference on the listener before leaving rfcomm_sk_list.lock. After lock_sock() succeeds, recheck that it is still in BT_LISTEN before queueing a child, cache the deferred-setup bit while the parent is locked, and drop the reference after the last parent use. KASAN reported a slab-use-after-free in lock_sock_nested() from rfcomm_connect_ind(), with the freeing stack going through rfcomm_sock_kill() and rfcomm_sock_release().
CVE-2026-53240	1 Linux	1 Linux Kernel	2026-06-28	8.8 High
In the Linux kernel, the following vulnerability has been resolved: xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload __input_process_payload() stores first_skb into xtfs->ra_newskb under drop_lock when starting partial reassembly, then unlocks and breaks out of the processing loop. The post-loop check reads xtfs->ra_newskb without the lock to decide whether first_skb is still owned: if (first_skb && first_iplen && !defer && first_skb != xtfs->ra_newskb) Between spin_unlock and this read, a concurrent CPU running iptfs_reassem_cont() (or the drop_timer hrtimer) can complete reassembly, NULL xtfs->ra_newskb, and free the skb. The check then evaluates first_skb != NULL as true, and pskb_trim/ip_summed/consume_skb operate on the freed skb — a use-after-free in skbuff_head_cache. Replace the unlocked read with a local bool that records whether first_skb was handed to the reassembly state in the current call. The flag is set after the existing spin_unlock, before the break, using the pointer equality that is stable at that point (first_skb == skb iff first_skb was stored in ra_newskb).
CVE-2026-53239	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx() Fix the race by pruning the bin while still holding xfrm_policy_lock, before dropping it. Use __xfrm_policy_inexact_prune_bin() directly since the lock is already held. The wrapper xfrm_policy_inexact_prune_bin() becomes unused and is removed. Race: CPU0 (XFRM_MSG_DELPOLICY) CPU1 (XFRM_MSG_NEWSPDINFO) ========================== ========================== xfrm_policy_bysel_ctx(): spin_lock_bh(xfrm_policy_lock) bin = xfrm_policy_inexact_lookup() __xfrm_policy_unlink(pol) spin_unlock_bh(xfrm_policy_lock) xfrm_policy_kill(ret) // wide window, lock not held xfrm_hash_rebuild(): spin_lock_bh(xfrm_policy_lock) __xfrm_policy_inexact_flush(): kfree_rcu(bin) // bin freed spin_unlock_bh(xfrm_policy_lock) xfrm_policy_inexact_prune_bin(bin) // UAF: bin is freed
CVE-2026-53189	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: mm/huge_memory: update file PMD counter before folio_put() __split_huge_pmd_locked() updates the file/shmem RSS counter after dropping the PMD mapping's folio reference. If folio_put() drops the last reference, mm_counter_file() can later read freed folio state via folio_test_swapbacked(). Move the counter update before folio_put().
CVE-2026-53185	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: zram: fix use-after-free in zram_bvec_write_partial() zram_read_page() picks the sync or async backing device read path based on whether the parent bio is NULL. zram_bvec_write_partial() passes its parent bio down, so for ZRAM_WB slots the read is dispatched asynchronously and zram_read_page() returns 0 while the bio is still in flight. The caller then runs memcpy_from_bvec(), zram_write_page() and __free_page() on the buffer, leaving the async read to write into a freed page. zram_bvec_read_partial() was switched to NULL in commit 4e3c87b9421d ("zram: fix synchronous reads") for the same reason; the write_partial counterpart was missed.
CVE-2026-53162	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: memcg: use round-robin victim selection in refill_stock Harry Yoo reported that get_random_u32_below() is not safe to call in the nmi context and memcg charge draining can happen in nmi context. More specifically get_random_u32_below() is neither reentrant- nor NMI-safe: it acquires a per-cpu local_lock via local_lock_irqsave() on the batched_entropy_u32 state. An NMI that lands on a CPU mid-update of the ChaCha batch state and recurses into the random subsystem would corrupt that state. The memcg_stock local_trylock prevents re-entry on the percpu stock itself, but cannot protect an unrelated subsystem's per-cpu lock. Replace the random pick with a per-cpu round-robin counter stored in memcg_stock_pcp and serialized by the same local_trylock that already guards cached[] and nr_pages[]. No atomics, no random calls, no extra locks needed.
CVE-2026-53160	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: misc: fastrpc: fix use-after-free race in fastrpc_map_create fastrpc_map_lookup returns a raw pointer after releasing fl->lock. The caller fastrpc_map_create then calls fastrpc_map_get (kref_get_unless_zero) on this unprotected pointer. A concurrent MEM_UNMAP can free the map between the lock release and the kref operation, resulting in a use-after-free on the freed slab object. Restore the take_ref parameter to fastrpc_map_lookup so the reference is acquired atomically under fl->lock before the pointer is exposed to the caller.
CVE-2026-53145	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: drm/gem: Try to fix change_handle ioctl, attempt 4 [airlied: just added some comments on how to reenable] On-list because the cat is out of the bag and we're clearly not good enough to figure this out in private. The story thus far: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") tried to fix a race condition between the gem_close and gem_change_handle ioctls, but got a few things wrong: - There's a confusion with the local variable handle, which is actually the new handle, and so the two-stage trick was actually applied to the wrong idr slot. 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") tried to fix that by adding yet another code block, but forgot to add the error handling. Which meant we now have two paths, both kinda wrong. - dc366607c41c ("drm: Replace old pointer to new idr") tried to apply another fix, but inconsistently, again because of the handle confusion - this would be the right fix (kinda, somewhat, it's a mess) if we'd do the two-stage approach for the new handle. Except that wasn't the intent of the original fix. We also didn't have an igt merged for the original ioctl, which is a big no-go. This was attempted to address off-list in the original bugfix, and amd QA people claimed the bug was fixed now. Very clearly that's not the case. Here's my attempt to sort this out: - Rename the local variable to new_handle, the old aliasing with args->handle is just too dangerously confusing. - Merge the gem obj lookup with the two-stage idr_replace so that we avoid getting ourselves confused there. - This means we don't have a surplus temporary reference anymore, only an inherited from the idr. A concurrent gem_close on the new_handle could steal that. Fix that with the same two-stage approach create_tail uses. This is a bit overkill as documented in the comment, but I also don't trust my ability to understand this all correctly, so go with the established pattern we have from other ioctls instead for maximum paranoia. - Adjust error paths. I've tried to make the error and success paths common, because they are identical except for which handle is removed and on which we call idr_replace to (re)install the object again. But that made things messier to read, so I've left it at the more verbose version, which unfortunately hides the symmetry in the entire code flow a bit. - While at it, also replace the 7 space indent with 1 tab. And finally, because I flat out don't trust my abilities here at all anymore: - Disable the ioctl until we have the igt situation and everything else sorted out on-list and with full consensus. v2: Sashiko noticed that I didn't handle the error path for idr_replace correctly, it must be checked with IS_ERR_OR_NULL like in gem_handle_delete. So yeah, definitely should just the existing paths 1:1 because this is endless amounts of tricky. Also add the Fixes: line for the original ioctl, I forgot that too.
CVE-2026-53086	1 Linux	1 Linux Kernel	2026-06-28	9.8 Critical
In the Linux kernel, the following vulnerability has been resolved: net: bcmgenet: fix racing timeout handler The bcmgenet_timeout handler tries to take down all tx queues when a single queue times out. This is over zealous and causes many race conditions with queues that are still chugging along. Instead lets only restart the timed out queue.
CVE-2026-53072	1 Linux	1 Linux Kernel	2026-06-28	8.8 High
In the Linux kernel, the following vulnerability has been resolved: Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER When protocol sets HCI_PROTO_DEFER, hci_conn_request_evt() calls hci_connect_cfm(conn) without hdev->lock. Generally hci_connect_cfm() assumes it is held, and if conn is deleted concurrently -> UAF. Only SCO and ISO set HCI_PROTO_DEFER and only for defer setup listen, and HCI_EV_CONN_REQUEST is not generated for ISO. In the non-deferred listening socket code paths, hci_connect_cfm(conn) is called with hdev->lock held. Fix by holding the lock.
CVE-2026-53050	1 Linux	1 Linux Kernel	2026-06-28	7.8 High
In the Linux kernel, the following vulnerability has been resolved: quota: Fix race of dquot_scan_active() with quota deactivation dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set / spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { / finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it.
CVE-2026-52982	1 Linux	1 Linux Kernel	2026-06-28	9.8 Critical
In the Linux kernel, the following vulnerability has been resolved: net: usb: rtl8150: fix use-after-free in rtl8150_start_xmit() syzbot reported a KASAN slab-use-after-free read in rtl8150_start_xmit() when accessing skb->len for tx statistics after usb_submit_urb() has been called: BUG: KASAN: slab-use-after-free in rtl8150_start_xmit+0x71f/0x760 drivers/net/usb/rtl8150.c:712 Read of size 4 at addr ffff88810eb7a930 by task kworker/0:4/5226 The URB completion handler write_bulk_callback() frees the skb via dev_kfree_skb_irq(dev->tx_skb). The URB may complete on another CPU in softirq context before usb_submit_urb() returns in the submitter, so by the time the submitter reads skb->len the skb has already been queued to the per-CPU completion_queue and freed by net_tx_action(): CPU A (xmit) CPU B (USB completion softirq) ------------ ------------------------------ dev->tx_skb = skb; usb_submit_urb() --+ \|-------> write_bulk_callback() \| dev_kfree_skb_irq(dev->tx_skb) \| net_tx_action() \| napi_skb_cache_put() <-- free netdev->stats.tx_bytes \| += skb->len; <-- UAF read Fix it by caching skb->len before submitting the URB and using the cached value when updating the tx_bytes counter. The pre-existing tx_bytes semantics are preserved: the counter tracks the original frame length (skb->len), not the ETH_ZLEN/USB-alignment padded "count" value that is handed to the device. Changing that would be a user-visible accounting change and is out of scope for this UAF fix.
CVE-2026-52945	1 Linux	1 Linux Kernel	2026-06-28	7.5 High
In the Linux kernel, the following vulnerability has been resolved: Revert "wireguard: device: enable threaded NAPI" This reverts commit 933466fc50a8e4eb167acbd0d8ec96a078462e9c which is commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream. We have had three independent production user reports in combination with Cilium utilizing WireGuard as encryption underneath that k8s Pod E/W traffic to certain peer nodes fully stalled. The situation appears as follows: - Occurs very rarely but at random times under heavy networking load. - Once the issue triggers the decryption side stops working completely for that WireGuard peer, other peers keep working fine. The stall happens also for newly initiated connections towards that particular WireGuard peer. - Only the decryption side is affected, never the encryption side. - Once it triggers, it never recovers and remains in this state, the CPU/mem on that node looks normal, no leak, busy loop or crash. - bpftrace on the affected system shows that wg_prev_queue_enqueue fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's rx_queue is reached. - Also, bpftrace shows that wg_packet_rx_poll for that peer is never called again after reaching this state for that peer. For other peers wg_packet_rx_poll does get called normally. - Commit db9ae3b ("wireguard: device: enable threaded NAPI") switched WireGuard to threaded NAPI by default. The default has not been changed for triggering the issue, neither did CPU hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()")). - The issue has been observed with stable kernels of v5.15 as well as v6.1. It was reported to us that v5.10 stable is working fine, and no report on v6.6 stable either (somewhat related discussion in [0] though). - In the WireGuard driver the only material difference between v5.10 stable and v5.15 stable is the switch to threaded NAPI by default. [0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/ Breakdown of the problem: 1) skbs arriving for decryption are enqueued to the peer->rx_queue in wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer. 2) The latter only moves the skb into the MPSC peer queue if it does not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an atomic counter via wg_prev_queue_enqueue. 3) In case enqueueing was successful, the skb is also queued up in the device queue, round-robin picks a next online CPU, and schedules the decryption worker. 4) The wg_packet_decrypt_worker, once scheduled, picks these up from the queue, decrypts the packets and once done calls into wg_queue_enqueue_per_peer_rx. 5) The latter updates the state to PACKET_STATE_CRYPTED on success and calls napi_schedule on the per peer->napi instance. 6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks on the peer->rx_queue. It will wg_prev_queue_dequeue if the queue->peeked skb was not cached yet, or just return the latter otherwise. (wg_prev_queue_drop_peeked later clears the cache.) 7) From an ordering perspective, the peer->rx_queue has skbs in order while the device queue with the per-CPU worker threads from a global ordering PoV can finish the decryption and signal the skb PACKET_STATE_CRYPTED out of order. 8) A situation can be observed that the first packet coming in will be stuck waiting for the decryption worker to be scheduled for a longer time when the system is under pressure. 9) While this is the case, the other CPUs in the meantime finish decryption and call into napi_schedule. 10) Now in wg_packet_rx_poll it picks up the first in-order skb from the peer->rx_queue and sees that its state is still PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits e ---truncated---
CVE-2026-4878	2 Libcap Project, Redhat	18 Libcap, Ai Inference Server, Cost Management and 15 more	2026-06-27	6.7 Medium
A flaw was found in libcap. A local unprivileged user can exploit a Time-of-check-to-time-of-use (TOCTOU) race condition in the `cap_set_file()` function. This allows an attacker with write access to a parent directory to redirect file capability updates to an attacker-controlled file. By doing so, capabilities can be injected into or stripped from unintended executables, leading to privilege escalation.
CVE-2026-53293	1 Linux	1 Linux Kernel	2026-06-27	5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG There were multiple issues in that code. First of all the order between the reset semaphore and the mm_lock was wrong (e.g. copy_to_user) was called while holding the lock. Then we allocated memory while holding the reset semaphore which is also a pretty big bug and can deadlock. Then we used down_read_trylock() instead of waiting for the reset to finish. (cherry picked from commit 361b6e6b303d4b691f6c5974d3eaab67ca6dd90e)
CVE-2026-53056	1 Linux	1 Linux Kernel	2026-06-27	N/A
In the Linux kernel, the following vulnerability has been resolved: drm/msm/dpu: fix mismatch between power and frequency During DPU runtime suspend, calling dev_pm_opp_set_rate(dev, 0) drops the MMCX rail to MIN_SVS while the core clock frequency remains at its original (highest) rate. When runtime resume re-enables the clock, this may result in a mismatch between the rail voltage and the clock rate. For example, in the DPU bind path, the sequence could be: cpu0: dev_sync_state -> rpmhpd_sync_state cpu1: dpu_kms_hw_init timeline 0 ------------------------------------------------> t After rpmhpd_sync_state, the voltage performance is no longer guaranteed to stay at the highest level. During dpu_kms_hw_init, calling dev_pm_opp_set_rate(dev, 0) drops the voltage, causing the MMCX rail to fall to MIN_SVS while the core clock is still at its maximum frequency. When the power is re-enabled, only the clock is enabled, leading to a situation where the MMCX rail is at MIN_SVS but the core clock is at its highest rate. In this state, the rail cannot sustain the clock rate, which may cause instability or system crash. Remove the call to dev_pm_opp_set_rate(dev, 0) from dpu_runtime_suspend to ensure the correct vote is restored when DPU resumes. Patchwork: https://patchwork.freedesktop.org/patch/710077/
CVE-2026-52979	1 Linux	1 Linux Kernel	2026-06-27	N/A
In the Linux kernel, the following vulnerability has been resolved: net: psp: check for device unregister when creating assoc psp_assoc_device_get_locked() obtains a psp_dev reference via psp_dev_get_for_sock() (which uses psp_dev_tryget() under RCU); it then acquires psd->lock and drops the reference. Before the lock is taken, psp_dev_unregister() can run to completion: take psd->lock, clear out state, unlock, drop the registration reference. The expectation is that the lock prevents device unregistration, but much like with netdevs special care has to be taken when "upgrading" a reference to a locked device. Add the missing check if device is still alive. psp_dev_is_registered() exists already but had no callers, which makes me wonder if I either forgot to add this or lost the check during refactoring...
CVE-2026-53051	1 Linux	1 Linux Kernel	2026-06-27	5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on When PERST# is deasserted twice (assert -> deassert -> assert -> deassert), a CBB (Control Backbone) timeout occurs at DBI register offset 0x8bc (PCIE_MISC_CONTROL_1_OFF). This happens because pci_epc_deinit_notify() and dw_pcie_ep_cleanup() are called before reset_control_deassert() powers on the controller core. The call chain that causes the timeout: pex_ep_event_pex_rst_deassert() pci_epc_deinit_notify() pci_epf_test_epc_deinit() pci_epf_test_clear_bar() pci_epc_clear_bar() dw_pcie_ep_clear_bar() __dw_pcie_ep_reset_bar() dw_pcie_dbi_ro_wr_en() <- Accesses 0x8bc DBI register reset_control_deassert(pcie->core_rst) <- Core powered on HERE The DBI registers, including PCIE_MISC_CONTROL_1_OFF (0x8bc), are only accessible after the controller core is powered on via reset_control_deassert(pcie->core_rst). Accessing them before this point results in a CBB timeout because the hardware is not yet operational. Fix this by moving pci_epc_deinit_notify() and dw_pcie_ep_cleanup() to after reset_control_deassert(pcie->core_rst), ensuring the controller is fully powered on before any DBI register accesses occur.
CVE-2026-53109	1 Linux	1 Linux Kernel	2026-06-27	5.5 Medium
In the Linux kernel, the following vulnerability has been resolved: powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy powerpc uses pt_frag_refcount as a reference counter for tracking it's pte and pmd page table fragments. For PTE table, in case of Hash with 64K pagesize, we have 16 fragments of 4K size in one 64K page. Patch series [1] "mm: free retracted page table by RCU" added pte_free_defer() to defer the freeing of PTE tables when retract_page_tables() is called for madvise MADV_COLLAPSE on shmem range. [1]: https://lore.kernel.org/all/[email protected]/ pte_free_defer() sets the active flag on the corresponding fragment's folio & calls pte_fragment_free(), which reduces the pt_frag_refcount. When pt_frag_refcount reaches 0 (no active fragment using the folio), it checks if the folio active flag is set, if set, it calls call_rcu to free the folio, it the active flag is unset then it calls pte_free_now(). Now, this can lead to following problem in a corner case... [ 265.351553][ T183] BUG: Bad page state in process a.out pfn:20d62 [ 265.353555][ T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62 [ 265.355457][ T183] flags: 0x3ffff800000100(active\|node=0\|zone=0\|lastcpupid=0x7ffff) [ 265.358719][ T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000 [ 265.360177][ T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000 [ 265.361438][ T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 265.362572][ T183] Modules linked in: [ 265.364622][ T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY [ 265.364785][ T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries [ 265.364908][ T183] Call Trace: [ 265.364955][ T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable) [ 265.365202][ T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8 [ 265.365384][ T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08 [ 265.365554][ T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310 [ 265.365729][ T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218 [ 265.365912][ T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820 [ 265.366080][ T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300 [ 265.366244][ T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508 [ 265.366421][ T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148 [ 265.366602][ T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178 [ 265.366780][ T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0 [ 265.366958][ T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec The bad page state error occurs when such a folio gets freed (with active flag set), from do_exit() path in parallel. ... this can happen when the pte fragment was allocated from this folio, but when all the fragments get freed, the pte_frag_refcount still had some unused fragments. Now, if this process exits, with such folio as it's cached pte_frag in mm->context, then during pte_frag_destroy(), we simply call pagetable_dtor() and pagetable_free(), meaning it doesn't clear the active flag. This, can lead to the above bug. Since we are anyway in do_exit() path, then if the refcount is 0, then I guess it should be ok to simply clear the folio active flag before calling pagetable_dtor() & pagetable_free().

« first previous Page 2 of 203. next last »