summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-11-01erofs: avoid duplicated permission check for "trusted." xattrsGao Xiang1-2/+0
commit d578b46db69d125a654f509bdc9091d84e924dc8 upstream. Don't recheck it since xattr_permission() already checks CAP_SYS_ADMIN capability. Just follow 5d3ce4f70172 ("f2fs: avoid duplicated permission check for "trusted." xattrs") Reported-by: Hongyu Jin <hongyu.jin@unisoc.com> [ Gao Xiang: since it could cause some complex Android overlay permission issue as well on android-5.4+, it'd be better to backport to 5.4+ rather than pure cleanup on mainline. ] Cc: <stable@vger.kernel.org> # 5.4+ Link: https://lore.kernel.org/r/20200811070020.6339-1-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01net: protect tcf_block_unbind with block lockLeon Romanovsky1-2/+2
[ Upstream commit d6535dca28859d8d9ef80894eb287b2ac35a32e8 ] The tcf_block_unbind() expects that the caller will take block->cb_lock before calling it, however the code took RTNL lock and dropped cb_lock instead. This causes to the following kernel panic. WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420 Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw] CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G W 5.9.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:tcf_block_unbind+0x2db/0x420 Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff <0f> 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd RSP: 0018:ffff888117d17968 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878 RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87 R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000 R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0 FS: 00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0 Call Trace: tc_block_indr_cleanup+0x3e0/0x5a0 ? tcf_block_unbind+0x420/0x420 ? __mutex_unlock_slowpath+0xe7/0x610 flow_indr_dev_unregister+0x5e2/0x930 ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core] ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core] ? flow_indr_block_cb_alloc+0x3c0/0x3c0 ? mlx5_db_free+0x37c/0x4b0 [mlx5_core] mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core] mlx5e_detach_netdev+0xe5/0x120 [mlx5_core] mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core] esw_offloads_disable+0x227/0x2b0 [mlx5_core] mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core] mlx5_eswitch_disable+0x94/0xf0 [mlx5_core] mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core] mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core] sriov_numvfs_store+0x261/0x2f0 ? sriov_drivers_autoprobe_store+0x110/0x110 ? sysfs_file_ops+0x170/0x170 ? sysfs_file_ops+0x117/0x170 ? sysfs_file_ops+0x170/0x170 kernfs_fop_write+0x1ff/0x3f0 ? rcu_read_lock_any_held+0x6e/0x90 vfs_write+0x1f3/0x620 ksys_write+0xf9/0x1d0 ? __x64_sys_read+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x273/0x3f0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 <...> ---[ end trace bfdd028ada702879 ]--- Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20201026123327.1141066-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01tipc: fix memory leak caused by tipc_buf_append()Tung Nguyen1-3/+2
[ Upstream commit ceb1eb2fb609c88363e06618b8d4bbf7815a4e03 ] Commit ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") replaced skb_unshare() with skb_copy() to not reduce the data reference counter of the original skb intentionally. This is not the correct way to handle the cloned skb because it causes memory leak in 2 following cases: 1/ Sending multicast messages via broadcast link The original skb list is cloned to the local skb list for local destination. After that, the data reference counter of each skb in the original list has the value of 2. This causes each skb not to be freed after receiving ACK: tipc_link_advance_transmq() { ... /* release skb */ __skb_unlink(skb, &l->transmq); kfree_skb(skb); <-- memory exists after being freed } 2/ Sending multicast messages via replicast link Similar to the above case, each skb cannot be freed after purging the skb list: tipc_mcast_xmit() { ... __skb_queue_purge(pkts); <-- memory exists after being freed } This commit fixes this issue by using skb_unshare() instead. Besides, to avoid use-after-free error reported by KASAN, the pointer to the fragment is set to NULL before calling skb_unshare() to make sure that the original skb is not freed after freeing the fragment 2 times in case skb_unshare() returns NULL. Fixes: ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()") Acked-by: Jon Maloy <jmaloy@redhat.com> Reported-by: Thang Hoang Ngo <thang.h.ngo@dektech.com.au> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01tcp: Prevent low rmem stalls with SO_RCVLOWAT.Arjun Roy2-1/+4
[ Upstream commit 435ccfa894e35e3d4a1799e6ac030e48a7b69ef5 ] With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where: 1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets. In this case, we advertise 0 rwnd (due to #3) but the application does not drain the receive queue (no wakeup because of #1 and #2) so the flow stalls. Modify the heuristic for SO_RCVLOWAT so that, if we are advertising rwnd<=rcv_mss, force a wakeup to prevent a stall. Without this patch, setting tcp_rmem to 6143 and disabling TCP autotune causes a stalled flow. With this patch, no stall occurs. This is with RPC-style traffic with large messages. Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01ravb: Fix bit fields checking in ravb_hwtstamp_get()Andrew Gabbasov1-3/+7
[ Upstream commit 68b9f0865b1ef545da180c57d54b82c94cb464a4 ] In the function ravb_hwtstamp_get() in ravb_main.c with the existing values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL (0x6) if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT) config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT; else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL) config.rx_filter = HWTSTAMP_FILTER_ALL; if the test on RAVB_RXTSTAMP_TYPE_ALL should be true, it will never be reached. This issue can be verified with 'hwtstamp_config' testing program (tools/testing/selftests/net/hwtstamp_config.c). Setting filter type to ALL and subsequent retrieving it gives incorrect value: $ hwtstamp_config eth0 OFF ALL flags = 0 tx_type = OFF rx_filter = ALL $ hwtstamp_config eth0 flags = 0 tx_type = OFF rx_filter = PTP_V2_L2_EVENT Correct this by converting if-else's to switch. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Reported-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01r8169: fix issue with forced threading in combination with shared interruptsHeiner Kallweit1-2/+2
[ Upstream commit 2734a24e6e5d18522fbf599135c59b82ec9b2c9e ] As reported by Serge flag IRQF_NO_THREAD causes an error if the interrupt is actually shared and the other driver(s) don't have this flag set. This situation can occur if a PCI(e) legacy interrupt is used in combination with forced threading. There's no good way to deal with this properly, therefore we have to remove flag IRQF_NO_THREAD. For fixing the original forced threading issue switch to napi_schedule(). Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading") Link: https://www.spinics.net/lists/netdev/msg694960.html Reported-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01net/sched: act_mpls: Add softdep on mpls_gso.koGuillaume Nault1-0/+1
TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso packets. Such packets will thus require mpls_gso.ko for segmentation. v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and David). Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.1603708007.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01net: ipa: command payloads already mappedAlex Elder1-6/+15
[ Upstream commit df833050cced27e1b343cc8bc41f90191b289334 ] IPA transactions describe actions to be performed by the IPA hardware. Three cases use IPA transactions: transmitting a socket buffer; providing a page to receive packet data; and issuing an IPA immediate command. An IPA transaction contains a scatter/gather list (SGL) to hold the set of actions to be performed. We map buffers in the SGL for DMA at the time they are added to the transaction. For skb TX transactions, we fill the SGL with a call to skb_to_sgvec(). Page RX transactions involve a single page pointer, and that is recorded in the SGL with sg_set_page(). In both of these cases we then map the SGL for DMA with a call to dma_map_sg(). Immediate commands are different. The payload for an immediate command comes from a region of coherent DMA memory, which must *not* be mapped for DMA. For that reason, gsi_trans_cmd_add() sort of hand-crafts each SGL entry added to a command transaction. This patch fixes a problem with the code that crafts the SGL entry for an immediate command. Previously a portion of the SGL entry was updated using sg_set_buf(). However this is not valid because it includes a call to virt_to_page() on the buffer, but the command buffer pointer is not a linear address. Since we never actually map the SGL for command transactions, there are very few fields in the SGL we need to fill. Specifically, we only need to record the DMA address and the length, so they can be used by __gsi_trans_commit() to fill a TRE. We additionally need to preserve the SGL flags so for_each_sg() still works. For that we can simply assign a null page pointer for command SGL entries. Fixes: 9dd441e4ed575 ("soc: qcom: ipa: GSI transactions") Reported-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01net: hns3: Clear the CMDQ registers before unmapping BAR regionZenghui Yu1-1/+1
[ Upstream commit e3364c5ff3ff975b943a7bf47e21a2a4bf20f3fe ] When unbinding the hns3 driver with the HNS3 VF, I got the following kernel panic: [ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000 [ 265.717928] Mem abort info: [ 265.720740] ESR = 0x96000047 [ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits [ 265.729126] SET = 0, FnV = 0 [ 265.732195] EA = 0, S1PTW = 0 [ 265.735351] Data abort info: [ 265.738227] ISV = 0, ISS = 0x00000047 [ 265.742071] CM = 0, WnR = 1 [ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000 [ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000 [ 265.764314] Internal error: Oops: 96000047 [#1] SMP [ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206 [ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019 [ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) [ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300 [ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300 [ 265.861992] sp : ffff80004c983b50 [ 265.881411] pmr_save: 000000e0 [ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500 [ 265.889744] x27: 0000000000000000 x26: 0000000000000000 [ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90 [ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150 [ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90 [ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004 [ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8 [ 265.916201] x15: 0000000000000028 x14: 0000000000000002 [ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8 [ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000 [ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000 [ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff [ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001 [ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100 [ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000 [ 265.964491] Call trace: [ 265.964494] hclgevf_cmd_uninit+0xbc/0x300 [ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8 [ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130 [ 265.964516] hns3_remove+0x34/0x88 [hns3] [ 266.009683] pci_device_remove+0x48/0xf0 [ 266.009692] device_release_driver_internal+0x114/0x1e8 [ 266.030058] device_driver_detach+0x28/0x38 [ 266.034224] unbind_store+0xd4/0x108 [ 266.037784] drv_attr_store+0x40/0x58 [ 266.041435] sysfs_kf_write+0x54/0x80 [ 266.045081] kernfs_fop_write+0x12c/0x250 [ 266.049076] vfs_write+0xc4/0x248 [ 266.052378] ksys_write+0x74/0xf8 [ 266.055677] __arm64_sys_write+0x24/0x30 [ 266.059584] el0_svc_common.constprop.3+0x84/0x270 [ 266.064354] do_el0_svc+0x34/0xa0 [ 266.067658] el0_svc+0x38/0x40 [ 266.070700] el0_sync_handler+0x8c/0xb0 [ 266.074519] el0_sync+0x140/0x180 It looks like the BAR memory region had already been unmapped before we start clearing CMDQ registers in it, which is pretty bad and the kernel happily kills itself because of a Current EL Data Abort (on arm64). Moving the CMDQ uninitialization a bit early fixes the issue for me. Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01netem: fix zero division in tabledistAleksandr Nogikh1-1/+8
[ Upstream commit eadd1befdd778a1eca57fad058782bd22b4db804 ] Currently it is possible to craft a special netlink RTM_NEWQDISC command that can result in jitter being equal to 0x80000000. It is enough to set the 32 bit jitter to 0x02000000 (it will later be multiplied by 2^6) or just set the 64 bit jitter via TCA_NETEM_JITTER64. This causes an overflow during the generation of uniformly distributed numbers in tabledist(), which in turn leads to division by zero (sigma != 0, but sigma * 2 is 0). The related fragment of code needs 32-bit division - see commit 9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to 64 bit is not an option. Fix the issue by keeping the value of jitter within the range that can be adequately handled by tabledist() - [0;INT_MAX]. As negative std deviation makes no sense, take the absolute value of the passed value and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit arithmetic in order to prevent overflows. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01mlxsw: core: Fix memory leak on module removalIdo Schimmel1-0/+2
[ Upstream commit adc80b6cfedff6dad8b93d46a5ea2775fd5af9ec ] Free the devlink instance during the teardown sequence in the non-reload case to avoid the following memory leak. unreferenced object 0xffff888232895000 (size 2048): comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2.... backtrace: [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0 [<00000000ee30129d>] devlink_alloc+0xff/0x760 [<0000000092ab3e5d>] 0xffffffffa042e5b0 [<000000004f3f8a31>] 0xffffffffa042f6ad [<0000000092800b4b>] 0xffffffffa0491df3 [<00000000c4843903>] local_pci_probe+0xcb/0x170 [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0 [<00000000a8e0de75>] really_probe+0x2c5/0xf90 [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340 [<00000000bcc95e05>] device_driver_attach+0x294/0x300 [<000000000e2bc177>] __driver_attach+0x167/0x2f0 [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0 [<000000003cd5a91e>] driver_attach+0x45/0x60 [<000000000041ce51>] bus_add_driver+0x3b8/0x720 [<00000000f5215476>] driver_register+0x230/0x4e0 [<00000000d79356f5>] __pci_register_driver+0x190/0x200 Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01ibmvnic: fix ibmvnic_set_macLijun Pan1-2/+6
[ Upstream commit 8fc3672a8ad3e782bac80e979bc2a2c10960cbe9 ] Jakub Kicinski brought up a concern in ibmvnic_set_mac(). ibmvnic_set_mac() does this: ether_addr_copy(adapter->mac_addr, addr->sa_data); if (adapter->state != VNIC_PROBED) rc = __ibmvnic_set_mac(netdev, addr->sa_data); So if state == VNIC_PROBED, the user can assign an invalid address to adapter->mac_addr, and ibmvnic_set_mac() will still return 0. The fix is to validate ethernet address at the beginning of ibmvnic_set_mac(), and move the ether_addr_copy to the case of "adapter->state != VNIC_PROBED". Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01ibmveth: Fix use of ibmveth in a bridge.Thomas Bogendoerfer1-6/+0
[ Upstream commit 2ac8af0967aaa2b67cb382727e784900d2f4d0da ] The check for src mac address in ibmveth_is_packet_unsupported is wrong. Commit 6f2275433a2f wanted to shut down messages for loopback packets, but now suppresses bridged frames, which are accepted by the hypervisor otherwise bridging won't work at all. Fixes: 6f2275433a2f ("ibmveth: Detect unsupported packets before sending to the hypervisor") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01gtp: fix an use-before-init in gtp_newlink()Masahiro Fujiwara1-8/+8
[ Upstream commit 51467431200b91682b89d31317e35dcbca1469ce ] *_pdp_find() from gtp_encap_recv() would trigger a crash when a peer sends GTP packets while creating new GTP device. RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp] <SNIP> Call Trace: <IRQ> gtp_encap_recv+0xc2/0x2e0 [gtp] ? gtp1_pdp_find.isra.0+0x90/0x90 [gtp] udp_queue_rcv_one_skb+0x1fe/0x530 udp_queue_rcv_skb+0x40/0x1b0 udp_unicast_rcv_skb.isra.0+0x78/0x90 __udp4_lib_rcv+0x5af/0xc70 udp_rcv+0x1a/0x20 ip_protocol_deliver_rcu+0xc5/0x1b0 ip_local_deliver_finish+0x48/0x50 ip_local_deliver+0xe5/0xf0 ? ip_protocol_deliver_rcu+0x1b0/0x1b0 gtp_encap_enable() should be called after gtp_hastable_new() otherwise *_pdp_find() will access the uninitialized hash table. Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional") Signed-off-by: Masahiro Fujiwara <fujiwara.masahiro@gmail.com> Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01cxgb4: set up filter action after rewritesRaju Rangoju2-29/+31
[ Upstream commit 937d8420588421eaa5c7aa5c79b26b42abb288ef ] The current code sets up the filter action field before rewrites are set up. When the action 'switch' is used with rewrites, this may result in initial few packets that get switched out don't have rewrites applied on them. So, make sure filter action is set up along with rewrites or only after everything else is set up for rewrites. Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01chelsio/chtls: fix tls record info to userVinay Kumar Yadav1-2/+5
[ Upstream commit 4f3391ce8f5a69e7e6d66d0a3fc654eb6dbdc919 ] chtls_pt_recvmsg() receives a skb with tls header and subsequent skb with data, need to finalize the data copy whenever next skb with tls header is available. but here current tls header is overwritten by next available tls header, ends up corrupting user buffer data. fixing it by finalizing current record whenever next skb contains tls header. v1->v2: - Improved commit message. Fixes: 17a7d24aa89d ("crypto: chtls - generic handling of data and hdr") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01chelsio/chtls: fix memory leaks in CPL handlersVinay Kumar Yadav1-15/+12
[ Upstream commit 6daa1da4e262b0cd52ef0acc1989ff22b5540264 ] CPL handler functions chtls_pass_open_rpl() and chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE so that caller function will do skb free to avoid leak. Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01chelsio/chtls: fix deadlock issueVinay Kumar Yadav1-2/+0
[ Upstream commit 28e9dcd9172028263c8225c15c4e329e08475e89 ] In chtls_pass_establish() we hold child socket lock using bh_lock_sock and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock. Remove bh_lock_sock in add_to_reap_list() as lock is already held. Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.Vasundhara Volam1-1/+2
[ Upstream commit 825741b071722f1c8ad692cead562c4b5f5eaa93 ] In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently. However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset. Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01bnxt_en: Re-write PCI BARs after PCI fatal error.Vasundhara Volam2-1/+19
[ Upstream commit f75d9a0aa96721d20011cd5f8c7a24eb32728589 ] When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained. pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally. Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.Vasundhara Volam1-11/+2
[ Upstream commit 631ce27a3006fc0b732bfd589c6df505f62eadd9 ] As part of the commit b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well. As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call. Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one(). Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().Vasundhara Volam1-4/+5
[ Upstream commit 21d6a11e2cadfb8446265a3efff0e2aad206e15e ] A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed. Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue. Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01bnxt_en: Check abort error state in bnxt_open_nic().Michael Chan1-1/+4
[ Upstream commit a1301f08c5acf992d9c1fafddc84c3a822844b04 ] bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state. Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this: [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000 Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01efivarfs: Replace invalid slashes with exclamation marks in dentries.Michael Schaller1-0/+3
commit 336af6a4686d885a067ecea8c3c3dd129ba4fc75 upstream. Without this patch efivarfs_alloc_dentry creates dentries with slashes in their name if the respective EFI variable has slashes in its name. This in turn causes EIO on getdents64, which prevents a complete directory listing of /sys/firmware/efi/efivars/. This patch replaces the invalid shlashes with exclamation marks like kobject_set_name_vargs does for /sys/firmware/efi/vars/ to have consistently named dentries under /sys/firmware/efi/vars/ and /sys/firmware/efi/efivars/. Signed-off-by: Michael Schaller <misch@google.com> Link: https://lore.kernel.org/r/20200925074502.150448-1-misch@google.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01x86/copy_mc: Introduce copy_mc_enhanced_fast_string()Dan Williams3-9/+60
commit 5da8e4a658109e3b7e1f45ae672b7c06ac3e7158 upstream. The motivations to go rework memcpy_mcsafe() are that the benefit of doing slow and careful copies is obviated on newer CPUs, and that the current opt-in list of CPUs to instrument recovery is broken relative to those CPUs. There is no need to keep an opt-in list up to date on an ongoing basis if pmem/dax operations are instrumented for recovery by default. With recovery enabled by default the old "mcsafe_key" opt-in to careful copying can be made a "fragile" opt-out. Where the "fragile" list takes steps to not consume poison across cachelines. The discussion with Linus made clear that the current "_mcsafe" suffix was imprecise to a fault. The operations that are needed by pmem/dax are to copy from a source address that might throw #MC to a destination that may write-fault, if it is a user page. So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate the separate precautions taken on source and destination. copy_mc_to_kernel() is introduced as a non-SMAP version that does not expect write-faults on the destination, but is still prepared to abort with an error code upon taking #MC. The original copy_mc_fragile() implementation had negative performance implications since it did not use the fast-string instruction sequence to perform copies. For this reason copy_mc_to_kernel() fell back to plain memcpy() to preserve performance on platforms that did not indicate the capability to recover from machine check exceptions. However, that capability detection was not architectural and now that some platforms can recover from fast-string consumption of memory errors the memcpy() fallback now causes these more capable platforms to fail. Introduce copy_mc_enhanced_fast_string() as the fast default implementation of copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'. With this in place, copy_mc_to_kernel() is fast and recovery-ready by default regardless of hardware capability. Thanks to Vivek for identifying that copy_user_generic() is not suitable as the copy_mc_to_user() backend since the #MC handler explicitly checks ex_has_fault_handler(). Thanks to the 0day robot for catching a performance bug in the x86/copy_mc_to_user implementation. [ bp: Add the "why" for this change from the 0/2th message, massage. ] Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()") Reported-by: Erwin Tsaur <erwin.tsaur@intel.com> Reported-by: 0day robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Erwin Tsaur <erwin.tsaur@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams37-530/+674
commit ec6347bb43395cb92126788a1a5b25302543f815 upstream. In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabledRandy Dunlap1-0/+1
commit 035fff1f7aab43e420e0098f0854470a5286fb83 upstream. Fix build error when CONFIG_ACPI is not set/enabled by adding the header file <asm/acpi.h> which contains a stub for the function in the build error. ../arch/x86/pci/intel_mid_pci.c: In function ‘intel_mid_pci_init’: ../arch/x86/pci/intel_mid_pci.c:303:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set(); Fixes: a912a7584ec3 ("x86/platform/intel-mid: Move PCI initialization to arch_init()") Link: https://lore.kernel.org/r/ea903917-e51b-4cc9-2680-bc1e36efa026@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Jesse Barnes <jsbarnes@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v4.16+ Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01arm64: link with -z norelro regardless of CONFIG_RELOCATABLENick Desaulniers1-2/+2
commit 3b92fa7485eba16b05166fddf38ab42f2ff6ab95 upstream. With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n, CONFIG_RELOCATABLE=n, we observe the following failure when trying to link the kernel image with LD=ld.lld: error: section: .exit.data is not contiguous with other relro sections ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This was previously fixed, but only for CONFIG_RELOCATABLE=y. Fixes: 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201016175339.2429280-1-ndesaulniers@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUsMarc Zyngier1-0/+7
commit 39533e12063be7f55e3d6ae21ffe067799d542a4 upstream. Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification") changed the way we deal with per-CPU errata by only calling the .matches() callback until one CPU is found to be affected. At this point, .matches() stop being called, and .cpu_enable() will be called on all CPUs. This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be mitigated. In order to address this, forcefully call the .matches() callback from a .cpu_enable() callback, which brings us back to the original behaviour. Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification") Cc: <stable@vger.kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUsMarc Zyngier1-0/+8
commit 18fce56134c987e5b4eceddafdbe4b00c07e2ae1 upstream. Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most of the enabling code to the .matches() callback. This has the unfortunate effect that the workaround gets only enabled on the first affected CPU, and no other. In order to address this, forcefully call the .matches() callback from a .cpu_enable() callback, which brings us back to the original behaviour. Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") Cc: <stable@vger.kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enumKees Cook2-2/+1
commit 06e67b849ab910a49a629445f43edb074153d0eb upstream. The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It should not be distinguished separately from just "FIRMWARE", as this confuses the LSMs about what is being loaded. Additionally, there was no actual validation of the firmware contents happening. Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firmware_request_platform()") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Scott Branden <scott.branden@broadcom.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201002173828.2099543-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failureArd Biesheuvel2-6/+6
commit d32de9130f6c79533508e2c7879f18997bfbe2a0 upstream. Currently, on arm64, we abort on any failure from efi_get_random_bytes() other than EFI_NOT_FOUND when it comes to setting the physical seed for KASLR, but ignore such failures when obtaining the seed for virtual KASLR or for early seeding of the kernel's entropy pool via the config table. This is inconsistent, and may lead to unexpected boot failures. So let's permit any failure for the physical seed, and simply report the error code if it does not equal EFI_NOT_FOUND. Cc: <stable@vger.kernel.org> # v5.8+ Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01scripts/setlocalversion: make git describe output more reliableRasmus Villemoes1-5/+16
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream. When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: Convert advanced XArray uses to the normal APIMatthew Wilcox (Oracle)1-12/+2
commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream. There are no bugs here that I've spotted, it's just easier to use the normal API and there are no performance advantages to using the more verbose advanced API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: Fix XArray usage in io_uring_add_task_fileMatthew Wilcox (Oracle)1-12/+9
commit 236434c3438c4da3dfbd6aeeab807577b85e951a upstream. The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't allocate memory using GFP_NOWAIT, it would leak the reference to the file descriptor. Also the node pointed to by the xas could be freed between the call to xas_load() under the rcu_read_lock() and the acquisition of the xa_lock. It's easier to just use the normal xa_load/xa_store interface here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [axboe: fix missing assign after alloc, cur_uring -> tctx rename] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: Fix use of XArray in __io_uring_files_cancelMatthew Wilcox (Oracle)1-14/+5
commit ce765372bc443573d1d339a2bf4995de385dea3a upstream. We have to drop the lock during each iteration, so there's no advantage to using the advanced API. Convert this to a standard xa_for_each() loop. Reported-by: syzbot+27c12725d8ff0bfe1a13@syzkaller.appspotmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: no need to call xa_destroy() on empty xarrayJens Axboe1-1/+0
commit ca6484cd308a671811bf39f3119e81966eb476e3 upstream. The kernel test robot reports this lockdep issue: [child1:659] mbind (274) returned ENOSYS, marking as inactive. [child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive. [main] 10175 iterations. [F:7781 S:2344 HI:2397] [ 24.610601] [ 24.610743] ================================ [ 24.611083] WARNING: inconsistent lock state [ 24.611437] 5.9.0-rc7-00017-g0f2122045b9462 #5 Not tainted [ 24.611861] -------------------------------- [ 24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes: [ 24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1 [ 24.613642] {SOFTIRQ-ON-W} state was registered at: [ 24.614024] lock_acquire+0x20c/0x29b [ 24.614341] _raw_spin_lock+0x21/0x30 [ 24.614636] io_uring_add_task_file+0xe8/0x13a [ 24.614987] io_uring_create+0x535/0x6bd [ 24.615297] io_uring_setup+0x11d/0x136 [ 24.615606] __ia32_sys_io_uring_setup+0xd/0xf [ 24.615977] do_int80_syscall_32+0x53/0x6c [ 24.616306] restore_all_switch_stack+0x0/0xb1 [ 24.616677] irq event stamp: 939881 [ 24.616968] hardirqs last enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145 [ 24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e [ 24.618321] softirqs last enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a [ 24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61 [ 24.619521] [ 24.619521] other info that might help us debug this: [ 24.620028] Possible unsafe locking scenario: [ 24.620028] [ 24.620492] CPU0 [ 24.620685] ---- [ 24.620894] lock(&xa->xa_lock#4); [ 24.621168] <Interrupt> [ 24.621381] lock(&xa->xa_lock#4); [ 24.621695] [ 24.621695] *** DEADLOCK *** [ 24.621695] [ 24.622154] 1 lock held by ksoftirqd/0/7: [ 24.622468] #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155 [ 24.623106] [ 24.623106] stack backtrace: [ 24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f2122045b9462 #5 [ 24.624090] Call Trace: [ 24.624284] ? show_stack+0x40/0x46 [ 24.624551] dump_stack+0x1b/0x1d [ 24.624809] print_usage_bug+0x17a/0x185 [ 24.625142] mark_lock+0x11d/0x1db [ 24.625474] ? print_shortest_lock_dependencies+0x121/0x121 [ 24.625905] __lock_acquire+0x41e/0x7bf [ 24.626206] lock_acquire+0x20c/0x29b [ 24.626517] ? xa_destroy+0x43/0xc1 [ 24.626810] ? lock_acquire+0x20c/0x29b [ 24.627110] _raw_spin_lock_irqsave+0x3e/0x4e [ 24.627450] ? xa_destroy+0x43/0xc1 [ 24.627725] xa_destroy+0x43/0xc1 [ 24.627989] __io_uring_free+0x57/0x71 [ 24.628286] ? get_pid+0x22/0x22 [ 24.628544] __put_task_struct+0xf2/0x163 [ 24.628865] put_task_struct+0x1f/0x2a [ 24.629161] delayed_put_task_struct+0xe2/0xe9 [ 24.629509] rcu_process_callbacks+0x128/0x155 [ 24.629860] __do_softirq+0x1a3/0x45a [ 24.630151] run_ksoftirqd+0x35/0x61 [ 24.630443] smpboot_thread_fn+0x304/0x31a [ 24.630763] kthread+0x124/0x139 [ 24.631016] ? sort_range+0x18/0x18 [ 24.631290] ? kthread_create_worker_on_cpu+0x17/0x17 [ 24.631682] ret_from_fork+0x1c/0x28 which is complaining about xa_destroy() grabbing the xa lock in an IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt safe. This is really an xarray issue, since it should not assume the lock type. But for our use case, since we know the xarray is empty at this point, there's no need to actually call xa_destroy(). So just get rid of it. Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io-wq: fix use-after-free in io_wq_worker_runningHillf Danton1-58/+58
commit c4068bf898ddaef791049a366828d9b84b467bda upstream. The smart syzbot has found a reproducer for the following issue: ================================================================== BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline] BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_write include/linux/instrumented.h:71 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_wqe_inc_running fs/io-wq.c:301 [inline] io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 schedule_timeout+0x148/0x250 kernel/time/timer.c:1879 io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 7768: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594 kmalloc_node include/linux/slab.h:572 [inline] kzalloc_node include/linux/slab.h:677 [inline] io_wq_create+0x57b/0xa10 fs/io-wq.c:1064 io_init_wq_offload fs/io_uring.c:7432 [inline] io_sq_offload_start fs/io_uring.c:7504 [inline] io_uring_create fs/io_uring.c:8625 [inline] io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 21: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kfree+0x10e/0x2b0 mm/slab.c:3756 __io_wq_destroy fs/io-wq.c:1138 [inline] io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146 io_finish_async fs/io_uring.c:6836 [inline] io_ring_ctx_free fs/io_uring.c:7870 [inline] io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at ffff8882183db000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 140 bytes inside of 1024-byte region [ffff8882183db000, ffff8882183db400) The buggy address belongs to the page: page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db flags: 0x57ffe0000000200(slab) raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== which is down to the comment below, /* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) complete(&wqe->wq->done); because there might be multiple cases of wqe in a wq and we would wait for every worker in every wqe to go home before releasing wq's resources on destroying. To that end, rework wq's refcount by making it independent of the tracking of workers because after all they are two different things, and keeping it balanced when workers come and go. Note the manager kthread, like other workers, now holds a grab to wq during its lifetime. Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker and do nothing for exiting wq. Cc: stable@vger.kernel.org # v5.5+ Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_wq: Make io_wqe::lock a raw_spinlock_tSebastian Andrzej Siewior1-26/+26
commit 95da84659226d75698a1ab958be0af21d9cc2a9c upstream. During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards). The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out. While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same. In order to keep the spinlock_t it would have to be moved after the tsk_is_pi_blocked() check which would introduce a branch instruction into the hot path. The lock is used to maintain the `work_list' and wakes one task up at most. Should io_wqe_cancel_pending_work() cause latency spikes, while searching for a specific item, then it would need to drop the lock during iterations. revert_creds() is also invoked under the lock. According to debug cred::non_rcu is 0. Otherwise it should be moved outside of the locked section because put_cred_rcu()->free_uid() acquires a sleeping lock. Convert io_wqe::lock to a raw_spinlock_t.c Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: reference ->nsproxy for file table commandsJens Axboe3-0/+8
commit 9b8284921513fc1ea57d87777283a59b05862f03 upstream. If we don't get and assign the namespace for the async work, then certain paths just don't work properly (like /dev/stdin, /proc/mounts, etc). Anything that references the current namespace of the given task should be assigned for async work on behalf of that task. Cc: stable@vger.kernel.org # v5.5+ Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: don't rely on weak ->files referencesJens Axboe7-36/+340
commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream. Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: enable task/files specific overflow flushingJens Axboe1-15/+26
commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream. This allows us to selectively flush out pending overflows, depending on the task and/or files_struct being passed in. No intended functional changes in this patch. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: return cancelation status from poll/timeout/files handlersJens Axboe1-6/+24
commit 76e1b6427fd8246376a97e3227049d49188dfb9c upstream. Return whether we found and canceled requests or not. This is in preparation for using this information, no functional changes in this patch. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: unconditionally grab req->taskJens Axboe1-23/+3
commit e3bc8e9dad7f2f83cc807111d4472164c9210153 upstream. Sometimes we assign a weak reference to it, sometimes we grab a reference to it. Clean this up and make it unconditional, and drop the flag related to tracking this state. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: stash ctx task reference for SQPOLLJens Axboe1-10/+29
commit 2aede0e417db846793c276c7a1bbf7262c8349b0 upstream. We can grab a reference to the task instead of stashing away the task files_struct. This is doable without creating a circular reference between the ring fd and the task itself. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: move dropping of files into separate helperJens Axboe1-10/+16
commit f573d384456b3025d3f8e58b3eafaeeb0f510784 upstream. No functional changes in this patch, prep patch for grabbing references to the files_struct. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: allow timeout/poll/files killing to take task into accountJens Axboe1-8/+22
commit f3606e3a92ddd36299642c78592fc87609abb1f6 upstream. We currently cancel these when the ring exits, and we cancel all of them. This is in preparation for killing only the ones associated with a given task. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01io_uring: don't run task work on an exiting taskJens Axboe1-0/+8
commit 6200b0ae4ea28a4bfd8eb434e33e6201b7a6a282 upstream. This isn't safe, and isn't needed either. We are guaranteed that any work we queue is on a live task (and will be run), or it goes to our backup io-wq threads if the task is exiting. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01netfilter: nftables_offload: KASAN slab-out-of-bounds Read in ↵Saeed Mirzamohammadi3-5/+11
nft_flow_rule_create commit 31cc578ae2de19c748af06d859019dced68e325d upstream. This patch fixes the issue due to: BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244 The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds. This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue. Add nft_expr_more() and use it to fix this problem. Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29Linux 5.8.17v5.8.17Greg Kroah-Hartman1-1/+1
Tested-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20201027135522.655719020@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>