summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-08-31Merge tag 'v5.8.5' into dev-5.8dev-5.8Joel Stanley879-4198/+8003
This is the 5.8.5 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2020-08-27Linux 5.8.5v5.8.5Greg Kroah-Hartman1-1/+1
Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27binfmt_flat: revert "binfmt_flat: don't offset the data start"Max Filippov1-8/+12
commit 2217b982624680d19a80ebb4600d05c8586c4f96 upstream. binfmt_flat loader uses the gap between text and data to store data segment pointers for the libraries. Even in the absence of shared libraries it stores at least one pointer to the executable's own data segment. Text and data can go back to back in the flat binary image and without offsetting data segment last few instructions in the text segment may get corrupted by the data segment pointer. Fix it by reverting commit a2357223c50a ("binfmt_flat: don't offset the data start"). Cc: stable@vger.kernel.org Fixes: a2357223c50a ("binfmt_flat: don't offset the data start") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27io_uring: fix missing ->mm on exitPavel Begunkov1-1/+2
Upstream commits: 8eb06d7e8dd85 ("io_uring: fix missing ->mm on exit") cbcf72148da4a ("io_uring: return locked and pinned page accounting") do_exit() first drops current->mm and then runs task_work, from where io_sq_thread_acquire_mm() would try to set mm for a user dying process. [ 208.004249] WARNING: CPU: 2 PID: 1854 at kernel/kthread.c:1238 kthread_use_mm+0x244/0x270 [ 208.004287] kthread_use_mm+0x244/0x270 [ 208.004288] io_sq_thread_acquire_mm.part.0+0x54/0x80 [ 208.004290] io_async_task_func+0x258/0x2ac [ 208.004291] task_work_run+0xc8/0x210 [ 208.004294] do_exit+0x1b8/0x430 [ 208.004295] do_group_exit+0x44/0xac [ 208.004296] get_signal+0x164/0x69c [ 208.004298] do_signal+0x94/0x1d0 [ 208.004299] do_notify_resume+0x18c/0x340 [ 208.004300] work_pending+0x8/0x3d4 Reported-by: Roman Gershman <romange@gmail.com> Tested-by: Roman Gershman <romange@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27netlink: fix state reallocation in policy exportJohannes Berg1-0/+3
[ Upstream commit d1fb55592909ea249af70170c7a52e637009564d ] Evidently, when I did this previously, we didn't have more than 10 policies and didn't run into the reallocation path, because it's missing a memset() for the unused policies. Fix that. Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27ethtool: Don't omit the netlink reply if no features were changedMaxim Mikityanskiy1-7/+4
[ Upstream commit f01204ec8be7ea5e8f0230a7d4200e338d563bde ] The legacy ethtool userspace tool shows an error when no features could be changed. It's useful to have a netlink reply to be able to show this error when __netdev_update_features wasn't called, for example: 1. ethtool -k eth0 large-receive-offload: off 2. ethtool -K eth0 rx-fcs on 3. ethtool -K eth0 lro on Could not change any device features rx-lro: off [requested on] 4. ethtool -K eth0 lro on # The output should be the same, but without this patch the kernel # doesn't send the reply, and ethtool is unable to detect the error. This commit makes ethtool-netlink always return a reply when requested, and it still avoids unnecessary calls to __netdev_update_features if the wanted features haven't changed. Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27ethtool: Account for hw_features in netlink interfaceMaxim Mikityanskiy1-1/+2
[ Upstream commit 2847bfed888fbb8bf4c8e8067fd6127538c2c700 ] ethtool-netlink ignores dev->hw_features and may confuse the drivers by asking them to enable features not in the hw_features bitmask. For example: 1. ethtool -k eth0 tls-hw-tx-offload: off [fixed] 2. ethtool -K eth0 tls-hw-tx-offload on tls-hw-tx-offload: on 3. ethtool -k eth0 tls-hw-tx-offload: on [fixed] Fitler out dev->hw_features from req_wanted to fix it and to resemble the legacy ethtool behavior. Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27ethtool: Fix preserving of wanted feature bits in netlink interfaceMaxim Mikityanskiy1-4/+7
[ Upstream commit 840110a4eae190dcbb9907d68216d5d1d9f25839 ] Currently, ethtool-netlink calculates new wanted bits as: (req_wanted & req_mask) | (old_active & ~req_mask) It completely discards the old wanted bits, so they are forgotten with the next ethtool command. Sample steps to reproduce: 1. ethtool -k eth0 tx-tcp-segmentation: on # TSO is on from the beginning 2. ethtool -K eth0 tx off tx-tcp-segmentation: off [not requested] 3. ethtool -k eth0 tx-tcp-segmentation: off [requested on] 4. ethtool -K eth0 rx off # Some change unrelated to TSO 5. ethtool -k eth0 tx-tcp-segmentation: off # "Wanted on" is forgotten This commit fixes it by changing the formula to: (req_wanted & req_mask) | (old_wanted & ~req_mask), where old_active was replaced by old_wanted to account for the wanted bits. The shortcut condition for the case where nothing was changed now compares wanted bitmasks, instead of wanted to active. Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net: ena: Make missed_tx stat incrementalShay Agroskin1-1/+4
[ Upstream commit ccd143e5150f24b9ba15145c7221b61dd9e41021 ] Most statistics in ena driver are incremented, meaning that a stat's value is a sum of all increases done to it since driver/queue initialization. This patch makes all statistics this way, effectively making missed_tx statistic incremental. Also added a comment regarding rx_drops and tx_drops to make it clearer how these counters are calculated. Fixes: 11095fdb712b ("net: ena: add statistics for missed tx packets") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27tipc: fix uninit skb->data in tipc_nl_compat_dumpit()Cong Wang1-1/+11
[ Upstream commit 47733f9daf4fe4f7e0eb9e273f21ad3a19130487 ] __tipc_nl_compat_dumpit() has two callers, and it expects them to pass a valid nlmsghdr via arg->data. This header is artificial and crafted just for __tipc_nl_compat_dumpit(). tipc_nl_compat_publ_dump() does so by putting a genlmsghdr as well as some nested attribute, TIPC_NLA_SOCK. But the other caller tipc_nl_compat_dumpit() does not, this leaves arg->data uninitialized on this call path. Fix this by just adding a similar nlmsghdr without any payload in tipc_nl_compat_dumpit(). This bug exists since day 1, but the recent commit 6ea67769ff33 ("net: tipc: prepare attrs in __tipc_nl_compat_dumpit()") makes it easier to appear. Reported-and-tested-by: syzbot+0e7181deafa7e0b79923@syzkaller.appspotmail.com Fixes: d0796d1ef63d ("tipc: convert legacy nl bearer dump to nl compat") Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27tipc: call rcu_read_lock() in tipc_aead_encrypt_done()Xin Long1-0/+2
[ Upstream commit f6db9096416209474090d64d8284e7c16c3d8873 ] b->media->send_msg() requires rcu_read_lock(), as we can see elsewhere in tipc, tipc_bearer_xmit, tipc_bearer_xmit_skb and tipc_bearer_bc_xmit(). Syzbot has reported this issue as: net/tipc/bearer.c:466 suspicious rcu_dereference_check() usage! Workqueue: cryptd cryptd_queue_worker Call Trace: tipc_l2_send_msg+0x354/0x420 net/tipc/bearer.c:466 tipc_aead_encrypt_done+0x204/0x3a0 net/tipc/crypto.c:761 cryptd_aead_crypt+0xe8/0x1d0 crypto/cryptd.c:739 cryptd_queue_worker+0x118/0x1b0 crypto/cryptd.c:181 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 So fix it by calling rcu_read_lock() in tipc_aead_encrypt_done() for b->media->send_msg(). Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: syzbot+47bbc6b678d317cccbe0@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net/smc: Prevent kernel-infoleak in __smc_diag_dump()Peilin Ye1-7/+9
[ Upstream commit ce51f63e63c52a4e1eee4dd040fb0ba0af3b43ab ] __smc_diag_dump() is potentially copying uninitialized kernel stack memory into socket buffers, since the compiler may leave a 4-byte hole near the beginning of `struct smcd_diag_dmbinfo`. Fix it by initializing `dinfo` with memset(). Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net: sctp: Fix negotiation of the number of data streams.David Laight1-2/+4
[ Upstream commit ab921f3cdbec01c68705a7ade8bec628d541fc2b ] The number of output and input streams was never being reduced, eg when processing received INIT or INIT_ACK chunks. The effect is that DATA chunks can be sent with invalid stream ids and then discarded by the remote system. Fixes: 2075e50caf5ea ("sctp: convert to genradix") Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flowAlaa Hleihel1-1/+1
[ Upstream commit eda814b97dfb8d9f4808eb2f65af9bd3705c4cae ] tcf_ct_handle_fragments() shouldn't free the skb when ip_defrag() call fails. Otherwise, we will cause a double-free bug. In such cases, just return the error to the caller. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net: qrtr: fix usage of idr in port assignment to socketNecip Fazil Yildiran1-9/+11
[ Upstream commit 8dfddfb79653df7c38a9c8c4c034f242a36acee9 ] Passing large uint32 sockaddr_qrtr.port numbers for port allocation triggers a warning within idr_alloc() since the port number is cast to int, and thus interpreted as a negative number. This leads to the rejection of such valid port numbers in qrtr_port_assign() as idr_alloc() fails. To avoid the problem, switch to idr_alloc_u32() instead. Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-by: syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com Signed-off-by: Necip Fazil Yildiran <necip@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net: nexthop: don't allow empty NHA_GROUPNikolay Aleksandrov1-1/+4
[ Upstream commit eeaac3634ee0e3f35548be35275efeca888e9b23 ] Currently the nexthop code will use an empty NHA_GROUP attribute, but it requires at least 1 entry in order to function properly. Otherwise we end up derefencing null or random pointers all over the place due to not having any nh_grp_entry members allocated, nexthop code relies on having at least the first member present. Empty NHA_GROUP doesn't make any sense so just disallow it. Also add a WARN_ON for any future users of nexthop_create_group(). BUG: kernel NULL pointer dereference, address: 0000000000000080 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:fib_check_nexthop+0x4a/0xaa Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85 RSP: 0018:ffff88807983ba00 EFLAGS: 00010213 RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000 RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80 RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001 FS: 00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0 Call Trace: fib_create_info+0x64d/0xaf7 fib_table_insert+0xf6/0x581 ? __vma_adjust+0x3b6/0x4d4 inet_rtm_newroute+0x56/0x70 rtnetlink_rcv_msg+0x1e3/0x20d ? rtnl_calcit.isra.0+0xb8/0xb8 netlink_rcv_skb+0x5b/0xac netlink_unicast+0xfa/0x17b netlink_sendmsg+0x334/0x353 sock_sendmsg_nosec+0xf/0x3f ____sys_sendmsg+0x1a0/0x1fc ? copy_msghdr_from_user+0x4c/0x61 ___sys_sendmsg+0x63/0x84 ? handle_mm_fault+0xa39/0x11b5 ? sockfd_lookup_light+0x72/0x9a __sys_sendmsg+0x50/0x6e do_syscall_64+0x54/0xbe entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f10dacc0bb7 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48 RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7 RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003 RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008 R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440 Modules linked in: CR2: 0000000000000080 CC: David Ahern <dsahern@gmail.com> Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27net: Fix potential wrong skb->protocol in skb_vlan_untag()Miaohe Lin1-2/+2
[ Upstream commit 55eff0eb7460c3d50716ed9eccf22257b046ca92 ] We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: 0d5501c1c828 ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-27gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPYMark Tomlinson1-1/+9
[ Upstream commit 272502fcb7cda01ab07fc2fcff82d1d2f73d43cc ] When receiving an IPv4 packet inside an IPv6 GRE packet, and the IP6_TNL_F_RCV_DSCP_COPY flag is set on the tunnel, the IPv4 header would get corrupted. This is due to the common ip6_tnl_rcv() function assuming that the inner header is always IPv6. This patch checks the tunnel protocol for IPv4 inner packets, but still defaults to IPv6. Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26Linux 5.8.4v5.8.4Greg Kroah-Hartman1-1/+1
Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26Revert "drm/amd/display: Improve DisplayPort monitor interop"Alex Deucher3-23/+8
This reverts commit 1adb2ff1f6b170cdbc3925a359c8f39d2215dc20. This breaks display wake up in stable kernels (5.7.x and 5.8.x). Note that there is no upstream equivalent to this revert. This patch was targeted for stable by Sasha's stable patch process. Presumably there are some other changes necessary for this patch to work properly on stable kernels. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1266 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.7.x, 5.8.x Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not setWill Deacon1-4/+13
commit b5331379bc62611d1026173a09c73573384201d9 upstream. When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: <stable@vger.kernel.org> Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-3-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon10-10/+17
commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream. The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26do_epoll_ctl(): clean the failure exits up a bitAl Viro1-13/+6
commit 52c479697c9b73f628140dcdfcd39ea302d05482 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi/libstub: Handle unterminated cmdlineArvind Sankar1-2/+4
commit 8a8a3237a78cbc0557f0eb16a89f16d616323e99 upstream. Make the command line parsing more robust, by handling the case it is not NUL-terminated. Use strnlen instead of strlen, and make sure that the temporary copy is NUL-terminated before parsing. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200813185811.554051-4-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi/libstub: Handle NULL cmdlineArvind Sankar1-1/+5
commit a37ca6a2af9df2972372b918f09390c9303acfbd upstream. Treat a NULL cmdline the same as empty. Although this is unlikely to happen in practice, the x86 kernel entry does check for NULL cmdline and handles it, so do it here as well. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200729193300.598448-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi/libstub: Stop parsing arguments at "--"Arvind Sankar1-0/+2
commit 1fd9717d75df68e3c3509b8e7b1138ca63472f88 upstream. Arguments after "--" are arguments for init, not for the kernel. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200725155916.1376773-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi: add missed destroy_workqueue when efisubsys_init failsLi Heng1-0/+2
commit 98086df8b70c06234a8f4290c46064e44dafa0ed upstream. destroy_workqueue() should be called to destroy efi_rts_wq when efisubsys_init() init resources fails. Cc: <stable@vger.kernel.org> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Link: https://lore.kernel.org/r/1595229738-10087-1-git-send-email-liheng40@huawei.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26efi/x86: Mark kernel rodata non-executable for mixed modeArvind Sankar1-0/+2
commit c8502eb2d43b6b9b1dc382299a4d37031be63876 upstream. When remapping the kernel rodata section RO in the EFI pagetables, the protection flags that were used for the text section are being reused, but the rodata section should not be marked executable. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200717194526.3452089-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26EDAC/{i7core,sb,pnd2,skx}: Fix error event severityTony Luck4-7/+7
commit 45bc6098a3e279d8e391d22428396687562797e2 upstream. IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto the stack as part of machine check delivery is valid or not. Various drivers copied a code fragment that uses the RIPV bit to determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where RIPV is set as "FATAL"). Reverse the tests so that the error is marked fatal when RIPV is not set. Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc/pseries: Do not initiate shutdown when system is running on UPSVasant Hegde1-1/+0
commit 90a9b102eddf6a3f987d15f4454e26a2532c1c98 upstream. As per PAPR we have to look for both EPOW sensor value and event modifier to identify the type of event and take appropriate action. In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes": SYSTEM_SHUTDOWN 3 The system must be shut down. An EPOW-aware OS logs the EPOW error log information, then schedules the system to be shut down to begin after an OS defined delay internal (default is 10 minutes.) Then in section 10.3.2.2.8 there is table 146 "Platform Event Log Format, Version 6, EPOW Section", which includes the "EPOW Event Modifier": For EPOW sensor value = 3 0x01 = Normal system shutdown with no additional delay 0x02 = Loss of utility power, system is running on UPS/Battery 0x03 = Loss of system critical functions, system should be shutdown 0x04 = Ambient temperature too high All other values = reserved We have a user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown after predefined time. It also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Commit 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of the "on UPS/Battery" case, to immediately shutdown the system. This breaks existing setups that rely on the userspace tool to delay shutdown and let the system run on the UPS. Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Massage change log and add PAPR references] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 coresMichael Neuling1-0/+1
commit 030a2c689fb46e1690f7ded8b194bab7678efb28 upstream. On POWER10 bit 12 in the PVR indicates if the core is SMT4 or SMT8. Bit 12 is set for SMT4. Without this patch, /proc/cpuinfo on a SMT4 DD1 POWER10 looks like this: cpu : POWER10, altivec supported revision : 17.0 (pvr 0080 1100) Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode") Cc: stable@vger.kernel.org # v5.8 Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200803035600.1820371-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26epoll: Keep a reference on files added to the check listMarc Zyngier1-2/+9
commit a9ed4a6560b8562b7e2e2bed9527e88001f7b682 upstream. When adding a new fd to an epoll, and that this new fd is an epoll fd itself, we recursively scan the fds attached to it to detect cycles, and add non-epool files to a "check list" that gets subsequently parsed. However, this check list isn't completely safe when deletions can happen concurrently. To sidestep the issue, make sure that a struct file placed on the check list sees its f_count increased, ensuring that a concurrent deletion won't result in the file disapearing from under our feet. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26net: dsa: b53: check for timeoutTom Rix1-0/+2
[ Upstream commit 774d977abfd024e6f73484544b9abe5a5cd62de7 ] clang static analysis reports this problem b53_common.c:1583:13: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage ent.port &= ~BIT(port); ~~~~~~~~ ^ ent is set by a successful call to b53_arl_read(). Unsuccessful calls are caught by an switch statement handling specific returns. b32_arl_read() calls b53_arl_op_wait() which fails with the unhandled -ETIMEDOUT. So add -ETIMEDOUT to the switch statement. Because b53_arl_op_wait() already prints out a message, do not add another one. Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations") Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()Haiyang Zhang1-1/+1
[ Upstream commit c3d897e01aef8ddc43149e4d661b86f823e3aae7 ] netvsc_vf_xmit() / dev_queue_xmit() will call VF NIC’s ndo_select_queue or netdev_pick_tx() again. They will use skb_get_rx_queue() to get the queue number, so the “skb->queue_mapping - 1” will be used. This may cause the last queue of VF not been used. Use skb_record_rx_queue() here, so that the skb_get_rx_queue() called later will get the correct queue number, and VF will be able to use all queues. Fixes: b3bf5666a510 ("hv_netvsc: defer queue selection to VF") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26selftests/bpf: Remove test_align leftoversVeronika Kabatova2-2/+1
[ Upstream commit 5597432dde62befd3ab92e6ef9e073564e277ea8 ] Calling generic selftests "make install" fails as rsync expects all files from TEST_GEN_PROGS to be present. The binary is not generated anymore (commit 3b09d27cc93d) so we can safely remove it from there and also from gitignore. Fixes: 3b09d27cc93d ("selftests/bpf: Move test_align under test_progs") Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20200819160710.1345956-1-vkabatov@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26net: gemini: Fix missing free_netdev() in error path of ↵Wang Hai1-3/+1
gemini_ethernet_port_probe() [ Upstream commit cf96d977381d4a23957bade2ddf1c420b74a26b6 ] Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way, when probe fails, netdev can be freed automatically. Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26net: ena: Change WARN_ON expression in ena_del_napi_in_range()Shay Agroskin1-7/+4
[ Upstream commit 8b147f6f3e7de4e51113e3e9ec44aa2debc02c58 ] The ena_del_napi_in_range() function unregisters the napi handler for rings in a given range. This function had the following WARN_ON macro: WARN_ON(ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring); This macro prints the call stack if the expression inside of it is true [1], but the expression inside of it is the wanted situation. The expression checks whether the ring has an XDP queue and its index corresponds to a XDP one. This patch changes the expression to !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring which indicates an unwanted situation. Also, change the structure of the function. The napi handler is unregistered for all rings, and so there's no need to check whether the index is an XDP index or not. By removing this check the code becomes much more readable. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26net: ena: Prevent reset after device destructionShay Agroskin1-9/+10
[ Upstream commit 63d4a4c145cca2e84dc6e62d2ef5cb990c9723c2 ] The reset work is scheduled by the timer routine whenever it detects that a device reset is required (e.g. when a keep_alive signal is missing). When releasing device resources in ena_destroy_device() the driver cancels the scheduling of the timer routine without destroying the reset work explicitly. This creates the following bug: The driver is suspended and the ena_suspend() function is called -> This function calls ena_destroy_device() to free the net device resources -> The driver waits for the timer routine to finish its execution and then cancels it, thus preventing from it to be called again. If, in its final execution, the timer routine schedules a reset, the reset routine might be called afterwards,and a redundant call to ena_restore_device() would be made. By changing the reset routine we allow it to read the device's state accurately. This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set before resetting the device and making both the destruction function and the flag check are under rtnl lock. The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction routine. Also surround the flag check with 'likely' because we expect that the reset routine would be called only when ENA_FLAG_TRIGGER_RESET flag is set. The destruction of the timer and reset services in __ena_shutoff() have to stay, even though the timer routine is destroyed in ena_destroy_device(). This is to avoid a case in which the reset routine is scheduled after free_netdev() in __ena_shutoff(), which would create an access to freed memory in adapter->flags. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26bonding: fix active-backup failover for current ARP slaveJiri Wiesner1-2/+16
[ Upstream commit 0410d07190961ac526f05085765a8d04d926545b ] When the ARP monitor is used for link detection, ARP replies are validated for all slaves (arp_validate=3) and fail_over_mac is set to active, two slaves of an active-backup bond may get stuck in a state where both of them are active and pass packets that they receive to the bond. This state makes IPv6 duplicate address detection fail. The state is reached thus: 1. The current active slave goes down because the ARP target is not reachable. 2. The current ARP slave is chosen and made active. 3. A new slave is enslaved. This new slave becomes the current active slave and can reach the ARP target. As a result, the current ARP slave stays active after the enslave action has finished and the log is littered with "PROBE BAD" messages: > bond0: PROBE: c_arp ens10 && cas ens11 BAD The workaround is to remove the slave with "going back" status from the bond and re-enslave it. This issue was encountered when DPDK PMD interfaces were being enslaved to an active-backup bond. I would be possible to fix the issue in bond_enslave() or bond_change_active_slave() but the ARP monitor was fixed instead to keep most of the actions changing the current ARP slave in the ARP monitor code. The current ARP slave is set as inactive and backup during the commit phase. A new state, BOND_LINK_FAIL, has been introduced for slaves in the context of the ARP monitor. This allows administrators to see how slaves are rotated for sending ARP requests and attempts are made to find a new active slave. Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU deathMichael Roth1-6/+12
[ Upstream commit 801980f6497946048709b9b09771a1729551d705 ] For a power9 KVM guest with XIVE enabled, running a test loop where we hotplug 384 vcpus and then unplug them, the following traces can be seen (generally within a few loops) either from the unplugged vcpu: cpu 65 (hwid 65) Ready to die... Querying DEAD? cpu 66 (66) shows 2 list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ... CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1 NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000 ... NIP __list_del_entry_valid+0xac/0x100 LR __list_del_entry_valid+0xa8/0x100 Call Trace: __list_del_entry_valid+0xa8/0x100 (unreliable) free_pcppages_bulk+0x1f8/0x940 free_unref_page+0xd0/0x100 xive_spapr_cleanup_queue+0x148/0x1b0 xive_teardown_cpu+0x1bc/0x240 pseries_mach_cpu_die+0x78/0x2f0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x2f4/0x4c0 cpu_startup_entry+0x38/0x40 start_secondary+0x7bc/0x8f0 start_secondary_prolog+0x10/0x14 or on the worker thread handling the unplug: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 BUG: Bad page state in process kworker/u768:3 pfn:95de1 cpu 314 (hwid 314) Ready to die... page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0x5ffffc00000000() raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ... CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 Workqueue: pseries hotplug workque pseries_hp_work_fn Call Trace: dump_stack+0xb0/0xf4 (unreliable) bad_page+0x12c/0x1b0 free_pcppages_bulk+0x5bc/0x940 page_alloc_cpu_dead+0x118/0x120 cpuhp_invoke_callback.constprop.5+0xb8/0x760 _cpu_down+0x188/0x340 cpu_down+0x5c/0xa0 cpu_subsys_offline+0x24/0x40 device_offline+0xf0/0x130 dlpar_offline_cpu+0x1c4/0x2a0 dlpar_cpu_remove+0xb8/0x190 dlpar_cpu_remove_by_index+0x12c/0x150 dlpar_cpu+0x94/0x800 pseries_hp_work_fn+0x128/0x1e0 process_one_work+0x304/0x5d0 worker_thread+0xcc/0x7a0 kthread+0x1ac/0x1c0 ret_from_kernel_thread+0x5c/0x80 The latter trace is due to the following sequence: page_alloc_cpu_dead drain_pages drain_pages_zone free_pcppages_bulk where drain_pages() in this case is called under the assumption that the unplugged cpu is no longer executing. To ensure that is the case, and early call is made to __cpu_die()->pseries_cpu_die(), which runs a loop that waits for the cpu to reach a halted state by polling its status via query-cpu-stopped-state RTAS calls. It only polls for 25 iterations before giving up, however, and in the trace above this results in the following being printed only .1 seconds after the hotplug worker thread begins processing the unplug request: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 At that point the worker thread assumes the unplugged CPU is in some unknown/dead state and procedes with the cleanup, causing the race with the XIVE cleanup code executed by the unplugged CPU. Fix this by waiting indefinitely, but also making an effort to avoid spurious lockup messages by allowing for rescheduling after polling the CPU status and printing a warning if we wait for longer than 120s. Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> [mpe: Trim oopses in change log slightly for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26bpf: Use get_file_rcu() instead of get_file() for task_file iteratorYonghong Song1-1/+2
[ Upstream commit cf28f3bbfca097d956f9021cb710dfad56adcc62 ] With latest `bpftool prog` command, we observed the following kernel panic. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0 Oops: 0010 [#1] SMP CPU: 9 PID: 6023 ... RIP: 0010:0x0 Code: Bad RIP value. RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286 RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400 RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600 R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710 R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009 FS: 00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> rcu_core+0x1a4/0x440 __do_softirq+0xd3/0x2c8 irq_exit+0x9d/0xa0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0033:0x47ce80 Code: Bad RIP value. RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8 RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400 R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380 After further analysis, the bug is triggered by Commit eaaacd23910f ("bpf: Add task and task/file iterator targets") which introduced task_file bpf iterator, which traverses all open file descriptors for all tasks in the current namespace. The latest `bpftool prog` calls a task_file bpf program to traverse all files in the system in order to associate processes with progs/maps, etc. When traversing files for a given task, rcu read_lock is taken to access all files in a file_struct. But it used get_file() to grab a file, which is not right. It is possible file->f_count is 0 and get_file() will unconditionally increase it. Later put_file() may cause all kind of issues with the above as one of sympotoms. The failure can be reproduced with the following steps in a few seconds: $ cat t.c #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #define N 10000 int fd[N]; int main() { int i; for (i = 0; i < N; i++) { fd[i] = open("./note.txt", 'r'); if (fd[i] < 0) { fprintf(stderr, "failed\n"); return -1; } } for (i = 0; i < N; i++) close(fd[i]); return 0; } $ gcc -O2 t.c $ cat run.sh #/bin/bash for i in {1..100} do while true; do ./a.out; done & done $ ./run.sh $ while true; do bpftool prog >& /dev/null; done This patch used get_file_rcu() which only grabs a file if the file->f_count is not zero. This is to ensure the file pointer is always valid. The above reproducer did not fail for more than 30 minutes. Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets") Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/bpf/20200817174214.252601-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26powerpc/fixmap: Fix the size of the early debug areaChristophe Leroy1-1/+1
[ Upstream commit fdc6edbb31fba76fd25d7bd016b675a92908d81e ] Commit ("03fd42d458fb powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") reworked the setup of the early debug area and mistakenly replaced 128 * 1024 by SZ_128. Change to SZ_128K to restore the original 128 kbytes size of the area. Fixes: 03fd42d458fb ("powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/996184974d674ff984643778cf1cdd7fe58cc065.1597644194.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26ARM64: vdso32: Install vdso32 from vdso_installStephen Boyd2-1/+2
[ Upstream commit 8d75785a814241587802655cc33e384230744f0c ] Add the 32-bit vdso Makefile to the vdso_install rule so that 'make vdso_install' installs the 32-bit compat vdso when it is compiled. Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26afs: Fix NULL deref in afs_dynroot_depopulate()David Howells1-9/+11
[ Upstream commit 5e0b17b026eb7c6de9baa9b0d45a51b05f05abe1 ] If an error occurs during the construction of an afs superblock, it's possible that an error occurs after a superblock is created, but before we've created the root dentry. If the superblock has a dynamic root (ie. what's normally mounted on /afs), the afs_kill_super() will call afs_dynroot_depopulate() to unpin any created dentries - but this will oops if the root hasn't been created yet. Fix this by skipping that bit of code if there is no root dentry. This leads to an oops looking like: general protection fault, ... KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f] ... RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385 ... Call Trace: afs_kill_super+0x13b/0x180 fs/afs/super.c:535 deactivate_locked_super+0x94/0x160 fs/super.c:335 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x2070 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 which is oopsing on this line: inode_lock(root->d_inode); presumably because sb->s_root was NULL. Fixes: 0da0b7fd73e4 ("afs: Display manually added cells in dynamic root mount") Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26kconfig: qconf: remove qInfo() to get back Qt4 supportMasahiro Yamada1-2/+0
[ Upstream commit 53efe2e76ca2bfad7f35e0b5330e2ccd44a643e3 ] qconf is supposed to work with Qt4 and Qt5, but since commit c4f7398bee9c ("kconfig: qconf: make debug links work again"), building with Qt4 fails as follows: HOSTCXX scripts/kconfig/qconf.o scripts/kconfig/qconf.cc: In member function ‘void ConfigInfoView::clicked(const QUrl&)’: scripts/kconfig/qconf.cc:1241:3: error: ‘qInfo’ was not declared in this scope; did you mean ‘setInfo’? 1241 | qInfo() << "Clicked link is empty"; | ^~~~~ | setInfo scripts/kconfig/qconf.cc:1254:3: error: ‘qInfo’ was not declared in this scope; did you mean ‘setInfo’? 1254 | qInfo() << "Clicked symbol is invalid:" << data; | ^~~~~ | setInfo make[1]: *** [scripts/Makefile.host:129: scripts/kconfig/qconf.o] Error 1 make: *** [Makefile:606: xconfig] Error 2 qInfo() does not exist in Qt4. In my understanding, these call-sites should be unreachable. Perhaps, qWarning(), assertion, or something is better, but qInfo() is not the right one to use here, I think. Fixes: c4f7398bee9c ("kconfig: qconf: make debug links work again") Reported-by: Ronald Warsow <rwarsow@gmx.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26afs: Fix key ref leak in afs_put_operation()David Howells1-0/+1
[ Upstream commit ba8e42077bbe046a09bdb965dbfbf8c27594fe8f ] The afs_put_operation() function needs to put the reference to the key that's authenticating the operation. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26Revert "RDMA/hns: Reserve one sge in order to avoid local length error"Weihang Li5-14/+8
[ Upstream commit 6da06c6291f38be4df6df2efb76ba925096d2691 ] This patch caused some issues on SEND operation, and it should be reverted to make the drivers work correctly. There will be a better solution that has been tested carefully to solve the original problem. This reverts commit 711195e57d341e58133d92cf8aaab1db24e4768d. Fixes: 711195e57d34 ("RDMA/hns: Reserve one sge in order to avoid local length error") Link: https://lore.kernel.org/r/1597829984-20223-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26RDMA/bnxt_re: Do not add user qps to flushlistSelvin Xavier1-1/+2
[ Upstream commit a812f2d60a9fb7818f9c81f967180317b52545c0 ] Driver shall add only the kernel qps to the flush list for clean up. During async error events from the HW, driver is adding qps to this list without checking if the qp is kernel qp or not. Add a check to avoid user qp addition to the flush list. Fixes: 942c9b6ca8de ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing") Fixes: c50866e2853a ("bnxt_re: fix the regression due to changes in alloc_pbl") Link: https://lore.kernel.org/r/1596689148-4023-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26Fix build error when CONFIG_ACPI is not set/enabled:Randy Dunlap1-0/+1
[ Upstream commit ee87e1557c42dc9c2da11c38e11b87c311569853 ] ../arch/x86/pci/xen.c: In function ‘pci_xen_init’: ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set(); Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: linux-pci@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26efi: avoid error message when booting under XenJuergen Gross1-1/+1
[ Upstream commit 6163a985e50cb19d5bdf73f98e45b8af91a77658 ] efifb_probe() will issue an error message in case the kernel is booted as Xen dom0 from UEFI as EFI_MEMMAP won't be set in this case. Avoid that message by calling efi_mem_desc_lookup() only if EFI_MEMMAP is set. Fixes: 38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>