summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-06-10i3c: device: Add SETMWL/GETMWL and SETMRL/GETMRL supportOleksandr Shulzhenko1-0/+5
Add the GETMRL/SETMRL and GETMWL/SETMWL CCC commands functions. The functions are needed to configure maximum read/write length for the specific I3C device. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-09soc: aspeed: mctp: Add IOCTL to set own EIDIwona Winiarska1-1/+7
The driver doesn't implement MCTP logic, which means that it requires its clients to prepare valid MCTP over PCIe packets. Because we support MCTP stacks in both kernel and userspace, and since MCTP protocol is a part of userspace, we need userspace to update own EID information in kernel to provide it for in-kernel aspeed-mctp clients. Previously, own EID information was updated by userspace using SET_EID_INFO which now is deprecated and replaced by SET_EID_EXT_INFO to pass CPU EID and Domain ID mappings. To provide own EID information for in-kernel clients, add a dedicated IOCTL. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-06-09soc: aspeed: mctp: Mark GET_EID_INFO/SET_EID_INFO as deprecatedIwona Winiarska1-6/+8
ASPEED_MCTP_GET_EID_INFO and ASPEED_MCTP_SET_EID_INFO shouldn't be used in future projects - they are replaced by EXT_INFO variants. Update documentation for aspeed-mctp UAPI header to mark them as deprecated. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-06-01i3c: mctp: Add PECI-related functionality to I3C MCTP driverOleksandr Shulzhenko1-0/+12
- Handler for PECI messages have been added; - i3c_mctp_client and platform_device for PECI have been added. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-01i3c: mctp: Extend MCTP o. I3C driver with the TX/RX functionsOleksandr Shulzhenko1-0/+38
Extend the driver with the functionality necessary to send and receive MCTP over I3C packets by another component in kernel: - i3c_mctp_client concept needed to implement packet queue so that the driver can manage and classify packets for several clients respectively; - the function that provides MCTP target EID; - the functions to alloc and free mctp_i3c_packet; - the functions to send and receive MCTP packets over I3C. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-05-23peci: Extend peci client with Domain ID informationIwona Winiarska2-3/+5
Since PECI Target addressing has been changed to use Domain ID information, each peci_client instance should correspond with both CPU ID and Domain ID. Let's extend peci_client structure, sysfs API and intel-peci-client to use Domain ID. Make MFD device ID as a combination of peci adapter number, CPU ID and Domain ID. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-20i3c: master: Enable support for PECZbigniew Lukwinski2-0/+6
PEC (Packet Error Check) support enabled in I3C controller driver. API for I3C device driver to control (enable or disable) PEC support in lower layer driver added. Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
2022-05-12Merge commit '49caedb668e476c100d727f2174724e0610a2b92' of ↵Sujoy Ray106-273/+1080
https://github.com/openbmc/linux into openbmc/dev-5.15-intel-bump_v5.15.36 Signed-off-by: Sujoy Ray <sujoy.ray@intel.com>
2022-04-29i2c: mux: Add mux deselect support on timeoutArun P. Mohanan1-0/+1
Add support to deselect the mux when there is a timeout. The mux idle_state settings will be configured on startup. In case of MCTP it is MUX_IDLE_DISCONNECT. But when there is a timeout, mux ends up in connected position and the devices behind the mux will appear under different muxes connected to the same bus. This change fix the same. Signed-off-by: Arun P. Mohanan <arun.p.m@linux.intel.com>
2022-04-28Merge tag 'v5.15.36' into dev-5.15Joel Stanley12-18/+69
This is the 5.15.36 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-27netfilter: conntrack: avoid useless indirection during conntrack destructionFlorian Westphal2-2/+8
commit 6ae7989c9af0d98ab64196f4f4c6f6499454bd23 upstream. nf_ct_put() results in a usesless indirection: nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock + indirect call of ct_hooks->destroy(). There are two _put helpers: nf_ct_put and nf_conntrack_put. The latter is what should be used in code that MUST NOT cause a linker dependency on the conntrack module (e.g. calls from core network stack). Everyone else should call nf_ct_put() instead. A followup patch will convert a few nf_conntrack_put() calls to nf_ct_put(), in particular from modules that already have a conntrack dependency such as act_ct or even nf_conntrack itself. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27netfilter: conntrack: convert to refcount_t apiFlorian Westphal1-4/+4
commit 719774377622bc4025d2a74f551b5dc2158c6c30 upstream. Convert nf_conn reference counting from atomic_t to refcount_t based api. refcount_t api provides more runtime sanity checks and will warn on certain constructs, e.g. refcount_inc() on a zero reference count, which usually indicates use-after-free. For this reason template allocation is changed to init the refcount to 1, the subsequenct add operations are removed. Likewise, init_conntrack() is changed to set the initial refcount to 1 instead refcount_inc(). This is safe because the new entry is not (yet) visible to other cpus. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanupNico Pache1-0/+1
commit e4a38402c36e42df28eb1a5394be87e6571fb48a upstream. The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Nico Pache <npache@redhat.com> Co-developed-by: Joel Savitz <jsavitz@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27mm, hugetlb: allow for "high" userspace addressesChristophe Leroy1-0/+8
commit 5f24d5a579d1eace79d505b148808a850b417d4c upstream. This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053dac8 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053dac8. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053dac8 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27memcg: sync flush only if periodic flush is delayedShakeel Butt1-0/+5
commit 9b3016154c913b2e7ec5ae5c9a42eb9e732d86aa upstream. Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Daniel Dao <dqminh@cloudflare.com> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Frank Hofmann <fhofmann@cloudflare.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27scsi: iscsi: Fix NOP handling during conn recoveryMike Christie1-1/+1
[ Upstream commit 44ac97109e42f87b1a34954704b81b6c8eca80c4 ] If a offload driver doesn't use the xmit workqueue, then when we are doing ep_disconnect libiscsi can still inject PDUs to the driver. This adds a check for if the connection is bound before trying to inject PDUs. Link: https://lore.kernel.org/r/20220408001314.5014-9-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27scsi: iscsi: Merge suspend fieldsMike Christie1-4/+5
[ Upstream commit 5bd856256f8c03e329f8ff36d8c8efcb111fe6df ] Move the tx and rx suspend fields into one flags field. Link: https://lore.kernel.org/r/20220408001314.5014-8-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27scsi: iscsi: Release endpoint ID when its freedMike Christie1-1/+1
[ Upstream commit 3c6ae371b8a1ffba1fc415989fd581ebf841ed0a ] We can't release the endpoint ID until all references to the endpoint have been dropped or it could be allocated while in use. This has us use an idr instead of looping over all conns to find a free ID and then free the ID when all references have been dropped instead of when the device is only deleted. Link: https://lore.kernel.org/r/20220408001314.5014-4-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Reviewed-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27ipv6: make ip6_rt_gc_expire an atomic_tEric Dumazet1-2/+2
[ Upstream commit 9cb7c013420f98fa6fd12fc6a5dc055170c108db ] Reads and Writes to ip6_rt_gc_expire always have been racy, as syzbot reported lately [1] There is a possible risk of under-flow, leading to unexpected high value passed to fib6_run_gc(), although I have not observed this in the field. Hosts hitting ip6_dst_gc() very hard are under pretty bad state anyway. [1] BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 value changed: 0x00000bb3 -> 0x00000ba9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27esp: limit skb_page_frag_refill use to a single pageSabrina Dubroca1-2/+0
[ Upstream commit 5bd8baab087dff657e05387aee802e70304cc813 ] Commit ebe48d368e97 ("esp: Fix possible buffer overflow in ESP transformation") tried to fix skb_page_frag_refill usage in ESP by capping allocsize to 32k, but that doesn't completely solve the issue, as skb_page_frag_refill may return a single page. If that happens, we will write out of bounds, despite the check introduced in the previous patch. This patch forces COW in cases where we would end up calling skb_page_frag_refill with a size larger than a page (first in esp_output_head with tailen, then in esp_output_tail with skb->data_len). Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27mm, kfence: support kmem_dump_obj() for KFENCE objectsMarco Elver1-0/+24
commit 2dfe63e61cc31ee59ce951672b0850b5229cd5b0 upstream. Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overeadKees Cook1-3/+2
commit 2618a0dae09ef37728dab89ff60418cbe25ae6bd upstream. With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], | ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Khem Raj <raj.khem@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27block: simplify the block device syncing codeChristoph Hellwig1-0/+4
[ Upstream commit 1e03a36bdff4709c1bbf0f57f60ae3f776d51adf ] Get rid of the indirections and just provide a sync_bdevs helper for the generic sync code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27block: remove __sync_blockdevChristoph Hellwig1-0/+5
[ Upstream commit 70164eb6ccb76ab679b016b4b60123bf4ec6c162 ] Instead offer a new sync_blockdev_nowait helper for the !wait case. This new helper is exported as it will grow modular callers in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27Merge tag 'v5.15.35' into dev-5.15Joel Stanley9-6/+69
This is the 5.15.35 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-20ax25: fix reference count leaks of ax25_devDuoming Zhou1-3/+5
commit 87563a043cef044fed5db7967a75741cc16ad2b1 upstream. The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> [OP: backport to 5.15: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: add refcount in ax25_dev to avoid UAF bugsDuoming Zhou1-0/+10
commit d01ffb9eee4af165d83b08dd73ebdf9fe94a519b upstream. If we dereference ax25_dev after we call kfree(ax25_dev) in ax25_dev_device_down(), it will lead to concurrency UAF bugs. There are eight syscall functions suffer from UAF bugs, include ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(), ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and ax25_info_show(). One of the concurrency UAF can be shown as below: (USE) | (FREE) | ax25_device_event | ax25_dev_device_down ax25_bind | ... ... | kfree(ax25_dev) ax25_fillin_cb() | ... ax25_fillin_cb_from_dev() | ... | The root cause of UAF bugs is that kfree(ax25_dev) in ax25_dev_device_down() is not protected by any locks. When ax25_dev, which there are still pointers point to, is released, the concurrency UAF bug will happen. This patch introduces refcount into ax25_dev in order to guarantee that there are no pointers point to it when ax25_dev is released. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.15: adjusted context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20SUNRPC: Fix NFSD's request deferral on RDMA transportsChuck Lever1-0/+1
commit 773f91b2cf3f52df0d7508fdbf60f37567cdaee4 upstream. Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further investigation shows that the crash occurred while NFSD was handling a deferred request. This patch addresses two inter-related issues that prevent request deferral from working correctly for RPC/RDMA requests: 1. Prevent the crash by ensuring that the original svc_rqst::rq_xprt_ctxt value is available when the request is revisited. Otherwise svc_rdma_sendto() does not have a Receive context available with which to construct its reply. 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly compute .len and .buflen for received RPC Calls"), svc_rdma_recvfrom() did not include the transport header in the returned xdr_buf. There should have been no need for svc_defer() and friends to save and restore that header, as of that commit. This issue is addressed in a backport-friendly way by simply having svc_rdma_recvfrom() set rq_xprt_hlen to zero unconditionally, just as svc_tcp_recvfrom() does. This enables svc_deferred_recv() to correctly reconstruct an RPC message received via RPC/RDMA. Reported-by: Trond Myklebust <trondmy@hammerspace.com> Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entrySteve Capper1-3/+7
[ Upstream commit 697a1d44af8ba0477ee729e632f4ade37999249a ] tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when updating the mmu_gather structure. Unfortunately on arm64 there are two additional huge page sizes that need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced due to the fact that the tlb structure hasn't been correctly updated by the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry. This patch adds inequality logic to the generic implementation of tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are effectively covered on arm64. Also, as well as ptes, pmds and puds; p4ds are now considered too. Reported-by: David Hildenbrand <david@redhat.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat.com/ Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20vfio/pci: Fix vf_token mechanism when device-specific VF drivers are usedJason Gunthorpe1-0/+2
[ Upstream commit 1ef3342a934e235aca72b4bcc0d6854d80a65077 ] get_pf_vdev() tries to check if a PF is a VFIO PF by looking at the driver: if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) { However now that we have multiple VF and PF drivers this is no longer reliable. This means that security tests realted to vf_token can be skipped by mixing and matching different VFIO PCI drivers. Instead of trying to use the driver core to find the PF devices maintain a linked list of all PF vfio_pci_core_device's that we have called pci_enable_sriov() on. When registering a VF just search the list to see if the PF is present and record the match permanently in the struct. PCI core locking prevents a PF from passing pci_disable_sriov() while VF drivers are attached so the VFIO owned PF becomes a static property of the VF. In common cases where vfio does not own the PF the global list remains empty and the VF's pointer is statically NULL. This also fixes a lockdep splat from recursive locking of the vfio_group::device_lock between vfio_device_get_from_name() and vfio_device_get_from_dev(). If the VF and PF share the same group this would deadlock. Fixes: ff53edf6d6ab ("vfio/pci: Split the pci_driver code out of vfio_pci_core.c") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v3-876570980634+f2e8-vfio_vf_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20scsi: iscsi: Fix conn cleanup and stop race during iscsid restartMike Christie1-0/+2
[ Upstream commit 7c6e99c18167ed89729bf167ccb4a7e3ab3115ba ] If iscsid is doing a stop_conn at the same time the kernel is starting error recovery we can hit a race that allows the cleanup work to run on a valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set, but it calls flush_work on the clean_work before iscsi_conn_error_event has queued it. The flush then returns before the queueing and so the cleanup_work can run later and disconnect/stop a conn while it's in a connected state. The patch: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") added the late stop_conn call bug originally, and the patch: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") attempted to fix it but only fixed the normal EH case and left the above race for the iscsid restart case. For the normal EH case we don't hit the race because we only signal userspace to start recovery after we have done the queueing, so the flush will always catch the queued work or see it completed. For iscsid restart cases like boot, we can hit the race because iscsid will call down to the kernel before the kernel has signaled any error, so both code paths can be running at the same time. This adds a lock around the setting of the cleanup bit and queueing so they happen together. Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20net/sched: flower: fix parsing of ethertype following VLAN headerVlad Buslov1-0/+2
[ Upstream commit 2105f700b53c24aa48b65c15652acc386044d26a ] A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to match the L2 ethertype following the first VLAN header, as confirmed by linked discussion with the maintainer. However, such rule also matches packets that have additional second VLAN header, even though filter has both eth_type and vlan_ethtype set to "ipv4". Looking at the code this seems to be mostly an artifact of the way flower uses flow dissector. First, even though looking at the uAPI eth_type and vlan_ethtype appear like a distinct fields, in flower they are all mapped to the same key->basic.n_proto. Second, flow dissector skips following VLAN header as no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the value of n_proto to last parsed header. With these, such filters ignore any headers present between first VLAN header and first "non magic" header (ipv4 in this case) that doesn't result FLOW_DISSECT_RET_PROTO_AGAIN. Fix the issue by extending flow dissector VLAN key structure with new 'vlan_eth_type' field that matches first ethertype following previously parsed VLAN header. Modify flower classifier to set the new flow_dissector_key_vlan->vlan_eth_type with value obtained from TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs. Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/ Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20SUNRPC: Fix the svc_deferred_event trace classChuck Lever1-3/+4
[ Upstream commit 4d5004451ab2218eab94a30e1841462c9316ba19 ] Fix a NULL deref crash that occurs when an svc_rqst is deferred while the sunrpc tracing subsystem is enabled. svc_revisit() sets dr->xprt to NULL, so it can't be relied upon in the tracepoint to provide the remote's address. Unfortunately we can't revert the "svc_deferred_class" hunk in commit ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") because there is now a specific check of event format specifiers for unsafe dereferences. The warning that check emits is: event svc_defer_recv has unsafe dereference of argument 1 A "%pISpc" format specifier with a "struct sockaddr *" is indeed flagged by this check. Instead, take the brute-force approach used by the svcrdma_qp_error tracepoint. Convert the dr::addr field into a presentation address in the TP_fast_assign() arm of the trace event, and store that as a string. This fix can be backported to -stable kernels. In the meantime, commit c6ced22997ad ("tracing: Update print fmt check to handle new __get_sockaddr() macro") is now in v5.18, so this wonky fix can be replaced with __sockaddr() and friends properly during the v5.19 merge window. Fixes: ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ALSA: core: Add snd_card_free_on_error() helperTakashi Iwai1-0/+1
commit fee2b871d8d6389c9b4bdf9346a99ccc1c98c9b8 upstream. This is a small helper function to handle the error path more easily when an error happens during the probe for the device with the device-managed card. Since devres releases in the reverser order of the creations, usually snd_card_free() gets called at the last in the probe error path unless it already reached snd_card_register() calls. Due to this nature, when a driver expects the resource releases in card->private_free, this might be called too lately. As a workaround, one should call the probe like: static int __some_probe(...) { // do real probe.... } static int some_probe(...) { return snd_card_free_on_error(dev, __some_probe(dev, ...)); } so that the snd_card_free() is called explicitly at the beginning of the error path from the probe. This function will be used in the upcoming fixes to address the regressions by devres usages. Fixes: e8ad415b7a55 ("ALSA: core: Add managed card creation") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412093141.8008-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20soc: qcom: aoss: Expose send for generic usecaseDeepak Kumar Singh1-0/+38
commit 8c75d585b931ac874fbe4ee5a8f1811d20c2817f upstream. Not all upcoming usecases will have an interface to allow the aoss driver to hook onto. Expose the send api and create a get function to enable drivers to send their own messages to aoss. Signed-off-by: Chris Lew <clew@codeaurora.org> Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/1630420228-31075-2-git-send-email-deesin@codeaurora.org Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15peci: mctp: Add DomainId support for peci-mctpKrzysztof Richert1-0/+3
Get DomainId from peci subsystem message to send message to requested EID. Required EID is taken from aspeed-mctp based on the discovery performed by user-space application. For each particular agent behind master, information about its BDF/DomainId is sufficient to get its EID. Signed-off-by: Krzysztof Richert <krzysztof.richert@intel.com>
2022-04-15soc: aspeed: mctp: Extend internal aspeed-mctp driver APIKrzysztof Richert1-0/+14
New API allows to read DomainId information by internal kernel clients (for example peci-mctp). Signed-off-by: Krzysztof Richert <krzysztof.richert@intel.com>
2022-04-15soc: aspeed: mctp: Add IOCTLs for extended EID informationKrzysztof Richert1-0/+11
For PECI over MCTP purposes, aspeed-mctp driver needs to be extended that allows to set Domain ID information. Signed-off-by: Krzysztof Richert <krzysztof.richert@intel.com>
2022-04-15soc: aspeed: xdma: Add trace eventsEddie James1-0/+139
Trace the flow of the driver to aid debugging after an error. OpenBMC-Staging-Count: 1 Signed-off-by: Eddie James <eajames@linux.ibm.com> Acked-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lore.kernel.org/r/20220412215331.42491-1-eajames@linux.ibm.com Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-14Merge tag 'v5.15.34' into dev-5.15Joel Stanley45-123/+308
This is the 5.15.34 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-13stacktrace: move filter_irq_stacks() to kernel/stacktrace.cMarco Elver2-2/+1
commit f39f21b3ddc7fc0f87eb6dc75ddc81b5bbfb7672 upstream. filter_irq_stacks() has little to do with the stackdepot implementation, except that it is usually used by users (such as KASAN) of stackdepot to reduce the stack trace. However, filter_irq_stacks() itself is not useful without a stack trace as obtained by stack_trace_save() and friends. Therefore, move filter_irq_stacks() to kernel/stacktrace.c, so that new users of filter_irq_stacks() do not have to start depending on STACKDEPOT only for filter_irq_stacks(). Link: https://lkml.kernel.org/r/20210923104803.2620285-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Jann Horn <jannh@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Taras Madan <tarasmadan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13static_call: Don't make __static_call_return0 staticChristophe Leroy1-4/+1
commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b upstream. System.map shows that vmlinux contains several instances of __static_call_return0(): c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0 arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not: c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it. In order to work properly, global single instance of __static_call_return0() is required. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warningWaiman Long1-4/+7
commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream. The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code: static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; : It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense. Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access. Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long <longman@redhat.com> Reported-by: Justin Forbes <jforbes@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13SUNRPC: Don't call connect() more than once on a TCP socketTrond Myklebust1-0/+1
commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream. Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport. Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13rtc: mc146818-lib: fix signedness bug in mc146818_get_time()Dan Carpenter1-1/+1
commit 7372971c1be5b7d4fdd8ad237798bdc1d1d54162 upstream. The mc146818_get_time() function returns zero on success or negative a error code on failure. It needs to be type int. Fixes: d35786b3a28d ("rtc: mc146818-lib: change return values of mc146818_get_time()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220111071922.GE11243@kili Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wideJakub Sitnicki1-1/+2
commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream. remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13gpio: Restrict usage of GPIO chip irq members before initializationShreeya Patel1-0/+9
commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream. GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions. One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference. Following are the logs for reference :- kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0 To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized. Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com> Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13rtc: mc146818-lib: fix RTC presence checkMateusz Jończyk1-0/+1
[ Upstream commit ea6fa4961aab8f90a8aa03575a98b4bda368d4b6 ] To prevent an infinite loop in mc146818_get_time(), commit 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") added a check for RTC availability. Together with a later fix, it checked if bit 6 in register 0x0d is cleared. This, however, caused a false negative on a motherboard with an AMD SB710 southbridge; according to the specification [1], bit 6 of register 0x0d of this chipset is a scratchbit. This caused a regression in Linux 5.11 - the RTC was determined broken by the kernel and not used by rtc-cmos.c [3]. This problem was also reported in Fedora [4]. As a better alternative, check whether the UIP ("Update-in-progress") bit is set for longer then 10ms. If that is the case, then apparently the RTC is either absent (and all register reads return 0xff) or broken. Also limit the number of loop iterations in mc146818_get_time() to 10 to prevent an infinite loop there. The functions mc146818_get_time() and mc146818_does_rtc_work() will be refactored later in this patch series, in order to fix a separate problem with reading / setting the RTC alarm time. This is done so to avoid a confusion about what is being fixed when. In a previous approach to this problem, I implemented a check whether the RTC_HOURS register contains a value <= 24. This, however, sometimes did not work correctly on my Intel Kaby Lake laptop. According to Intel's documentation [2], "the time and date RAM locations (0-9) are disconnected from the external bus" during the update cycle so reading this register without checking the UIP bit is incorrect. [1] AMD SB700/710/750 Register Reference Guide, page 308, https://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.pdf [2] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet Volume 1 of 2, page 209 Intel's Document Number: 334658-006, https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-and-8th-gen-core-family-mobile-u-y-processor-lines-i-o-datasheet-vol-1.pdf [3] Functions in arch/x86/kernel/rtc.c apparently were using it. [4] https://bugzilla.redhat.com/show_bug.cgi?id=1936688 Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Fixes: ebb22a059436 ("rtc: mc146818: Dont test for bit 0-5 in Register D") Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20211210200131.153887-5-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13NFS: nfsiod should not block forever in mempool_alloc()Trond Myklebust1-1/+1
[ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ] The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later. Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13NFS: swap IO handling is slightly different for O_DIRECT IONeilBrown1-4/+4
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ] 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>