summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-04-11net: icmp: add skb drop reasons to icmp protocolMenglong Dong3-1/+8
Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with kfree_skb_reason(). In order to get the reasons of the skb drops after icmp message handle, we change the return type of 'handler()' in 'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This may change its original intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means success now. Therefore, all 'handler' and the call of them need to be handled. Following 'handler' functions are involved: icmp_unreach() icmp_redirect() icmp_echo() icmp_timestamp() icmp_discard() And following new drop reasons are added: SKB_DROP_REASON_ICMP_CSUM SKB_DROP_REASON_INVALID_PROTO The reason 'INVALID_PROTO' is introduced for the case that the packet doesn't follow rfc 1122 and is dropped. This is not a common case, and I believe we can locate the problem from the data in the packet. For now, this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types. Maybe there should be a document file for these reasons. For example, list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO' drop reason. Therefore, users can locate their problems according to the document. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11net: skb: rename SKB_DROP_REASON_PTYPE_ABSENTMenglong Dong2-6/+4
As David Ahern suggested, the reasons for skb drops should be more general and not be code based. Therefore, rename SKB_DROP_REASON_PTYPE_ABSENT to SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no L3 protocol handler, no L4 protocol handler, version extensions, etc. From previous discussion, now we have the aim to make these reasons more abstract and users based, avoiding code based. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11net: sock: introduce sock_queue_rcv_skb_reason()Menglong Dong1-1/+8
In order to report the reasons of skb drops in 'sock_queue_rcv_skb()', introduce the function 'sock_queue_rcv_skb_reason()'. As the return value of 'sock_queue_rcv_skb()' is used as the error code, we can't make it as drop reason and have to pass extra output argument. 'sock_queue_rcv_skb()' is used in many places, so we can't change it directly. Introduce the new function 'sock_queue_rcv_skb_reason()' and make 'sock_queue_rcv_skb()' an inline call to it. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10tls: rx: simplify async waitJakub Kicinski1-1/+0
Since we are protected from async completions by decrypt_compl_lock we can drop the async_notify and reinit the completion before we start waiting. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-2/+2
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-04-09 We've added 63 non-merge commits during the last 9 day(s) which contain a total of 68 files changed, 4852 insertions(+), 619 deletions(-). The main changes are: 1) Add libbpf support for USDT (User Statically-Defined Tracing) probes. USDTs are an abstraction built on top of uprobes, critical for tracing and BPF, and widely used in production applications, from Andrii Nakryiko. 2) While Andrii was adding support for x86{-64}-specific logic of parsing USDT argument specification, Ilya followed-up with USDT support for s390 architecture, from Ilya Leoshkevich. 3) Support name-based attaching for uprobe BPF programs in libbpf. The format supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g. attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc") now, from Alan Maguire. 4) Various load/store optimizations for the arm64 JIT to shrink the image size by using arm64 str/ldr immediate instructions. Also enable pointer authentication to verify return address for JITed code, from Xu Kuohai. 5) BPF verifier fixes for write access checks to helper functions, e.g. rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that write into passed buffers, from Kumar Kartikeya Dwivedi. 6) Fix overly excessive stack map allocation for its base map structure and buckets which slipped-in from cleanups during the rlimit accounting removal back then, from Yuntao Wang. 7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter connection tracking tuple direction, from Lorenzo Bianconi. 8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde. 9) Minor cleanups all over the place from various others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits) bpf: Fix excessive memory allocation in stack_map_alloc() selftests/bpf: Fix return value checks in perf_event_stackmap test selftests/bpf: Add CO-RE relos into linked_funcs selftests libbpf: Use weak hidden modifier for USDT BPF-side API functions libbpf: Don't error out on CO-RE relos for overriden weak subprogs samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread libbpf: Allow WEAK and GLOBAL bindings during BTF fixup libbpf: Use strlcpy() in path resolution fallback logic libbpf: Add s390-specific USDT arg spec parsing logic libbpf: Make BPF-side of USDT support work on big-endian machines libbpf: Minor style improvements in USDT code libbpf: Fix use #ifdef instead of #if to avoid compiler warning libbpf: Potential NULL dereference in usdt_manager_attach_usdt() selftests/bpf: Uprobe tests should verify param/return values libbpf: Improve string parsing for uprobe auto-attach libbpf: Improve library identification for uprobe binary path resolution selftests/bpf: Test for writes to map key from BPF helpers selftests/bpf: Test passing rdonly mem to global func bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access ... ==================== Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08net/sched: act_skbedit: Add extack messages for offload failureIdo Schimmel1-0/+12
For better error reporting to user space, add extack messages when skbedit action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit queue_mapping 1234 Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-185 [002] b..1. 31.802414: netlink_extack: msg=act_skbedit: Offload not supported when "queue_mapping" option is used tc-185 [002] ..... 31.802418: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit inheritdsfield Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-187 [002] b..1. 45.985145: netlink_extack: msg=act_skbedit: Offload not supported when "inheritdsfield" option is used tc-187 [002] ..... 45.985160: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_gact: Add extack messages for offload failureIdo Schimmel1-0/+15
For better error reporting to user space, add extack messages when gact action offload fails. Example: # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action continue Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-181 [002] b..1. 105.493450: netlink_extack: msg=act_gact: Offload of "continue" action is not supported tc-181 [002] ..... 105.493466: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action reclassify Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-183 [002] b..1. 124.126477: netlink_extack: msg=act_gact: Offload of "reclassify" action is not supported tc-183 [002] ..... 124.126489: netlink_extack: msg=cls_matchall: Failed to setup flow action # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action pipe action drop Error: cls_matchall: Failed to setup flow action. We have an error talking to the kernel # cat /sys/kernel/tracing/trace_pipe tc-185 [002] b..1. 137.097791: netlink_extack: msg=act_gact: Offload of "pipe" action is not supported tc-185 [002] ..... 137.097804: netlink_extack: msg=cls_matchall: Failed to setup flow action Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: act_api: Add extack to offload_act_setup() callbackIdo Schimmel2-3/+6
The callback is used by various actions to populate the flow action structure prior to offload. Pass extack to this callback so that the various actions will be able to report accurate error messages to user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: use a define for tag lengthJakub Kicinski1-0/+1
TLS 1.3 has to strip padding, and it starts out 16 bytes from the end of the record. Make it clear this is because of the auth tag. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the decryption status in socket contextJakub Kicinski2-1/+1
Similar justification to previous change, the information about decryption status belongs in the skb. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the record type in socket contextJakub Kicinski2-7/+6
Original TLS implementation was handling one record at a time. It stashed the type of the record inside tls context (per socket structure) for convenience. When async crypto support was added [1] the author had to use skb->cb to store the type per-message. The use of skb->cb overlaps with strparser, however, so a hybrid approach was taken where type is stored in context while parsing (since we parse a message at a time) but once parsed its copied to skb->cb. Recently a workaround for sockmaps [2] exposed the previously private struct _strp_msg and started a trend of adding user fields directly in strparser's header. This is cleaner than storing information about an skb in the context. This change is not strictly necessary, but IMHO the ownership of the context field is confusing. Information naturally belongs to the skb. [1] commit 94524d8fc965 ("net/tls: Add support for async decryption of tls records") [2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-9/+4
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08Merge tag 'net-5.18-rc2' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - new code bugs: - mctp: correct mctp_i2c_header_create result - eth: fungible: fix reference to __udivdi3 on 32b builds - eth: micrel: remove latencies support lan8814 Previous releases - regressions: - bpf: resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT - vrf: fix packet sniffing for traffic originating from ip tunnels - rxrpc: fix a race in rxrpc_exit_net() - dsa: revert "net: dsa: stop updating master MTU from master.c" - eth: ice: fix MAC address setting Previous releases - always broken: - tls: fix slab-out-of-bounds bug in decrypt_internal - bpf: support dual-stack sockets in bpf_tcp_check_syncookie - xdp: fix coalescing for page_pool fragment recycling - ovs: fix leak of nested actions - eth: sfc: - add missing xdp queue reinitialization - fix using uninitialized xdp tx_queue - eth: ice: - clear default forwarding VSI during VSI release - fix broken IFF_ALLMULTI handling - synchronize_rcu() when terminating rings - eth: qede: confirm skb is allocated before using - eth: aqc111: fix out-of-bounds accesses in RX fixup - eth: slip: fix NPD bug in sl_tx_timeout()" * tag 'net-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) drivers: net: slip: fix NPD bug in sl_tx_timeout() bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets bpf: Support dual-stack sockets in bpf_tcp_check_syncookie myri10ge: fix an incorrect free for skb in myri10ge_sw_tso net: usb: aqc111: Fix out-of-bounds accesses in RX fixup qede: confirm skb is allocated before using net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n net: phy: mscc-miim: reject clause 45 register accesses net: axiemac: use a phandle to reference pcs_phy dt-bindings: net: add pcs-handle attribute net: axienet: factor out phy_node in struct axienet_local net: axienet: setup mdio unconditionally net: sfc: fix using uninitialized xdp tx_queue rxrpc: fix a race in rxrpc_exit_net() net: openvswitch: fix leak of nested actions net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() net: openvswitch: don't send internal clone attribute to the userspace. net: micrel: Fix KS8851 Kconfig ice: clear cmd_type_offset_bsz for TX rings ice: xsk: fix VSI state check in ice_xsk_wakeup() ...
2022-04-08tcp: Add tracepoint for tcp_set_ca_statePing Gan2-9/+48
The congestion status of a tcp flow may be updated since there is congestion between tcp sender and receiver. It makes sense to add tracepoint for congestion status set function to summate cc status duration and evaluate the performance of network and congestion algorithm. the backgound of this patch is below. Link: https://github.com/iovisor/bcc/pull/3899 Signed-off-by: Ping Gan <jacky_gam_2001@163.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08net-core: rx_otherhost_dropped to core_statsJeffrey Ji2-0/+7
Increment rx_otherhost_dropped counter when packet dropped due to mismatched dest MAC addr. An example when this drop can occur is when manually crafting raw packets that will be consumed by a user space application via a tap device. For testing purposes local traffic was generated using trafgen for the client and netcat to start a server Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the counter was incremented. (Also had to modify iproute2 to show the stat, additional patch for that coming next.) Signed-off-by: Jeffrey Ji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08net: extract a few internals from netdevice.hJakub Kicinski1-70/+2
There's a number of functions and static variables used under net/core/ but not from the outside. We currently dump most of them into netdevice.h. That bad for many reasons: - netdevice.h is very cluttered, hard to figure out what the APIs are; - netdevice.h is very long; - we have to touch netdevice.h more which causes expensive incremental builds. Create a header under net/core/ and move some declarations. The new header is also a bit of a catch-all but that's fine, if we create more specific headers people will likely over-think where their declaration fit best. And end up putting them in netdevice.h, again. More work should be done on splitting netdevice.h into more targeted headers, but that'd be more time consuming so small steps. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07Merge tag 'hyperv-fixes-signed-20220407' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Correctly propagate coherence information for VMbus devices (Michael Kelley) - Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng) - Use barrier to prevent reording when reading ring buffer (Michael Kelley) - Use virt_store_mb in favour of smp_store_mb (Andrea Parri) - Fix VMbus device object initialization (Andrea Parri) - Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri) - Fix a crash when unloading VMbus module (Guilherme G. Piccoli) * tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() Drivers: hv: balloon: Disable balloon and hot-add accordingly Drivers: hv: balloon: Support status report for larger page sizes Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer PCI: hv: Propagate coherence from VMbus device to PCI device Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device Drivers: hv: vmbus: Fix potential crash on module unload Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register() Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
2022-04-07ipv6: fix locking issues with loops over idev->addr_listNiels Dossche1-0/+8
idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock. This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-1/+3
Alexei Starovoitov says: ==================== pull-request: bpf 2022-04-06 We've added 8 non-merge commits during the last 8 day(s) which contain a total of 9 files changed, 139 insertions(+), 36 deletions(-). The main changes are: 1) rethook related fixes, from Jiri and Masami. 2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin. 3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets bpf: Support dual-stack sockets in bpf_tcp_check_syncookie bpf: selftests: Test fentry tracing a struct_ops program bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT rethook: Fix to use WRITE_ONCE() for rethook:: Handler selftests/bpf: Fix warning comparing pointer to 0 bpf: Fix sparse warnings in kprobe_multi_resolve_syms bpftool: Explicit errno handling in skeletons ==================== Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06tcp: add accessors to read/set tp->snd_cwndEric Dumazet2-5/+16
We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06net: ethernet: mtk_eth_soc: implement flow offloading to WED devicesFelix Fietkau1-0/+7
This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)Felix Fietkau1-0/+131
The Wireless Ethernet Dispatch subsystem on the MT7622 SoC can be configured to intercept and handle access to the DMA queues and PCIe interrupts for a MT7615/MT7915 wireless card. It can manage the internal WDMA (Wireless DMA) controller, which allows ethernet packets to be passed from the packet switch engine (PSE) to the wireless card, bypassing the CPU entirely. This can be used to implement hardware flow offloading from ethernet to WLAN. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net, uapi: remove inclusion of arpa/inet.hNick Desaulniers1-16/+12
In include/uapi/linux/tipc_config.h, there's a comment that it includes arpa/inet.h for ntohs; but ntohs is not defined in any UAPI header. For now, reuse the definitions from include/linux/byteorder/generic.h, since the various conversion functions do exist in UAPI headers: include/uapi/linux/byteorder/big_endian.h include/uapi/linux/byteorder/little_endian.h We would like to get to the point where we can build UAPI header tests with -nostdinc, meaning that kernel UAPI headers should not have a circular dependency on libc headers. Link: https://android-review.googlesource.com/c/platform/bionic/+/2048127 Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: remove noblock parameter from skb_recv_datagram()Oliver Hartkopp1-2/+1
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: ensure net_todo_list is processed quicklyJohannes Berg1-1/+2
In [1], Will raised a potential issue that the cfg80211 code, which does (from a locking perspective) rtnl_lock() wiphy_lock() rtnl_unlock() might be suspectible to ABBA deadlocks, because rtnl_unlock() calls netdev_run_todo(), which might end up calling rtnl_lock() again, which could then deadlock (see the comment in the code added here for the scenario). Some back and forth and thinking ensued, but clearly this can't happen if the net_todo_list is empty at the rtnl_unlock() here. Clearly, the code here cannot actually put an entry on it, and all other users of rtnl_unlock() will empty it since that will always go through netdev_run_todo(), emptying the list. So the only other way to get there would be to add to the list and then unlock the RTNL without going through rtnl_unlock(), which is only possible through __rtnl_unlock(). However, this isn't exported and not used in many places, and none of them seem to be able to unregister before using it. Therefore, add a WARN_ON() in the code to ensure this invariant won't be broken, so that the cfg80211 (or any similar) code stays safe. [1] https://lore.kernel.org/r/Yjzpo3TfZxtKPMAG@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220404113847.0ee02e4a70da.Ic73d206e217db20fd22dcec14fe5442ca732804b@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-6/+0
Pull virtio fixes from Michael Tsirkin: "Fixes and cleanups: - A couple of mlx5 fixes related to cvq - A couple of reverts dropping useless code (code that used it got reverted earlier)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: mlx5: synchronize driver status with CVQ vdpa: mlx5: prevent cvq work from hogging CPU Revert "virtio_config: introduce a new .enable_cbs method" Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
2022-04-04bpf: Correct the comment for BTF kind bitfieldHaiyue Wang1-2/+2
The commit 8fd886911a6a ("bpf: Add BTF_KIND_FLOAT to uapi") has extended the BTF kind bitfield from 4 to 5 bits, correct the comment. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220403115327.205964-1-haiyue.wang@intel.com
2022-04-03Merge tag 'trace-v5.18-2' of ↵Linus Torvalds11-68/+29
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Rename the staging files to give them some meaning. Just stage1,stag2,etc, does not show what they are for - Check for NULL from allocation in bootconfig - Hold event mutex for dyn_event call in user events - Mark user events to broken (to work on the API) - Remove eBPF updates from user events - Remove user events from uapi header to keep it from being installed. - Move ftrace_graph_is_dead() into inline as it is called from hot paths and also convert it into a static branch. * tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Move user_events.h temporarily out of include/uapi ftrace: Make ftrace_graph_is_dead() a static branch tracing: Set user_events to BROKEN tracing/user_events: Remove eBPF interfaces tracing/user_events: Hold event_mutex during dyn_event_add proc: bootconfig: Add null pointer check tracing: Rename the staging files for trace_events
2022-04-03Merge tag 'core-urgent-2022-04-03' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RT signal fix from Thomas Gleixner: "Revert the RT related signal changes. They need to be reworked and generalized" * tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"
2022-04-03Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-131/+1
Pull more dma-mapping updates from Christoph Hellwig: - fix a regression in dma remap handling vs AMD memory encryption (me) - finally kill off the legacy PCI DMA API (Christophe JAILLET) * tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: move pgprot_decrypted out of dma_pgprot PCI/doc: cleanup references to the legacy PCI DMA API PCI: Remove the deprecated "pci-dma-compat.h" API
2022-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-23/+48
Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
2022-04-02tracing: mark user_events as BROKENSteven Rostedt (Google)1-0/+0
After being merged, user_events become more visible to a wider audience that have concerns with the current API. It is too late to fix this for this release, but instead of a full revert, just mark it as BROKEN (which prevents it from being selected in make config). Then we can work finding a better API. If that fails, then it will need to be completely reverted. To not have the code silently bitrot, still allow building it with COMPILE_TEST. And to prevent the uapi header from being installed, then later changed, and then have an old distro user space see the old version, move the header file out of the uapi directory. Surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-02tracing: Move user_events.h temporarily out of include/uapiSteven Rostedt (Google)1-0/+0
While user_events API is under development and has been marked for broken to not let the API become fixed, move the header file out of the uapi directory. This is to prevent it from being installed, then later changed, and then have an old distro user space update with a new kernel, where applications see the user_events being available, but the old header is in place, and then they get compiled incorrectly. Also, surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Link: https://lkml.kernel.org/r/20220401143903.188384f3@gandalf.local.home Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02ftrace: Make ftrace_graph_is_dead() a static branchChristophe Leroy1-1/+15
ftrace_graph_is_dead() is used on hot paths, it just reads a variable in memory and is not worth suffering function call constraints. For instance, at entry of prepare_ftrace_return(), inlining it avoids saving prepare_ftrace_return() parameters to stack and restoring them after calling ftrace_graph_is_dead(). While at it using a static branch is even more performant and is rather well adapted considering that the returned value will almost never change. Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool by a static branch. The performance improvement is noticeable. Link: https://lkml.kernel.org/r/e0411a6a0ed3eafff0ad2bc9cd4b0e202b4617df.1648623570.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing/user_events: Remove eBPF interfacesBeau Belgrave1-53/+0
Remove eBPF interfaces within user_events to ensure they are fully reviewed. Link: https://lore.kernel.org/all/20220329165718.GA10381@kbox/ Link: https://lkml.kernel.org/r/20220329173051.10087-1-beaub@linux.microsoft.com Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing: Rename the staging files for trace_eventsSteven Rostedt (Google)9-14/+14
When looking for implementation of different phases of the creation of the TRACE_EVENT() macro, it is pretty useless when all helper macro redefinitions are in files labeled "stageX_defines.h". Rename them to state which phase the files are for. For instance, when looking for the defines that are used to create the event fields, seeing "stage4_event_fields.h" gives the developer a good idea that the defines are in that file. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02KVM: Remove dirty handling from gfn_to_pfn_cache completelyDavid Woodhouse2-10/+5
It isn't OK to cache the dirty status of a page in internal structures for an indefinite period of time. Any time a vCPU exits the run loop to userspace might be its last; the VMM might do its final check of the dirty log, flush the last remaining dirty pages to the destination and complete a live migration. If we have internal 'dirty' state which doesn't get flushed until the vCPU is finally destroyed on the source after migration is complete, then we have lost data because that will escape the final copy. This problem already exists with the use of kvm_vcpu_unmap() to mark pages dirty in e.g. VMX nesting. Note that the actual Linux MM already considers the page to be dirty since we have a writeable mapping of it. This is just about the KVM dirty logging. For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to track which gfn_to_pfn_caches have been used and explicitly mark the corresponding pages dirty before returning to userspace. But we would have needed external tracking of that anyway, rather than walking the full list of GPCs to find those belonging to this vCPU which are dirty. So let's rely *solely* on that external tracking, and keep it simple rather than laying a tempting trap for callers to fall into. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: Use enum to track if cached PFN will be used in guest and/or hostSean Christopherson2-10/+16
Replace the guest_uses_pa and kernel_map booleans in the PFN cache code with a unified enum/bitmask. Using explicit names makes it easier to review and audit call sites. Opportunistically add a WARN to prevent passing garbage; instantating a cache without declaring its usage is either buggy or pointless. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: Don't actually set a request when evicting vCPUs for GFN cache invdSean Christopherson1-7/+31
Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02Merge branch 'work.misc' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted bits and pieces" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: drop needless assignment in aio_read() clean overflow checks in count_mounts() a bit seq_file: fix NULL pointer arithmetic warning uml/x86: use x86 load_unaligned_zeropad() asm/user.h: killed unused macros constify struct path argument of finish_automount()/do_add_mount() fs: Remove FIXME comment in generic_write_checks()
2022-04-02Merge tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds2-2/+3
Pull block driver fixes from Jens Axboe: "Followup block driver updates and fixes for the 5.18-rc1 merge window. In detail: - NVMe pull request - Fix multipath hang when disk goes live over reconnect (Anton Eidelman) - fix RCU hole that allowed for endless looping in multipath round robin (Chris Leech) - remove redundant assignment after left shift (Colin Ian King) - add quirks for Samsung X5 SSDs (Monish Kumar R) - fix the read-only state for zoned namespaces with unsupposed features (Pankaj Raghav) - use a private workqueue instead of the system workqueue in nvmet (Sagi Grimberg) - allow duplicate NSIDs for private namespaces (Sungup Moon) - expose use_threaded_interrupts read-only in sysfs (Xin Hao)" - nbd minor allocation fix (Zhang) - drbd fixes and maintainer addition (Lars, Jakob, Christoph) - n64cart build fix (Jackie) - loop compat ioctl fix (Carlos) - misc fixes (Colin, Dongli)" * tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block: drbd: remove check of list iterator against head past the loop body drbd: remove usage of list iterator variable after loop nbd: fix possible overflow on 'first_minor' in nbd_dev_add() MAINTAINERS: add drbd co-maintainer drbd: fix potential silent data corruption loop: fix ioctl calls using compat_loop_info nvme-multipath: fix hang when disk goes live over reconnect nvme: fix RCU hole that allowed for endless looping in multipath round robin nvme: allow duplicate NSIDs for private namespaces nvmet: remove redundant assignment after left shift nvmet: use a private workqueue instead of the system workqueue nvme-pci: add quirks for Samsung X5 SSDs nvme-pci: expose use_threaded_interrupts read-only in sysfs nvme: fix the read-only state for zoned namespaces with unsupposed features n64cart: convert bi_disk to bi_bdev->bd_disk fix build xen/blkfront: fix comment for need_copy xen-blkback: remove redundant assignment to variable i
2022-04-02Merge tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds2-2/+5
Pull block fixes from Jens Axboe: "Either fixes or a few additions that got missed in the initial merge window pull. In detail: - List iterator fix to avoid leaking value post loop (Jakob) - One-off fix in minor count (Christophe) - Fix for a regression in how io priority setting works for an exiting task (Jiri) - Fix a regression in this merge window with blkg_free() being called in an inappropriate context (Ming) - Misc fixes (Ming, Tom)" * tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block: blk-wbt: remove wbt_track stub block: use dedicated list iterator variable block: Fix the maximum minor value is blk_alloc_ext_minor() block: restore the old set_task_ioprio() behaviour wrt PF_EXITING block: avoid calling blkg_free() in atomic context lib/sbitmap: allocate sb->map via kvzalloc_node
2022-04-02Merge tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+0
Pull io_uring fixes from Jens Axboe: "A little bit all over the map, some regression fixes for this merge window, and some general fixes that are stable bound. In detail: - Fix an SQPOLL memory ordering issue (Almog) - Accept fixes (Dylan) - Poll fixes (me) - Fixes for provided buffers and recycling (me) - Tweak to IORING_OP_MSG_RING command added in this merge window (me) - Memory leak fix (Pavel) - Misc fixes and tweaks (Pavel, me)" * tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block: io_uring: defer msg-ring file validity check until command issue io_uring: fail links if msg-ring doesn't succeeed io_uring: fix memory leak of uid in files registration io_uring: fix put_kbuf without proper locking io_uring: fix invalid flags for io_put_kbuf() io_uring: improve req fields comments io_uring: enable EPOLLEXCLUSIVE for accept poll io_uring: improve task work cache utilization io_uring: fix async accept on O_NONBLOCK sockets io_uring: remove IORING_CQE_F_MSG io_uring: add flag for disabling provided buffer recycling io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly io_uring: don't recycle provided buffer if punted to async worker io_uring: fix assuming triggered poll waitqueue is the single poll io_uring: bump poll refs to full 31-bits io_uring: remove poll entry from list when canceling all io_uring: fix memory ordering when SQPOLL thread goes to sleep io_uring: ensure that fsnotify is always called io_uring: recycle provided before arming poll
2022-04-02Merge tag 'for-5.18/dm-fixes' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM integrity shrink crash due to journal entry not being marked unused. - Fix DM bio polling to handle possibility that underlying device(s) return BLK_STS_AGAIN during submission. - Fix dm_io and dm_target_io flags race condition on Alpha. - Add some pr_err debugging to help debug cases when DM ioctl structure is corrupted. * tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: fix bio polling to handle possibile BLK_STS_AGAIN dm: fix dm_io and dm_target_io flags race condition on Alpha dm integrity: set journal entry unused when shrinking device dm ioctl: log an error if the ioctl structure is corrupted
2022-04-01Merge tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds4-33/+21
Pull more filesystem folio updates from Matthew Wilcox: "A mixture of odd changes that didn't quite make it into the original pull and fixes for things that did. Also the readpages changes had to wait for the NFS tree to be pulled first. - Remove ->readpages infrastructure - Remove AOP_FLAG_CONT_EXPAND - Move read_descriptor_t to networking code - Pass the iocb to generic_perform_write - Minor updates to iomap, btrfs, ext4, f2fs, ntfs" * tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache: btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio() ntfs: Correct mark_ntfs_record_dirty() folio conversion f2fs: Get the superblock from the mapping instead of the page f2fs: Correct f2fs_dirty_data_folio() conversion ext4: Correct ext4_journalled_dirty_folio() conversion filemap: Remove AOP_FLAG_CONT_EXPAND fs: Pass an iocb to generic_perform_write() fs, net: Move read_descriptor_t to net.h fs: Remove read_actor_t iomap: Simplify is_partially_uptodate a little readahead: Update comments mm: remove the skip_page argument to read_pages mm: remove the pages argument to read_pages fs: Remove ->readpages address space operation readahead: Remove read_cache_pages()
2022-04-01Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarrayLinus Torvalds1-0/+1
Pull XArray updates from Matthew Wilcox: - Documentation update - Fix test-suite build after move of bitmap.h - Fix xas_create_range() when a large entry is already present - Fix xas_split() of a shadow entry * tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray: XArray: Update the LRU list in xas_split() XArray: Fix xas_create_range() when multi-order entry present XArray: Include bitmap.h from xarray.h XArray: Document the locking requirement for the xa_state
2022-04-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-3/+1
Merge still more updates from Andrew Morton: "16 patches. Subsystems affected by this patch series: ofs2, nilfs2, mailmap, and mm (madvise, mlock, mfence, memory-failure, kasan, debug, kmemleak, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/damon: prevent activated scheme from sleeping by deactivated schemes mm/kmemleak: reset tag when compare object pointer doc/vm/page_owner.rst: remove content related to -c option tools/vm/page_owner_sort.c: remove -c option mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP mm,hwpoison: unmap poisoned page before invalidation mailmap: update Kirill's email mm: kfence: fix objcgs vector allocation mm/munlock: protect the per-CPU pagevec by a local_lock_t mm/munlock: update Documentation/vm/unevictable-lru.rst mm/munlock: add lru_add_drain() to fix memcg_stat_test nilfs2: get rid of nilfs_mapping_init() nilfs2: fix lockdep warnings during disk space reclamation nilfs2: fix lockdep warnings in page operations for btree nodes ocfs2: fix crash when mount with quota enabled Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
2022-04-01mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEPAndrey Konovalov1-3/+1
KASAN changes that added new GFP flags mistakenly updated __GFP_BITS_SHIFT as the total number of GFP bits instead of as a shift used to define __GFP_BITS_MASK. This broke LOCKDEP, as __GFP_BITS_MASK now gets the 25th bit enabled instead of the 28th for __GFP_NOLOCKDEP. Update __GFP_BITS_SHIFT to always count KASAN GFP bits. In the future, we could handle all combinations of KASAN and LOCKDEP to occupy as few bits as possible. For now, we have enough GFP bits to be inefficient in this quick fix. Link: https://lkml.kernel.org/r/462ff52742a1fcc95a69778685737f723ee4dfb3.1648400273.git.andreyknvl@google.com Fixes: 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS") Fixes: 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS") Fixes: f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01filemap: Remove AOP_FLAG_CONT_EXPANDMatthew Wilcox (Oracle)1-1/+0
This flag is no longer used, so remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01fs: Pass an iocb to generic_perform_write()Matthew Wilcox (Oracle)1-1/+1
We can extract both the file pointer and the pos from the iocb. This simplifies each caller as well as allowing generic_perform_write() to see more of the iocb contents in the future. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>