summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGESDavid Howells1-138/+35
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. [!] Note that tls_sw_sendpage_locked() appears to have the wrong locking upstream. I think the caller will only hold the socket lock, but it should hold tls_ctx->tx_lock too. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tls/sw: Support MSG_SPLICE_PAGESDavid Howells1-0/+41
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()David Howells1-8/+10
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and, as a result, it incorrectly signals/fails to signal MSG_MORE when splicing to a socket. The problem I'm seeing happens when a short splice occurs because we got a short read due to hitting the EOF on a file: as the length read (read_len) is less than the remaining size to be spliced (len), SPLICE_F_MORE (and thus MSG_MORE) is set. The issue is that, for the moment, we have no way to know *why* the short read occurred and so can't make a good decision on whether we *should* keep MSG_MORE set. MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set incorrectly under some circumstances - for example if a short read fills a single pipe_buffer, but the next read would return more (seqfile can do this). This was observed with the multi_chunk_sendfile tests in the tls kselftest program. Some of those tests would hang and time out when the last chunk of file was less than the sendfile request size: build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile This has been observed before[2] and worked around in AF_TLS[3]. Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if we haven't yet hit the requested operation size. SPLICE_F_MORE remains signalled if the user passed it in to splice() but otherwise gets cleared when we've read sufficient data to fulfill the request. If, however, we get a premature EOF from ->splice_read(), have sent at least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof() will be invoked. Signed-off-by: David Howells <dhowells@redhat.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: David Hildenbrand <david@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2] Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09kcm: Use splice_eof() to flushDavid Howells1-0/+15
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Cong Wang <cong.wang@bytedance.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09chelsio/chtls: Use splice_eof() to flushDavid Howells3-0/+11
Allow splice to end a Chelsio TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Ayush Sawal <ayush.sawal@chelsio.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09ipv4, ipv6: Use splice_eof() to flushDavid Howells10-0/+71
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof(). For TCP, it's not clear that MSG_MORE is actually effective. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Kuniyuki Iwashima <kuniyu@amazon.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tls/device: Use splice_eof() to flushDavid Howells3-0/+26
Allow splice to end a TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called TLS with a sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tls/sw: Use splice_eof() to flushDavid Howells3-0/+77
Allow splice to end a TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called TLS with a sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09splice, net: Add a splice_eof op to file-ops and socket-opsDavid Howells6-1/+44
Add an optional method, ->splice_eof(), to allow splice to indicate the premature termination of a splice to struct file_operations and struct proto_ops. This is called if sendfile() or splice() encounters all of the following conditions inside splice_direct_to_actor(): (1) the user did not set SPLICE_F_MORE (splice only), and (2) an EOF condition occurred (->splice_read() returned 0), and (3) we haven't read enough to fulfill the request (ie. len > 0 still), and (4) we have already spliced at least one byte. A further patch will modify the behaviour of SPLICE_F_MORE to always be passed to the actor if either the user set it or we haven't yet read sufficient data to fulfill the request. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: David Hildenbrand <david@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: linux-mm@kvack.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()David Howells4-57/+131
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage() with a net-specific handler, splice_to_socket(), that calls sendmsg() with MSG_SPLICE_PAGES set instead of calling ->sendpage(). MSG_MORE is used to indicate if the sendmsg() is expected to be followed with more data. This allows multiple pipe-buffer pages to be passed in a single call in a BVEC iterator, allowing the processing to be pushed down to a loop in the protocol driver. This helps pave the way for passing multipage folios down too. Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should just ignore it and do a normal sendmsg() for now - although that may be a bit slower as it may copy everything. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsgDavid Howells2-2/+3
Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal sendmsg for now. This means the data will just be copied until MSG_SPLICE_PAGES is handled. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspaceDavid Howells1-1/+3
It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them in the network protocol, however, allows these flags to be passed in by userspace too[1]. Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object if they are passed to sendmsg() by userspace. Network protocol ->sendmsg() implementations can then allow them through. Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once sendpage is removed as a whole slew of pages will be passed in in one go by splice through sendmsg, with MSG_MORE being set if it has more data waiting in the pipe. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tcp: let tcp_mtu_probe() build headless packetsEric Dumazet1-2/+58
tcp_mtu_probe() is still copying payload from skbs in the write queue, using skb_copy_bits(), ignoring potential errors. Modern TCP stack wants to only deal with payload found in page frags, as this is a prereq for TCPDirect (host stack might not have access to the payload) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230607214113.1992947-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09Merge tag 'mlx5-updates-2023-06-06' of ↵Jakub Kicinski18-103/+201
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-06 1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ * tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: simplify condition after napi budget handling change mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure net/mlx5e: TC, refactor access to hash key net/mlx5e: Remove RX page cache leftovers net/mlx5e: Expose catastrophic steering error counters net/mlx5: Enable 4 ports VF LAG net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports net/mlx5: LAG, change mlx5_shared_fdb_supported() to static net/mlx5: LAG, generalize handling of shared FDB net/mlx5: LAG, check if all eswitches are paired for shared FDB {net/RDMA}/mlx5: introduce lag_for_each_peer RDMA/mlx5: Free second uplink ib port ==================== Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09ethtool: ioctl: improve error checking for set_wolJustin Chen1-2/+12
The netlink version of set_wol checks for not supported wolopts and avoids setting wol when the correct wolopt is already set. If we do the same with the ioctl version then we can remove these checks from the driver layer. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/1686179653-29750-1-git-send-email-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09Merge branch 'complete-lynx-mdio-device-handling'Jakub Kicinski4-50/+58
Russell King says: ==================== complete Lynx mdio device handling This series completes the mdio device lifetime handling for Lynx PCS users which do not create their own mdio device, but instead fetch it using a firmware description - namely the DPAA2 and FMAN_MEMAC drivers. In a previous patch set, lynx_pcs_create() was modified to increase the mdio device refcount, and lynx_pcs_destroy() to drop that refcount. The first two patches change these two drivers to put the reference which they hold immediately after lynx_pcs_create(), effectively handing the responsibility for maintaining the refcount to the Lynx PCS driver. A side effect of the first two patches is that lynx_get_mdio_device() is no longer used, so patch 3 removes it. Patch 4 adds a new helper - lynx_pcs_create_fwnode(), which creates a Lynx PCS instance from the fwnode. Patch 5 and 6 convert the two drivers to make use of this new helper, which simply has to find the mdio device, and then create the Lynx PCS from that. With those conversions done, lynx_pcs_create() is no longer required outside pcs-lynx.c, so remove it from public view. Patch 8 we changes lynx_pcs_create() to return an error-pointer rather than NULL to bring consistency to the return style, and means that we can remove the NULL-to-error-pointer conversion from both lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev(). Patch 9 adds a check for the fwnode being available, and returns an -ENODEV error pointer if unavailable. Patch 10 removes this check from DPAA2, detecting the error pointer value to continue printing the helpful message. Patch 11 removes this check from fman_memac, and in doing so fixes a bug where if the node is unavailable, the reference count is not dropped. ==================== Link: https://lore.kernel.org/r/ZIBwuw+IuGQo5yV8@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: fman_memac: use pcs-lynx's check for fwnode availabilityRussell King (Oracle)1-1/+1
Use pcs-lynx's check rather than our own when determining if the device is available. This fixes a bug where the reference gained by of_parse_phandle() is not dropped if the device is not available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: dpaa2: use pcs-lynx's check for fwnode availabilityRussell King (Oracle)1-6/+5
Use pcs-lynx's check rather than our own when determining if the device is available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pcs: lynx: check that the fwnode is available prior to useRussell King (Oracle)1-0/+14
Check that the fwnode is marked as available prior to trying to lookup the PCS device, and return -ENODEV if unavailable. Document the return codes from lynx_pcs_create_fwnode(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pcs: lynx: change lynx_pcs_create() to return error-pointersRussell King (Oracle)1-13/+1
Change lynx_pcs_create() to return an error-pointer on failure to allocate memory, rather than returning NULL. This allows the removal of the conversion in lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pcs: lynx: make lynx_pcs_create() staticRussell King (Oracle)2-3/+1
We no longer need to export lynx_pcs_create() for drivers to use as we now have all the functionality we need in the two new creation helpers. Remove the export and prototype, and make it static. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: fman_memac: use lynx_pcs_create_fwnode()Russell King (Oracle)1-9/+4
Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: dpaa2-mac: use lynx_pcs_create_fwnode()Russell King (Oracle)1-8/+10
Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pcs: lynx: add lynx_pcs_create_fwnode()Russell King (Oracle)2-0/+30
Add a helper to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pcs: lynx: remove lynx_get_mdio_device()Russell King (Oracle)2-10/+0
lynx_get_mdio_device() is no longer necessary, let's remove it so the lynx PCS code is always managing the lifetime of the mdiodev. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: fman_memac: allow lynx PCS to handle mdiodev lifetimeRussell King (Oracle)1-6/+1
Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: dpaa2-mac: allow lynx PCS to manage mdiodev lifetimeRussell King (Oracle)1-4/+1
Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09net: pch_gbe: Allow build on MIPS_GENERIC kernelJiaxun Yang2-2/+2
MIPS Boston board, which is using MIPS_GENERIC kernel is using EG20T PCH and thus need this driver. Dependency of PCH_GBE, PTP_1588_CLOCK_PCH is also fixed for MIPS_GENERIC. Note that CONFIG_PCH_GBE is selected in arch/mips/configs/generic/ board-boston.config for a while, some how it's never wired up in Kconfig. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230607055953.34110-1-jiaxun.yang@flygoat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09mlxsw: spectrum_nve_vxlan: Fix unsupported flag regressionIdo Schimmel1-2/+4
The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN devices and denotes a behavior that is irrelevant for the hardware data path. Add it to the lists of IPv4 and IPv6 supported flags to avoid rejecting offload of VXLAN devices which have this flag set. Fixes: 69474a8a5837 ("net: vxlan: Add nolocalbypass option to vxlan.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09Merge branch 'tools-ynl-generate-code-for-the-devlink-family'Jakub Kicinski11-58/+1101
Jakub Kicinski says: ==================== tools: ynl: generate code for the devlink family Another chunk of changes to support more capabilities in the YNL code gen. Devlink brings in deep nesting and directional messages (requests and responses have different IDs). We need a healthy dose of codegen changes to support those (I wasn't planning to support code gen for "directional" families initially, but the importance of devlink and ethtool is undeniable). ==================== Link: https://lore.kernel.org/r/20230607202403.1089925-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl: add sample for devlinkJakub Kicinski2-0/+61
Add a sample to show off how to issue basic devlink requests. For added testing issue get requests while walking a dump. $ ./devlink netdevsim/netdevsim1: driver: netdevsim running fw: fw.mgmt: 10.20.30 ... netdevsim/netdevsim2: driver: netdevsim running fw: fw.mgmt: 10.20.30 ... Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl: generate code for the devlink familyJakub Kicinski3-1/+932
Admittedly the devlink.yaml spec is fairly limitted, it only covers basic device get and info-get ops. That's sufficient to be useful (monitoring FW versions in the fleet). Plus it gives us a chance to exercise deep nesting and directional messaging in YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: don't generate forward declarations for policies - regenJakub Kicinski3-8/+0
Renegerate code after dropping forward declarations for policies. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: don't generate forward declarations for policiesJakub Kicinski1-9/+3
Now that all nested types have structs and are sorted topologically there should be no need to generate forward declarations for policies. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: walk nested types in depthJakub Kicinski1-12/+29
So far we had only created structures for nested types nested directly in messages (second level of attrs so to speak). Walk types in depth to support deeper nesting. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: inherit struct use infoJakub Kicinski1-0/+8
We only render parse and netlink generation helpers as needed, to avoid generating dead code. Propagate the information from first- and second-layer attribute sets onto all children. Otherwise devlink won't work, it has a lot more levels of nesting. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: try to sort the types more intelligentlyJakub Kicinski1-2/+24
We need to sort the structures to avoid the need for forward declarations. While at it remove the sort of structs when rendering, it doesn't do anything. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: enable code gen for directional specsJakub Kicinski2-6/+11
I think that user space code gen for directional specs works after recent changes. Let them through. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: refactor strmap helper generationJakub Kicinski1-19/+17
Move generating strmap lookup function to a helper. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09tools: ynl-gen: use enum names in op strmap more carefullyJakub Kicinski3-2/+9
In preparation for supporting families which use different msg ids to and from the kernel - make sure the ids in op strmap are correct. The map is expected to be used mostly for notifications, don't generate a separate map for the "to kernel" direction. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-09netlink: specs: devlink: fill in some details important for CJakub Kicinski1-0/+8
Python YNL is much more forgiving than the C code gen in terms of the spec completeness. Fill in a handful of devlink details to make the spec usable in C. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski319-1351/+2815
Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08Merge tag 'net-6.4-rc6' of ↵Linus Torvalds105-386/+788
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, wifi, netfilter, bluetooth and ebpf. Current release - regressions: - bpf: sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready() - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif() - phylink: actually fix ksettings_set() ethtool call - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3 Current release - new code bugs: - wifi: mt76: fix possible NULL pointer dereference in mt7996_mac_write_txwi() Previous releases - regressions: - netfilter: fix NULL pointer dereference in nf_confirm_cthelper - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS - openvswitch: fix upcall counter access before allocation - bluetooth: - fix use-after-free in hci_remove_ltk/hci_remove_irk - fix l2cap_disconnect_req deadlock - nic: bnxt_en: prevent kernel panic when receiving unexpected PHC_UPDATE event Previous releases - always broken: - core: annotate rfs lockless accesses - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values - netfilter: add null check for nla_nest_start_noflag() in nft_dump_basechain_hook() - bpf: fix UAF in task local storage - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294 - ipv6: rpl: fix route of death. - tcp: gso: really support BIG TCP - mptcp: fixes for user-space PM address advertisement - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT - can: avoid possible use-after-free when j1939_can_rx_register fails - batman-adv: fix UaF while rescheduling delayed work - eth: qede: fix scheduling while atomic - eth: ice: make writes to /dev/gnssX synchronous" * tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event bnxt_en: Skip firmware fatal error recovery if chip is not accessible bnxt_en: Query default VLAN before VNIC setup on a VF bnxt_en: Don't issue AP reset during ethtool's reset operation bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg() net: bcmgenet: Fix EEE implementation eth: ixgbe: fix the wake condition eth: bnxt: fix the wake condition lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release() bpf: Add extra path pointer check to d_path helper net: sched: fix possible refcount leak in tc_chain_tmplt_add() net: sched: act_police: fix sparse errors in tcf_police_dump() net: openvswitch: fix upcall counter access before allocation net: sched: move rtm_tca_policy declaration to include file ice: make writes to /dev/gnssX synchronous net: sched: add rcu annotations around qdisc->qdisc_sleeping rfs: annotate lockless accesses to RFS sock flow table rfs: annotate lockless accesses to sk->sk_rxhash virtio_net: use control_buf for coalesce params ...
2023-06-08Merge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds24-229/+427
Pull xfs fixes from Dave Chinner: "These are a set of regression fixes discovered on recent kernels. I was hoping to send this to you a week and half ago, but events out of my control delayed finalising the changes until early this week. Whilst the diffstat looks large for this stage of the merge window, a large chunk of it comes from moving the guts of one function from one file to another i.e. it's the same code, it is just run in a different context where it is safe to hold a specific lock. Otherwise the individual changes are relatively small and straigtht forward. Summary: - Propagate unlinked inode list corruption back up to log recovery (regression fix) - improve corruption detection for AGFL entries, AGFL indexes and XEFI extents (syzkaller fuzzer oops report) - Avoid double perag reference release (regression fix) - Improve extent merging detection in scrub (regression fix) - Fix a new undefined high bit shift (regression fix) - Fix for AGF vs inode cluster buffer deadlock (regression fix)" * tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: collect errors from inodegc for unlinked inode recovery xfs: validate block number being freed before adding to xefi xfs: validity check agbnos on the AGFL xfs: fix agf/agfl verification on v4 filesystems xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag() xfs: fix broken logic when detecting mergeable bmap records xfs: Fix undefined behavior of shift into sign bit xfs: fix AGF vs inode cluster buffer deadlock xfs: defered work could create precommits xfs: restore allocation trylock iteration xfs: buffer pins need to hold a buffer reference
2023-06-08Merge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages'Paolo Abeni11-445/+451
David Howells says: ==================== crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES) Here are patches to make AF_ALG handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. The sendpage functions are then turned into wrappers around that. This set consists of the following parts: (1) Move netfs_extract_iter_to_sg() to somewhere more general and rename it to drop the "netfs" prefix. We use this to extract directly from an iterator into a scatterlist. (2) Make AF_ALG use iov_iter_extract_pages(). This has the additional effect of pinning pages obtained from userspace rather than taking refs on them. Pages from kernel-backed iterators would not be pinned, but AF_ALG isn't really meant for use by kernel services. (3) Change AF_ALG still further to use extract_iter_to_sg(). (4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make af_alg_sendpage() just a wrapper around sendmsg(). This has to take refs on the pages pinned for the moment. (5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it. hash_sendpage() is left untouched to be removed later, after the splice core has been changed to call sendmsg(). ==================== Link: https://lore.kernel.org/r/20230606130856.1970660-1-dhowells@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08crypto: af_alg/hash: Support MSG_SPLICE_PAGESDavid Howells2-41/+70
Make AF_ALG sendmsg() support MSG_SPLICE_PAGES in the hashing code. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGESDavid Howells1-44/+8
Convert af_alg_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08crypto: af_alg: Support MSG_SPLICE_PAGESDavid Howells3-17/+37
Make AF_ALG sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08crypto: af_alg: Indent the loop in af_alg_sendmsg()David Howells1-24/+27
Put the loop in af_alg_sendmsg() into an if-statement to indent it to make the next patch easier to review as that will add another branch to handle MSG_SPLICE_PAGES to the if-statement. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08crypto: af_alg: Use extract_iter_to_sg() to create scatterlistsDavid Howells5-59/+40
Use extract_iter_to_sg() to decant the destination iterator into a scatterlist in af_alg_get_rsgl(). af_alg_make_sg() can then be removed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>