summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-16geneve: add dsfield helper functionBeniamino Galvani1-11/+18
Add a helper function to compute the tos/dsfield. In this way, we can factor out some duplicate code. Also, the helper will be called from more places in the next commit. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: use tunnel flow flags for tunnel route lookupsBeniamino Galvani1-0/+1
Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a new field to struct ip_tunnel_key to control route lookups. Currently the flag is used by vxlan and geneve tunnels; use it also in udp_tunnel_dst_lookup() so that it affects all tunnel types relying on this function. Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: add new arguments to udp_tunnel_dst_lookup()Beniamino Galvani3-20/+25
We want to make the function more generic so that it can be used by other UDP tunnel implementations such as geneve and vxlan. To do that, add the following arguments: - source and destination UDP port; - ifindex of the output interface, needed by vxlan; - the tos, because in some cases it is not taken from struct ip_tunnel_info (for example, when it's inherited from the inner packet); - the dst cache, because not all tunnel types (e.g. vxlan) want to use the one from struct ip_tunnel_info. With these parameters, the function no longer needs the full struct ip_tunnel_info as argument and we can pass only the relevant part of it (struct ip_tunnel_key). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: remove "proto" argument from udp_tunnel_dst_lookup()Beniamino Galvani3-5/+5
The function is now UDP-specific, the protocol is always IPPROTO_UDP. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16ipv4: rename and move ip_route_output_tunnel()Beniamino Galvani5-58/+58
At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel_dst_lookup() and move it to file net/ipv4/udp_tunnel_core.c. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16selftests: net: remove unused variableszhujun23-5/+3
These variables are never referenced in the code, just remove them Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16net: cxgb3: simplify logic for rspq_check_napiChristian Marangi1-6/+1
Simplify logic for rspq_check_napi. Drop redundant and wrong napi_is_scheduled call as it's not race free and directly use the output of napi_schedule to understand if a napi is pending or not. rspq_check_napi main logic is to check if is_new_response is true and check if a napi is not scheduled. The result of this function is then used to detect if we are missing some interrupt and act on top of this... With this knowing, we can rework and simplify the logic and make it less problematic with testing an internal bit for napi. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15Merge branch 'ptp-multiple-readers'David S. Miller9-60/+261
Xabier Marquiegui says: ==================== ptp: Support for multiple filtered timestamp event queue readers On systems with multiple timestamp event channels, there can be scenarios where multiple userspace readers want to access the timestamping data for various purposes. One such example is wanting to use a pps out for time synchronization, and wanting to timestamp external events with the synchronized time base simultaneously. Timestmp event consumers on the other hand, are often interested in a subset of the available timestamp channels. linuxptp ts2phc, for example, is not happy if more than one timestamping channel is active on the device it is reading from. Linked lists are introduced to support multiple timestamp event queue consumers, and timestamp event channel filters through IOCTLs, as well as a debugfs interface to do some simple verifications. Xabier Marquiegui (6): posix-clock: introduce posix_clock_context concept ptp: Replace timestamp event queue with linked list ptp: support multiple timestamp event readers ptp: support event queue reader channel masks ptp: add debugfs interface to see applied channel masks ptp: add testptp mask test drivers/ptp/ptp_chardev.c | 129 ++++++++++++++++---- drivers/ptp/ptp_clock.c | 45 ++++++- drivers/ptp/ptp_private.h | 28 +++-- drivers/ptp/ptp_sysfs.c | 13 +- include/linux/posix-clock.h | 35 ++++-- include/uapi/linux/ptp_clock.h | 2 + kernel/time/posix-clock.c | 36 ++++-- tools/testing/selftests/ptp/ptpchmaskfmt.sh | 14 +++ tools/testing/selftests/ptp/testptp.c | 19 ++- 9 files changed, 261 insertions(+), 60 deletions(-) create mode 100644 tools/testing/selftests/ptp/ptpchmaskfmt.sh --- v6: - correct commit message - correct coding style v5: https://lore.kernel.org/netdev/cover.1696804243.git.reibax@gmail.com/ - fix spelling on commit message - fix memory leak on ptp_open v4: https://lore.kernel.org/netdev/cover.1696511486.git.reibax@gmail.com/ - split modifications in different patches for improved organization - rename posix_clock_user to posix_clock_context - remove unnecessary flush_users clock operation - remove unnecessary tests - simpler queue clean procedure - fix/clean comment lines - simplified release procedures - filter modifications exclusive to currently open instance for simplicity and security - expand mask to 2048 channels - make more secure and simple: mask is only applied to the testptp instance. Use debugfs to verify effects. v3: https://lore.kernel.org/netdev/20230928133544.3642650-1-reibax@gmail.com/ - add this patchset overview file - fix use of safe and non safe linked lists for loops - introduce new posix_clock private_data and ida object ids for better dicrimination of timestamp consumers - safer resource release procedures - filter application by object id, aided by process id - friendlier testptp implementation of event queue channel filters v2: https://lore.kernel.org/netdev/20230912220217.2008895-1-reibax@gmail.com/ - fix ptp_poll() return value - Style changes to comform to checkpatch strict suggestions - more coherent ptp_read error exit routines - fix testptp compilation error: unknown type name 'pid_t' - rename mask variable for easier code traceability - more detailed commit message with two examples v1: https://lore.kernel.org/netdev/20230906104754.1324412-2-reibax@gmail.com/ ==================== Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ptp: add testptp mask testXabier Marquiegui1-1/+18
Add option to test timestamp event queue mask manipulation in testptp. Option -F allows the user to specify a single channel that will be applied on the mask filter via IOCTL. The test program will maintain the file open until user input is received. This allows checking the effect of the IOCTL in debugfs. eg: Console 1: ``` Channel 12 exclusively enabled. Check on debugfs. Press any key to continue ``` Console 2: ``` 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 ``` Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ptp: add debugfs interface to see applied channel masksXabier Marquiegui4-0/+39
Use debugfs to be able to view channel mask applied to every timestamp event queue. Every time the device is opened, a new entry is created in `$DEBUGFS_MOUNTPOINT/ptpN/$INSTANCE_ADDRESS/mask`. The mask value can be viewed grouped in 32bit decimal values using cat, or converted to hexadecimal with the included `ptpchmaskfmt.sh` script. 32 bit values are listed from least significant to most significant. Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ptp: support event queue reader channel masksXabier Marquiegui4-2/+41
On systems with multiple timestamp event channels, some readers might want to receive only a subset of those channels. Add the necessary modifications to support timestamp event channel filtering, including two IOCTL operations: - Clear all channels - Enable one channel The mask modification operations will be applied exclusively on the event queue assigned to the file descriptor used on the IOCTL operation, so the typical procedure to have a reader receiving only a subset of the enabled channels would be: - Open device file - ioctl: clear all channels - ioctl: enable one channel - start reading Calling the enable one channel ioctl more than once will result in multiple enabled channels. Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ptp: support multiple timestamp event readersXabier Marquiegui4-29/+55
Use linked lists to create one event queue per open file. This enables simultaneous readers for timestamp event queues. Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ptp: Replace timestamp event queue with linked listXabier Marquiegui4-6/+42
Introduce linked lists to access the timestamp event queue. Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15posix-clock: introduce posix_clock_context conceptXabier Marquiegui4-32/+76
Add the necessary structure to support custom private-data per posix-clock user. The previous implementation of posix-clock assumed all file open instances need access to the same clock structure on private_data. The need for individual data structures per file open instance has been identified when developing support for multiple timestamp event queue users for ptp_clock. Signed-off-by: Xabier Marquiegui <reibax@gmail.com> Suggested-by: Richard Cochran <richardcochran@gmail.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15Merge branch 'dpll-phase-offset-phase-adjust'David S. Miller9-18/+517
Arkadiusz Kubalewski says: ==================== dpll: add phase-offset and phase-adjust Improve monitoring and control over dpll devices. Allow user to receive measurement of phase difference between signals on pin and dpll (phase-offset). Allow user to receive and control adjustable value of pin's signal phase (phase-adjust). v4->v5: - rebase series on top of net-next/main, fix conflict - remove redundant attribute type definition in subset definition v3->v4: - do not increase do version of uAPI header as it is not needed (v3 did not have this change) - fix spelling around commit messages, argument descriptions and docs - add missing extack errors on failure set callbacks for pin phase adjust and frequency - remove ice check if value is already set, now redundant as checked in the dpll subsystem v2->v3: - do not increase do version of uAPI header as it is not needed v1->v2: - improve handling for error case of requesting the phase adjust set - align handling for error case of frequency set request with the approach introduced for phase adjust ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15dpll: netlink/core: change pin frequency set behaviorArkadiusz Kubalewski1-8/+42
Align the approach of pin frequency set behavior with the approach introduced with pin phase adjust set. Fail the request if any of devices did not registered the callback ops. If callback op on any pin's registered device fails, return error and rollback the value to previous one. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15ice: dpll: implement phase related callbacksArkadiusz Kubalewski2-4/+226
Implement new callback ops related to measurement and adjustment of signal phase for pin-dpll in ice driver. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15dpll: netlink/core: add support for pin-dpll signal phase offset/adjustArkadiusz Kubalewski2-1/+155
Add callback ops for pin-dpll phase measurement. Add callback for pin signal phase adjustment. Add min and max phase adjustment values to pin proprties. Invoke callbacks in dpll_netlink.c when filling the pin details to provide user with phase related attribute values. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15dpll: spec: add support for pin-dpll signal phase offset/adjustArkadiusz Kubalewski4-4/+42
Add attributes for providing the user with: - measurement of signals phase offset between pin and dpll - ability to adjust the phase of pin signal Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15dpll: docs: add support for pin signal phase offset/adjustArkadiusz Kubalewski1-1/+52
Add documentation on: - measurement of phase of signal between pin and dpll - adjustment of pin signal phase Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15Merge branch 'i40e-devlink'David S. Miller10-76/+414
Ivan Vecera says: ==================== i40e: Add basic devlink support The series adds initial support for devlink to i40e driver. Patch-set overview: Patch 1: Adds initial devlink support (devlink and port registration) Patch 2: Refactors and split i40e_nvm_version_str() Patch 3: Adds support for 'devlink dev info' Patch 4: Refactors existing helper function to read PBA ID Patch 5: Adds 'board.id' to 'devlink dev info' using PBA ID ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15i40e: Add PBA as board id info to devlink .info_getIvan Vecera1-0/+16
Expose stored PBA ID string as unique board identifier via devlink's .info_get command. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15i40e: Refactor and rename i40e_read_pba_string()Ivan Vecera4-26/+39
Function i40e_read_pba_string() is currently unused but will be used by subsequent patch to provide board ID via devlink device info. The function reads PBA block from NVM so it cannot be called during adapter reset and as we would like to provide PBA ID via devlink info it is better to read the PBA ID during i40e_probe() and cache it in i40e_hw structure to avoid a waiting for potential adapter reset in devlink info callback. So... - Remove pba_num and pba_num_size arguments from the function, allocate resource managed buffer to store PBA ID string and save resulting pointer to i40e_hw->pba_id field - Make the function void as the PBA ID can be missing and in this case (or in case of NVM reading failure) the i40e_hw->pba_id will be NULL - Rename the function to i40e_get_pba_string() to align with other functions like i40e_get_oem_version() i40e_get_port_mac_addr()... - Call this function on init during i40e_probe() Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15i40e: Add handler for devlink .info_getIvan Vecera1-0/+90
Provide devlink .info_get callback to allow the driver to report detailed version information. The following info is reported: "serial_number" -> The PCI DSN of the adapter "fw.mgmt" -> The version of the firmware "fw.mgmt.api" -> The API version of interface exposed over the AdminQ "fw.psid" -> The version of the NVM image "fw.bundle_id" -> Unique identifier for the combined flash image "fw.undi" -> The combo image version With this, 'devlink dev info' provides at least the same amount information as is reported by ETHTOOL_GDRVINFO: $ ethtool -i enp2s0f0 | egrep '(driver|firmware)' driver: i40e firmware-version: 9.30 0x8000e5f3 1.3429.0 $ devlink dev info pci/0000:02:00.0 pci/0000:02:00.0: driver i40e serial_number c0-de-b7-ff-ff-ef-ec-3c versions: running: fw.mgmt 9.130.73618 fw.mgmt.api 1.15 fw.psid 9.30 fw.bundle_id 0x8000e5f3 fw.undi 1.3429.0 Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15i40e: Split and refactor i40e_nvm_version_str()Ivan Vecera3-44/+105
The function formats NVM version string according adapter's EETrackID value. If this value OEM specific (0xffffffff) then the reported version is with format: "<gen>.<snap>.<release>" and in other case "<nvm_maj>.<nvm_min> <eetrackid> <cvid_maj>.<cvid_bld>.<cvid_min>" These versions are reported in the subsequent patch in this series that implements devlink .info_get but separately. So split the function into separate ones, refactor it to use them and remove ugly static string buffer. Additionally convert NVM/OEM version mask macros to use GENMASK and use FIELD_GET/FIELD_PREP for them in i40e_nvm_version_str() and i40e_get_oem_version(). This makes code more readable and allows us to remove related shift macros. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15i40e: Add initial devlink supportIvan Vecera6-6/+164
Add an initial support for devlink interface to i40e driver. Similarly to ice driver the implementation doe not enable devlink to manage device-wide configuration and devlink instance is created for each physical function of PCIe device. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15tg3: Improve PTP TX timestamping logicPavan Chebbi2-16/+68
When we are trying to timestamp a TX packet, there may be occasions when the TX timestamp register is still not updated with the latest timestamp even if the timestamp packet descriptor is marked as complete. This usually happens in cases where the system is under stress or flow control is affecting the transmit side. We will solve this problem by saving the snapshot of the timestamp register when we are posting the TX descriptor. At this time, the register contains previously timestamped packet's value and valid timestamp of the current packet must be different than this. Upon completion of the current descriptor, we will check if the timestamp register is updated or not before timestamping the skb. If not updated, we will schedule the ptp worker to fetch the updated time later and timestamp the skb. Also now we restrict number of outstanding PTP TX packet requests to 1. Reported-by: Simon White <Simon.White@viavisolutions.com> Link: https://lore.kernel.org/netdev/CACKFLikGdN9XPtWk-fdrzxdcD=+bv-GHBvfVfSpJzHY7hrW39g@mail.gmail.com/ Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15docs: try to encourage (netdev?) reviewersJakub Kicinski2-0/+33
Add a section to netdev maintainer doc encouraging reviewers to chime in on the mailing list. The questions about "when is it okay to share feedback" keep coming up (most recently at netconf) and the answer is "pretty much always". Extend the section of 7.AdvancedTopics.rst which deals with reviews a little bit to add stuff we had been recommending locally. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15Merge branch 'sfc-conntrack-offload'David S. Miller4-3/+101
Edward Cree says: ==================== sfc: support conntrack NAT offload The EF100 MAE supports performing NAT (and NPT) on packets which match in the conntrack table. This series adds that capability to the driver. ==================== Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15sfc: support offloading ct(nat) action in RHS rulesEdward Cree3-1/+12
If an IP address and/or L4 port for NAPT is available from a CT match, the MAE will perform the edits; if no CT lookup has been performed for this packet, the CT lookup did not return a match, or the matched CT entry did not include NAPT, the action will have no effect. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15sfc: parse mangle actions (NAT) in conntrack entriesEdward Cree1-2/+89
The MAE can edit either address, L4 port, or both, for either source or destination. These can't be mixed; i.e. it can edit source addr and source port, but not (say) source addr and dest port. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15Merge branch 'vsock-virtio-vhost-zerocopy'David S. Miller19-17/+1170
Arseniy Krasnov says: ==================== vsock/virtio/vhost: MSG_ZEROCOPY preparations this patchset is first of three parts of another big patchset for MSG_ZEROCOPY flag support: https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/ During review of this series, Stefano Garzarella <sgarzare@redhat.com> suggested to split it for three parts to simplify review and merging: 1) virtio and vhost updates (for fragged skbs) <--- this patchset 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read tx completions) and update for Documentation/. 3) Updates for tests and utils. This series enables handling of fragged skbs in virtio and vhost parts. Newly logic won't be triggered, because SO_ZEROCOPY options is still impossible to enable at this moment (next bunch of patches from big set above will enable it). I've included changelog to some patches anyway, because there were some comments during review of last big patchset from the link above. Head for this patchset is: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2fa1c812c91e99d0317d1fc7d845e1e05f39716 Link to v1: https://lore.kernel.org/netdev/20230717210051.856388-1-AVKrasnov@sberdevices.ru/ Link to v2: https://lore.kernel.org/netdev/20230718180237.3248179-1-AVKrasnov@sberdevices.ru/ Link to v3: https://lore.kernel.org/netdev/20230720214245.457298-1-AVKrasnov@sberdevices.ru/ Link to v4: https://lore.kernel.org/netdev/20230727222627.1895355-1-AVKrasnov@sberdevices.ru/ Link to v5: https://lore.kernel.org/netdev/20230730085905.3420811-1-AVKrasnov@sberdevices.ru/ Link to v6: https://lore.kernel.org/netdev/20230814212720.3679058-1-AVKrasnov@sberdevices.ru/ Link to v7: https://lore.kernel.org/netdev/20230827085436.941183-1-avkrasnov@salutedevices.com/ Link to v8: https://lore.kernel.org/netdev/20230911202234.1932024-1-avkrasnov@salutedevices.com/ Changelog: v3 -> v4: * Patchset rebased and tested on new HEAD of net-next (see hash above). v4 -> v5: * See per-patch changelog after ---. v5 -> v6: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v6 -> v7: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v7 -> v8: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. v8 -> v9: * Patchset rebased and tested on new HEAD of net-next (see hash above). * See per-patch changelog after ---. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15test/vsock: io_uring rx/tx testsArseniy Krasnov3-2/+348
This adds set of tests which use io_uring for rx/tx. This test suite is implemented as separated util like 'vsock_test' and has the same set of input arguments as 'vsock_test'. These tests only cover cases of data transmission (no connect/bind/accept etc). Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15test/vsock: MSG_ZEROCOPY support for vsock_perfArseniy Krasnov2-10/+72
To use this option pass '--zerocopy' parameter: ./vsock_perf --zerocopy --sender <cid> ... With this option MSG_ZEROCOPY flag will be passed to the 'send()' call. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15test/vsock: MSG_ZEROCOPY flag testsArseniy Krasnov8-1/+633
This adds three tests for MSG_ZEROCOPY feature: 1) SOCK_STREAM tx with different buffers. 2) SOCK_SEQPACKET tx with different buffers. 3) SOCK_STREAM test to read empty error queue of the socket. Patch also works as preparation for the next patches for tools in this patchset: vsock_perf and vsock_uring_test: 1) Adds several new functions to util.c - they will be also used by vsock_uring_test. 2) Adds two new functions for MSG_ZEROCOPY handling to a new source file - such source will be shared between vsock_test, vsock_perf and vsock_uring_test, thus avoiding code copy-pasting. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15docs: net: description of MSG_ZEROCOPY for AF_VSOCKArseniy Krasnov1-2/+11
This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of socket. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: enable setting SO_ZEROCOPYArseniy Krasnov1-2/+43
For AF_VSOCK, zerocopy tx mode depends on transport, so this option must be set in AF_VSOCK implementation where transport is accessible (if transport is not set during setting SO_ZEROCOPY: for example socket is not connected, then SO_ZEROCOPY will be enabled, but once transport will be assigned, support of this type of transmission will be checked). To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT bit, thus handling SOL_SOCKET option operations, but all of them except SO_ZEROCOPY will be forwarded to the generic handler by calling 'sock_setsockopt()'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock/loopback: support MSG_ZEROCOPY for transportArseniy Krasnov1-0/+6
Add 'msgzerocopy_allow()' callback for loopback transport. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock/virtio: support MSG_ZEROCOPY for transportArseniy Krasnov1-0/+7
Add 'msgzerocopy_allow()' callback for virtio transport. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vhost/vsock: support MSG_ZEROCOPY for transportArseniy Krasnov1-0/+7
Add 'msgzerocopy_allow()' callback for vhost transport. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: enable SOCK_SUPPORT_ZC bitArseniy Krasnov1-0/+6
This bit is used by io_uring in case of zerocopy tx mode. io_uring code checks, that socket has this feature. This patch sets it in two places: 1) For socket in 'connect()' call. 2) For new socket which is returned by 'accept()' call. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: check for MSG_ZEROCOPY support on sendArseniy Krasnov2-0/+13
This feature totally depends on transport, so if transport doesn't support it, return error. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: read from socket's error queueArseniy Krasnov3-0/+24
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: set EPOLLERR on non-empty error queueArseniy Krasnov1-1/+1
If socket's error queue is not empty, EPOLLERR must be set. Otherwise, reader of error queue won't detect data in it using EPOLLERR bit. Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this feature is the only user of an error queue of the socket. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-14appletalk: remove special handling code for ipddpLukas Bulwahn1-36/+0
After commit 1dab47139e61 ("appletalk: remove ipddp driver") removes the config IPDDP, there is some minor code clean-up possible in the appletalk network layer. Remove some code in appletalk layer after the ipddp driver is gone. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231012063443.22368-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14qed: replace uses of strncpyJustin Stitt1-4/+3
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. This patch eliminates three uses of strncpy(): Firstly, `dest` is expected to be NUL-terminated which is evident by the manual setting of a NUL-byte at size - 1. For this use specifically, strscpy() is a viable replacement due to the fact that it guarantees NUL-termination on the destination buffer. The next two cases should simply be memcpy() as the size of the src string is always 3 and the destination string just wants the first 3 bytes changed. To be clear, there are no buffer overread bugs in the current code as the sizes and offsets are carefully managed such that buffers are NUL-terminated. However, with these changes, the code is now more robust and less ambiguous (and hopefully easier to read). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-ethernet-qlogic-qed-qed_debug-c-v2-1-16d2c0162b80@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14r8169: fix rare issue with broken rx after link-down on RTL8125Heiner Kallweit1-0/+4
In very rare cases (I've seen two reports so far about different RTL8125 chip versions) it seems the MAC locks up when link goes down and requires a software reset to get revived. Realtek doesn't publish hw errata information, therefore the root cause is unknown. Realtek vendor drivers do a full hw re-initialization on each link-up event, the slimmed-down variant here was reported to fix the issue for the reporting user. It's not fully clear which parts of the NIC are reset as part of the software reset, therefore I can't rule out side effects. Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Reported-by: Martin Kjær Jørgensen <me@lagy.org> Link: https://lore.kernel.org/netdev/97ec2232-3257-316c-c3e7-a08192ce16a6@gmail.com/T/ Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9edde757-9c3b-4730-be3b-0ef3a374ff71@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14Merge branch 'net-netconsole-configfs-entries-for-boot-target'Jakub Kicinski2-56/+121
Breno Leitao says: ==================== net: netconsole: configfs entries for boot target There is a limitation in netconsole, where it is impossible to disable or modify the target created from the command line parameter. (netconsole=...). "netconsole" cmdline parameter sets the remote IP, and if the remote IP changes, the machine needs to be rebooted (with the new remote IP set in the command line parameter). This allows the user to modify a target without the need to restart the machine. This functionality sits on top of the dynamic target reconfiguration that is already implemented in netconsole. The way to modify a boot time target is creating special named configfs directories, that will be associated with the targets coming from `netconsole=...`. Example: Let's suppose you have two netconsole targets defined at boot time:: netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc;4444@10.0.0.1/eth1,9353@10.0.0.3/12:34:56:78:9a:bc You can modify these targets in runtime by creating the following targets:: $ mkdir cmdline1 $ cat cmdline1/remote_ip 10.0.0.3 $ echo 0 > cmdline1/enabled $ echo 10.0.0.4 > cmdline1/remote_ip $ echo 1 > cmdline1/enabled ==================== Link: https://lore.kernel.org/r/20231012111401.333798-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14Documentation: netconsole: add support for cmdline targetsBreno Leitao1-3/+19
With the previous patches, there is no more limitation at modifying the targets created at boot time (or module load time). Document the way on how to create the configfs directories to be able to modify these netconsole targets. The design discussion about this topic could be found at: https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/ Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-14netconsole: Attach cmdline target to dynamic targetBreno Leitao1-0/+28
Enable the attachment of a dynamic target to the target created during boot time. The boot-time targets are named as "cmdline\d", where "\d" is a number starting at 0. If the user creates a dynamic target named "cmdline0", it will attach to the first target created at boot time (as defined in the `netconsole=...` command line argument). `cmdline1` will attach to the second target and so forth. If there is no netconsole target created at boot time, then, the target name could be reused. Relevant design discussion: https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/ Suggested-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-4-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>