summaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-03-13 03:44:08 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2024-03-13 03:44:08 +0300
commit9187210eee7d87eea37b45ea93454a88681894a4 (patch)
tree31b4610e62cdd5e1dfb700014aa619e41145d7d3 /Documentation/networking
parent1f440397665f4241346e4cc6d93f8b73880815d1 (diff)
parented1f164038b50c5864aa85389f3ffd456f050cca (diff)
downloadlinux-9187210eee7d87eea37b45ea93454a88681894a4.tar.xz
Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/af_xdp.rst33
-rw-r--r--Documentation/networking/bonding.rst12
-rw-r--r--Documentation/networking/can.rst34
-rw-r--r--Documentation/networking/device_drivers/ethernet/amazon/ena.rst6
-rw-r--r--Documentation/networking/device_drivers/ethernet/index.rst1
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ice.rst21
-rw-r--r--Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst24
-rw-r--r--Documentation/networking/device_drivers/wwan/t7xx.rst46
-rw-r--r--Documentation/networking/devlink/mlx5.rst9
-rw-r--r--Documentation/networking/index.rst1
-rw-r--r--Documentation/networking/ip-sysctl.rst14
-rw-r--r--Documentation/networking/l2tp.rst135
-rw-r--r--Documentation/networking/multi-pf-netdev.rst174
-rw-r--r--Documentation/networking/netconsole.rst66
-rw-r--r--Documentation/networking/netdevices.rst4
-rw-r--r--Documentation/networking/sfp-phylink.rst147
-rw-r--r--Documentation/networking/statistics.rst15
-rw-r--r--Documentation/networking/xfrm_device.rst4
18 files changed, 684 insertions, 62 deletions
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index dceeb0d763aa..72da7057e4cf 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -329,23 +329,24 @@ XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field as you registered the UMEM on that
socket. These two sockets will now share one and the same UMEM.
-There is no need to supply an XDP program like the one in the previous
-case where sockets were bound to the same queue id and
-device. Instead, use the NIC's packet steering capabilities to steer
-the packets to the right queue. In the previous example, there is only
-one queue shared among sockets, so the NIC cannot do this steering. It
-can only steer between queues.
-
-In libbpf, you need to use the xsk_socket__create_shared() API as it
-takes a reference to a FILL ring and a COMPLETION ring that will be
-created for you and bound to the shared UMEM. You can use this
-function for all the sockets you create, or you can use it for the
-second and following ones and use xsk_socket__create() for the first
-one. Both methods yield the same result.
+In this case, it is possible to use the NIC's packet steering
+capabilities to steer the packets to the right queue. This is not
+possible in the previous example as there is only one queue shared
+among sockets, so the NIC cannot do this steering as it can only steer
+between queues.
+
+In libxdp (or libbpf prior to version 1.0), you need to use the
+xsk_socket__create_shared() API as it takes a reference to a FILL ring
+and a COMPLETION ring that will be created for you and bound to the
+shared UMEM. You can use this function for all the sockets you create,
+or you can use it for the second and following ones and use
+xsk_socket__create() for the first one. Both methods yield the same
+result.
Note that a UMEM can be shared between sockets on the same queue id
and device, as well as between queues on the same device and between
-devices at the same time.
+devices at the same time. It is also possible to redirect to any
+socket as long as it is bound to the same umem with XDP_SHARED_UMEM.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
@@ -822,6 +823,10 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
+ Note that if you are using the XDP_SHARED_UMEM option, it is
+ possible to switch traffic between any socket bound to the same
+ umem.
+
Q: My packets are sometimes corrupted. What is wrong?
A: Care has to be taken not to feed the same buffer in the UMEM into
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index f7a73421eb76..e774b48de9f5 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -444,6 +444,18 @@ arp_missed_max
The default value is 2, and the allowable range is 1 - 255.
+coupled_control
+
+ Specifies whether the LACP state machine's MUX in the 802.3ad mode
+ should have separate Collecting and Distributing states.
+
+ This is by implementing the independent control state machine per
+ IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control
+ state machine.
+
+ The default value is 1. This setting does not separate the Collecting
+ and Distributing states, maintaining the bond in coupled control.
+
downdelay
Specifies the time, in milliseconds, to wait before disabling
diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst
index d7e1ada905b2..62519d38c58b 100644
--- a/Documentation/networking/can.rst
+++ b/Documentation/networking/can.rst
@@ -444,6 +444,24 @@ definitions are specified for CAN specific MTUs in include/linux/can.h:
#define CANFD_MTU (sizeof(struct canfd_frame)) == 72 => CAN FD frame
+Returned Message Flags
+----------------------
+
+When using the system call recvmsg(2) on a RAW or a BCM socket, the
+msg->msg_flags field may contain the following flags:
+
+MSG_DONTROUTE:
+ set when the received frame was created on the local host.
+
+MSG_CONFIRM:
+ set when the frame was sent via the socket it is received on.
+ This flag can be interpreted as a 'transmission confirmation' when the
+ CAN driver supports the echo of frames on driver level, see
+ :ref:`socketcan-local-loopback1` and :ref:`socketcan-local-loopback2`.
+ (Note: In order to receive such messages on a RAW socket,
+ CAN_RAW_RECV_OWN_MSGS must be set.)
+
+
.. _socketcan-raw-sockets:
RAW Protocol Sockets with can_filters (SOCK_RAW)
@@ -693,22 +711,6 @@ where the CAN_INV_FILTER flag is set in order to notch single CAN IDs or
CAN ID ranges from the incoming traffic.
-RAW Socket Returned Message Flags
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When using recvmsg() call, the msg->msg_flags may contain following flags:
-
-MSG_DONTROUTE:
- set when the received frame was created on the local host.
-
-MSG_CONFIRM:
- set when the frame was sent via the socket it is received on.
- This flag can be interpreted as a 'transmission confirmation' when the
- CAN driver supports the echo of frames on driver level, see
- :ref:`socketcan-local-loopback1` and :ref:`socketcan-local-loopback2`.
- In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
-
-
Broadcast Manager Protocol Sockets (SOCK_DGRAM)
-----------------------------------------------
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
index b842bcb14255..a4c7d0c65fd7 100644
--- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
+++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
@@ -211,10 +211,16 @@ Documentation/networking/net_dim.rst
RX copybreak
============
+
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
+This option controls the maximum packet length for which the RX
+descriptor it was received on would be recycled. When a packet smaller
+than RX copybreak bytes is received, it is copied into a new memory
+buffer and the RX descriptor is returned to HW.
+
Statistics
==========
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 43de285b8a92..6932d8c043c2 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -42,6 +42,7 @@ Contents:
intel/ice
marvell/octeontx2
marvell/octeon_ep
+ marvell/octeon_ep_vf
mellanox/mlx5/index
microsoft/netvsc
neterion/s2io
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
index 5038e54586af..934752f675ba 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
@@ -368,15 +368,28 @@ more options for Receive Side Scaling (RSS) hash byte configuration.
# ethtool -N <ethX> rx-flow-hash <type> <option>
Where <type> is:
- tcp4 signifying TCP over IPv4
- udp4 signifying UDP over IPv4
- tcp6 signifying TCP over IPv6
- udp6 signifying UDP over IPv6
+ tcp4 signifying TCP over IPv4
+ udp4 signifying UDP over IPv4
+ gtpc4 signifying GTP-C over IPv4
+ gtpc4t signifying GTP-C (include TEID) over IPv4
+ gtpu4 signifying GTP-U over IPV4
+ gtpu4e signifying GTP-U and Extension Header over IPV4
+ gtpu4u signifying GTP-U PSC Uplink over IPV4
+ gtpu4d signifying GTP-U PSC Downlink over IPV4
+ tcp6 signifying TCP over IPv6
+ udp6 signifying UDP over IPv6
+ gtpc6 signifying GTP-C over IPv6
+ gtpc6t signifying GTP-C (include TEID) over IPv6
+ gtpu6 signifying GTP-U over IPV6
+ gtpu6e signifying GTP-U and Extension Header over IPV6
+ gtpu6u signifying GTP-U PSC Uplink over IPV6
+ gtpu6d signifying GTP-U PSC Downlink over IPV6
And <option> is one or more of:
s Hash on the IP source address of the Rx packet.
d Hash on the IP destination address of the Rx packet.
f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
+ e Hash on GTP Packet on TEID (4bytes) of the Rx packet.
Accelerated Receive Flow Steering (aRFS)
diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst
new file mode 100644
index 000000000000..603133d0b92f
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=======================================================================
+Linux kernel networking driver for Marvell's Octeon PCI Endpoint NIC VF
+=======================================================================
+
+Network driver for Marvell's Octeon PCI EndPoint NIC VF.
+Copyright (c) 2020 Marvell International Ltd.
+
+Overview
+========
+This driver implements networking functionality of Marvell's Octeon PCI
+EndPoint NIC VF.
+
+Supported Devices
+=================
+Currently, this driver support following devices:
+ * Network controller: Cavium, Inc. Device b203
+ * Network controller: Cavium, Inc. Device b403
+ * Network controller: Cavium, Inc. Device b103
+ * Network controller: Cavium, Inc. Device b903
+ * Network controller: Cavium, Inc. Device ba03
+ * Network controller: Cavium, Inc. Device bc03
+ * Network controller: Cavium, Inc. Device bd03
diff --git a/Documentation/networking/device_drivers/wwan/t7xx.rst b/Documentation/networking/device_drivers/wwan/t7xx.rst
index dd5b731957ca..f346f5f85f15 100644
--- a/Documentation/networking/device_drivers/wwan/t7xx.rst
+++ b/Documentation/networking/device_drivers/wwan/t7xx.rst
@@ -39,6 +39,34 @@ command and receive response:
- open the AT control channel using a UART tool or a special user tool
+Sysfs
+=====
+The driver provides sysfs interfaces to userspace.
+
+t7xx_mode
+---------
+The sysfs interface provides userspace with access to the device mode, this interface
+supports read and write operations.
+
+Device mode:
+
+- ``unknown`` represents that device in unknown status
+- ``ready`` represents that device in ready status
+- ``reset`` represents that device in reset status
+- ``fastboot_switching`` represents that device in fastboot switching status
+- ``fastboot_download`` represents that device in fastboot download status
+- ``fastboot_dump`` represents that device in fastboot dump status
+
+Read from userspace to get the current device mode.
+
+::
+ $ cat /sys/bus/pci/devices/${bdf}/t7xx_mode
+
+Write from userspace to set the device mode.
+
+::
+ $ echo fastboot_switching > /sys/bus/pci/devices/${bdf}/t7xx_mode
+
Management application development
==================================
The driver and userspace interfaces are described below. The MBIM protocol is
@@ -97,6 +125,20 @@ The driver exposes an AT port by implementing AT WWAN Port.
The userspace end of the control port is a /dev/wwan0at0 character
device. Application shall use this interface to issue AT commands.
+fastboot port userspace ABI
+---------------------------
+
+/dev/wwan0fastboot0 character device
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The driver exposes a fastboot protocol interface by implementing
+fastboot WWAN Port. The userspace end of the fastboot channel pipe is a
+/dev/wwan0fastboot0 character device. Application shall use this interface for
+fastboot protocol communication.
+
+Please note that driver needs to be reloaded to export /dev/wwan0fastboot0
+port, because device needs a cold reset after enter ``fastboot_switching``
+mode.
+
The MediaTek's T700 modem supports the 3GPP TS 27.007 [4] specification.
References
@@ -118,3 +160,7 @@ speak the Mobile Interface Broadband Model (MBIM) protocol"*
[4] *Specification # 27.007 - 3GPP*
- https://www.3gpp.org/DynaReport/27007.htm
+
+[5] *fastboot "a mechanism for communicating with bootloaders"*
+
+- https://android.googlesource.com/platform/system/core/+/refs/heads/main/fastboot/README.md
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 702f204a3dbd..456985407475 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -97,6 +97,10 @@ parameters.
When metadata is disabled, the above use cases will fail to initialize if
users try to enable them.
+
+ Note: Setting this parameter does not take effect immediately. Setting
+ must happen in legacy mode and eswitch port metadata takes effect after
+ enabling switchdev mode.
* - ``hairpin_num_queues``
- u32
- driverinit
@@ -246,7 +250,7 @@ them in realtime.
Description of the vnic counters:
-- total_q_under_processor_handle
+- total_error_queues
number of queues in an error state due to
an async error or errored command.
- send_queue_priority_update_flow
@@ -255,7 +259,8 @@ Description of the vnic counters:
number of times CQ entered an error state due to an overflow.
- async_eq_overrun
number of times an EQ mapped to async events was overrun.
- comp_eq_overrun number of times an EQ mapped to completion events was
+- comp_eq_overrun
+ number of times an EQ mapped to completion events was
overrun.
- quota_exceeded_command
number of commands issued and failed due to quota exceeded.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 69f3d6dcd9fd..473d72c36d61 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -74,6 +74,7 @@ Contents:
mpls-sysctl
mptcp-sysctl
multiqueue
+ multi-pf-netdev
napi
net_cachelines/index
netconsole
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 7afff42612e9..bd50df6a5a42 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2503,7 +2503,7 @@ use_tempaddr - INTEGER
temp_valid_lft - INTEGER
valid lifetime (in seconds) for temporary addresses. If less than the
- minimum required lifetime (typically 5 seconds), temporary addresses
+ minimum required lifetime (typically 5-7 seconds), temporary addresses
will not be created.
Default: 172800 (2 days)
@@ -2511,7 +2511,7 @@ temp_valid_lft - INTEGER
temp_prefered_lft - INTEGER
Preferred lifetime (in seconds) for temporary addresses. If
temp_prefered_lft is less than the minimum required lifetime (typically
- 5 seconds), temporary addresses will not be created. If
+ 5-7 seconds), the preferred lifetime is the minimum required. If
temp_prefered_lft is greater than temp_valid_lft, the preferred lifetime
is temp_valid_lft.
@@ -2535,6 +2535,16 @@ max_desync_factor - INTEGER
Default: 600
+regen_min_advance - INTEGER
+ How far in advance (in seconds), at minimum, to create a new temporary
+ address before the current one is deprecated. This value is added to
+ the amount of time that may be required for duplicate address detection
+ to determine when to create a new address. Linux permits setting this
+ value to less than the default of 2 seconds, but a value less than 2
+ does not conform to RFC 8981.
+
+ Default: 2
+
regen_max_retry - INTEGER
Number of attempts before give up attempting to generate
valid temporary addresses.
diff --git a/Documentation/networking/l2tp.rst b/Documentation/networking/l2tp.rst
index 7f383e99dbad..8496b467dea4 100644
--- a/Documentation/networking/l2tp.rst
+++ b/Documentation/networking/l2tp.rst
@@ -386,12 +386,19 @@ Sample userspace code:
- Create session PPPoX data socket::
+ /* Input: the L2TP tunnel UDP socket `tunnel_fd`, which needs to be
+ * bound already (both sockname and peername), otherwise it will not be
+ * ready.
+ */
+
struct sockaddr_pppol2tp sax;
- int fd;
+ int session_fd;
+ int ret;
+
+ session_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
+ if (session_fd < 0)
+ return -errno;
- /* Note, the tunnel socket must be bound already, else it
- * will not be ready
- */
sax.sa_family = AF_PPPOX;
sax.sa_protocol = PX_PROTO_OL2TP;
sax.pppol2tp.fd = tunnel_fd;
@@ -406,12 +413,128 @@ Sample userspace code:
/* session_fd is the fd of the session's PPPoL2TP socket.
* tunnel_fd is the fd of the tunnel UDP / L2TPIP socket.
*/
- fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
- if (fd < 0 ) {
+ ret = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
+ if (ret < 0 ) {
+ close(session_fd);
+ return -errno;
+ }
+
+ return session_fd;
+
+L2TP control packets will still be available for read on `tunnel_fd`.
+
+ - Create PPP channel::
+
+ /* Input: the session PPPoX data socket `session_fd` which was created
+ * as described above.
+ */
+
+ int ppp_chan_fd;
+ int chindx;
+ int ret;
+
+ ret = ioctl(session_fd, PPPIOCGCHAN, &chindx);
+ if (ret < 0)
+ return -errno;
+
+ ppp_chan_fd = open("/dev/ppp", O_RDWR);
+ if (ppp_chan_fd < 0)
+ return -errno;
+
+ ret = ioctl(ppp_chan_fd, PPPIOCATTCHAN, &chindx);
+ if (ret < 0) {
+ close(ppp_chan_fd);
return -errno;
}
+
+ return ppp_chan_fd;
+
+LCP PPP frames will be available for read on `ppp_chan_fd`.
+
+ - Create PPP interface::
+
+ /* Input: the PPP channel `ppp_chan_fd` which was created as described
+ * above.
+ */
+
+ int ifunit = -1;
+ int ppp_if_fd;
+ int ret;
+
+ ppp_if_fd = open("/dev/ppp", O_RDWR);
+ if (ppp_if_fd < 0)
+ return -errno;
+
+ ret = ioctl(ppp_if_fd, PPPIOCNEWUNIT, &ifunit);
+ if (ret < 0) {
+ close(ppp_if_fd);
+ return -errno;
+ }
+
+ ret = ioctl(ppp_chan_fd, PPPIOCCONNECT, &ifunit);
+ if (ret < 0) {
+ close(ppp_if_fd);
+ return -errno;
+ }
+
+ return ppp_if_fd;
+
+IPCP/IPv6CP PPP frames will be available for read on `ppp_if_fd`.
+
+The ppp<ifunit> interface can then be configured as usual with netlink's
+RTM_NEWLINK, RTM_NEWADDR, RTM_NEWROUTE, or ioctl's SIOCSIFMTU, SIOCSIFADDR,
+SIOCSIFDSTADDR, SIOCSIFNETMASK, SIOCSIFFLAGS, or with the `ip` command.
+
+ - Bridging L2TP sessions which have PPP pseudowire types (this is also called
+ L2TP tunnel switching or L2TP multihop) is supported by bridging the PPP
+ channels of the two L2TP sessions to be bridged::
+
+ /* Input: the session PPPoX data sockets `session_fd1` and `session_fd2`
+ * which were created as described further above.
+ */
+
+ int ppp_chan_fd;
+ int chindx1;
+ int chindx2;
+ int ret;
+
+ ret = ioctl(session_fd1, PPPIOCGCHAN, &chindx1);
+ if (ret < 0)
+ return -errno;
+
+ ret = ioctl(session_fd2, PPPIOCGCHAN, &chindx2);
+ if (ret < 0)
+ return -errno;
+
+ ppp_chan_fd = open("/dev/ppp", O_RDWR);
+ if (ppp_chan_fd < 0)
+ return -errno;
+
+ ret = ioctl(ppp_chan_fd, PPPIOCATTCHAN, &chindx1);
+ if (ret < 0) {
+ close(ppp_chan_fd);
+ return -errno;
+ }
+
+ ret = ioctl(ppp_chan_fd, PPPIOCBRIDGECHAN, &chindx2);
+ close(ppp_chan_fd);
+ if (ret < 0)
+ return -errno;
+
return 0;
+It can be noted that when bridging PPP channels, the PPP session is not locally
+terminated, and no local PPP interface is created. PPP frames arriving on one
+channel are directly passed to the other channel, and vice versa.
+
+The PPP channel does not need to be kept open. Only the session PPPoX data
+sockets need to be kept open.
+
+More generally, it is also possible in the same way to e.g. bridge a PPPoL2TP
+PPP channel with other types of PPP channels, such as PPPoE.
+
+See more details for the PPP side in ppp_generic.rst.
+
Old L2TPv2-only API
-------------------
diff --git a/Documentation/networking/multi-pf-netdev.rst b/Documentation/networking/multi-pf-netdev.rst
new file mode 100644
index 000000000000..be8e4bcadf11
--- /dev/null
+++ b/Documentation/networking/multi-pf-netdev.rst
@@ -0,0 +1,174 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============
+Multi-PF Netdev
+===============
+
+Contents
+========
+
+- `Background`_
+- `Overview`_
+- `mlx5 implementation`_
+- `Channels distribution`_
+- `Observability`_
+- `Steering`_
+- `Mutually exclusive features`_
+
+Background
+==========
+
+The Multi-PF NIC technology enables several CPUs within a multi-socket server to connect directly to
+the network, each through its own dedicated PCIe interface. Through either a connection harness that
+splits the PCIe lanes between two cards or by bifurcating a PCIe slot for a single card. This
+results in eliminating the network traffic traversing over the internal bus between the sockets,
+significantly reducing overhead and latency, in addition to reducing CPU utilization and increasing
+network throughput.
+
+Overview
+========
+
+The feature adds support for combining multiple PFs of the same port in a Multi-PF environment under
+one netdev instance. It is implemented in the netdev layer. Lower-layer instances like pci func,
+sysfs entry, and devlink are kept separate.
+Passing traffic through different devices belonging to different NUMA sockets saves cross-NUMA
+traffic and allows apps running on the same netdev from different NUMAs to still feel a sense of
+proximity to the device and achieve improved performance.
+
+mlx5 implementation
+===================
+
+Multi-PF or Socket-direct in mlx5 is achieved by grouping PFs together which belong to the same
+NIC and has the socket-direct property enabled, once all PFs are probed, we create a single netdev
+to represent all of them, symmetrically, we destroy the netdev whenever any of the PFs is removed.
+
+The netdev network channels are distributed between all devices, a proper configuration would utilize
+the correct close NUMA node when working on a certain app/CPU.
+
+We pick one PF to be a primary (leader), and it fills a special role. The other devices
+(secondaries) are disconnected from the network at the chip level (set to silent mode). In silent
+mode, no south <-> north traffic flowing directly through a secondary PF. It needs the assistance of
+the leader PF (east <-> west traffic) to function. All Rx/Tx traffic is steered through the primary
+to/from the secondaries.
+
+Currently, we limit the support to PFs only, and up to two PFs (sockets).
+
+Channels distribution
+=====================
+
+We distribute the channels between the different PFs to achieve local NUMA node performance
+on multiple NUMA nodes.
+
+Each combined channel works against one specific PF, creating all its datapath queues against it. We
+distribute channels to PFs in a round-robin policy.
+
+::
+
+ Example for 2 PFs and 5 channels:
+ +--------+--------+
+ | ch idx | PF idx |
+ +--------+--------+
+ | 0 | 0 |
+ | 1 | 1 |
+ | 2 | 0 |
+ | 3 | 1 |
+ | 4 | 0 |
+ +--------+--------+
+
+
+The reason we prefer round-robin is, it is less influenced by changes in the number of channels. The
+mapping between a channel index and a PF is fixed, no matter how many channels the user configures.
+As the channel stats are persistent across channel's closure, changing the mapping every single time
+would turn the accumulative stats less representing of the channel's history.
+
+This is achieved by using the correct core device instance (mdev) in each channel, instead of them
+all using the same instance under "priv->mdev".
+
+Observability
+=============
+The relation between PF, irq, napi, and queue can be observed via netlink spec:
+
+$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump queue-get --json='{"ifindex": 13}'
+[{'id': 0, 'ifindex': 13, 'napi-id': 539, 'type': 'rx'},
+ {'id': 1, 'ifindex': 13, 'napi-id': 540, 'type': 'rx'},
+ {'id': 2, 'ifindex': 13, 'napi-id': 541, 'type': 'rx'},
+ {'id': 3, 'ifindex': 13, 'napi-id': 542, 'type': 'rx'},
+ {'id': 4, 'ifindex': 13, 'napi-id': 543, 'type': 'rx'},
+ {'id': 0, 'ifindex': 13, 'napi-id': 539, 'type': 'tx'},
+ {'id': 1, 'ifindex': 13, 'napi-id': 540, 'type': 'tx'},
+ {'id': 2, 'ifindex': 13, 'napi-id': 541, 'type': 'tx'},
+ {'id': 3, 'ifindex': 13, 'napi-id': 542, 'type': 'tx'},
+ {'id': 4, 'ifindex': 13, 'napi-id': 543, 'type': 'tx'}]
+
+$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump napi-get --json='{"ifindex": 13}'
+[{'id': 543, 'ifindex': 13, 'irq': 42},
+ {'id': 542, 'ifindex': 13, 'irq': 41},
+ {'id': 541, 'ifindex': 13, 'irq': 40},
+ {'id': 540, 'ifindex': 13, 'irq': 39},
+ {'id': 539, 'ifindex': 13, 'irq': 36}]
+
+Here you can clearly observe our channels distribution policy:
+
+$ ls /proc/irq/{36,39,40,41,42}/mlx5* -d -1
+/proc/irq/36/mlx5_comp1@pci:0000:08:00.0
+/proc/irq/39/mlx5_comp1@pci:0000:09:00.0
+/proc/irq/40/mlx5_comp2@pci:0000:08:00.0
+/proc/irq/41/mlx5_comp2@pci:0000:09:00.0
+/proc/irq/42/mlx5_comp3@pci:0000:08:00.0
+
+Steering
+========
+Secondary PFs are set to "silent" mode, meaning they are disconnected from the network.
+
+In Rx, the steering tables belong to the primary PF only, and it is its role to distribute incoming
+traffic to other PFs, via cross-vhca steering capabilities. Still maintain a single default RSS table,
+that is capable of pointing to the receive queues of a different PF.
+
+In Tx, the primary PF creates a new Tx flow table, which is aliased by the secondaries, so they can
+go out to the network through it.
+
+In addition, we set default XPS configuration that, based on the CPU, selects an SQ belonging to the
+PF on the same node as the CPU.
+
+XPS default config example:
+
+NUMA node(s): 2
+NUMA node0 CPU(s): 0-11
+NUMA node1 CPU(s): 12-23
+
+PF0 on node0, PF1 on node1.
+
+- /sys/class/net/eth2/queues/tx-0/xps_cpus:000001
+- /sys/class/net/eth2/queues/tx-1/xps_cpus:001000
+- /sys/class/net/eth2/queues/tx-2/xps_cpus:000002
+- /sys/class/net/eth2/queues/tx-3/xps_cpus:002000
+- /sys/class/net/eth2/queues/tx-4/xps_cpus:000004
+- /sys/class/net/eth2/queues/tx-5/xps_cpus:004000
+- /sys/class/net/eth2/queues/tx-6/xps_cpus:000008
+- /sys/class/net/eth2/queues/tx-7/xps_cpus:008000
+- /sys/class/net/eth2/queues/tx-8/xps_cpus:000010
+- /sys/class/net/eth2/queues/tx-9/xps_cpus:010000
+- /sys/class/net/eth2/queues/tx-10/xps_cpus:000020
+- /sys/class/net/eth2/queues/tx-11/xps_cpus:020000
+- /sys/class/net/eth2/queues/tx-12/xps_cpus:000040
+- /sys/class/net/eth2/queues/tx-13/xps_cpus:040000
+- /sys/class/net/eth2/queues/tx-14/xps_cpus:000080
+- /sys/class/net/eth2/queues/tx-15/xps_cpus:080000
+- /sys/class/net/eth2/queues/tx-16/xps_cpus:000100
+- /sys/class/net/eth2/queues/tx-17/xps_cpus:100000
+- /sys/class/net/eth2/queues/tx-18/xps_cpus:000200
+- /sys/class/net/eth2/queues/tx-19/xps_cpus:200000
+- /sys/class/net/eth2/queues/tx-20/xps_cpus:000400
+- /sys/class/net/eth2/queues/tx-21/xps_cpus:400000
+- /sys/class/net/eth2/queues/tx-22/xps_cpus:000800
+- /sys/class/net/eth2/queues/tx-23/xps_cpus:800000
+
+Mutually exclusive features
+===========================
+
+The nature of Multi-PF, where different channels work with different PFs, conflicts with
+stateful features where the state is maintained in one of the PFs.
+For example, in the TLS device-offload feature, special context objects are created per connection
+and maintained in the PF. Transitioning between different RQs/SQs would break the feature. Hence,
+we disable this combination for now.
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index 390730a74332..d55c2a22ec7a 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -15,6 +15,8 @@ Extended console support by Tejun Heo <tj@kernel.org>, May 1 2015
Release prepend support by Breno Leitao <leitao@debian.org>, Jul 7 2023
+Userdata append support by Matthew Wood <thepacketgeek@gmail.com>, Jan 22 2024
+
Please send bug reports to Matt Mackall <mpm@selenic.com>
Satyam Sharma <satyam.sharma@gmail.com>, and Cong Wang <xiyou.wangcong@gmail.com>
@@ -171,6 +173,70 @@ You can modify these targets in runtime by creating the following targets::
cat cmdline1/remote_ip
10.0.0.3
+Append User Data
+----------------
+
+Custom user data can be appended to the end of messages with netconsole
+dynamic configuration enabled. User data entries can be modified without
+changing the "enabled" attribute of a target.
+
+Directories (keys) under `userdata` are limited to 53 character length, and
+data in `userdata/<key>/value` are limited to 200 bytes::
+
+ cd /sys/kernel/config/netconsole && mkdir cmdline0
+ cd cmdline0
+ mkdir userdata/foo
+ echo bar > userdata/foo/value
+ mkdir userdata/qux
+ echo baz > userdata/qux/value
+
+Messages will now include this additional user data::
+
+ echo "This is a message" > /dev/kmsg
+
+Sends::
+
+ 12,607,22085407756,-;This is a message
+ foo=bar
+ qux=baz
+
+Preview the userdata that will be appended with::
+
+ cd /sys/kernel/config/netconsole/cmdline0/userdata
+ for f in `ls userdata`; do echo $f=$(cat userdata/$f/value); done
+
+If a `userdata` entry is created but no data is written to the `value` file,
+the entry will be omitted from netconsole messages::
+
+ cd /sys/kernel/config/netconsole && mkdir cmdline0
+ cd cmdline0
+ mkdir userdata/foo
+ echo bar > userdata/foo/value
+ mkdir userdata/qux
+
+The `qux` key is omitted since it has no value::
+
+ echo "This is a message" > /dev/kmsg
+ 12,607,22085407756,-;This is a message
+ foo=bar
+
+Delete `userdata` entries with `rmdir`::
+
+ rmdir /sys/kernel/config/netconsole/cmdline0/userdata/qux
+
+.. warning::
+ When writing strings to user data values, input is broken up per line in
+ configfs store calls and this can cause confusing behavior::
+
+ mkdir userdata/testing
+ printf "val1\nval2" > userdata/testing/value
+ # userdata store value is called twice, first with "val1\n" then "val2"
+ # so "val2" is stored, being the last value stored
+ cat userdata/testing/value
+ val2
+
+ It is recommended to not write user data values with newlines.
+
Extended console:
=================
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 9e4cccb90b87..c2476917a6c3 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -252,8 +252,8 @@ ndo_eth_ioctl:
Context: process
ndo_get_stats:
- Synchronization: rtnl_lock() semaphore, dev_base_lock rwlock, or RCU.
- Context: atomic (can't sleep under rwlock or RCU)
+ Synchronization: rtnl_lock() semaphore, or RCU.
+ Context: atomic (can't sleep under RCU)
ndo_start_xmit:
Synchronization: __netif_tx_lock spinlock.
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
index 8054d33f449f..5bf285d73e8a 100644
--- a/Documentation/networking/sfp-phylink.rst
+++ b/Documentation/networking/sfp-phylink.rst
@@ -231,16 +231,136 @@ this documentation.
For further information on these methods, please see the inline
documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`.
-9. Remove calls to of_parse_phandle() for the PHY,
- of_phy_register_fixed_link() for fixed links etc. from the probe
- function, and replace with:
+9. Fill-in the :c:type:`struct phylink_config <phylink_config>` fields with
+ a reference to the :c:type:`struct device <device>` associated to your
+ :c:type:`struct net_device <net_device>`:
.. code-block:: c
- struct phylink *phylink;
priv->phylink_config.dev = &dev.dev;
priv->phylink_config.type = PHYLINK_NETDEV;
+ Fill-in the various speeds, pause and duplex modes your MAC can handle:
+
+ .. code-block:: c
+
+ priv->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_10 | MAC_100 | MAC_1000FD;
+
+10. Some Ethernet controllers work in pair with a PCS (Physical Coding Sublayer)
+ block, that can handle among other things the encoding/decoding, link
+ establishment detection and autonegotiation. While some MACs have internal
+ PCS whose operation is transparent, some other require dedicated PCS
+ configuration for the link to become functional. In that case, phylink
+ provides a PCS abstraction through :c:type:`struct phylink_pcs <phylink_pcs>`.
+
+ Identify if your driver has one or more internal PCS blocks, and/or if
+ your controller can use an external PCS block that might be internally
+ connected to your controller.
+
+ If your controller doesn't have any internal PCS, you can go to step 11.
+
+ If your Ethernet controller contains one or several PCS blocks, create
+ one :c:type:`struct phylink_pcs <phylink_pcs>` instance per PCS block within
+ your driver's private data structure:
+
+ .. code-block:: c
+
+ struct phylink_pcs pcs;
+
+ Populate the relevant :c:type:`struct phylink_pcs_ops <phylink_pcs_ops>` to
+ configure your PCS. Create a :c:func:`pcs_get_state` function that reports
+ the inband link state, a :c:func:`pcs_config` function to configure your
+ PCS according to phylink-provided parameters, and a :c:func:`pcs_validate`
+ function that report to phylink all accepted configuration parameters for
+ your PCS:
+
+ .. code-block:: c
+
+ struct phylink_pcs_ops foo_pcs_ops = {
+ .pcs_validate = foo_pcs_validate,
+ .pcs_get_state = foo_pcs_get_state,
+ .pcs_config = foo_pcs_config,
+ };
+
+ Arrange for PCS link state interrupts to be forwarded into
+ phylink, via:
+
+ .. code-block:: c
+
+ phylink_pcs_change(pcs, link_is_up);
+
+ where ``link_is_up`` is true if the link is currently up or false
+ otherwise. If a PCS is unable to provide these interrupts, then
+ it should set ``pcs->pcs_poll = true;`` when creating the PCS.
+
+11. If your controller relies on, or accepts the presence of an external PCS
+ controlled through its own driver, add a pointer to a phylink_pcs instance
+ in your driver private data structure:
+
+ .. code-block:: c
+
+ struct phylink_pcs *pcs;
+
+ The way of getting an instance of the actual PCS depends on the platform,
+ some PCS sit on an MDIO bus and are grabbed by passing a pointer to the
+ corresponding :c:type:`struct mii_bus <mii_bus>` and the PCS's address on
+ that bus. In this example, we assume the controller attaches to a Lynx PCS
+ instance:
+
+ .. code-block:: c
+
+ priv->pcs = lynx_pcs_create_mdiodev(bus, 0);
+
+ Some PCS can be recovered based on firmware information:
+
+ .. code-block:: c
+
+ priv->pcs = lynx_pcs_create_fwnode(of_fwnode_handle(node));
+
+12. Populate the :c:func:`mac_select_pcs` callback and add it to your
+ :c:type:`struct phylink_mac_ops <phylink_mac_ops>` set of ops. This function
+ must return a pointer to the relevant :c:type:`struct phylink_pcs <phylink_pcs>`
+ that will be used for the requested link configuration:
+
+ .. code-block:: c
+
+ static struct phylink_pcs *foo_select_pcs(struct phylink_config *config,
+ phy_interface_t interface)
+ {
+ struct foo_priv *priv = container_of(config, struct foo_priv,
+ phylink_config);
+
+ if ( /* 'interface' needs a PCS to function */ )
+ return priv->pcs;
+
+ return NULL;
+ }
+
+ See :c:func:`mvpp2_select_pcs` for an example of a driver that has multiple
+ internal PCS.
+
+13. Fill-in all the :c:type:`phy_interface_t <phy_interface_t>` (i.e. all MAC to
+ PHY link modes) that your MAC can output. The following example shows a
+ configuration for a MAC that can handle all RGMII modes, SGMII and 1000BaseX.
+ You must adjust these according to what your MAC and all PCS associated
+ with this MAC are capable of, and not just the interface you wish to use:
+
+ .. code-block:: c
+
+ phy_interface_set_rgmii(priv->phylink_config.supported_interfaces);
+ __set_bit(PHY_INTERFACE_MODE_SGMII,
+ priv->phylink_config.supported_interfaces);
+ __set_bit(PHY_INTERFACE_MODE_1000BASEX,
+ priv->phylink_config.supported_interfaces);
+
+14. Remove calls to of_parse_phandle() for the PHY,
+ of_phy_register_fixed_link() for fixed links etc. from the probe
+ function, and replace with:
+
+ .. code-block:: c
+
+ struct phylink *phylink;
+
phylink = phylink_create(&priv->phylink_config, node, phy_mode, &phylink_ops);
if (IS_ERR(phylink)) {
err = PTR_ERR(phylink);
@@ -249,14 +369,14 @@ this documentation.
priv->phylink = phylink;
- and arrange to destroy the phylink in the probe failure path as
- appropriate and the removal path too by calling:
+ and arrange to destroy the phylink in the probe failure path as
+ appropriate and the removal path too by calling:
- .. code-block:: c
+ .. code-block:: c
phylink_destroy(priv->phylink);
-10. Arrange for MAC link state interrupts to be forwarded into
+15. Arrange for MAC link state interrupts to be forwarded into
phylink, via:
.. code-block:: c
@@ -264,17 +384,16 @@ this documentation.
phylink_mac_change(priv->phylink, link_is_up);
where ``link_is_up`` is true if the link is currently up or false
- otherwise. If a MAC is unable to provide these interrupts, then
- it should set ``priv->phylink_config.pcs_poll = true;`` in step 9.
+ otherwise.
-11. Verify that the driver does not call::
+16. Verify that the driver does not call::
netif_carrier_on()
netif_carrier_off()
- as these will interfere with phylink's tracking of the link state,
- and cause phylink to omit calls via the :c:func:`mac_link_up` and
- :c:func:`mac_link_down` methods.
+ as these will interfere with phylink's tracking of the link state,
+ and cause phylink to omit calls via the :c:func:`mac_link_up` and
+ :c:func:`mac_link_down` methods.
Network drivers should call phylink_stop() and phylink_start() via their
suspend/resume paths, which ensures that the appropriate
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index 551b3cc29a41..75e017dfa825 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -41,6 +41,15 @@ If `-s` is specified once the detailed errors won't be shown.
`ip` supports JSON formatting via the `-j` option.
+Queue statistics
+~~~~~~~~~~~~~~~~
+
+Queue statistics are accessible via the netdev netlink family.
+
+Currently no widely distributed CLI exists to access those statistics.
+Kernel development tools (ynl) can be used to experiment with them,
+see `Documentation/userspace-api/netlink/intro-specs.rst`.
+
Protocol-specific statistics
----------------------------
@@ -147,6 +156,12 @@ Statistics are reported both in the responses to link information
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
+netdev (netlink)
+~~~~~~~~~~~~~~~~
+
+`netdev` generic netlink family allows accessing page pool and per queue
+statistics.
+
ethtool
-------
diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst
index 535077cbeb07..bfea9d8579ed 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm_device.rst
@@ -71,9 +71,9 @@ Callbacks to implement
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
struct xfrm_state *x);
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
+ void (*xdo_dev_state_update_stats) (struct xfrm_state *x);
/* Solely packet offload callbacks */
- void (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
void (*xdo_dev_policy_delete) (struct xfrm_policy *x);
void (*xdo_dev_policy_free) (struct xfrm_policy *x);
@@ -191,6 +191,6 @@ xdo_dev_policy_free() on any remaining offloaded states.
Outcome of HW handling packets, the XFRM core can't count hard, soft limits.
The HW/driver are responsible to perform it and provide accurate data when
-xdo_dev_state_update_curlft() is called. In case of one of these limits
+xdo_dev_state_update_stats() is called. In case of one of these limits
occuried, the driver needs to call to xfrm_state_check_expire() to make sure
that XFRM performs rekeying sequence.