summaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/device_drivers/ethernet/marvell/octeon_ep.rst4
-rw-r--r--Documentation/networking/device_drivers/ethernet/neterion/s2io.rst4
-rw-r--r--Documentation/networking/device_drivers/index.rst1
-rw-r--r--Documentation/networking/device_drivers/qlogic/index.rst18
-rw-r--r--Documentation/networking/device_drivers/qlogic/qlge.rst118
-rw-r--r--Documentation/networking/index.rst2
-rw-r--r--Documentation/networking/netlink_spec/.gitignore1
-rw-r--r--Documentation/networking/netlink_spec/readme.txt4
-rw-r--r--Documentation/networking/smc-sysctl.rst20
-rw-r--r--Documentation/networking/tcp_ao.rst444
10 files changed, 473 insertions, 143 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep.rst
index cad96c8d1f97..613a818d5db6 100644
--- a/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep.rst
+++ b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep.rst
@@ -24,6 +24,10 @@ Supported Devices
Currently, this driver support following devices:
* Network controller: Cavium, Inc. Device b200
* Network controller: Cavium, Inc. Device b400
+ * Network controller: Cavium, Inc. Device b900
+ * Network controller: Cavium, Inc. Device ba00
+ * Network controller: Cavium, Inc. Device bc00
+ * Network controller: Cavium, Inc. Device bd00
Interface Control
=================
diff --git a/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst b/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst
index c5673ec4559b..d731b5a98561 100644
--- a/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst
+++ b/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst
@@ -64,8 +64,8 @@ c. Multi-buffer receive mode. Scattering of packet across multiple
IBM xSeries).
d. MSI/MSI-X. Can be enabled on platforms which support this feature
- (IA64, Xeon) resulting in noticeable performance improvement(up to 7%
- on certain platforms).
+ resulting in noticeable performance improvement (up to 7% on certain
+ platforms).
e. Statistics. Comprehensive MAC-level and software statistics displayed
using "ethtool -S" option.
diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst
index 2f0285a5bc80..0dd30a84ce25 100644
--- a/Documentation/networking/device_drivers/index.rst
+++ b/Documentation/networking/device_drivers/index.rst
@@ -15,7 +15,6 @@ Contents:
ethernet/index
fddi/index
hamradio/index
- qlogic/index
wifi/index
wwan/index
diff --git a/Documentation/networking/device_drivers/qlogic/index.rst b/Documentation/networking/device_drivers/qlogic/index.rst
deleted file mode 100644
index ad05b04286e4..000000000000
--- a/Documentation/networking/device_drivers/qlogic/index.rst
+++ /dev/null
@@ -1,18 +0,0 @@
-.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
-
-QLogic QLGE Device Drivers
-===============================================
-
-Contents:
-
-.. toctree::
- :maxdepth: 2
-
- qlge
-
-.. only:: subproject and html
-
- Indices
- =======
-
- * :ref:`genindex`
diff --git a/Documentation/networking/device_drivers/qlogic/qlge.rst b/Documentation/networking/device_drivers/qlogic/qlge.rst
deleted file mode 100644
index 0b888253d152..000000000000
--- a/Documentation/networking/device_drivers/qlogic/qlge.rst
+++ /dev/null
@@ -1,118 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=======================================
-QLogic QLGE 10Gb Ethernet device driver
-=======================================
-
-This driver use drgn and devlink for debugging.
-
-Dump kernel data structures in drgn
------------------------------------
-
-To dump kernel data structures, the following Python script can be used
-in drgn:
-
-.. code-block:: python
-
- def align(x, a):
- """the alignment a should be a power of 2
- """
- mask = a - 1
- return (x+ mask) & ~mask
-
- def struct_size(struct_type):
- struct_str = "struct {}".format(struct_type)
- return sizeof(Object(prog, struct_str, address=0x0))
-
- def netdev_priv(netdevice):
- NETDEV_ALIGN = 32
- return netdevice.value_() + align(struct_size("net_device"), NETDEV_ALIGN)
-
- name = 'xxx'
- qlge_device = None
- netdevices = prog['init_net'].dev_base_head.address_of_()
- for netdevice in list_for_each_entry("struct net_device", netdevices, "dev_list"):
- if netdevice.name.string_().decode('ascii') == name:
- print(netdevice.name)
-
- ql_adapter = Object(prog, "struct ql_adapter", address=netdev_priv(qlge_device))
-
-The struct ql_adapter will be printed in drgn as follows,
-
- >>> ql_adapter
- (struct ql_adapter){
- .ricb = (struct ricb){
- .base_cq = (u8)0,
- .flags = (u8)120,
- .mask = (__le16)26637,
- .hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 },
- .ipv6_hash_key = (__le32 [10]){},
- .ipv4_hash_key = (__le32 [4]){},
- },
- .flags = (unsigned long)0,
- .wol = (u32)0,
- .nic_stats = (struct nic_stats){
- .tx_pkts = (u64)0,
- .tx_bytes = (u64)0,
- .tx_mcast_pkts = (u64)0,
- .tx_bcast_pkts = (u64)0,
- .tx_ucast_pkts = (u64)0,
- .tx_ctl_pkts = (u64)0,
- .tx_pause_pkts = (u64)0,
- ...
- },
- .active_vlans = (unsigned long [64]){
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615,
- 18446619461681283072, 0, 42949673024, 2147483647,
- },
- .rx_ring = (struct rx_ring [17]){
- {
- .cqicb = (struct cqicb){
- .msix_vect = (u8)0,
- .reserved1 = (u8)0,
- .reserved2 = (u8)0,
- .flags = (u8)0,
- .len = (__le16)0,
- .rid = (__le16)0,
- ...
- },
- .cq_base = (void *)0x0,
- .cq_base_dma = (dma_addr_t)0,
- }
- ...
- }
- }
-
-coredump via devlink
---------------------
-
-
-And the coredump obtained via devlink in json format looks like,
-
-.. code:: shell
-
- $ devlink health dump show DEVICE reporter coredump -p -j
- {
- "Core Registers": {
- "segment": 1,
- "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
- },
- "Test Logic Regs": {
- "segment": 2,
- "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
- },
- "RMII Registers": {
- "segment": 3,
- "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
- },
- ...
- "Sem Registers": {
- "segment": 50,
- "values": [ 0,0,0,0 ]
- }
- }
-
-When the module parameter qlge_force_coredump is set to be true, the MPI
-RISC reset before coredumping. So coredumping will much longer since
-devlink tool has to wait for 5 secs for the resetting to be
-finished.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 2ffc5ad10295..cb435c141794 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -55,6 +55,7 @@ Contents:
filter
generic-hdlc
generic_netlink
+ netlink_spec/index
gen_stats
gtp
ila
@@ -106,6 +107,7 @@ Contents:
sysfs-tagging
tc-actions-env-rules
tc-queue-filters
+ tcp_ao
tcp-thin
team
timestamping
diff --git a/Documentation/networking/netlink_spec/.gitignore b/Documentation/networking/netlink_spec/.gitignore
new file mode 100644
index 000000000000..30d85567b592
--- /dev/null
+++ b/Documentation/networking/netlink_spec/.gitignore
@@ -0,0 +1 @@
+*.rst
diff --git a/Documentation/networking/netlink_spec/readme.txt b/Documentation/networking/netlink_spec/readme.txt
new file mode 100644
index 000000000000..6763f99d216c
--- /dev/null
+++ b/Documentation/networking/netlink_spec/readme.txt
@@ -0,0 +1,4 @@
+SPDX-License-Identifier: GPL-2.0
+
+This file is populated during the build of the documentation (htmldocs) by the
+tools/net/ynl/ynl-gen-rst.py script.
diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index 6d8acdbe9be1..a874d007f2db 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -44,18 +44,30 @@ smcr_testlink_time - INTEGER
wmem - INTEGER
Initial size of send buffer used by SMC sockets.
- The default value inherits from net.ipv4.tcp_wmem[1].
The minimum value is 16KiB and there is no hard limit for max value, but
only allowed 512KiB for SMC-R and 1MiB for SMC-D.
- Default: 16K
+ Default: 64KiB
rmem - INTEGER
Initial size of receive buffer (RMB) used by SMC sockets.
- The default value inherits from net.ipv4.tcp_rmem[1].
The minimum value is 16KiB and there is no hard limit for max value, but
only allowed 512KiB for SMC-R and 1MiB for SMC-D.
- Default: 128K
+ Default: 64KiB
+
+smcr_max_links_per_lgr - INTEGER
+ Controls the max number of links can be added to a SMC-R link group. Notice that
+ the actual number of the links added to a SMC-R link group depends on the number
+ of RDMA devices exist in the system. The acceptable value ranges from 1 to 2. Only
+ for SMC-R v2.1 and later.
+
+ Default: 2
+
+smcr_max_conns_per_lgr - INTEGER
+ Controls the max number of connections can be added to a SMC-R link group. The
+ acceptable value ranges from 16 to 255. Only for SMC-R v2.1 and later.
+
+ Default: 255
diff --git a/Documentation/networking/tcp_ao.rst b/Documentation/networking/tcp_ao.rst
new file mode 100644
index 000000000000..cfa5bf1cc542
--- /dev/null
+++ b/Documentation/networking/tcp_ao.rst
@@ -0,0 +1,444 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================================
+TCP Authentication Option Linux implementation (RFC5925)
+========================================================
+
+TCP Authentication Option (TCP-AO) provides a TCP extension aimed at verifying
+segments between trusted peers. It adds a new TCP header option with
+a Message Authentication Code (MAC). MACs are produced from the content
+of a TCP segment using a hashing function with a password known to both peers.
+The intent of TCP-AO is to deprecate TCP-MD5 providing better security,
+key rotation and support for variety of hashing algorithms.
+
+1. Introduction
+===============
+
+.. table:: Short and Limited Comparison of TCP-AO and TCP-MD5
+
+ +----------------------+------------------------+-----------------------+
+ | | TCP-MD5 | TCP-AO |
+ +======================+========================+=======================+
+ |Supported hashing |MD5 |Must support HMAC-SHA1 |
+ |algorithms |(cryptographically weak)|(chosen-prefix attacks)|
+ | | |and CMAC-AES-128 (only |
+ | | |side-channel attacks). |
+ | | |May support any hashing|
+ | | |algorithm. |
+ +----------------------+------------------------+-----------------------+
+ |Length of MACs (bytes)|16 |Typically 12-16. |
+ | | |Other variants that fit|
+ | | |TCP header permitted. |
+ +----------------------+------------------------+-----------------------+
+ |Number of keys per |1 |Many |
+ |TCP connection | | |
+ +----------------------+------------------------+-----------------------+
+ |Possibility to change |Non-practical (both |Supported by protocol |
+ |an active key |peers have to change | |
+ | |them during MSL) | |
+ +----------------------+------------------------+-----------------------+
+ |Protection against |No |Yes: ignoring them |
+ |ICMP 'hard errors' | |by default on |
+ | | |established connections|
+ +----------------------+------------------------+-----------------------+
+ |Protection against |No |Yes: pseudo-header |
+ |traffic-crossing | |includes TCP ports. |
+ |attack | | |
+ +----------------------+------------------------+-----------------------+
+ |Protection against |No |Sequence Number |
+ |replayed TCP segments | |Extension (SNE) and |
+ | | |Initial Sequence |
+ | | |Numbers (ISNs) |
+ +----------------------+------------------------+-----------------------+
+ |Supports |Yes |No. ISNs+SNE are needed|
+ |Connectionless Resets | |to correctly sign RST. |
+ +----------------------+------------------------+-----------------------+
+ |Standards |RFC 2385 |RFC 5925, RFC 5926 |
+ +----------------------+------------------------+-----------------------+
+
+
+1.1 Frequently Asked Questions (FAQ) with references to RFC 5925
+----------------------------------------------------------------
+
+Q: Can either SendID or RecvID be non-unique for the same 4-tuple
+(srcaddr, srcport, dstaddr, dstport)?
+
+A: No [3.1]::
+
+ >> The IDs of MKTs MUST NOT overlap where their TCP connection
+ identifiers overlap.
+
+Q: Can Master Key Tuple (MKT) for an active connection be removed?
+
+A: No, unless it's copied to Transport Control Block (TCB) [3.1]::
+
+ It is presumed that an MKT affecting a particular connection cannot
+ be destroyed during an active connection -- or, equivalently, that
+ its parameters are copied to an area local to the connection (i.e.,
+ instantiated) and so changes would affect only new connections.
+
+Q: If an old MKT needs to be deleted, how should it be done in order
+to not remove it for an active connection? (As it can be still in use
+at any moment later)
+
+A: Not specified by RFC 5925, seems to be a problem for key management
+to ensure that no one uses such MKT before trying to remove it.
+
+Q: Can an old MKT exist forever and be used by another peer?
+
+A: It can, it's a key management task to decide when to remove an old key [6.1]::
+
+ Deciding when to start using a key is a performance issue. Deciding
+ when to remove an MKT is a security issue. Invalid MKTs are expected
+ to be removed. TCP-AO provides no mechanism to coordinate their removal,
+ as we consider this a key management operation.
+
+also [6.1]::
+
+ The only way to avoid reuse of previously used MKTs is to remove the MKT
+ when it is no longer considered permitted.
+
+Linux TCP-AO will try its best to prevent you from removing a key that's
+being used, considering it a key management failure. But sine keeping
+an outdated key may become a security issue and as a peer may
+unintentionally prevent the removal of an old key by always setting
+it as RNextKeyID - a forced key removal mechanism is provided, where
+userspace has to supply KeyID to use instead of the one that's being removed
+and the kernel will atomically delete the old key, even if the peer is
+still requesting it. There are no guarantees for force-delete as the peer
+may yet not have the new key - the TCP connection may just break.
+Alternatively, one may choose to shut down the socket.
+
+Q: What happens when a packet is received on a new connection with no known
+MKT's RecvID?
+
+A: RFC 5925 specifies that by default it is accepted with a warning logged, but
+the behaviour can be configured by the user [7.5.1.a]::
+
+ If the segment is a SYN, then this is the first segment of a new
+ connection. Find the matching MKT for this segment, using the segment's
+ socket pair and its TCP-AO KeyID, matched against the MKT's TCP connection
+ identifier and the MKT's RecvID.
+
+ i. If there is no matching MKT, remove TCP-AO from the segment.
+ Proceed with further TCP handling of the segment.
+ NOTE: this presumes that connections that do not match any MKT
+ should be silently accepted, as noted in Section 7.3.
+
+[7.3]::
+
+ >> A TCP-AO implementation MUST allow for configuration of the behavior
+ of segments with TCP-AO but that do not match an MKT. The initial default
+ of this configuration SHOULD be to silently accept such connections.
+ If this is not the desired case, an MKT can be included to match such
+ connections, or the connection can indicate that TCP-AO is required.
+ Alternately, the configuration can be changed to discard segments with
+ the AO option not matching an MKT.
+
+[10.2.b]::
+
+ Connections not matching any MKT do not require TCP-AO. Further, incoming
+ segments with TCP-AO are not discarded solely because they include
+ the option, provided they do not match any MKT.
+
+Note that Linux TCP-AO implementation differs in this aspect. Currently, TCP-AO
+segments with unknown key signatures are discarded with warnings logged.
+
+Q: Does the RFC imply centralized kernel key management in any way?
+(i.e. that a key on all connections MUST be rotated at the same time?)
+
+A: Not specified. MKTs can be managed in userspace, the only relevant part to
+key changes is [7.3]::
+
+ >> All TCP segments MUST be checked against the set of MKTs for matching
+ TCP connection identifiers.
+
+Q: What happens when RNextKeyID requested by a peer is unknown? Should
+the connection be reset?
+
+A: It should not, no action needs to be performed [7.5.2.e]::
+
+ ii. If they differ, determine whether the RNextKeyID MKT is ready.
+
+ 1. If the MKT corresponding to the segment’s socket pair and RNextKeyID
+ is not available, no action is required (RNextKeyID of a received
+ segment needs to match the MKT’s SendID).
+
+Q: How current_key is set and when does it change? It is a user-triggered
+change, or is it by a request from the remote peer? Is it set by the user
+explicitly, or by a matching rule?
+
+A: current_key is set by RNextKeyID [6.1]::
+
+ Rnext_key is changed only by manual user intervention or MKT management
+ protocol operation. It is not manipulated by TCP-AO. Current_key is updated
+ by TCP-AO when processing received TCP segments as discussed in the segment
+ processing description in Section 7.5. Note that the algorithm allows
+ the current_key to change to a new MKT, then change back to a previously
+ used MKT (known as "backing up"). This can occur during an MKT change when
+ segments are received out of order, and is considered a feature of TCP-AO,
+ because reordering does not result in drops.
+
+[7.5.2.e.ii]::
+
+ 2. If the matching MKT corresponding to the segment’s socket pair and
+ RNextKeyID is available:
+
+ a. Set current_key to the RNextKeyID MKT.
+
+Q: If both peers have multiple MKTs matching the connection's socket pair
+(with different KeyIDs), how should the sender/receiver pick KeyID to use?
+
+A: Some mechanism should pick the "desired" MKT [3.3]::
+
+ Multiple MKTs may match a single outgoing segment, e.g., when MKTs
+ are being changed. Those MKTs cannot have conflicting IDs (as noted
+ elsewhere), and some mechanism must determine which MKT to use for each
+ given outgoing segment.
+
+ >> An outgoing TCP segment MUST match at most one desired MKT, indicated
+ by the segment’s socket pair. The segment MAY match multiple MKTs, provided
+ that exactly one MKT is indicated as desired. Other information in
+ the segment MAY be used to determine the desired MKT when multiple MKTs
+ match; such information MUST NOT include values in any TCP option fields.
+
+Q: Can TCP-MD5 connection migrate to TCP-AO (and vice-versa):
+
+A: No [1]::
+
+ TCP MD5-protected connections cannot be migrated to TCP-AO because TCP MD5
+ does not support any changes to a connection’s security algorithm
+ once established.
+
+Q: If all MKTs are removed on a connection, can it become a non-TCP-AO signed
+connection?
+
+A: [7.5.2] doesn't have the same choice as SYN packet handling in [7.5.1.i]
+that would allow accepting segments without a sign (which would be insecure).
+While switching to non-TCP-AO connection is not prohibited directly, it seems
+what the RFC means. Also, there's a requirement for TCP-AO connections to
+always have one current_key [3.3]::
+
+ TCP-AO requires that every protected TCP segment match exactly one MKT.
+
+[3.3]::
+
+ >> An incoming TCP segment including TCP-AO MUST match exactly one MKT,
+ indicated solely by the segment’s socket pair and its TCP-AO KeyID.
+
+[4.4]::
+
+ One or more MKTs. These are the MKTs that match this connection’s
+ socket pair.
+
+Q: Can a non-TCP-AO connection become a TCP-AO-enabled one?
+
+A: No: for already established non-TCP-AO connection it would be impossible
+to switch using TCP-AO as the traffic key generation requires the initial
+sequence numbers. Paraphrasing, starting using TCP-AO would require
+re-establishing the TCP connection.
+
+2. In-kernel MKTs database vs database in userspace
+===================================================
+
+Linux TCP-AO support is implemented using ``setsockopt()s``, in a similar way
+to TCP-MD5. It means that a userspace application that wants to use TCP-AO
+should perform ``setsockopt()`` on a TCP socket when it wants to add,
+remove or rotate MKTs. This approach moves the key management responsibility
+to userspace as well as decisions on corner cases, i.e. what to do if
+the peer doesn't respect RNextKeyID; moving more code to userspace, especially
+responsible for the policy decisions. Besides, it's flexible and scales well
+(with less locking needed than in the case of an in-kernel database). One also
+should keep in mind that mainly intended users are BGP processes, not any
+random applications, which means that compared to IPsec tunnels,
+no transparency is really needed and modern BGP daemons already have
+``setsockopt()s`` for TCP-MD5 support.
+
+.. table:: Considered pros and cons of the approaches
+
+ +----------------------+------------------------+-----------------------+
+ | | ``setsockopt()`` | in-kernel DB |
+ +======================+========================+=======================+
+ | Extendability | ``setsockopt()`` | Netlink messages are |
+ | | commands should be | simple and extendable |
+ | | extendable syscalls | |
+ +----------------------+------------------------+-----------------------+
+ | Required userspace | BGP or any application | could be transparent |
+ | changes | that wants TCP-AO needs| as tunnels, providing |
+ | | to perform | something like |
+ | | ``setsockopt()s`` | ``ip tcpao add key`` |
+ | | and do key management | (delete/show/rotate) |
+ +----------------------+------------------------+-----------------------+
+ |MKTs removal or adding| harder for userspace | harder for kernel |
+ +----------------------+------------------------+-----------------------+
+ | Dump-ability | ``getsockopt()`` | Netlink .dump() |
+ | | | callback |
+ +----------------------+------------------------+-----------------------+
+ | Limits on kernel | equal |
+ | resources/memory | |
+ +----------------------+------------------------+-----------------------+
+ | Scalability | contention on | contention on |
+ | | ``TCP_LISTEN`` sockets | the whole database |
+ +----------------------+------------------------+-----------------------+
+ | Monitoring & warnings| ``TCP_DIAG`` | same Netlink socket |
+ +----------------------+------------------------+-----------------------+
+ | Matching of MKTs | half-problem: only | hard |
+ | | listen sockets | |
+ +----------------------+------------------------+-----------------------+
+
+
+3. uAPI
+=======
+
+Linux provides a set of ``setsockopt()s`` and ``getsockopt()s`` that let
+userspace manage TCP-AO on a per-socket basis. In order to add/delete MKTs
+``TCP_AO_ADD_KEY`` and ``TCP_AO_DEL_KEY`` TCP socket options must be used
+It is not allowed to add a key on an established non-TCP-AO connection
+as well as to remove the last key from TCP-AO connection.
+
+``setsockopt(TCP_AO_DEL_KEY)`` command may specify ``tcp_ao_del::current_key``
++ ``tcp_ao_del::set_current`` and/or ``tcp_ao_del::rnext``
++ ``tcp_ao_del::set_rnext`` which makes such delete "forced": it
+provides userspace a way to delete a key that's being used and atomically set
+another one instead. This is not intended for normal use and should be used
+only when the peer ignores RNextKeyID and keeps requesting/using an old key.
+It provides a way to force-delete a key that's not trusted but may break
+the TCP-AO connection.
+
+The usual/normal key-rotation can be performed with ``setsockopt(TCP_AO_INFO)``.
+It also provides a uAPI to change per-socket TCP-AO settings, such as
+ignoring ICMPs, as well as clear per-socket TCP-AO packet counters.
+The corresponding ``getsockopt(TCP_AO_INFO)`` can be used to get those
+per-socket TCP-AO settings.
+
+Another useful command is ``getsockopt(TCP_AO_GET_KEYS)``. One can use it
+to list all MKTs on a TCP socket or use a filter to get keys for a specific
+peer and/or sndid/rcvid, VRF L3 interface or get current_key/rnext_key.
+
+To repair TCP-AO connections ``setsockopt(TCP_AO_REPAIR)`` is available,
+provided that the user previously has checkpointed/dumped the socket with
+``getsockopt(TCP_AO_REPAIR)``.
+
+A tip here for scaled TCP_LISTEN sockets, that may have some thousands TCP-AO
+keys, is: use filters in ``getsockopt(TCP_AO_GET_KEYS)`` and asynchronous
+delete with ``setsockopt(TCP_AO_DEL_KEY)``.
+
+Linux TCP-AO also provides a bunch of segment counters that can be helpful
+with troubleshooting/debugging issues. Every MKT has good/bad counters
+that reflect how many packets passed/failed verification.
+Each TCP-AO socket has the following counters:
+- for good segments (properly signed)
+- for bad segments (failed TCP-AO verification)
+- for segments with unknown keys
+- for segments where an AO signature was expected, but wasn't found
+- for the number of ignored ICMPs
+
+TCP-AO per-socket counters are also duplicated with per-netns counters,
+exposed with SNMP. Those are ``TCPAOGood``, ``TCPAOBad``, ``TCPAOKeyNotFound``,
+``TCPAORequired`` and ``TCPAODroppedIcmps``.
+
+RFC 5925 very permissively specifies how TCP port matching can be done for
+MKTs::
+
+ TCP connection identifier. A TCP socket pair, i.e., a local IP
+ address, a remote IP address, a TCP local port, and a TCP remote port.
+ Values can be partially specified using ranges (e.g., 2-30), masks
+ (e.g., 0xF0), wildcards (e.g., "*"), or any other suitable indication.
+
+Currently Linux TCP-AO implementation doesn't provide any TCP port matching.
+Probably, port ranges are the most flexible for uAPI, but so far
+not implemented.
+
+4. ``setsockopt()`` vs ``accept()`` race
+========================================
+
+In contrast with TCP-MD5 established connection which has just one key,
+TCP-AO connections may have many keys, which means that accepted connections
+on a listen socket may have any amount of keys as well. As copying all those
+keys on a first properly signed SYN would make the request socket bigger, that
+would be undesirable. Currently, the implementation doesn't copy keys
+to request sockets, but rather look them up on the "parent" listener socket.
+
+The result is that when userspace removes TCP-AO keys, that may break
+not-yet-established connections on request sockets as well as not removing
+keys from sockets that were already established, but not yet ``accept()``'ed,
+hanging in the accept queue.
+
+The reverse is valid as well: if userspace adds a new key for a peer on
+a listener socket, the established sockets in accept queue won't
+have the new keys.
+
+At this moment, the resolution for the two races:
+``setsockopt(TCP_AO_ADD_KEY)`` vs ``accept()``
+and ``setsockopt(TCP_AO_DEL_KEY)`` vs ``accept()`` is delegated to userspace.
+This means that it's expected that userspace would check the MKTs on the socket
+that was returned by ``accept()`` to verify that any key rotation that
+happened on listen socket is reflected on the newly established connection.
+
+This is a similar "do-nothing" approach to TCP-MD5 from the kernel side and
+may be changed later by introducing new flags to ``tcp_ao_add``
+and ``tcp_ao_del``.
+
+Note that this race is rare for it needs TCP-AO key rotation to happen
+during the 3-way handshake for the new TCP connection.
+
+5. Interaction with TCP-MD5
+===========================
+
+A TCP connection can not migrate between TCP-AO and TCP-MD5 options. The
+established sockets that have either AO or MD5 keys are restricted for
+adding keys of the other option.
+
+For listening sockets the picture is different: BGP server may want to receive
+both TCP-AO and (deprecated) TCP-MD5 clients. As a result, both types of keys
+may be added to TCP_CLOSED or TCP_LISTEN sockets. It's not allowed to add
+different types of keys for the same peer.
+
+6. SNE Linux implementation
+===========================
+
+RFC 5925 [6.2] describes the algorithm of how to extend TCP sequence numbers
+with SNE. In short: TCP has to track the previous sequence numbers and set
+sne_flag when the current SEQ number rolls over. The flag is cleared when
+both current and previous SEQ numbers cross 0x7fff, which is 32Kb.
+
+In times when sne_flag is set, the algorithm compares SEQ for each packet with
+0x7fff and if it's higher than 32Kb, it assumes that the packet should be
+verified with SNE before the increment. As a result, there's
+this [0; 32Kb] window, when packets with (SNE - 1) can be accepted.
+
+Linux implementation simplifies this a bit: as the network stack already tracks
+the first SEQ byte that ACK is wanted for (snd_una) and the next SEQ byte that
+is wanted (rcv_nxt) - that's enough information for a rough estimation
+on where in the 4GB SEQ number space both sender and receiver are.
+When they roll over to zero, the corresponding SNE gets incremented.
+
+tcp_ao_compute_sne() is called for each TCP-AO segment. It compares SEQ numbers
+from the segment with snd_una or rcv_nxt and fits the result into a 2GB window around them,
+detecting SEQ numbers rolling over. That simplifies the code a lot and only
+requires SNE numbers to be stored on every TCP-AO socket.
+
+The 2GB window at first glance seems much more permissive compared to
+RFC 5926. But that is only used to pick the correct SNE before/after
+a rollover. It allows more TCP segment replays, but yet all regular
+TCP checks in tcp_sequence() are applied on the verified segment.
+So, it trades a bit more permissive acceptance of replayed/retransmitted
+segments for the simplicity of the algorithm and what seems better behaviour
+for large TCP windows.
+
+7. Links
+========
+
+RFC 5925 The TCP Authentication Option
+ https://www.rfc-editor.org/rfc/pdfrfc/rfc5925.txt.pdf
+
+RFC 5926 Cryptographic Algorithms for the TCP Authentication Option (TCP-AO)
+ https://www.rfc-editor.org/rfc/pdfrfc/rfc5926.txt.pdf
+
+Draft "SHA-2 Algorithm for the TCP Authentication Option (TCP-AO)"
+ https://datatracker.ietf.org/doc/html/draft-nayak-tcp-sha2-03
+
+RFC 2385 Protection of BGP Sessions via the TCP MD5 Signature Option
+ https://www.rfc-editor.org/rfc/pdfrfc/rfc2385.txt.pdf
+
+:Author: Dmitry Safonov <dima@arista.com>