summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-08-26Merge tag 'mac80211-next-for-net-next-2021-08-26' of ↵David S. Miller9-3/+441
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few more things: * Use correct DFS domain for self-managed devices * some preparations for transmit power element handling and other 6 GHz regulatory handling * TWT support in AP mode in mac80211 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26sock: remove one redundant SKB_FRAG_PAGE_ORDER macroYunsheng Lin1-1/+0
Both SKB_FRAG_PAGE_ORDER are defined to the same value in net/core/sock.c and drivers/vhost/net.c. Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h, as both net/core/sock.c and drivers/vhost/net.c include it, and it seems a reasonable file to put the macro. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26cfg80211: use wiphy DFS domain if it is self-managedSriram R1-1/+8
Currently during CAC start or other radar events, the DFS domain is fetched from cfg based on global DFS domain, even if the wiphy regdomain disagrees. But this could be different in case of self managed wiphy's in case the self managed driver updates its database or supports regions which has DFS domain set to UNSET in cfg80211 local regdomain. So for explicitly self-managed wiphys, just use their DFS domain. Signed-off-by: Sriram R <srirrama@codeaurora.org> Link: https://lore.kernel.org/r/1629934730-16388-1-git-send-email-srirrama@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-26mac80211: parse transmit power envelope elementWen Gong2-0/+15
Parse and store the transmit power envelope element. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210820122041.12157-8-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-25net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpidVladimir Oltean1-4/+34
Introduced in commit 38b5beeae7a4 ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid function solved quite a different problem than our needs are now. Then, we used best-effort VLAN filtering and we were using the xmit_tpid to tunnel packets coming from an 8021q upper through the TX VLAN allocated by tag_8021q to that egress port. The need for a different VLAN protocol depending on switch revision came from the fact that this in itself was more of a hack to trick the hardware into accepting tunneled VLANs in the first place. Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we supported them again, we would not do that using the same method of {tunneling the VLAN on egress, retagging the VLAN on ingress} that we had in the best-effort VLAN filtering mode. It seems rather simpler that we just allocate a VLAN in the VLAN table that is simply not used by the bridge at all, or by any other port. Anyway, I have 2 gripes with the current sja1105_xmit_tpid: 1. When sending packets on behalf of a VLAN-aware bridge (with the new TX forwarding offload framework) plus untagged (with the tag_8021q VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent through the DSA master have a VLAN protocol of 0x8100 and others of 0x88a8. This is strange and there is no reason for it now. If we have a bridge and are therefore forced to send using that bridge's TPID, we can as well blend with that bridge's VLAN protocol for all packets. 2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver, because it looks inside dp->priv. It is desirable to keep as much separation between taggers and switch drivers as possible. Now it doesn't do that anymore. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: dsa: sja1105: drop untagged packets on the CPU and DSA portsVladimir Oltean1-1/+40
The sja1105 driver is a bit special in its use of VLAN headers as DSA tags. This is because in VLAN-aware mode, the VLAN headers use an actual TPID of 0x8100, which is understood even by the DSA master as an actual VLAN header. Furthermore, control packets such as PTP and STP are transmitted with no VLAN header as a DSA tag, because, depending on switch generation, there are ways to steer these control packets towards a precise egress port other than VLAN tags. Transmitting control packets as untagged means leaving a door open for traffic in general to be transmitted as untagged from the DSA master, and for it to traverse the switch and exit a random switch port according to the FDB lookup. This behavior is a bit out of line with other DSA drivers which have native support for DSA tagging. There, it is to be expected that the switch only accepts DSA-tagged packets on its CPU port, dropping everything that does not match this pattern. We perhaps rely a bit too much on the switches' hardware dropping on the CPU port, and place no other restrictions in the kernel data path to avoid that. For example, sja1105 is also a bit special in that STP/PTP packets are transmitted using "management routes" (sja1105_port_deferred_xmit): when sending a link-local packet from the CPU, we must first write a SPI message to the switch to tell it to expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route it towards port 3 when it gets it. This entry expires as soon as it matches a packet received by the switch, and it needs to be reinstalled for the next packet etc. All in all quite a ghetto mechanism, but it is all that the sja1105 switches offer for injecting a control packet. The driver takes a mutex for serializing control packets and making the pairs of SPI writes of a management route and its associated skb atomic, but to be honest, a mutex is only relevant as long as all parties agree to take it. With the DSA design, it is possible to open an AF_PACKET socket on the DSA master net device, and blast packets towards 01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use, it all goes kaput because management routes installed by the driver will match skbs sent by the DSA master, and not skbs generated by the driver itself. So they will end up being routed on the wrong port. So through the lens of that, maybe it would make sense to avoid that from happening by doing something in the network stack, like: introduce a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around dev_hard_start_xmit(), introduce the following check: if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa) kfree_skb(skb); Ok, maybe that is a bit drastic, but that would at least prevent a bunch of problems. For example, right now, even though the majority of DSA switches drop packets without DSA tags sent by the DSA master (and therefore the majority of garbage that user space daemons like avahi and udhcpcd and friends create), it is still conceivable that an aggressive user space program can open an AF_PACKET socket and inject a spoofed DSA tag directly on the DSA master. We have no protection against that; the packet will be understood by the switch and be routed wherever user space says. Furthermore: there are some DSA switches where we even have register access over Ethernet, using DSA tags. So even user space drivers are possible in this way. This is a huge hole. However, the biggest thing that bothers me is that udhcpcd attempts to ask for an IP address on all interfaces by default, and with sja1105, it will attempt to get a valid IP address on both the DSA master as well as on sja1105 switch ports themselves. So with IP addresses in the same subnet on multiple interfaces, the routing table will be messed up and the system will be unusable for traffic until it is configured manually to not ask for an IP address on the DSA master itself. It turns out that it is possible to avoid that in the sja1105 driver, at least very superficially, by requesting the switch to drop VLAN-untagged packets on the CPU port. With the exception of control packets, all traffic originated from tag_sja1105.c is already VLAN-tagged, so only STP and PTP packets need to be converted. For that, we need to uphold the equivalence between an untagged and a pvid-tagged packet, and to remember that the CPU port of sja1105 uses a pvid of 4095. Now that we drop untagged traffic on the CPU port, non-aggressive user space applications like udhcpcd stop bothering us, and sja1105 effectively becomes just as vulnerable to the aggressive kind of user space programs as other DSA switches are (ok, users can also create 8021q uppers on top of the DSA master in the case of sja1105, but in future patches we can easily deny that, but it still doesn't change the fact that VLAN-tagged packets can still be injected over raw sockets). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: add the mibs for MP_FAILGeliang Tang4-0/+6
This patch added the mibs for MP_FAIL: MPTCP_MIB_MPFAILTX and MPTCP_MIB_MPFAILRX. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: send out MP_FAIL when data checksum failsGeliang Tang2-0/+28
When a bad checksum is detected, set the send_mp_fail flag to send out the MP_FAIL option. Add a new function mptcp_has_another_subflow() to check whether there's only a single subflow. When multiple subflows are in use, close the affected subflow with a RST that includes an MP_FAIL option and discard the data with the bad checksum. Set the sk_state of the subsocket to TCP_CLOSE, then the flag MPTCP_WORK_CLOSE_SUBFLOW will be set in subflow_sched_work_if_closed, and the subflow will be closed. When a single subfow is in use, temporarily handled by sending MP_FAIL with a RST too. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: MP_FAIL suboption receivingGeliang Tang3-0/+24
This patch added handling for receiving MP_FAIL suboption. Add a new members mp_fail and fail_seq in struct mptcp_options_received. When MP_FAIL suboption is received, set mp_fail to 1 and save the sequence number to fail_seq. Then invoke mptcp_pm_mp_fail_received to deal with the MP_FAIL suboption. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: MP_FAIL suboption sendingGeliang Tang2-4/+58
This patch added the MP_FAIL suboption sending support. Add a new flag named send_mp_fail in struct mptcp_subflow_context. If this flag is set, send out MP_FAIL suboption. Add a new member fail_seq in struct mptcp_out_options to save the data sequence number to put into the MP_FAIL suboption. An MP_FAIL option could be included in a RST or on the subflow-level ACK. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25mptcp: optimize out option generationPaolo Abeni2-109/+121
Currently we have several protocol constraints on MPTCP options generation (e.g. MPC and MPJ subopt are mutually exclusive) and some additional ones required by our implementation (e.g. almost all ADD_ADDR variant are mutually exclusive with everything else). We can leverage the above to optimize the out option generation: we check DSS/MPC/MPJ presence in a mutually exclusive way, avoiding many unneeded conditionals in the common cases. Additionally extend the existing constraints on ADD_ADDR opt on all subvariants, so that it becomes fully mutually exclusive with the above and we can skip another conditional statement for the common case. This change is also needed by the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net-next: When a bond have a massive amount of VLANs with IPv6 addresses, ↵Gilad Naaman1-46/+98
performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically. The source of most of the slow down is the `dev_addr_lists.c` module, which mainatins a linked list of HW addresses. When using IPv6, this list grows for each IPv6 address added on a VLAN, since each IPv6 address has a multicast HW address associated with it. When performing any modification to the involved links, this list is traversed many times, often for nothing, all while holding the RTNL lock. Instead, this patch adds an auxilliary rbtree which cuts down traversal time significantly. Performance can be seen with the following script: #!/bin/bash ip netns del test || true 2>/dev/null ip netns add test echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null set -e ip -n test link add foo type veth peer name bar ip -n test link add b1 type bond ip -n test link add florp type vrf table 10 ip -n test link set bar master b1 ip -n test link set foo up ip -n test link set bar up ip -n test link set b1 up ip -n test link set florp up VLAN_COUNT=1500 BASE_DEV=b1 echo Creating vlans ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done" echo Bringing them up ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i up; done" echo Assiging IPv6 Addresses ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test address add dev foo.\$i 2000::\$i/64; done" echo Attaching to VRF ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i master florp; done" On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance before the patch is (truncated): Creating vlans real 108.35 Bringing them up real 4.96 Assiging IPv6 Addresses real 19.22 Attaching to VRF real 458.84 After the patch: Creating vlans real 5.59 Bringing them up real 5.07 Assiging IPv6 Addresses real 5.64 Attaching to VRF real 25.37 Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Lu Wei <luwei32@huawei.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: bridge: change return type of br_handle_ingress_vlan_tunnelKangmin Park3-16/+11
br_handle_ingress_vlan_tunnel() is only referenced in br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and return non-zero value, goto drop in br_handle_frame(). But, br_handle_ingress_vlan_tunnel() always return 0. So, the routines that check the return value and goto drop has no meaning. Therefore, change return type of br_handle_ingress_vlan_tunnel() to void and remove if statement of br_handle_frame(). Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24ethtool: extend coalesce setting uAPI with CQE modeYufeng Mo2-6/+19
In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24ethtool: add two coalesce attributes for CQE modeYufeng Mo2-3/+18
Currently, there are many drivers who support CQE mode configuration, some configure it as a fixed when initialized, some provide an interface to change it by ethtool private flags. In order to make it more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and 'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these parameters can be accessed by ethtool netlink coalesce uAPI. Also add an new structure kernel_ethtool_coalesce, then the new parameter can be added into this struct. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24page_pool: use relaxed atomic for release side accountingYunsheng Lin1-1/+1
There is no need to synchronize the account updating, so use the relaxed atomic to avoid some memory barrier in the data path. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: let drivers state that they need VLAN filtering while standaloneVladimir Oltean2-9/+24
As explained in commit e358bef7c392 ("net: dsa: Give drivers the chance to veto certain upper devices"), the hellcreek driver uses some tricks to comply with the network stack expectations: it enforces port separation in standalone mode using VLANs. For untagged traffic, bridging between ports is prevented by using different PVIDs, and for VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on two ports, so packets with one VLAN cannot leak from one port to another. That is almost fine*, and has worked because hellcreek relied on an implicit behavior of the DSA core that was changed by the previous patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in standalone mode, that feature was actually incorrectly reflecting the hardware/driver state, so there was a desire to fix it. This leaves the hellcreek driver in a situation where it has to explicitly request this behavior from the DSA framework. We configure the ports as follows: - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid and will add a VLAN to the hardware tables, giving the driver the opportunity to refuse it through .port_prechangeupper. - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper on top of a bridged hellcreek port will not go through dsa_slave_vlan_rx_add_vid, because there will not be any attempt to offload this VLAN. The driver already disables VLAN awareness, so that upper should receive the traffic it needs. - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid, and can again be vetoed through .port_prechangeupper. *It is not actually completely fine, because if I follow through correctly, we can have the following situation: ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 # lan0 now becomes VLAN-unaware ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation This patch fixes that corner case by extending the DSA core logic, based on this requested attribute, to change the VLAN awareness state of the switch (port) when it leaves the bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: don't advertise 'rx-vlan-filter' when not neededVladimir Oltean3-5/+111
There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit 9b236d2a69da ("net: dsa: Advertise the VLAN offload netdev ability only if switch supports it"). Up to that point, the driver was returning -EOPNOTSUPP and DSA was reconverting that error to 0, making the 8021q upper think all is ok (but obviously the error message was there even prior to this change). After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it is a hard error. Case 2: Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL because they don't implement .port_bridge_{join,leave}). Understandably, a standalone port should not offload VLANs either, it should remain VLAN unaware and any VLAN should be a software VLAN (as long as the hardware is not quirky, that is). In fact, dsa_slave_port_obj_add does do the right thing and rejects switchdev VLAN objects coming from the bridge when that bridge is not offloaded: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); But it seems that the bridge is able to trick us. The __vlan_vid_add from br_vlan.c has: /* Try switchdev op first. In case it is not supported, fallback to * 8021q add. */ err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack); if (err == -EOPNOTSUPP) return vlan_vid_add(dev, br->vlan_proto, v->vid); So it says "no, no, you need this VLAN in your life!". And we, naive as we are, say "oh, this comes from the vlan_vid_add code path, it must be an 8021q upper, sure, I'll take that". And we end up with that bridge VLAN installed on our port anyway. But this time, it has the wrong flags: if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN, failed via switchdev, retried via vlan_vid_add, we have this comment: /* This API only allows programming tagged, non-PVID VIDs */ So what we do makes absolutely no sense. Backtracing a bit, we see the common pattern. We allow the network stack to think that our standalone ports are VLAN-aware, but they aren't, for the vast majority of switches. The quirky ones should not dictate the norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid methods exist for drivers that need the 'rx-vlan-filter: on' feature in ethtool -k, which can be due to any of the following reasons: 1. vlan_filtering_is_global = true, and some ports are under a VLAN-aware bridge while others are standalone, and the standalone ports would otherwise drop VLAN-tagged traffic. This is described in commit 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation"). 2. the ports that are under a VLAN-aware bridge should also set this feature, for 8021q uppers having a VID not claimed by the bridge. In this case, the driver will essentially not even know that the VID is coming from the 8021q layer and not the bridge. 3. Hellcreek. This driver needs it because in standalone mode, it uses unique VLANs per port to ensure separation. For separation of untagged traffic, it uses different PVIDs for each port, and for separation of VLAN-tagged traffic, it never accepts 8021q uppers with the same vid on two ports. If a driver does not fall under any of the above 3 categories, there is no reason why it should advertise the 'rx-vlan-filter' feature, therefore no reason why it should offload the VLANs added through vlan_vid_add. This commit fixes the problem by removing the 'rx-vlan-filter' feature from the slave devices when they operate in standalone mode, and when they offload a VLAN-unaware bridge. The way it works is that vlan_vid_add will now stop its processing here: vlan_add_rx_filter_info: if (!vlan_hw_filter_capable(dev, proto)) return 0; So the VLAN will still be saved in the interface's VLAN RX filtering list, but because it does not declare VLAN filtering in its features, the 8021q module will return zero without committing that VLAN to hardware. This gives the drivers what they want, since it keeps the 8021q VLANs away from the VLAN table until VLAN awareness is enabled (point at which the ports are no longer standalone, hence in the mv88e6xxx case, the check in mv88e6xxx_port_vlan_prepare passes). Since the issue predates the existence of the hellcreek driver, case 3 will be dealt with in a separate patch. The main change that this patch makes is to no longer set NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically (for most switches, never). The second part of the patch addresses an issue that the first part introduces: because the 'rx-vlan-filter' feature is now dynamically toggled, and our .ndo_vlan_rx_add_vid does not get called when 'rx-vlan-filter' is off, we need to avoid bugs such as the following by replaying the VLANs from 8021q uppers every time we enable VLAN filtering: ip link add link lan0 name lan0.100 type vlan id 100 ip addr add 192.168.100.1/24 dev lan0.100 ping 192.168.100.2 # should work ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 ping 192.168.100.2 # should still work ip link set br0 type bridge vlan_filtering 1 ping 192.168.100.2 # should still work but doesn't As reported by Florian, some drivers look at ds->vlan_filtering in their .port_vlan_add() implementation. So this patch also makes sure that ds->vlan_filtering is committed before calling the driver. This is the reason why it is first committed, then restored on the failure path. Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: properly fall back to software bridgingVladimir Oltean2-2/+9
If the driver does not implement .port_bridge_{join,leave}, then we must fall back to standalone operation on that port, and trigger the error path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL. In turn, having a non-NULL dp->bridge_dev when there is no offloading support makes the following things go wrong: - dsa_default_offload_fwd_mark make the wrong decision in setting skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for ports that don't offload the bridge, which should instruct the bridge to forward in software. But this does not happen, dp->bridge_dev is incorrectly set to point to the bridge, so the bridge is told that packets have been forwarded in hardware, which they haven't. - switchdev objects (MDBs, VLANs) should not be offloaded by ports that don't offload the bridge. Standalone ports should behave as packet-in, packet-out and the bridge should not be able to manipulate the pvid of the port, or tag stripping on egress, or ingress filtering. This should already work fine because dsa_slave_port_obj_add has: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); but since dsa_port_offloads_bridge_port works based on dp->bridge_dev, this is again sabotaging us. All the above work in case the port has an unoffloaded LAG interface, so this is well exercised code, we should apply it for plain unoffloaded bridge ports too. Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: dsa: don't call switchdev_bridge_port_unoffload for unoffloaded bridge ↵Vladimir Oltean1-0/+4
ports For ports that have a NULL dp->bridge_dev, dsa_port_to_bridge_port() also returns NULL as expected. Issue #1 is that we are performing a NULL pointer dereference on brport_dev. Issue #2 is that these are ports on which switchdev_bridge_port_offload has not been called, so we should not call switchdev_bridge_port_unoffload on them either. Both issues are addressed by checking against a NULL brport_dev in dsa_port_pre_bridge_leave and exiting early. Fixes: 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mac80211: introduce individual TWT support in AP modeLorenzo Bianconi7-2/+418
Introduce TWT action frames parsing support to mac80211. Currently just individual TWT agreement are support in AP mode. Whenever the AP receives a TWT action frame from an associated client, after performing sanity checks, it will notify the underlay driver with requested parameters in order to check if they are supported and if there is enough room for a new agreement. The driver is expected to set the agreement result and report it to mac80211. Drivers supporting this have two new callbacks: - add_twt_setup (mandatory) - twt_teardown_request (optional) mac80211 will send an action frame reply according to the result reported by the driver. Tested-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/257512f2e22ba42b9f2624942a128dd8f141de4b.1629741512.git.lorenzo@kernel.org [use le16p_replace_bits(), minor cleanups, use (void *) casts, fix to use ieee80211_get_he_iftype_cap() correctly] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-24mptcp: remove MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORTYonglong Li3-21/+3
MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT are not necessary, we can get these info from pm.local or pm.remote. Drop mptcp_pm_should_add_signal_ipv6 and mptcp_pm_should_add_signal_port too. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mptcp: build ADD_ADDR/echo-ADD_ADDR option according pm.add_signalYonglong Li2-6/+10
According to the MPTCP_ADD_ADDR_SIGNAL or MPTCP_ADD_ADDR_ECHO flag, build the ADD_ADDR/ADD_ADDR_ECHO option. In mptcp_pm_add_addr_signal(), use opts->addr to save the announced ADD_ADDR or ADD_ADDR_ECHO address. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mptcp: fix ADD_ADDR and RM_ADDR maybe flush addr_signal each otherYonglong Li1-3/+10
ADD_ADDR shares pm.addr_signal with RM_ADDR, so after RM_ADDR/ADD_ADDR has done, we should not clean ADD_ADDR/RM_ADDR's addr_signal. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mptcp: make MPTCP_ADD_ADDR_SIGNAL and MPTCP_ADD_ADDR_ECHO separateYonglong Li3-8/+18
Use MPTCP_ADD_ADDR_SIGNAL only for the action of sending ADD_ADDR, and use MPTCP_ADD_ADDR_ECHO only for the action of sending ADD_ADDR echo. Use msk->pm.local to save the announced ADD_ADDR address only, and reuse msk->pm.remote to save the announced ADD_ADDR_ECHO address. To prepare for the next patch. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24mptcp: move drop_other_suboptions check under pm lockYonglong Li3-18/+31
This patch moved the drop_other_suboptions check from mptcp_established_options_add_addr() into mptcp_pm_add_addr_signal(), do it under the PM lock to avoid the race between this check and mptcp_pm_add_addr_signal(). For this, added a new parameter for mptcp_pm_add_addr_signal() to get the drop_other_suboptions value. And drop the other suboptions after the option length check if drop_other_suboptions is true. Additionally, always drop the other suboption for TCP pure ack: that makes both the code simpler and the MPTCP behaviour more consistent. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24net: ipv4: Move ip_options_fragment() out of loopYajun Deng1-15/+4
The ip_options_fragment() only called when iter->offset is equal to zero, so move it out of loop, and inline 'Copy the flags to each fragment.' As also, remove the unused parameter in ip_frag_ipcb(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23net: dsa: track unique bridge numbers across all DSA switch treesVladimir Oltean3-34/+55
Right now, cross-tree bridging setups work somewhat by mistake. In the case of cross-tree bridging with sja1105, all switch instances need to agree upon a common VLAN ID for forwarding a packet that belongs to a certain bridging domain. With TX forwarding offload, the VLAN ID is the bridge VLAN for VLAN-aware bridging, and the tag_8021q TX forwarding offload VID (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging. The VBID for VLAN-unaware bridging is derived from the dp->bridge_num value calculated by DSA independently for each switch tree. If ports from one tree join one bridge, and ports from another tree join another bridge, DSA will assign them the same bridge_num, even though the bridges are different. If cross-tree bridging is supported, this is an issue. Modify DSA to calculate the bridge_num globally across all switch trees. This has the implication for a driver that the dp->bridge_num value that DSA will assign to its ports might not be contiguous, if there are boards with multiple DSA drivers instantiated. Additionally, all bridge_num values eat up towards each switch's ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate, and can be seen as a limitation introduced by this patch. However, that is the lesser evil for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'mac80211-next-for-net-next-2021-08-20' of ↵Jakub Kicinski13-63/+518
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Minor updates: * BSS coloring support * MEI commands for Intel platforms * various fixes/cleanups * tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: cfg80211: fix BSS color notify trace enum confusion mac80211: Fix insufficient headroom issue for AMSDU mac80211: add support for BSS color change nl80211: add support for BSS coloring mac80211: Use flex-array for radiotap header bitmap mac80211: radiotap: Use BIT() instead of shifts mac80211: Remove unnecessary variable and label mac80211: include <linux/rbtree.h> mac80211: Fix monitor MTU limit so that A-MSDUs get through mac80211: remove unnecessary NULL check in ieee80211_register_hw() mac80211: Reject zero MAC address in sta_info_insert_check() nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage ==================== Link: https://lore.kernel.org/r/20210820105329.48674-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-20net: bridge: vlan: convert mcast router global option to per-vlan entryNikolay Aleksandrov4-18/+62
The per-vlan router option controls the port/vlan and host vlan entries' mcast router config. The global option controlled only the host vlan config, but that is unnecessary and incosistent as it's not really a global vlan option, but rather bridge option to control host router config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host vlan and port vlan mcast router config. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: bridge: mcast: br_multicast_set_port_router takes multicast context as ↵Nikolay Aleksandrov4-8/+11
argument Change br_multicast_set_port_router to take port multicast context as its first argument so we can later use it to control port/vlan mcast router option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'for-net-next-2021-08-19' of ↵David S. Miller9-180/+298
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn Mediatek Chip - Add support for LG LGSBWAC92/TWCM-K505D - hci_h5 flow control fixes and suspend support - Switch to use lock_sock for SCO and RFCOMM - Various fixes for extended advertising - Reword Intel's setup on btusb unifying the supported generations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge tag 'batadv-next-pullrequest-20210819' of ↵David S. Miller26-454/+362
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update docs about move IRC channel away from freenode, by Sven Eckelmann - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann - Update NULL checks, by Sven Eckelmann (2 patches) - remove remaining skb-copy calls for broadcast packets, by Linus Lüssing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-43/+22
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Bluetooth: Fix return value in hci_dev_do_close()Kangmin Park1-3/+4
hci_error_reset() return without calling hci_dev_do_open() when hci_dev_do_close() return error value which is not 0. Also, hci_dev_close() return hci_dev_do_close() function's return value. But, hci_dev_do_close() return always 0 even if hdev->shutdown return error value. So, fix hci_dev_do_close() to save and return the return value of the hdev->shutdown when it is called. Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-08-19Bluetooth: add timeout sanity check to hci_inquiryPavel Skripkin1-0/+6
Syzbot hit "task hung" bug in hci_req_sync(). The problem was in unreasonable huge inquiry timeout passed from userspace. Fix it by adding sanity check for timeout value to hci_inquiry(). Since hci_inquiry() is the only user of hci_req_sync() with user controlled timeout value, it makes sense to check timeout value in hci_inquiry() and don't touch hci_req_sync(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+be2baed593ea56c6a84c@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-08-19Bluetooth: mgmt: Pessimize compile-time bounds-checkKees Cook1-1/+1
After gaining __alloc_size hints, GCC thinks it can reach a memcpy() with eir_len == 0 (since it can't see into the rewrite of status). Instead, check eir_len == 0, avoiding this future warning: In function 'eir_append_data', inlined from 'read_local_oob_ext_data_complete' at net/bluetooth/mgmt.c:7210:12: ./include/linux/fortify-string.h:54:29: warning: '__builtin_memcpy' offset 5 is out of the bounds [0, 3] [-Warray-bounds] ... net/bluetooth/hci_request.h:133:2: note: in expansion of macro 'memcpy' 133 | memcpy(&eir[eir_len], data, data_len); | ^~~~~~ Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-08-19net: Fix offloading indirect devices dependency on qdisc order creationEli Cohen4-1/+91
Currently, when creating an ingress qdisc on an indirect device before the driver registered for callbacks, the driver will not have a chance to register its filter configuration callbacks. To fix that, modify the code such that it keeps track of all the ingress qdiscs that call flow_indr_dev_setup_offload(). When a driver calls flow_indr_dev_register(), go through the list of tracked ingress qdiscs and call the driver callback entry point so as to give it a chance to register its callback. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net/core: Remove unused field from struct flow_indr_devEli Cohen1-1/+0
rcu field is not used. Remove it. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19mptcp: full fully established support after ADD_ADDRMatthieu Baerts1-7/+3
If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR with HMAC from the server, it is enough to switch to a "fully established" mode because it has received more MPTCP options. It was then OK to enable the "fully_established" flag on the MPTCP socket. Still, best to check if the ADD_ADDR looks valid by looking if it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received while we are not in "fully established" mode, it is strange and then we should not switch to this mode now. But that is not enough. On one hand, the path-manager has be notified the state has changed. On the other hand, the "fully_established" flag on the subflow socket should be turned on as well not to re-send the MP_CAPABLE 3rd ACK content with the next ACK. Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19mptcp: fix memory leak on address flushPaolo Abeni1-32/+12
The endpoint cleanup path is prone to a memory leak, as reported by syzkaller: BUG: memory leak unreferenced object 0xffff88810680ea00 (size 64): comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s) hex dump (first 32 bytes): 58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de Xu}<...."....... 01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00 ................ backtrace: [<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline] [<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170 [<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731 [<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] [<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792 [<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504 [<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 [<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340 [<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929 [<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline] [<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724 [<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403 [<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457 [<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 [<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae We should not require an allocation to cleanup stuff. Rework the code a bit so that the additional RCU work is no more needed. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19net/rds: dma_map_sg is entitled to merge entriesGerd Rausch1-2/+2
Function "dma_map_sg" is entitled to merge adjacent entries and return a value smaller than what was passed as "nents". Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len") rather than the original "nents" parameter ("sg_len"). This old RDS bug was exposed and reliably causes kernel panics (using RDMA operations "rds-stress -D") on x86_64 starting with: commit c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Simply put: Linux 5.11 and later. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18batman-adv: bcast: remove remaining skb-copy callsLinus Lüssing1-2/+10
We currently have two code paths for broadcast packets: A) self-generated, via batadv_interface_tx()-> batadv_send_bcast_packet(). B) received/forwarded, via batadv_recv_bcast_packet()-> batadv_forw_bcast_packet(). For A), self-generated broadcast packets: The only modifications to the skb data is the ethernet header which is added/pushed to the skb in batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before doing so, batadv_skb_head_push() is called which calls skb_cow_head() to unshare the space for the to be pushed ethernet header. So for this case, it is safe to use skb clones. For B), received/forwarded packets: The same applies as in A) for the to be forwarded packets. Only the ethernet header is added. However after (queueing for) forwarding the packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a packet is additionally decapsulated and is sent up the stack through batadv_recv_bcast_packet()->batadv_interface_rx(). Protocols higher up the stack are already required to check if the packet is shared and create a copy for further modifications. When the next (protocol) layer works correctly, it cannot happen that it tries to operate on the data behind the skb clone which is still queued up for forwarding. Co-authored-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-08-18pktgen: Remove fill_imix_distribution() CONFIG_XFRM dependencyNick Richardson1-27/+26
Currently, the declaration of fill_imix_distribution() is dependent on CONFIG_XFRM. This is incorrect. Move fill_imix_distribution() declaration out of #ifndef CONFIG_XFRM block. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()Wei Wang3-6/+16
Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18ovs: clear skb->tstamp in forwarding pathkaixi.fan1-0/+1
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs doesn't clear skb->tstamp. We encountered a problem with linux version 5.4.56 and ovs version 2.14.1, and packets failed to dequeue from qdisc when fq qdisc was attached to ovs port. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com> Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: net_namespace: Optimize the codeYajun Deng1-28/+24
There is only one caller for ops_free(), so inline it. Separate net_drop_ns() and net_free(), so the net_free() can be called directly. Add free_exit_list() helper function for free net_exit_list. ==================== v2: - v1 does not apply, rebase it. ==================== Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: dsa: tag_sja1105: be dsa_loop-safeVladimir Oltean2-13/+28
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18sch_cake: fix srchost/dsthost hashing modeToke Høiland-Jørgensen1-1/+1
When adding support for using the skb->hash value as the flow hash in CAKE, I accidentally introduced a logic error that broke the host-only isolation modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash variable should stay initialised to 0 in cake_hash() in pure host-based hashing mode. Add a check for this before using the skb->hash value as flow_hash. Fixes: b0c19ed6088a ("sch_cake: Take advantage of skb->hash where appropriate") Reported-by: Pete Heist <pete@heistp.net> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: procfs: add seq_puts() statement for dev_mcastYajun Deng1-11/+13
Add seq_puts() statement for dev_mcast, make it more readable. As also, keep vertical alignment for {dev, ptype, dev_mcast} that under /proc/net. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>