summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-04i40e: Add new versions of send ASQ command functionsJedrzej Jagielski3-8/+150
ASQ send command functions are returning only i40e status codes yet some calling functions also need Admin Queue status that is stored in hw->aq.asq_last_status. Since hw object is stored on a heap it introduces a possibility for a race condition in access to hw if calling function is not fast enough to read hw->aq.asq_last_status before next send ASQ command is executed. Add new versions of send ASQ command functions that return Admin Queue status on the stack to avoid race conditions in access to hw->aq.asq_last_status. Add new _v2 version of i40e_aq_remove_macvlan that is using new _v2 versions of ASQ send command functions and returns the Admin Queue status on the stack. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-04i40e: Add sending commands in atomic contextJedrzej Jagielski1-9/+12
Change functions: - i40e_aq_add_macvlan - i40e_aq_remove_macvlan - i40e_aq_delete_element - i40e_aq_add_vsi - i40e_aq_update_vsi_params to explicitly use i40e_asq_send_command_atomic(..., true) instead of i40e_asq_send_command, as they use mutexes and do some work in an atomic context. Without this change setting vlan via netdev will fail with call trace cased by bug "BUG: scheduling while atomic". Signed-off-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-04i40e: remove enum i40e_client_stateJakub Kicinski1-10/+0
It's not used. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-04i40e: Remove unused RX realloc statJoe Damato2-3/+1
After commit 1a557afc4dd5 ("i40e: Refactor receive routine"), rx_stats.realloc_count is no longer being incremented, so remove it. The debugfs string was left, but hardcoded to 0. This is intended to prevent breaking any existing code / scripts that are parsing debugfs for i40e. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-04i40e: Disable hw-tc-offload feature on driver loadMateusz Palczewski1-1/+4
After loading driver hw-tc-offload is enabled by default. Change the behaviour of driver to disable hw-tc-offload by default as this is the expected state. Additionally since this impacts ntuple feature state change the way of checking NETIF_F_HW_TC flag. Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-03printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()Mickaël Salaün1-1/+1
The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to kernel/printk/sysctl.c introduced an incorrect __user attribute to the buffer argument. I spotted this change in [1] as well as the kernel test robot. Revert this change to please sparse: kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces) kernel/printk/sysctl.c:20:51: expected void * kernel/printk/sysctl.c:20:51: got void [noderef] __user *buffer Fixes: faaa357a55e0 ("printk: move printk sysctl to printk/sysctl.c") Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1] Reported-by: kernel test robot <lkp@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Revert "module, async: async_synchronize_full() on module init iff async is ↵Igor Pylypiv3-24/+5
used" This reverts commit 774a1221e862b343388347bac9b318767336b20b. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags |= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit 21c3c5d28007 ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit 774a1221e862 ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Merge branch 'for-5.17-fixes' of ↵Linus Torvalds2-14/+65
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Eric's fix for a long standing cgroup1 permission issue where it only checks for uid 0 instead of CAP which inadvertently allows unprivileged userns roots to modify release_agent userhelper - Fixes for the fallout from Waiman's recent cpuset work * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning cgroup-v1: Require capabilities to set release_agent cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
2022-02-03Merge branch 'net-ipa-enable-register-retention'Jakub Kicinski4-0/+70
Alex Elder says: ==================== net: ipa: enable register retention With runtime power management in place, we sometimes need to issue a command to enable retention of IPA register values before power collapse. This requires a new Device Tree property, whose presence will also be used to signal that the command is required. ==================== Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net: ipa: request IPA register values be retainedAlex Elder3-0/+64
In some cases, the IPA hardware needs to request the always-on subsystem (AOSS) to coordinate with the IPA microcontroller to retain IPA register values at power collapse. This is done by issuing a QMP request to the AOSS microcontroller. A similar request ondoes that request. We must get and hold the "QMP" handle early, because we might get back EPROBE_DEFER for that. But the actual request should be sent while we know the IPA clock is active, and when we know the microcontroller is operational. Fixes: 1aac309d3207 ("net: ipa: use autosuspend") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03dt-bindings: net: qcom,ipa: add optional qcom,qmp propertyAlex Elder1-0/+6
For some systems, the IPA driver must make a request to ensure that its registers are retained across power collapse of the IPA hardware. On such systems, we'll use the existence of the "qcom,qmp" property as a signal that this request is required. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03cgroup/cpuset: Fix "suspicious RCU usage" lockdep warningWaiman Long1-0/+10
It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Fixes: 4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus") Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-03tools/resolve_btfids: Do not print any commands when building silentlyNathan Chancellor1-1/+5
When building with 'make -s', there is some output from resolve_btfids: $ make -sj"$(nproc)" oldconfig prepare MKDIR .../tools/bpf/resolve_btfids/libbpf/ MKDIR .../tools/bpf/resolve_btfids//libsubcmd LINK resolve_btfids Silent mode means that no information should be emitted about what is currently being done. Use the $(silent) variable from Makefile.include to avoid defining the msg macro so that there is no information printed. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
2022-02-03Revert "mm/gup: small refactoring: simplify try_grab_page()"John Hubbard1-5/+30
This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a That commit did a refactoring that effectively combined fast and slow gup paths (again). And that was again incorrect, for two reasons: a) Fast gup and slow gup get reference counts on pages in different ways and with different goals: see Linus' writeup in commit cd1adf1b63a1 ("Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly""), and b) try_grab_compound_head() also has a specific check for "FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller can fall back to slow gup. This resulted in new failures, as recently report by Will McVicker [1]. But (a) has problems too, even though they may not have been reported yet. So just revert this. Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1] Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()") Reported-and-tested-by: Will McVicker <willmcvicker@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Minchan Kim <minchan@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Merge tag 'mips-fixes-5.17_2' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fix missed change for PTR->PTR_WD conversion - kernel-doc fixes * tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: KVM: fix vz.c kernel-doc notation MIPS: octeon: Fix missed PTR->PTR_WD conversion
2022-02-03Merge branch 'dsa-mv88e6xxx-phylink_generic_validate'David S. Miller6-140/+294
Russell King says: ==================== net: dsa: mv88e6xxx: convert to phylink_generic_validate() The overall objective of this series is to convert the mv88e6xxx DSA driver to use phylink_generic_validate(). Patch 1 adds a new helper mv88e6352_g2_scratch_port_has_serdes() which indicates whether an 88e6352 port has a serdes associated with it. This is necessary as ports 4 and 5 will normally be in automedia mode, where the CMODE field in the port status register will change e.g. between 15 (internal PHY) and 9 (1000base-X) depending on whether the serdes has link. The existing code caches the cmode field, and depending whether the serdes has link at probe time, determines whether we allow things such as the serdes statistics to be accessed. This means if the link isn't up at probe time, the serdes is essentially unavailable. Patch 1 addresses this by reading the pin configuration to find out whether the serdes is attached to port 4 or port 5. Patch 2 is a joint effort between myself and Marek Behún, adding the supported interfaces and MAC capabilities to all mv88e6xxx supported switch devices. This is slightly more restrictive than the original code as we didn't used to care too much about the interface mode, but with this we do - which is why we must know if there's a serdes associated now. Patch 3 switches mv88e6xxx to use the generic validation by removing the initialisation of the phylink_validate pointer in the dsa_ops struct. Patch 4 updates the statistics code to use the new helper in patch 1, so the serdes statistics are available even if the link was down at driver probe time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: improve 88e6352 serdes statistics detectionRussell King (Oracle)1-20/+23
The decision whether to report serdes statistics currently depends on the cached C_Mode value for the port, read at probe time or updated by configuration. However, port 4 can be in "automedia" mode when it is used as a serdes port, meaning it switches between the internal PHY and the serdes, changing the read-only C_Mode value depending on which first gains link. Consequently, the C_Mode value read at probe does not accurately reflect whether the port has the serdes associated with it. In "net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()", we added a way to read the hardware configuration to determine which port has the serdes associated with it. Use this to determine which port reports the serdes statistics. Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: convert to phylink_generic_validate()Russell King (Oracle)2-156/+0
Now that the mv88e6xxx chip drivers are supplying the supported interfaces and MAC capabilities, switch the driver to use the generic phylink validation implementation by removing our own validation implementations. This causes DSA to call phylink_generic_validate() on our behalf. Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilitiesRussell King (Oracle)3-3/+279
Populate the supported interfaces and MAC capabilities for the Marvell MV88E6xxx DSA switches in preparation to using these for the validation functionality. Patch co-authored by Marek. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> [ fixed 6341 and 6393x ] Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()Russell King (Oracle)2-0/+31
Read the hardware configuration to determine which port is attached to the serdes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'dsa-mv88e6xxx-port-isolation'David S. Miller11-34/+110
Tobias Waldekranz says: ==================== net: dsa: mv88e6xxx: Improve standalone port isolation The ideal isolation between standalone ports satisfies two properties: 1. Packets from one standalone port must not be forwarded to any other port. 2. Packets from a standalone port must be sent to the CPU port. mv88e6xxx solves (1) by isolating standalone ports using the PVT. Up to this point though, (2) has not guaranteed; as the ATU is still consulted, there is a chance that incoming packets never reach the CPU if its DA has previously been used as the SA of an earlier packet (see 1/5 for more details). This is typically not a problem, except for one very useful setup in which switch ports are looped in order to run the bridge kselftests in tools/testing/selftests/net/forwarding. This series attempts to solve (2). Ideally, we could simply use the "ForceMap" bit of more modern chips (Agate and newer) to classify all incoming packets as MGMT. This is not available on older silicon that is still widely used (Opal Plus chips like the 6097 for example). Instead, this series takes a two pronged approach: 1/5: Always clear MapDA on standalone ports to make sure that no ATU entry can lead packets astray. This solves (2) for single-chip systems. 2/5: Trivial prep work for 4/5. 3/5: Trivial prep work for 4/5. 4/5: On multi-chip systems though, this is not enough. On the incoming chip, the packet will be forced out towards the CPU thanks to 1/5, but on any intermediate chips the ATU is still consulted. We override this behavior by marking the reserved standalone VID (0) as a policy VID, the DSA ports' VID policy is set to TRAP. This will cause the packet to be reclassified as MGMT on the first intermediate chip, after which it's a straight shot towards the CPU. Finally, we allow more tests to be run on mv88e6xxx: 5/5: The bridge_vlan{,un}aware suites sets an ageing_time of 10s on the bridge it creates, but mv88e6xxx has a minimum supported time of 15s. Allow this time to be overridden in forwarding.config. With this series in place, mv88e6xxx passes the following kselftest suites: - bridge_port_isolation.sh - bridge_sticky_fdb.sh - bridge_vlan_aware.sh - bridge_vlan_unaware.sh v1 -> v2: - Wording/spelling (Vladimir) - Use standard iterator in dsa_switch_upstream_port (Vladimir) - Limit enabling of VTU port policy to downstream DSA ports (Vladimir) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: net: bridge: Parameterize ageing timeoutTobias Waldekranz4-4/+9
Allow the ageing timeout that is set on bridges to be customized from forwarding.config. This allows the tests to be run on hardware which does not support a 10s timeout (e.g. mv88e6xxx). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Improve multichip isolation of standalone portsTobias Waldekranz2-19/+51
Given that standalone ports are now configured to bypass the ATU and forward all frames towards the upstream port, extend the ATU bypass to multichip systems. Load VID 0 (standalone) into the VTU with the policy bit set. Since VID 4095 (bridged) is already loaded, we now know that all VIDs in use are always available in all VTUs. Therefore, we can safely enable 802.1Q on DSA ports. Setting the DSA ports' VTU policy to TRAP means that all incoming frames on VID 0 will be classified as MGMT - as a result, the ATU is bypassed on all subsequent switches. With this isolation in place, we are able to support configurations that are simultaneously very quirky and very useful. Quirky because it involves looping cables between local switchports like in this example: CPU | .------. .---0---. | .----0----. | sw0 | | | sw1 | '-1-2-3-' | '-1-2-3-4-' $ @ '---' $ @ % % We have three physically looped pairs ($, @, and %). This is very useful because it allows us to run the kernel's kselftests for the bridge on mv88e6xxx hardware. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Enable port policy support on 6097Tobias Waldekranz1-0/+1
This chip has support for the same per-port policy actions found in later versions of LinkStreet devices. Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Support policy entries in the VTUTobias Waldekranz3-1/+6
A VTU entry with policy enabled is used in combination with a port's VTU policy setting to override normal switching behavior for frames assigned to the entry's VID. A typical example is to Treat all frames in a particular VLAN as control traffic, and trap them to the CPU. In which case the relevant user port's VTU policy would be set to TRAP. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Improve isolation of standalone portsTobias Waldekranz4-10/+43
Clear MapDA on standalone ports to bypass any ATU lookup that might point the packet in the wrong direction. This means that all packets are flooded using the PVT config. So make sure that standalone ports are only allowed to communicate with the local upstream port. Here is a scenario in which this is needed: CPU | .----. .---0---. | .--0--. | sw0 | | | sw1 | '-1-2-3-' | '-1-2-' '---' - sw0p1 and sw1p1 are bridged - sw0p2 and sw1p2 are in standalone mode - Learning must be enabled on sw0p3 in order for hardware forwarding to work properly between bridged ports 1. A packet with SA :aa comes in on sw1p2 1a. Egresses sw1p0 1b. Ingresses sw0p3, ATU adds an entry for :aa towards port 3 1c. Egresses sw0p0 2. A packet with DA :aa comes in on sw0p2 2a. If an ATU lookup is done at this point, the packet will be incorrectly forwarded towards sw0p3. With this change in place, the ATU is bypassed and the packet is forwarded in accordance with the PVT, which only contains the CPU port. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'ptp-virtual-clock-improvements'David S. Miller2-5/+62
Miroslav Lichvar says: ==================== Virtual PTP clock improvements and fix v2: - dropped patch changing initial time of virtual clocks The first patch fixes an oops when unloading a driver with PTP clock and enabled virtual clocks. The other patches add missing features to make synchronization with virtual clocks work as well as with the physical clock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: add getcrosststamp() to virtual clocks.Miroslav Lichvar1-0/+24
If the physical clock supports cross timestamping (it has the getcrosststamp() function), provide a wrapper in the virtual clock to enable cross timestamping. This adds support for the PTP_SYS_OFFSET_PRECISE ioctl. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: add gettimex64() to virtual clocks.Miroslav Lichvar1-1/+28
If the physical clock has the gettimex64() function, provide a gettimex64() wrapper in the virtual clock to enable more accurate and stable synchronization. This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: increase maximum adjustment of virtual clocks.Miroslav Lichvar1-2/+1
Increase the maximum frequency offset of virtual clocks to 50% to enable faster slewing corrections. This value cannot be represented as scaled ppm when long has 32 bits, but that is already the case for other drivers, even those that provide the adjfine() function, i.e. 32-bit applications are expected to check for the limit. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: unregister virtual clocks when unregistering physical clock.Miroslav Lichvar1-2/+9
When unregistering a physical clock which has some virtual clocks, unregister the virtual clocks with it. This fixes the following oops, which can be triggered by unloading a driver providing a PTP clock when it has enabled virtual clocks: BUG: unable to handle page fault for address: ffffffffc04fc4d8 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:ptp_vclock_read+0x31/0xb0 Call Trace: timecounter_read+0xf/0x50 ptp_vclock_refresh+0x2c/0x50 ? ptp_clock_release+0x40/0x40 ptp_aux_kworker+0x17/0x30 kthread_worker_fn+0x9b/0x240 ? kthread_should_park+0x30/0x30 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03page_pool: Refactor page_pool to enable fragmenting after allocationAlexander Duyck2-43/+62
This change is meant to permit a driver to perform "fragmenting" of the page from within the driver instead of the current model which requires pre-partitioning the page. The main motivation behind this is to support use cases where the page will be split up by the driver after DMA instead of before. With this change it becomes possible to start using page pool to replace some of the existing use cases where multiple references were being used for a single page, but the number needed was unknown as the size could be dynamic. For example, with this code it would be possible to do something like the following to handle allocation: page = page_pool_alloc_pages(); if (!page) return NULL; page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX); rx_buf->page = page; rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX; Then we would process a received buffer by handling it with: rx_buf->pagecnt_bias--; Once the page has been fully consumed we could then flush the remaining instances with: if (page_pool_defrag_page(page, rx_buf->pagecnt_bias)) continue; page_pool_put_defragged_page(pool, page -1, !!budget); The general idea is that we want to have the ability to allocate a page with excess fragment count and then trim off the unneeded fragments. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'dsa-phylink_generic_validate'David S. Miller5-172/+67
Russell King says: ==================== Trivial DSA conversions to phylink_generic_validate() This series converts five DSA drivers to use phylink_generic_validate(). No feedback or testing reports were received from the CFT posting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: xrs700x: convert to phylink_generic_validate()Russell King (Oracle)1-18/+11
Populate the supported interfaces and MAC capabilities for the xrs700x family of DSA switches and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. According to commit ee00b24f32eb ("net: dsa: add Arrow SpeedChips XRS700x driver") the switch supports one RMII port and up to three RGMII ports. This commit assumes that port 0 is the RMII port and the remainder are RGMII. This commit also results in the Autoneg bit being set in the ethtool link modes, which wasn't in the original; if this switch supports RGMII to a 10/100/1G PHY, then surely we want to allow Autoneg on the PHY. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: qca8k: convert to phylink_generic_validate()Russell King (Oracle)1-47/+19
Populate the supported interfaces and MAC capabilities for the QCA8K DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. In making this change, we bring consistency to the ethtool linkmodes that phylink's validate step produces, thereby following the expected behaviour as the phylink documentation has explained. Specifically, the ethtool 1000baseX_Full capability is now permitted for all interface modes, as it is a property of the PHY driver whether 1000baseX fiber connections can be supported. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: ksz8795: convert to phylink_generic_validate()Russell King (Oracle)1-33/+12
Populate the supported interfaces and MAC capabilities for the Microchip KSZ8795 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: bcm_sf2: convert to phylink_generic_validate()Russell King (Oracle)1-39/+15
Populate the supported interfaces and MAC capabilities for the bcm_sf2 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. The exclusion of Gigabit linkmodes for MII and Reverse MII links is handled within phylink_generic_validate() in phylink, so there is no need to make them conditional on the interface mode in the driver. Thanks to Florian Fainelli for suggesting how to populate the supported interfaces. Link: https://lore.kernel.org/r/3b3fed98-0c82-99e9-dc72-09fe01c2bcf3@gmail.com Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: ar9331: convert to phylink_generic_validate()Russell King (Oracle)1-35/+10
Populate the supported interfaces and MAC capabilities for the AR9331 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'mptcp-next'David S. Miller4-37/+121
Mat Martineau says: ==================== mptcp: Miscellaneous changes for 5.18 Patch 1 has some minor cleanup in mptcp_write_options(). Patch 2 moves a rarely-needed branch to optimize mptcp_write_options(). Patch 3 adds a comment explaining which combinations of MPTCP option headers are expected. Patch 4 adds a pr_debug() for the MPTCP_RST option. Patches 5-7 allow setting MPTCP_PM_ADDR_FLAG_FULLMESH with the "set flags" netlink command. This allows changing the behavior of existing path manager endpoints. The flag was previously only set at endpoint creation time. Associated selftests also updated. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: mptcp: add fullmesh setting testsGeliang Tang1-6/+43
This patch added the fullmesh setting and clearing selftests in mptcp_join.sh. Now we can set both backup and fullmesh flags, so avoid using the words 'backup' and 'bkup'. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: mptcp: set fullmesh flag in pm_nl_ctlGeliang Tang1-3/+5
This patch added the fullmesh flag setting and clearing support in pm_nl_ctl: # pm_nl_ctl set ip flags fullmesh # pm_nl_ctl set ip flags nofullmesh Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: set fullmesh flag in pm_netlinkGeliang Tang1-9/+28
This patch added the fullmesh flag setting support in pm_netlink. If the fullmesh flag of the address is changed, remove all the related subflows, update the fullmesh flag and create subflows again. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: print out reset infos of MP_RSTGeliang Tang1-0/+2
This patch printed out the reset infos, reset_transient and reset_reason, of MP_RST in mptcp_parse_option() to show that MP_RST is received. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: clarify when options can be usedMatthieu Baerts1-2/+21
RFC8684 doesn't seem to clearly specify which MPTCP options can be used together. Some options are mutually exclusive -- e.g. MP_CAPABLE and MP_JOIN --, some can be used together -- e.g. DSS + MP_PRIO --, some can but we prefer not to -- e.g. DSS + ADD_ADDR -- and some have to be used together at some points -- e.g. MP_FAIL and DSS. We need to clarify this as a base before allowing other modifications. For example, does it make sense to send a RM_ADDR with an MPC or MPJ? This remains open for possible future discussions. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: reduce branching when writing MP_FAIL optionMatthieu Baerts1-11/+19
MP_FAIL should be use in very rare cases, either when the TCP RST flag is set -- with or without an MP_RST -- or with a DSS, see mptcp_established_options(). Here, we do the same in mptcp_write_options(). Co-developed-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: move the declarations of ssk and subflowGeliang Tang1-6/+3
Move the declarations of ssk and subflow in MP_FAIL and MP_PRIO to the beginning of the function mptcp_write_options(). Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03bpf: Use VM_MAP instead of VM_ALLOC for ringbufHou Tao1-1/+1
After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages after mapping"), non-VM_ALLOC mappings will be marked as accessible in __get_vm_area_node() when KASAN is enabled. But now the flag for ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access after vmap() returns. Because the ringbuf area is created by mapping allocated pages, so use VM_MAP instead. After the change, info in /proc/vmallocinfo also changes from [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmalloc user to [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmap user Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
2022-02-03Merge branch 'net-ipa-support-variable-rx-buffer-size'Jakub Kicinski8-42/+79
Alex Elder says: ==================== net: ipa: support variable RX buffer size Specify the size of receive buffers used for RX endpoints in the configuration data, rather than using 8192 bytes for all of them. Increase the size of the AP receive buffer for the modem to 32KB. ==================== Link: https://lore.kernel.org/r/20220201153737.601149-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net: ipa: set IPA v4.11 AP<-modem RX buffer size to 32KBAlex Elder1-1/+1
Increase the receive buffer size used for data received from the modem to 32KB, to improve download performance by allowing much greater aggregation. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net: ipa: define per-endpoint receive buffer sizeAlex Elder8-42/+79
Allow RX endpoints to have differing receive buffer sizes. Define the receive buffer size in the configuration data, and use that rather than IPA_RX_BUFFER_SIZE when configuring the endpoint. Add verification in ipa_endpoint_data_valid_one() that the receive buffer specified for AP RX endpoints is both big enough to handle at least one full packet, and not so big in an aggregating endpoint that its size can't be represented when programming the hardware. Move aggr_byte_limit_max() up in "ipa_endpoint.c" so it can be used earlier in the file without a forward-reference. Initially we'll just keep the 8KB receive buffer size already in use for all AP RX endpoints.. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>