summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
AgeCommit message (Collapse)AuthorFilesLines
2021-10-29mlxsw: spectrum_qdisc: Offload root TBF as port shaperPetr Machata1-18/+37
The Spectrum ASIC allows configuration of maximum shaper on all levels of the scheduling hierarchy: TCs, subgroups, groups and also ports. Currently, TBF always configures a subgroup. But a user could reasonably express the intent to configure port shaper by putting TBF to a root position, around ETS / PRIO. Accept this usage and offload appropriately. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-19mlxsw: spectrum_qdisc: Make RED, TBF offloads classfulPetr Machata1-1/+31
Permit offloading qdiscs below RED and TBF. In order to avoid having to implement trivial propagating callbacks for get_prio_bitmap and get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and ..._get_tclass_num() to handle the lack of the callback as a cue to forward the request to the parent. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Validate qdisc topologyPetr Machata1-9/+82
A following patch will enable offloading qdiscs that are deeper than directly under root qdisc. Currently the topology validation consists of demanding a root qdisc position for ETS and PRIO. Since RED and TBF are considered classless, this is enough. In order to prevent some nonsensical combinations when RED and TBF become classful, introduce a more general topology validator. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Clean stats recursively when priomap changesPetr Machata1-8/+29
On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio counters and aggregates them according to the priomap. Therefore when priomap changes, the counter base values need to be reset to reflect the change. Previously, this was only done for the sole child qdisc, but a following patch makes RED and TBF classful. Thus apply the request to the whole sub-tree. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Unify graft validationPetr Machata1-18/+19
Qdisc graft operations have so far been reported at PRIO, ETS and RED, with RED events ignored, because RED was not considered a classful qdisc. A following patch will make mlxsw recognize RED and TBF as classful qdiscs, and thus it is necessary to validate grafting at these qdiscs as well. Rename the existing graft validator to make it clear that it is a generic function, and invoke for RED and TBF graft events as well. Drop the unnecessary PRIO helper and invoke the graft validator directly for PRIO as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Destroy children in mlxsw_sp_qdisc_destroy()Petr Machata1-2/+4
Currently ETS and PRIO are the only offloaded classful qdiscs. Since they are both similar, their destroy handler is the same, and it handles children destruction itself. But now it is possible to do it generically for any classful qdisc. Therefore promote the recursive destruction from the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up in follow-up patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Extract two helpers for handling future FIFOsPetr Machata1-15/+33
Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one future FIFO resp. reinitializing the array of future FIFOs. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mlxsw: spectrum_qdisc: Query tclass / priomap instead of caching itPetr Machata1-34/+146
Currently when keeping track of qdiscs, mlxsw notes the TC and priomap corresponding to each qdisc. That is fine currently, as there only ever is one level of qdiscs to update: the direct children of ETS / PRIO. However as deeper structures are made offloadable, ETS would need to update these values for the complete subtree, and interim qdiscs would need to remember to propagate the value. Instead, reverse the responsibility: child qdiscs can ask their parent what their TC and priomap are. ETS / PRIO know the answer right away, or there are defaults for when the root qdisc does not assign them (e.g. when RED is used as root qdisc). When RED and TBF become classful, they will simply forward the request up to their parent. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14mlxsw: spectrum_qdisc: Introduce per-TC ECN countersPetr Machata1-2/+7
The Qdisc code in mlxsw used to report a number of packets ECN-marked on a port. Because reporting a per-port value as a per-TC value was misleading, this was removed in commit 8a29581eb001 ("mlxsw: spectrum: Move the ECN-marked packet counter to ethtool"). On Spectrum-3, a per-TC number of ECN-marked packets is available in per-TC congestion counter group. Add a new array for the ECN counter, fetch the values from the per-TC congestion group, and pick the value indicated by tclass_num as appropriate. On Spectrum-1 and Spectrum-2, this per-TC value is not available, and zeroes will be reported, as they currently are. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12mlxsw: spectrum_qdisc: Offload RED qevent markPetr Machata1-0/+10
The RED "mark" qevent can be offloaded under similar conditions as the RED "early_drop" qevent. Therefore recognize its binding type in the TC_SETUP_BLOCK handler and translate to the right SPAN trigger, with the right set of supported actions. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12mlxsw: spectrum_qdisc: Track permissible actions per bindingPetr Machata1-11/+33
One block can be bound to several qevents. The qevent type that the block is bound to determines which actions make sense in a given context. In the particular case of mlxsw, trap cannot be offloaded on a RED mark qevent, because the trap contract specifies that the packet is dropped in the HW datapath, and the HW trigger that the action is offloaded to is always forwarding the packet (in addition to marking in). Therefore keep track of which actions are permissible at each binding block. When an attempt is made to bind a certain action at a binding point where it is not supported, bounce the request. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12mlxsw: spectrum_qdisc: Distinguish between ingress and egress triggersPetr Machata1-8/+15
The following patches will configure the MLXSW_SP_SPAN_TRIGGER_ECN mirroring trigger. This trigger is considered "egress", unlike the previously-offloaded _EARLY_DROP. Add a helper to spectrum_span, mlxsw_sp_span_trigger_is_ingress(), to classify triggers to ingress and egress. Pass result of this instead of hardcoding true when calling mlxsw_sp_span_analyzed_port_get()/_put(). Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12mlxsw: spectrum_qdisc: Pass extack to mlxsw_sp_qevent_entry_configure()Petr Machata1-10/+19
This function will report a new failure in the following patches. Pass extack so that the failure is explicable. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()Petr Machata1-1/+4
In mlxsw Qdisc offload, find_class() is an operation that yields a qdisc offload descriptor given a parental qdisc descriptor and a class handle. In __mlxsw_sp_qdisc_ets_graft() however, a band number is passed to that function instead of a handle. This can lead to a trigger of a WARN_ON with the following splat: WARNING: CPU: 3 PID: 808 at drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:1356 __mlxsw_sp_qdisc_ets_graft+0x115/0x130 [mlxsw_spectrum] [...] Call Trace: mlxsw_sp_setup_tc_prio+0xe3/0x100 [mlxsw_spectrum] qdisc_offload_graft_helper+0x35/0xa0 prio_graft+0x176/0x290 [sch_prio] qdisc_graft+0xb3/0x540 tc_modify_qdisc+0x56a/0x8a0 rtnetlink_rcv_msg+0x12c/0x370 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x1f6/0x2b0 netlink_sendmsg+0x1fb/0x410 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Since the parent handle is not passed with the offload information, compute it from the band number and qdisc handle. Fixes: 28052e618b04 ("mlxsw: spectrum_qdisc: Track children per qdisc") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Index future FIFOs by band numberPetr Machata1-7/+6
mlxsw used to hold an array of qdiscs indexed by the TC number. In the previous patch, it was changed to allocate child qdiscs dynamically, and they are now indexed by band number. Follow suit with the array of future FIFOs. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Allocate child qdiscs dynamicallyPetr Machata1-32/+83
Instead of keeping qdiscs in globally-preallocated arrays, introduce a per-qdisc-kind value num_classes, and then allocate the necessary child qdiscs (if any) based on that value. Since now dynamic allocation is involved, mlxsw_sp_qdisc_replace() gets messy enough that it is worth it to split it to two cases: a new qdisc allocation and a change of existing qdisc. (Note that the change also includes what TC formally calls replace, if the qdisc kind is the same.) Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Guard all qdisc accesses with a lockPetr Machata1-16/+73
The FIFO handler currently guards accesses to the future FIFO tracking by asserting RTNL. In the future, the changes to the qdisc state will be more thorough, so other qdiscs will need this guarding is as well. In order to not further the RTNL infestation, instead convert to a custom lock that will guard accesses to the qdisc state. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Track children per qdiscPetr Machata1-46/+118
mlxsw currently allows a two-level structure of qdiscs: the root and possibly a number of children. In order to support offloading more general qdisc trees, introduce to struct mlxsw_sp_qdisc a pointer to child qdiscs. Refer to the child qdiscs through this pointer, instead of going through the tclass_qdiscs in qdisc_state. Additionally introduce a field num_classes, which holds number of given qdisc's children. Also introduce a generic function for walking qdisc trees. Rewrite mlxsw_sp_qdisc_find() and _find_by_handle() to use the generic walker. For now, keep the qdisc_state.tclass_qdisc, and just point root_qdiscs's children to this array. Following patches will make the allocation dynamic. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Promote backlog reduction to mlxsw_sp_qdisc_destroy()Petr Machata1-30/+18
When a qdisc is removed, it is necessary to update the backlog value at its parent--unless the qdisc is at root position. RED, TBF and FIFO all do that, each separately. Since all of them need to do this, just promote the operation directly to mlxsw_sp_qdisc_destroy(), instead of deferring it to individual destructors. Since FIFO dtor thus becomes trivial, remove it. Add struct mlxsw_sp_qdisc.parent to point at the parent qdisc. This will be handy later as deeper structures are offloaded. Use the parent qdisc to find the chain of parents whose backlog value needs to be updated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Track tclass_num as int, not u8Petr Machata1-6/+6
tclass_num is just a number, a value that would be ordinarily passed around as an int. (Which is unlike a u8 prio_bitmap.) In several places, tclass_num already is an int. Convert the remaining instances. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Drop an always-true conditionPetr Machata1-4/+1
The function mlxsw_sp_qdisc_compare() is invoked a couple lines above this check, which will bounce any requests where this condition does not hold. Therefore drop it. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Simplify mlxsw_sp_qdisc_compare()Petr Machata1-15/+7
The purpose of this function is to filter out events that are related to qdiscs that are not offloaded, or are not offloaded anymore. But the function is unnecessarily thorough: - mlxsw_sp_qdisc pointer is never NULL in the context where it is called - Two qdiscs with the same handle will never have different types. Even when replacing one qdisc with another in the same class, Linux will not permit handle reuse unless the qdisc type also matches. Simplify the function by omitting these two unnecessary conditions. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21mlxsw: spectrum_qdisc: Drop one argument from check_params callbackPetr Machata1-7/+1
The mlxsw_sp_qdisc argument is not used in any of the actual callbacks. Drop it. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mlxsw: spectrum_span: Add SPAN probability rate supportIdo Schimmel1-0/+1
Currently, every packet that matches a mirroring trigger (e.g., received packets, buffer dropped packets) is mirrored. Spectrum-2 and later ASICs support mirroring with probability, where every 1 in N matched packets is mirrored. Extend the API that creates the binding between the trigger and the SPAN agent with a probability rate parameter, which is an attribute of the trigger. Set it to '1' to maintain existing behavior. Subsequent patches will use it to perform more sophisticated sampling, by mirroring packets to the CPU with probability. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mlxsw: spectrum_span: Add SPAN session identifier supportIdo Schimmel1-1/+3
When packets are mirrored to the CPU, the trap identifier with which the packets are trapped is determined according to the session identifier of the SPAN agent performing the mirroring. Packets that are trapped for the same logical reason (e.g., buffer drops) should use the same session identifier. Currently, a single session is implicitly supported (identifier 0) and is used for packets that are mirrored to the CPU due to buffer drops (e.g., early drop). Subsequent patches are going to mirror packets to the CPU due to sampling, which will require a different session identifier. Prepare for that by making the session identifier an attribute of the SPAN agent. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18mlxsw: spectrum_qdisc: Disable port buffer autoresize with qdiscsPetr Machata1-1/+33
There are two interfaces to configure ETS: qdiscs and DCB. Historically, DCB ETS configuration was projected to ingress as well, and configured port buffers. Qdisc was not. Keep qdiscs behaving this way, and if an offloaded qdisc is configured on a port, move this port's headroom to a manual mode, thus allowing configuration of port buffers through dcbnl_setbuffer. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04mlxsw: spectrum_qdisc: Offload action trap for qeventsPetr Machata1-13/+62
When offloading action trap on a qevent, pass to_dev of NULL to the SPAN module to trigger the mirror to the CPU port. Query the buffer drops policer and use it for policing of the trapped traffic. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-15mlxsw: spectrum_span: Allow passing parameters to SPAN agentsIdo Schimmel1-1/+4
Currently, the only parameter of a SPAN agent is the netdev which the SPAN agent should mirror to. The next patch will add the ability to request a SPAN agent that mirrors to a specific netdev and has a specific policer ID bound to it. This is required when mirroring packets to the CPU port. Therefore, encapsulate the sole parameter to mlxsw_sp_span_agent_get() in a structure, so that it could later be extended with policer information. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14mlxsw: spectrum_qdisc: Offload mirroring on RED qevent early_dropPetr Machata1-0/+472
The RED qevents early_drop and mark can be offloaded under the following fairly strict conditions: - At most one filter is configured at the qevent block - The protocol is "any" - The classifier is matchall - The action is trap, sample, or mirror with the same conditions as with other SPAN offloads - The hw_counters type is none In this patchset, implement offload of mirror for early_drop qevent. The ECN trigger is currently not implemented in the FW and therefore the mark qevent is not supported. The qevent notifications look exactly like regular block binding notifications with a binder type that identifies them as qevents. Therefore the details of processing this binding are fairly similar to the matchall offload. struct flow_block_offload.sch points at the qdisc in question. Use it to figure out if the qdisc is offloaded at all and what TC it configures. Bounce bindings on not-offloaded qdiscs. Individual bindings are kept in a list so that several qevents can share the same block and all binding points get configured as the configured filters change. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15mlxsw: spectrum_qdisc: Offload RED ECN nodrop modePetr Machata1-4/+5
RED ECN nodrop mode means that non-ECT traffic should not be early-dropped, but enqueued normally instead. In Spectrum systems, this is achieved by disabling CWTPM.ew (enable WRED) for a given traffic class. So far CWTPM.ew was unconditionally enabled. Instead disable it when the RED qdisc is in nodrop mode. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06mlxsw: spectrum_qdisc: Support offloading of FIFO QdiscPetr Machata1-1/+145
There are two peculiarities about offloading FIFO: - sometimes the qdisc has an unspecified handle (it is "invisible") - it may be created before the qdisc that it will be a child of These features make the offload a bit more tricky. The approach chosen in this patch is to make note of all the FIFOs that needed to be rejected because their parents were not known. Later when the parent is created, they are offloaded FIFO is only offloaded for its counters, queue length is ignored. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06mlxsw: spectrum_qdisc: Add handle parameter to ..._ops.replacePetr Machata1-9/+9
PRIO and ETS will need to check the value of qdisc handle in their handlers. Add it to the callback and propagate through. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06mlxsw: spectrum_qdisc: Introduce struct mlxsw_sp_qdisc_statePetr Machata1-42/+43
In order to have a tidy structure where to put information related to Qdisc offloads, introduce a new structure. Move there the two existing pieces of data: root_qdisc and tclass_qdiscs. Embed them directly, because there's no reason to go through pointer anymore. Convert users, update init/fini functions. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27mlxsw: spectrum: Move the ECN-marked packet counter to ethtoolPetr Machata1-7/+2
Spectrum-1 and Spectrum-2 do not have a per-TC counter of number of packets marked by ECN. The value reported currently is the total number of marked packets. Showing this value at individual TC Qdiscs is misleading. Move the counter to ethtool instead. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-31mlxsw: spectrum_qdisc: Fix 64-bit division error in mlxsw_sp_qdisc_tbf_rate_kbpsNathan Chancellor1-1/+1
When building arm32 allmodconfig: ERROR: "__aeabi_uldivmod" [drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko] undefined! rate_bytes_ps has type u64, we need to use a 64-bit division helper to avoid a build error. Fixes: a44f58c41bfb ("mlxsw: spectrum_qdisc: Support offloading of TBF Qdisc") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-01-25mlxsw: spectrum_qdisc: Support offloading of TBF QdiscPetr Machata1-0/+199
React to the TC messages that were introduced in a preceding patch and configure egress maximum shaper as appropriate. TBF can be used as a root qdisc or under one of PRIO or strict ETS bands. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Extract a common leaf unoffload functionPetr Machata1-5/+14
When the RED Qdisc is unoffloaded, it needs to reduce the reported backlog by the amount that is in the HW, so that only the SW backlog is contained in the counter. The same thing will need to be done by TBF, and likely any other leaf Qdisc as well. Extract a helper mlxsw_sp_qdisc_leaf_unoffload() and call it from mlxsw_sp_qdisc_red_unoffload(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Add mlxsw_sp_qdisc_get_class_stats()Petr Machata1-10/+19
Add a wrapper around mlxsw_sp_qdisc_collect_tc_stats() and mlxsw_sp_qdisc_update_stats() for the simple case of doing both in one go: mlxsw_sp_qdisc_get_class_stats(). Dispatch to that function from mlxsw_sp_qdisc_get_red_stats(). This new function will be useful for other leaf Qdiscs as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mlxsw: spectrum_qdisc: Extract a per-TC stat functionPetr Machata1-49/+70
Extract from mlxsw_sp_qdisc_get_prio_stats() two new functions: mlxsw_sp_qdisc_collect_tc_stats() to accumulate stats for that one TC only, and mlxsw_sp_qdisc_update_stats() that makes the stats relative to base values stored earlier. Use them from mlxsw_sp_qdisc_get_red_stats(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-20Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-7/+23
2020-01-15mlxsw: spectrum_qdisc: Include MC TCs in Qdisc countersPetr Machata1-7/+23
mlxsw configures Spectrum in such a way that BUM traffic is passed not through its nominal traffic class TC, but through its MC counterpart TC+8. However, when collecting statistics, Qdiscs only look at the nominal TC and ignore the MC TC. Add two helpers to compute the value for logical TC from the constituents, one for backlog, the other for tail drops. Use them throughout instead of going through the xstats pointer directly. Counters for TX bytes and packets are deduced from packet priority counters, and therefore already include BUM traffic. wred_drop counter is irrelevant on MC TCs, because RED is not enabled on them. Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+7
The ungrafting from PRIO bug fixes in net, when merged into net-next, merge cleanly but create a build failure. The resolution used here is from Petr Machata. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFOPetr Machata1-0/+7
The following patch will change PRIO to replace a removed Qdisc with an invisible FIFO, instead of NOOP. mlxsw will see this replacement due to the graft message that is generated. But because FIFO does not issue its own REPLACE message, when the graft operation takes place, the Qdisc that mlxsw tracks under the indicated band is still the old one. The child handle (0:0) therefore does not match, and mlxsw rejects the graft operation, which leads to an extack message: Warning: Offloading graft operation failed. Fix by ignoring the invisible children in the PRIO graft handler. The DESTROY message of the removed Qdisc is going to follow shortly and handle the removal. Fixes: 32dc5efc6cb4 ("mlxsw: spectrum: qdiscs: prio: Handle graft command") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19mlxsw: spectrum_qdisc: Support offloading of ETS QdiscPetr Machata1-0/+84
Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc. Invoke the extended prio handlers implemented in the previous patch. For stats ops, invoke directly the prio callbacks, which are not sensitive to differences between PRIO and ETS. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETSPetr Machata1-23/+81
Thanks to the similarity between PRIO and ETS it is possible to simply reuse most of the code for offloading PRIO Qdisc. Extract the common functionality into separate functions, making the current PRIO handlers thin API adapters. Extend the new functions to pass quanta for individual bands, which allows configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as appropriate to de/configure WRR-ness and weight of individual bands. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19mlxsw: spectrum_qdisc: Clarify a commentPetr Machata1-7/+24
Expand the comment at mlxsw_sp_qdisc_prio_graft() to make the problem that this function is trying to handle clearer. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24mlxsw: spectrum: Use guaranteed buffer size as pool size limitPetr Machata1-1/+2
There are two resources associated with shared buffer size: cap_total_buffer_size, and cap_guaranteed_shared_buffer. So far, mlxsw has been using the former as a limit to determine how large a pool size is allowed to be. However, the total size also includes headrooms and reserved space, which really cannot be used for shared buffer pools. Therefore convert mlxsw to use the latter resource as a limit. Adjust hard-coded pool sizes to be the guaranteed size minus 256000 bytes for CPU port pool. On Spectrum-1 that actually leads to an increase. A follow-up patch will have this size calculated automatically. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09mlxsw: Replace license text with SPDX identifiers and adjust copyrightsJiri Pirko1-33/+2
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13treewide: kzalloc() -> kcalloc()Kees Cook1-1/+2
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-02-28mlxsw: spectrum: qdiscs: prio: Handle graft commandNogah Frankel1-0/+54
Handle graft command for an offloaded sch_prio. Grafting a qdisc to any place other than under its original parent is not supported by mlxsw and will cause the grafted qdisc to stop being offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>