From 1e5ae3507225f5560e45817155e77f496473850c Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 21 Jul 2020 16:38:04 +0300 Subject: net: ena: avoid unnecessary rearming of interrupt vector when busy-polling For an overview of the race created by this patch goto synchronization label. In napi busy-poll mode, the kernel invokes the napi handler of the device repeatedly to poll the NIC's receive queues. This process repeats until a timeout, specific for each connection, is up. By polling packets in busy-poll mode the user may gain lower latency and higher throughput (since the kernel no longer waits for interrupts to poll the queues) in expense of CPU usage. Upon completing a napi routine, the driver checks whether the routine was called by an interrupt handler. If so, the driver re-enables interrupts for the device. This is needed since an interrupt routine invocation disables future invocations until explicitly re-enabled. The driver avoids re-enabling the interrupts if they were not disabled in the first place (e.g. if driver in busy mode). Originally, the driver checked whether interrupt re-enabling is needed by reading the 'ena_napi->unmask_interrupt' variable. This atomic variable was set upon interrupt and cleared after re-enabling it. In the 4.10 Linux version, the 'napi_complete_done' call was changed so that it returns 'false' when device should not re-enable interrupts, and 'true' otherwise. The change includes reading the "NAPIF_STATE_IN_BUSY_POLL" flag to check if the napi call is in busy-poll mode, and if so, return 'false'. The driver was changed to re-enable interrupts according to this routine's return value. The Linux community rejected the use of the 'ena_napi->unmaunmask_interrupt' variable to determine whether unmasking is needed, and urged to use napi_napi_complete_done() return value solely. See https://lore.kernel.org/patchwork/patch/741149/ for more details As explained, a busy-poll session exists for a specified timeout value, after which it exits the busy-poll mode and re-enters it later. This leads to many invocations of the napi handler where napi_complete_done() false indicates that interrupts should be re-enabled. This creates a bug in which the interrupts are re-enabled unnecessarily. To reproduce this bug: 1) echo 50 | sudo tee /proc/sys/net/core/busy_poll 2) echo 50 | sudo tee /proc/sys/net/core/busy_read 3) Add counters that check whether 'ena_unmask_interrupt(tx_ring, rx_ring);' is called without disabling the interrupts in the first place (i.e. with calling the interrupt routine ena_intr_msix_io()) Steps 1+2 enable busy-poll as the default mode for new connections. The busy poll routine rearms the interrupts after every session by design, and so we need to add an extra check that the interrupts were masked in the first place. synchronization: This patch introduces a race between the interrupt handler ena_intr_msix_io() and the napi routine ena_io_poll(). Some macros and instruction were added to prevent this race from leaving the interrupts masked. The following specifies the different race scenarios in this patch: 1) interrupt handler and napi routine run sequentially i) interrupt handler is called, sets 'interrupts_masked' flag and successfully schedules the napi handler via softirq. In this scenario the napi routine might not see the flag change for several reasons: a) The flag is stored in a register by the compiler. For this case the WRITE_ONCE macro which prevents this. b) The compiler might reorder the instruction. For this the smp_wmb() instruction was used which implies a compiler memory barrier. c) On archs with weak consistency model (like ARM64) the napi routine might be scheduled and start running before the flag STORE instruction is committed to cache/memory. To ensure this doesn't happen, the smp_wmb() instruction was added. It ensures that the flag set instruction is committed before scheduling napi. ii) compiler reorders the flag's value check in the 'if' with the flag set in the napi routine. This scenario is prevented by smp_rmb() call after the flag check. 2) interrupt handler and napi routine run in parallel (can happen when busy poll routine invokes the napi handler) i) interrupt handler sets the flag in one core, while the napi routine reads it in another core. This scenario also is divided into two cases: a) napi_complete_done() doesn't finish running, in which case napi_sched() would just set NAPIF_STATE_MISSED and the napi routine would reschedule itself without changing the flag's value. b) napi_complete_done() finishes running. In this case the napi routine might override the flag's value. This doesn't present any rise since it later unmasks the interrupt vector. Signed-off-by: Shay Agroskin Signed-off-by: Arthur Kiyanovski Cc: Eric Dumazet Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_netdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/net/ethernet/amazon/ena/ena_netdev.h') diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index ba030d260940..89304b403995 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -167,6 +167,7 @@ struct ena_napi { struct ena_ring *rx_ring; struct ena_ring *xdp_ring; bool first_interrupt; + bool interrupts_masked; u32 qid; struct dim dim; }; -- cgit v1.2.3 From 0e3a3f6dacf01cbad9ecc26149ffe8cda3137540 Mon Sep 17 00:00:00 2001 From: Arthur Kiyanovski Date: Tue, 21 Jul 2020 16:38:11 +0300 Subject: net: ena: support new LLQ acceleration mode New devices add a new hardware acceleration engine, which adds some restrictions to the driver. Metadata descriptor must be present for each packet and the maximum burst size between two doorbells is now limited to a number advertised by the device. This patch adds: 1. A handshake protocol between the driver and the device, so the device will enable the accelerated queues only when both sides support it. 2. The driver support for the new acceleration engine: 2.1. Send metadata descriptor for each Tx packet. 2.2. Limit the number of packets sent between doorbells.(*) (*) A previous driver implementation of this feature was comitted in commit 05d62ca218f8 ("net: ena: add handling of llq max tx burst size") however the design of the interface between the driver and device changed since then. This change is reflected in this commit. Signed-off-by: Netanel Belgazal Signed-off-by: Arthur Kiyanovski Signed-off-by: David S. Miller --- drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 39 ++++++++++++++++-- drivers/net/ethernet/amazon/ena/ena_com.c | 19 ++++++++- drivers/net/ethernet/amazon/ena/ena_com.h | 3 ++ drivers/net/ethernet/amazon/ena/ena_eth_com.c | 51 +++++++++++++++++------- drivers/net/ethernet/amazon/ena/ena_eth_com.h | 3 +- drivers/net/ethernet/amazon/ena/ena_netdev.c | 16 ++++++-- drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 + 7 files changed, 109 insertions(+), 24 deletions(-) (limited to 'drivers/net/ethernet/amazon/ena/ena_netdev.h') diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h index 7f7978b135a9..b818a169c193 100644 --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h @@ -491,6 +491,36 @@ enum ena_admin_llq_stride_ctrl { ENA_ADMIN_MULTIPLE_DESCS_PER_ENTRY = 2, }; +enum ena_admin_accel_mode_feat { + ENA_ADMIN_DISABLE_META_CACHING = 0, + ENA_ADMIN_LIMIT_TX_BURST = 1, +}; + +struct ena_admin_accel_mode_get { + /* bit field of enum ena_admin_accel_mode_feat */ + u16 supported_flags; + + /* maximum burst size between two doorbells. The size is in bytes */ + u16 max_tx_burst_size; +}; + +struct ena_admin_accel_mode_set { + /* bit field of enum ena_admin_accel_mode_feat */ + u16 enabled_flags; + + u16 reserved; +}; + +struct ena_admin_accel_mode_req { + union { + u32 raw[2]; + + struct ena_admin_accel_mode_get get; + + struct ena_admin_accel_mode_set set; + } u; +}; + struct ena_admin_feature_llq_desc { u32 max_llq_num; @@ -536,10 +566,13 @@ struct ena_admin_feature_llq_desc { /* the stride control the driver selected to use */ u16 descriptors_stride_ctrl_enabled; - /* Maximum size in bytes taken by llq entries in a single tx burst. - * Set to 0 when there is no such limit. + /* reserved */ + u32 reserved1; + + /* accelerated low latency queues requirement. driver needs to + * support those requirements in order to use accelerated llq */ - u32 max_tx_burst_size; + struct ena_admin_accel_mode_req accel_mode; }; struct ena_admin_queue_ext_feature_fields { diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 432f143559a1..435bf05a853c 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -403,6 +403,8 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev, 0x0, io_sq->llq_info.desc_list_entry_size); io_sq->llq_buf_ctrl.descs_left_in_line = io_sq->llq_info.descs_num_before_header; + io_sq->disable_meta_caching = + io_sq->llq_info.disable_meta_caching; if (io_sq->llq_info.max_entries_in_tx_burst > 0) io_sq->entries_in_tx_burst_left = @@ -626,6 +628,10 @@ static int ena_com_set_llq(struct ena_com_dev *ena_dev) cmd.u.llq.desc_num_before_header_enabled = llq_info->descs_num_before_header; cmd.u.llq.descriptors_stride_ctrl_enabled = llq_info->desc_stride_ctrl; + cmd.u.llq.accel_mode.u.set.enabled_flags = + BIT(ENA_ADMIN_DISABLE_META_CACHING) | + BIT(ENA_ADMIN_LIMIT_TX_BURST); + ret = ena_com_execute_admin_command(admin_queue, (struct ena_admin_aq_entry *)&cmd, sizeof(cmd), @@ -643,6 +649,7 @@ static int ena_com_config_llq_info(struct ena_com_dev *ena_dev, struct ena_llq_configurations *llq_default_cfg) { struct ena_com_llq_info *llq_info = &ena_dev->llq_info; + struct ena_admin_accel_mode_get llq_accel_mode_get; u16 supported_feat; int rc; @@ -742,9 +749,17 @@ static int ena_com_config_llq_info(struct ena_com_dev *ena_dev, llq_default_cfg->llq_num_decs_before_header, supported_feat, llq_info->descs_num_before_header); } + /* Check for accelerated queue supported */ + llq_accel_mode_get = llq_features->accel_mode.u.get; + + llq_info->disable_meta_caching = + !!(llq_accel_mode_get.supported_flags & + BIT(ENA_ADMIN_DISABLE_META_CACHING)); - llq_info->max_entries_in_tx_burst = - (u16)(llq_features->max_tx_burst_size / llq_default_cfg->llq_ring_entry_size_value); + if (llq_accel_mode_get.supported_flags & BIT(ENA_ADMIN_LIMIT_TX_BURST)) + llq_info->max_entries_in_tx_burst = + llq_accel_mode_get.max_tx_burst_size / + llq_default_cfg->llq_ring_entry_size_value; rc = ena_com_set_llq(ena_dev); if (rc) diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h index 4c98f6f07882..4287d47b2b0b 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.h +++ b/drivers/net/ethernet/amazon/ena/ena_com.h @@ -127,6 +127,7 @@ struct ena_com_llq_info { u16 descs_num_before_header; u16 descs_per_entry; u16 max_entries_in_tx_burst; + bool disable_meta_caching; }; struct ena_com_io_cq { @@ -189,6 +190,8 @@ struct ena_com_io_sq { enum queue_direction direction; enum ena_admin_placement_policy_type mem_queue_type; + bool disable_meta_caching; + u32 msix_vector; struct ena_com_tx_meta cached_tx_meta; struct ena_com_llq_info llq_info; diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c b/drivers/net/ethernet/amazon/ena/ena_eth_com.c index ec8ea25e988d..ccd440589565 100644 --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c @@ -285,11 +285,10 @@ static u16 ena_com_cdesc_rx_pkt_get(struct ena_com_io_cq *io_cq, return count; } -static int ena_com_create_and_store_tx_meta_desc(struct ena_com_io_sq *io_sq, - struct ena_com_tx_ctx *ena_tx_ctx) +static int ena_com_create_meta(struct ena_com_io_sq *io_sq, + struct ena_com_tx_meta *ena_meta) { struct ena_eth_io_tx_meta_desc *meta_desc = NULL; - struct ena_com_tx_meta *ena_meta = &ena_tx_ctx->ena_meta; meta_desc = get_sq_desc(io_sq); memset(meta_desc, 0x0, sizeof(struct ena_eth_io_tx_meta_desc)); @@ -309,12 +308,13 @@ static int ena_com_create_and_store_tx_meta_desc(struct ena_com_io_sq *io_sq, /* Extended meta desc */ meta_desc->len_ctrl |= ENA_ETH_IO_TX_META_DESC_ETH_META_TYPE_MASK; - meta_desc->len_ctrl |= ENA_ETH_IO_TX_META_DESC_META_STORE_MASK; meta_desc->len_ctrl |= (io_sq->phase << ENA_ETH_IO_TX_META_DESC_PHASE_SHIFT) & ENA_ETH_IO_TX_META_DESC_PHASE_MASK; meta_desc->len_ctrl |= ENA_ETH_IO_TX_META_DESC_FIRST_MASK; + meta_desc->len_ctrl |= ENA_ETH_IO_TX_META_DESC_META_STORE_MASK; + meta_desc->word2 |= ena_meta->l3_hdr_len & ENA_ETH_IO_TX_META_DESC_L3_HDR_LEN_MASK; meta_desc->word2 |= (ena_meta->l3_hdr_offset << @@ -325,13 +325,36 @@ static int ena_com_create_and_store_tx_meta_desc(struct ena_com_io_sq *io_sq, ENA_ETH_IO_TX_META_DESC_L4_HDR_LEN_IN_WORDS_SHIFT) & ENA_ETH_IO_TX_META_DESC_L4_HDR_LEN_IN_WORDS_MASK; - meta_desc->len_ctrl |= ENA_ETH_IO_TX_META_DESC_META_STORE_MASK; + return ena_com_sq_update_tail(io_sq); +} + +static int ena_com_create_and_store_tx_meta_desc(struct ena_com_io_sq *io_sq, + struct ena_com_tx_ctx *ena_tx_ctx, + bool *have_meta) +{ + struct ena_com_tx_meta *ena_meta = &ena_tx_ctx->ena_meta; - /* Cached the meta desc */ - memcpy(&io_sq->cached_tx_meta, ena_meta, - sizeof(struct ena_com_tx_meta)); + /* When disable meta caching is set, don't bother to save the meta and + * compare it to the stored version, just create the meta + */ + if (io_sq->disable_meta_caching) { + if (unlikely(!ena_tx_ctx->meta_valid)) + return -EINVAL; - return ena_com_sq_update_tail(io_sq); + *have_meta = true; + return ena_com_create_meta(io_sq, ena_meta); + } + + if (ena_com_meta_desc_changed(io_sq, ena_tx_ctx)) { + *have_meta = true; + /* Cache the meta desc */ + memcpy(&io_sq->cached_tx_meta, ena_meta, + sizeof(struct ena_com_tx_meta)); + return ena_com_create_meta(io_sq, ena_meta); + } + + *have_meta = false; + return 0; } static void ena_com_rx_set_flags(struct ena_com_rx_ctx *ena_rx_ctx, @@ -402,12 +425,10 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq, if (unlikely(rc)) return rc; - have_meta = ena_tx_ctx->meta_valid && ena_com_meta_desc_changed(io_sq, - ena_tx_ctx); - if (have_meta) { - rc = ena_com_create_and_store_tx_meta_desc(io_sq, ena_tx_ctx); - if (unlikely(rc)) - return rc; + rc = ena_com_create_and_store_tx_meta_desc(io_sq, ena_tx_ctx, &have_meta); + if (unlikely(rc)) { + pr_err("failed to create and store tx meta desc\n"); + return rc; } /* If the caller doesn't want to send packets */ diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h b/drivers/net/ethernet/amazon/ena/ena_eth_com.h index 8b1afd3b32f2..b6592cb93b04 100644 --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h @@ -157,7 +157,8 @@ static inline bool ena_com_is_doorbell_needed(struct ena_com_io_sq *io_sq, llq_info = &io_sq->llq_info; num_descs = ena_tx_ctx->num_bufs; - if (unlikely(ena_com_meta_desc_changed(io_sq, ena_tx_ctx))) + if (llq_info->disable_meta_caching || + unlikely(ena_com_meta_desc_changed(io_sq, ena_tx_ctx))) ++num_descs; if (num_descs > llq_info->descs_num_before_header) { diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 913d698b9834..6478c1e0d137 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -655,6 +655,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter, txr->sgl_size = adapter->max_tx_sgl_size; txr->smoothed_interval = ena_com_get_nonadaptive_moderation_interval_tx(ena_dev); + txr->disable_meta_caching = adapter->disable_meta_caching; /* Don't init RX queues for xdp queues */ if (!ENA_IS_XDP_INDEX(adapter, i)) { @@ -2783,7 +2784,9 @@ int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count) return dev_was_up ? ena_open(adapter->netdev) : 0; } -static void ena_tx_csum(struct ena_com_tx_ctx *ena_tx_ctx, struct sk_buff *skb) +static void ena_tx_csum(struct ena_com_tx_ctx *ena_tx_ctx, + struct sk_buff *skb, + bool disable_meta_caching) { u32 mss = skb_shinfo(skb)->gso_size; struct ena_com_tx_meta *ena_meta = &ena_tx_ctx->ena_meta; @@ -2827,7 +2830,9 @@ static void ena_tx_csum(struct ena_com_tx_ctx *ena_tx_ctx, struct sk_buff *skb) ena_meta->l3_hdr_len = skb_network_header_len(skb); ena_meta->l3_hdr_offset = skb_network_offset(skb); ena_tx_ctx->meta_valid = 1; - + } else if (disable_meta_caching) { + memset(ena_meta, 0, sizeof(*ena_meta)); + ena_tx_ctx->meta_valid = 1; } else { ena_tx_ctx->meta_valid = 0; } @@ -3011,7 +3016,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev) ena_tx_ctx.header_len = header_len; /* set flags and meta data */ - ena_tx_csum(&ena_tx_ctx, skb); + ena_tx_csum(&ena_tx_ctx, skb, tx_ring->disable_meta_caching); rc = ena_xmit_common(dev, tx_ring, @@ -4260,6 +4265,11 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->xdp_num_queues = 0; adapter->rx_copybreak = ENA_DEFAULT_RX_COPYBREAK; + if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) + adapter->disable_meta_caching = + !!(get_feat_ctx.llq.accel_mode.u.get.supported_flags & + BIT(ENA_ADMIN_DISABLE_META_CACHING)); + adapter->wd_state = wd_state; snprintf(adapter->name, ENA_NAME_MAX_LEN, "ena_%d", adapters_found); diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 89304b403995..0c8504006247 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -298,6 +298,7 @@ struct ena_ring { u8 tx_max_header_size; bool first_interrupt; + bool disable_meta_caching; u16 no_interrupt_event_cnt; /* cpu for TPH */ @@ -399,6 +400,7 @@ struct ena_adapter { bool wd_state; bool dev_up_before_reset; + bool disable_meta_caching; unsigned long last_keep_alive_jiffies; struct u64_stats_sync syncp; -- cgit v1.2.3