From 1cb483a5cc84b497afb51a6c5dfb5a38a0b67086 Mon Sep 17 00:00:00 2001 From: Håkon Bugge Date: Tue, 7 Nov 2017 16:33:34 +0100 Subject: rds: ib: Fix NULL pointer dereference in debug code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit rds_ib_recv_refill() is a function that refills an IB receive queue. It can be called from both the CQE handler (tasklet) and a worker thread. Just after the call to ib_post_recv(), a debug message is printed with rdsdebug(): ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv, recv->r_ibinc, sg_page(&recv->r_frag->f_sg), (long) ib_sg_dma_address( ic->i_cm_id->device, &recv->r_frag->f_sg), ret); Now consider an invocation of rds_ib_recv_refill() from the worker thread, which is preemptible. Further, assume that the worker thread is preempted between the ib_post_recv() and rdsdebug() statements. Then, if the preemption is due to a receive CQE event, the rds_ib_recv_cqe_handler() will be invoked. This function processes receive completions, including freeing up data structures, such as the recv->r_frag. In this scenario, rds_ib_recv_cqe_handler() will process the receive WR posted above. That implies, that the recv->r_frag has been freed before the above rdsdebug() statement has been executed. When it is later executed, we will have a NULL pointer dereference: [ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] [ 4088.082686] PGD 0 P4D 0 [ 4088.085515] Oops: 0000 [#1] SMP [ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) [ 4088.168486] fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds] [ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G OE 4.14.0-rc7.master.20171105.ol7.x86_64 #1 [ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017 [ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm] [ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000 [ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma] [ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286 [ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050 [ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580 [ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000 [ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000 [ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000 [ 4088.280397] FS: 0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000 [ 4088.289434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0 [ 4088.303816] Call Trace: [ 4088.306557] rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma] [ 4088.312982] ? __dynamic_pr_debug+0x8c/0xb0 [ 4088.317664] ? __queue_work+0x142/0x3c0 [ 4088.321944] rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma] [ 4088.328370] cma_ib_handler+0xcd/0x280 [rdma_cm] [ 4088.333522] cm_process_work+0x25/0x120 [ib_cm] [ 4088.338580] cm_work_handler+0xd6b/0x17aa [ib_cm] [ 4088.343832] process_one_work+0x149/0x360 [ 4088.348307] worker_thread+0x4d/0x3e0 [ 4088.352397] kthread+0x109/0x140 [ 4088.355996] ? rescuer_thread+0x380/0x380 [ 4088.360467] ? kthread_park+0x60/0x60 [ 4088.364563] ret_from_fork+0x25/0x30 [ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24 [ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68 [ 4088.397772] CR2: 0000000000000020 [ 4088.401505] ---[ end trace fe922e6ccf004431 ]--- This bug was provoked by compiling rds out-of-tree with EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay between the rdsdebug() and ib_ib_port_recv() statements: /* XXX when can this fail? */ ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); + if (can_wait) + usleep_range(1000, 5000); rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv, recv->r_ibinc, sg_page(&recv->r_frag->f_sg), (long) ib_sg_dma_address( The fix is simply to move the rdsdebug() statement up before the ib_post_recv() and remove the printing of ret, which is taken care of anyway by the non-debug code. Signed-off-by: Håkon Bugge Reviewed-by: Knut Omang Reviewed-by: Wei Lin Guay Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller --- net/rds/ib_recv.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c index 9722bf839d9d..b4e421aa9727 100644 --- a/net/rds/ib_recv.c +++ b/net/rds/ib_recv.c @@ -410,14 +410,14 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp) break; } - /* XXX when can this fail? */ - ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); - rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv, + rdsdebug("recv %p ibinc %p page %p addr %lu\n", recv, recv->r_ibinc, sg_page(&recv->r_frag->f_sg), (long) ib_sg_dma_address( ic->i_cm_id->device, - &recv->r_frag->f_sg), - ret); + &recv->r_frag->f_sg)); + + /* XXX when can this fail? */ + ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); if (ret) { rds_ib_conn_error(conn, "recv post on " "%pI4 returned %d, disconnecting and " -- cgit v1.2.3 From b8cce68bf1f1b773ac1a535707f968512b3c1e5f Mon Sep 17 00:00:00 2001 From: Huy Nguyen Date: Sun, 29 Oct 2017 22:40:56 -0500 Subject: net/mlx5: Loop over temp list to release delay events list_splice_init initializing waiting_events_list after splicing it to temp list, therefore we should loop over temp list to fire the events. Fixes: 4ca637a20a52 ("net/mlx5: Delay events till mlx5 interface's add complete for pci resume") Signed-off-by: Huy Nguyen Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c index fc281712869b..17b723218b0c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c @@ -93,7 +93,7 @@ static void delayed_event_release(struct mlx5_device_context *dev_ctx, list_splice_init(&priv->waiting_events_list, &temp); if (!dev_ctx->context) goto out; - list_for_each_entry_safe(de, n, &priv->waiting_events_list, list) + list_for_each_entry_safe(de, n, &temp, list) dev_ctx->intf->event(dev, dev_ctx->context, de->event, de->param); out: -- cgit v1.2.3 From d2aa060d40fa060e963f9a356d43481e43ba3dac Mon Sep 17 00:00:00 2001 From: Huy Nguyen Date: Tue, 26 Sep 2017 15:11:56 -0500 Subject: net/mlx5: Cancel health poll before sending panic teardown command After the panic teardown firmware command, health_care detects the error in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is no longer needed because the panic teardown firmware command will bring down the PCI bus communication with the HCA. The solution is to cancel the health care timer and its pending workqueue request before sending panic teardown firmware command. Kernel trace: mlx5_core 0033:01:00.0: Shutdown was called mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1 mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0 mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end Unable to handle kernel paging request for data at address 0x0000003f Faulting instruction address: 0xc0080000434b8c80 Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow') Signed-off-by: Huy Nguyen Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 0d2c8dcd6eae..06562c9a6b9c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1482,9 +1482,16 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev) return -EAGAIN; } + /* Panic tear down fw command will stop the PCI bus communication + * with the HCA, so the health polll is no longer needed. + */ + mlx5_drain_health_wq(dev); + mlx5_stop_health_poll(dev); + ret = mlx5_cmd_force_teardown_hca(dev); if (ret) { mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret); + mlx5_start_health_poll(dev); return ret; } -- cgit v1.2.3 From 2a8d6065e7b90ad9d5540650944d802b0f86bdfe Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 31 Oct 2017 15:34:00 -0700 Subject: net/mlx5e: Fix napi poll with zero budget napi->poll can be called with budget 0, e.g. in netpoll scenarios where the caller only wants to poll TX rings (poll_one_napi@net/core/netpoll.c). The below commit changed RX polling from "while" loop to "do {} while", which caused to ignore the initial budget and handle at least one RX packet. This fixes the following warning: [ 2852.049194] mlx5e_napi_poll+0x0/0x260 [mlx5_core] exceeded budget in poll [ 2852.049195] ------------[ cut here ]------------ [ 2852.049195] WARNING: CPU: 0 PID: 25691 at net/core/netpoll.c:171 netpoll_poll_dev+0x18a/0x1a0 Fixes: 4b7dfc992514 ("net/mlx5e: Early-return on empty completion queues") Signed-off-by: Saeed Mahameed Reviewed-by: Tariq Toukan Reported-by: Martin KaFai Lau Tested-by: Martin KaFai Lau Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c index e906b754415c..ab92298eafc3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c @@ -49,7 +49,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget) struct mlx5e_channel *c = container_of(napi, struct mlx5e_channel, napi); bool busy = false; - int work_done; + int work_done = 0; int i; for (i = 0; i < c->num_tc; i++) @@ -58,15 +58,17 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget) if (c->xdp) busy |= mlx5e_poll_xdpsq_cq(&c->rq.xdpsq.cq); - work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget); - busy |= work_done == budget; + if (likely(budget)) { /* budget=0 means: don't poll rx rings */ + work_done = mlx5e_poll_rx_cq(&c->rq.cq, budget); + busy |= work_done == budget; + } busy |= c->rq.post_wqes(&c->rq); if (busy) { if (likely(mlx5e_channel_no_affinity_change(c))) return budget; - if (work_done == budget) + if (budget && work_done == budget) work_done--; } -- cgit v1.2.3 From 2e50b2619538ea0224c037f6fa746023089e0654 Mon Sep 17 00:00:00 2001 From: Inbar Karmy Date: Sun, 15 Oct 2017 17:30:59 +0300 Subject: net/mlx5e: Set page to null in case dma mapping fails Currently, when dma mapping fails, put_page is called, but the page is not set to null. Later, in the page_reuse treatment in mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time, improperly doing dma_unmap (for a non-mapped address) and an extra put_page. Prevent this by nullifying the page pointer when dma_map fails. Fixes: accd58833237 ("net/mlx5e: Introduce RX Page-Reuse") Signed-off-by: Inbar Karmy Reviewed-by: Tariq Toukan Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 15a1687483cc..91b1b0938931 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -215,22 +215,20 @@ static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq, static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - struct page *page; - if (mlx5e_rx_cache_get(rq, dma_info)) return 0; - page = dev_alloc_pages(rq->buff.page_order); - if (unlikely(!page)) + dma_info->page = dev_alloc_pages(rq->buff.page_order); + if (unlikely(!dma_info->page)) return -ENOMEM; - dma_info->addr = dma_map_page(rq->pdev, page, 0, + dma_info->addr = dma_map_page(rq->pdev, dma_info->page, 0, RQ_PAGE_SIZE(rq), rq->buff.map_dir); if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) { - put_page(page); + put_page(dma_info->page); + dma_info->page = NULL; return -ENOMEM; } - dma_info->page = page; return 0; } -- cgit v1.2.3 From d1c61e6d79ea0d4d53dc18bcd2db30ef2d99cfa7 Mon Sep 17 00:00:00 2001 From: Eugenia Emantayev Date: Thu, 12 Jan 2017 17:11:45 +0200 Subject: net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs This is to prevent the case of working with a single MPWQE (1 WQE is always reserved as RQ is linked-list). When the WQE is fully consumed, HW should still have available buffer in order not to drop packets. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev Reviewed-by: Tariq Toukan Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index cc13d3dbd366..13b5ef9d8703 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -67,7 +67,7 @@ #define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE 0xa #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE 0xd -#define MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW 0x1 +#define MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW 0x2 #define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW 0x3 #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW 0x6 -- cgit v1.2.3 From fb5f0b3ef69b95e665e4bbe8a3de7201f09f1071 Mon Sep 17 00:00:00 2001 From: Richard Schütz Date: Sun, 29 Oct 2017 13:03:22 +0100 Subject: can: c_can: don't indicate triple sampling support for D_CAN MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The D_CAN controller doesn't provide a triple sampling mode, so don't set the CAN_CTRLMODE_3_SAMPLES flag in ctrlmode_supported. Currently enabling triple sampling is a no-op. Signed-off-by: Richard Schütz Cc: linux-stable # >= v3.6 Signed-off-by: Marc Kleine-Budde --- drivers/net/can/c_can/c_can_pci.c | 1 - drivers/net/can/c_can/c_can_platform.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c index cf7c18947189..d065c0e2d18e 100644 --- a/drivers/net/can/c_can/c_can_pci.c +++ b/drivers/net/can/c_can/c_can_pci.c @@ -178,7 +178,6 @@ static int c_can_pci_probe(struct pci_dev *pdev, break; case BOSCH_D_CAN: priv->regs = reg_map_d_can; - priv->can.ctrlmode_supported |= CAN_CTRLMODE_3_SAMPLES; break; default: ret = -EINVAL; diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c index 46a746ee80bb..b5145a7f874c 100644 --- a/drivers/net/can/c_can/c_can_platform.c +++ b/drivers/net/can/c_can/c_can_platform.c @@ -320,7 +320,6 @@ static int c_can_plat_probe(struct platform_device *pdev) break; case BOSCH_D_CAN: priv->regs = reg_map_d_can; - priv->can.ctrlmode_supported |= CAN_CTRLMODE_3_SAMPLES; priv->read_reg = c_can_plat_read_reg_aligned_to_16bit; priv->write_reg = c_can_plat_write_reg_aligned_to_16bit; priv->read_reg32 = d_can_plat_read_reg32; -- cgit v1.2.3 From 4dcf924c2eda0c47a5c53b7703e3dc65ddaa8920 Mon Sep 17 00:00:00 2001 From: Gerhard Bertelsmann Date: Mon, 6 Nov 2017 18:16:56 +0100 Subject: can: sun4i: handle overrun in RX FIFO SUN4Is CAN IP has a 64 byte deep FIFO buffer. If the buffer is not drained fast enough (overrun) it's getting mangled. Already received frames are dropped - the data can't be restored. Signed-off-by: Gerhard Bertelsmann Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- drivers/net/can/sun4i_can.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c index b0c80859f746..1ac2090a1721 100644 --- a/drivers/net/can/sun4i_can.c +++ b/drivers/net/can/sun4i_can.c @@ -539,6 +539,13 @@ static int sun4i_can_err(struct net_device *dev, u8 isrc, u8 status) } stats->rx_over_errors++; stats->rx_errors++; + + /* reset the CAN IP by entering reset mode + * ignoring timeout error + */ + set_reset_mode(dev); + set_normal_mode(dev); + /* clear bit */ sun4i_can_write_cmdreg(priv, SUN4I_CMD_CLEAR_OR_FLAG); } @@ -653,8 +660,9 @@ static irqreturn_t sun4i_can_interrupt(int irq, void *dev_id) netif_wake_queue(dev); can_led_event(dev, CAN_LED_EVENT_TX); } - if (isrc & SUN4I_INT_RBUF_VLD) { - /* receive interrupt */ + if ((isrc & SUN4I_INT_RBUF_VLD) && + !(isrc & SUN4I_INT_DATA_OR)) { + /* receive interrupt - don't read if overrun occurred */ while (status & SUN4I_STA_RBUF_RDY) { /* RX buffer is not empty */ sun4i_can_rx(dev); -- cgit v1.2.3 From 4cbdd0ee67191481ec57ceed94febdfef95c9f25 Mon Sep 17 00:00:00 2001 From: Stephane Grosjean Date: Thu, 9 Nov 2017 14:42:14 +0100 Subject: can: peak: Add support for new PCIe/M2 CAN FD interfaces This adds support for the following PEAK-System CAN FD interfaces: PCAN-cPCIe FD CAN FD Interface for cPCI Serial (2 or 4 channels) PCAN-PCIe/104-Express CAN FD Interface for PCIe/104-Express (1, 2 or 4 ch.) PCAN-miniPCIe FD CAN FD Interface for PCIe Mini (1, 2 or 4 channels) PCAN-PCIe FD OEM CAN FD Interface for PCIe OEM version (1, 2 or 4 ch.) PCAN-M.2 CAN FD Interface for M.2 (1 or 2 channels) Like the PCAN-PCIe FD interface, all of these boards run the same IP Core that is able to handle CAN FD (see also http://www.peak-system.com). Signed-off-by: Stephane Grosjean Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- drivers/net/can/peak_canfd/peak_pciefd_main.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/peak_canfd/peak_pciefd_main.c b/drivers/net/can/peak_canfd/peak_pciefd_main.c index 51c2d182a33a..b4efd711f824 100644 --- a/drivers/net/can/peak_canfd/peak_pciefd_main.c +++ b/drivers/net/can/peak_canfd/peak_pciefd_main.c @@ -29,14 +29,19 @@ #include "peak_canfd_user.h" MODULE_AUTHOR("Stephane Grosjean "); -MODULE_DESCRIPTION("Socket-CAN driver for PEAK PCAN PCIe FD family cards"); -MODULE_SUPPORTED_DEVICE("PEAK PCAN PCIe FD CAN cards"); +MODULE_DESCRIPTION("Socket-CAN driver for PEAK PCAN PCIe/M.2 FD family cards"); +MODULE_SUPPORTED_DEVICE("PEAK PCAN PCIe/M.2 FD CAN cards"); MODULE_LICENSE("GPL v2"); #define PCIEFD_DRV_NAME "peak_pciefd" #define PEAK_PCI_VENDOR_ID 0x001c /* The PCI device and vendor IDs */ #define PEAK_PCIEFD_ID 0x0013 /* for PCIe slot cards */ +#define PCAN_CPCIEFD_ID 0x0014 /* for Compact-PCI Serial slot cards */ +#define PCAN_PCIE104FD_ID 0x0017 /* for PCIe-104 Express slot cards */ +#define PCAN_MINIPCIEFD_ID 0x0018 /* for mini-PCIe slot cards */ +#define PCAN_PCIEFD_OEM_ID 0x0019 /* for PCIe slot OEM cards */ +#define PCAN_M2_ID 0x001a /* for M2 slot cards */ /* PEAK PCIe board access description */ #define PCIEFD_BAR0_SIZE (64 * 1024) @@ -203,6 +208,11 @@ struct pciefd_board { /* supported device ids. */ static const struct pci_device_id peak_pciefd_tbl[] = { {PEAK_PCI_VENDOR_ID, PEAK_PCIEFD_ID, PCI_ANY_ID, PCI_ANY_ID,}, + {PEAK_PCI_VENDOR_ID, PCAN_CPCIEFD_ID, PCI_ANY_ID, PCI_ANY_ID,}, + {PEAK_PCI_VENDOR_ID, PCAN_PCIE104FD_ID, PCI_ANY_ID, PCI_ANY_ID,}, + {PEAK_PCI_VENDOR_ID, PCAN_MINIPCIEFD_ID, PCI_ANY_ID, PCI_ANY_ID,}, + {PEAK_PCI_VENDOR_ID, PCAN_PCIEFD_OEM_ID, PCI_ANY_ID, PCI_ANY_ID,}, + {PEAK_PCI_VENDOR_ID, PCAN_M2_ID, PCI_ANY_ID, PCI_ANY_ID,}, {0,} }; -- cgit v1.2.3 From 7ec318feeed10a64c0359ec4d10889cb4defa39a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 7 Nov 2017 15:15:04 -0800 Subject: tcp: gso: avoid refcount_t warning from tcp_gso_segment() When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1 and N2, we want to transfer socket ownership to the new fresh skbs. In order to avoid expensive atomic operations on a cache line subject to cache bouncing, we replace the sequence : refcount_add(N1, &sk->sk_wmem_alloc); refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments refcount_sub(O, &sk->sk_wmem_alloc); by a single refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc); Problem is : In some pathological cases, sum(N) - O might be a negative number, and syzkaller bot was apparently able to trigger this trace [1] atomic_t was ok with this construct, but we need to take care of the negative delta with refcount_t [1] refcount_t: saturated; leaking memory. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1c4/0x1e0 kernel/panic.c:546 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:260 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282 RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000 RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68 RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f refcount_add+0x1b/0x60 lib/refcount.c:101 tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155 tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51 inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271 skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749 __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821 skb_gso_segment include/linux/netdevice.h:3971 [inline] validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074 __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497 dev_queue_xmit+0x17/0x20 net/core/dev.c:3538 neigh_hh_output include/net/neighbour.h:471 [inline] neigh_output include/net/neighbour.h:479 [inline] ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229 ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:238 [inline] ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:459 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137 tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341 __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513 tcp_push_pending_frames include/net/tcp.h:1722 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline] tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2776 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x31c/0x890 net/socket.c:2048 __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138 Fixes: 14afee4b6092 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller --- net/ipv4/tcp_offload.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 11f69bbf9307..b6a2aa1dcf56 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -149,11 +149,19 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, * is freed by GSO engine */ if (copy_destructor) { + int delta; + swap(gso_skb->sk, skb->sk); swap(gso_skb->destructor, skb->destructor); sum_truesize += skb->truesize; - refcount_add(sum_truesize - gso_skb->truesize, - &skb->sk->sk_wmem_alloc); + delta = sum_truesize - gso_skb->truesize; + /* In some pathological cases, delta can be negative. + * We need to either use refcount_add() or refcount_sub_and_test() + */ + if (likely(delta >= 0)) + refcount_add(delta, &skb->sk->sk_wmem_alloc); + else + WARN_ON_ONCE(refcount_sub_and_test(-delta, &skb->sk->sk_wmem_alloc)); } delta = htonl(oldlen + (skb_tail_pointer(skb) - -- cgit v1.2.3 From 0eb96bf754d7fa6635aa0b0f6650c74b8a6b1cc9 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Tue, 7 Nov 2017 15:33:43 -0800 Subject: tcp: fix tcp_fastretrans_alert warning This patch fixes the cause of an WARNING indicatng TCP has pending retransmission in Open state in tcp_fastretrans_alert(). The root cause is a bad interaction between path mtu probing, if enabled, and the RACK loss detection. Upong receiving a SACK above the sequence of the MTU probing packet, RACK could mark the probe packet lost in tcp_fastretrans_alert(), prior to calling tcp_simple_retransmit(). tcp_simple_retransmit() only enters Loss state if it newly marks the probe packet lost. If the probe packet is already identified as lost by RACK, the sender remains in Open state with some packets marked lost and retransmitted. Then the next SACK would trigger the warning. The likely scenario is that the probe packet was lost due to its size or network congestion. The actual impact of this warning is small by potentially entering fast recovery an ACK later. The simple fix is always entering recovery (Loss) state if some packet is marked lost during path MTU probing. Fixes: a0370b3f3f2c ("tcp: enable RACK loss detection to trigger recovery") Reported-by: Oleksandr Natalenko Reported-by: Alexei Starovoitov Reported-by: Roman Gushchin Signed-off-by: Yuchung Cheng Reviewed-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b2fc7163bd40..b6bb3cdfad09 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2615,7 +2615,6 @@ void tcp_simple_retransmit(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; unsigned int mss = tcp_current_mss(sk); - u32 prior_lost = tp->lost_out; tcp_for_write_queue(skb, sk) { if (skb == tcp_send_head(sk)) @@ -2632,7 +2631,7 @@ void tcp_simple_retransmit(struct sock *sk) tcp_clear_retrans_hints_partial(tp); - if (prior_lost == tp->lost_out) + if (!tp->lost_out) return; if (tcp_is_reno(tp)) -- cgit v1.2.3 From 4f7116757b4bd99e4ef2636c7d957a6d63035d11 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Fri, 10 Nov 2017 11:22:39 +0100 Subject: can: ifi: Fix transmitter delay calculation The CANFD transmitter delay calculation formula was updated in the latest software drop from IFI and improves the behavior of the IFI CANFD core during bitrate switching. Use the new formula to improve stability of the CANFD operation. Signed-off-by: Marek Vasut Cc: Markus Marb Cc: linux-stable Signed-off-by: Marc Kleine-Budde --- drivers/net/can/ifi_canfd/ifi_canfd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c b/drivers/net/can/ifi_canfd/ifi_canfd.c index 4d1fe8d95042..2772d05ff11c 100644 --- a/drivers/net/can/ifi_canfd/ifi_canfd.c +++ b/drivers/net/can/ifi_canfd/ifi_canfd.c @@ -670,9 +670,9 @@ static void ifi_canfd_set_bittiming(struct net_device *ndev) priv->base + IFI_CANFD_FTIME); /* Configure transmitter delay */ - tdc = (dbt->brp * (dbt->phase_seg1 + 1)) & IFI_CANFD_TDELAY_MASK; - writel(IFI_CANFD_TDELAY_EN | IFI_CANFD_TDELAY_ABS | tdc, - priv->base + IFI_CANFD_TDELAY); + tdc = dbt->brp * (dbt->prop_seg + dbt->phase_seg1); + tdc &= IFI_CANFD_TDELAY_MASK; + writel(IFI_CANFD_TDELAY_EN | tdc, priv->base + IFI_CANFD_TDELAY); } static void ifi_canfd_set_filter(struct net_device *ndev, const u32 id, -- cgit v1.2.3 From b0b38a1c6684b10dd0462bef4fef038917115012 Mon Sep 17 00:00:00 2001 From: Vivien Didelot Date: Wed, 8 Nov 2017 10:49:56 -0500 Subject: net: dsa: return after mdb prepare phase The current code does not return after successfully preparing the MDB addition on every ports member of a multicast group. Fix this. Fixes: a1a6b7ea7f2d ("net: dsa: add cross-chip multicast support") Reported-by: Egil Hjelmeland Signed-off-by: Vivien Didelot Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- net/dsa/switch.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/dsa/switch.c b/net/dsa/switch.c index e6c06aa349a6..73746fa148f1 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -133,6 +133,8 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds, if (err) return err; } + + return 0; } for_each_set_bit(port, group, ds->num_ports) -- cgit v1.2.3 From 2118df93b5e1742931da358c2b8804c46fd64490 Mon Sep 17 00:00:00 2001 From: Vivien Didelot Date: Wed, 8 Nov 2017 10:50:10 -0500 Subject: net: dsa: return after vlan prepare phase The current code does not return after successfully preparing the VLAN addition on every ports member of a it. Fix this. Fixes: 1ca4aa9cd4cc ("net: dsa: check VLAN capability of every switch") Signed-off-by: Vivien Didelot Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- net/dsa/switch.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/dsa/switch.c b/net/dsa/switch.c index 73746fa148f1..1e2929f4290a 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -182,6 +182,8 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds, if (err) return err; } + + return 0; } for_each_set_bit(port, members, ds->num_ports) -- cgit v1.2.3 From 052d41c01b3a2e3371d66de569717353af489d63 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Thu, 9 Nov 2017 16:43:13 -0800 Subject: vlan: fix a use-after-free in vlan_device_event() After refcnt reaches zero, vlan_vid_del() could free dev->vlan_info via RCU: RCU_INIT_POINTER(dev->vlan_info, NULL); call_rcu(&vlan_info->rcu, vlan_info_rcu_free); However, the pointer 'grp' still points to that memory since it is set before vlan_vid_del(): vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) goto out; grp = &vlan_info->grp; Depends on when that RCU callback is scheduled, we could trigger a use-after-free in vlan_group_for_each_dev() right following this vlan_vid_del(). Fix it by moving vlan_vid_del() before setting grp. This is also symmetric to the vlan_vid_add() we call in vlan_device_event(). Reported-by: Fengguang Wu Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") Cc: Alexander Duyck Cc: Linus Torvalds Cc: Girish Moodalbail Signed-off-by: Cong Wang Reviewed-by: Girish Moodalbail Tested-by: Fengguang Wu Signed-off-by: David S. Miller --- net/8021q/vlan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 9649579b5b9f..4a72ee4e2ae9 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -423,9 +426,6 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, struct net_device *tmp; LIST_HEAD(close_list); - if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags; -- cgit v1.2.3