summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-06-15l2tp: prevent pppol2tp_connect() from creating kernel socketsGuillaume Nault1-0/+9
If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets the relevant fields to zero, tricking l2tp_tunnel_create() into setting up an unusable kernel socket. We can't set 'tcfg' with the required fields because there's no way to get them from the current connect() parameters. So let's restrict kernel sockets creation to the netlink API, which is the original use case. Fixes: 789a4a2c61d8 ("l2tp: Add support for static unmanaged L2TPv3 tunnels") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15l2tp: only accept PPP sessions in pppol2tp_connect()Guillaume Nault1-0/+6
l2tp_session_priv() returns a struct pppol2tp_session pointer only for PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess structure, which is much smaller than struct pppol2tp_session. This leads to invalid memory dereference when trying to lock ps->sk_lock. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15l2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()Guillaume Nault1-0/+1
Define cfg.pw_type so that the new session is created with its .pwtype field properly set (L2TP_PWTYPE_PPP). Not setting the pseudo-wire type had several annoying effects: * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when dumping sessions with the netlink API. * Impossibility to delete the session using the netlink API (because l2tp_nl_cmd_session_delete() gets the deletion callback function from an array indexed by the session's pseudo-wire type). Also, there are several cases where we should check a session's pseudo-wire type. For example, pppol2tp_connect() should refuse to connect a session that is not PPPoL2TP, but that requires the session's .pwtype field to be properly set. Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15Merge branch 'emaclite-fixes'David S. Miller1-8/+4
Radhey Shyam Pandey says: ==================== emaclite bug fixes and code cleanup This patch series fixes bug in emaclite remove and mdio_setup routines. It does minor code cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15net: emaclite: Remove xemaclite_mdio_setup return checkRadhey Shyam Pandey1-3/+1
Errors are already reported in xemaclite_mdio_setup so avoid reporting it again. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15net: emaclite: Remove unused 'has_mdio' flag.Radhey Shyam Pandey1-2/+0
Remove unused 'has_mdio' flag. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15net: emaclite: Fix MDIO bus unregister bugRadhey Shyam Pandey1-1/+1
Since 'has_mdio' flag is not used,sequence insmod->rmmod-> insmod leads to failure as MDIO unregister doesn't happen in .remove(). Fix it by checking MII bus pointer instead. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15net: emaclite: Fix position of lp->mii_bus assignmentRadhey Shyam Pandey1-2/+2
To ensure MDIO bus is not double freed in remove() path assign lp->mii_bus after MDIO bus registration. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15tcp: verify the checksum of the first data segment in a new connectionFrank van der Linden2-0/+8
commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") introduced an optimization for the handling of child sockets created for a new TCP connection. But this optimization passes any data associated with the last ACK of the connection handshake up the stack without verifying its checksum, because it calls tcp_child_process(), which in turn calls tcp_rcv_state_process() directly. These lower-level processing functions do not do any checksum verification. Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to fix this. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15net: qcom/emac: Add missing of_node_put()YueHaibing1-0/+1
Add missing of_node_put() call for device node returned by of_parse_phandle(). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14sctp: define sctp_packet_gso_append to build GSO framesXin Long2-10/+23
Now sctp GSO uses skb_gro_receive() to append the data into head skb frag_list. However it actually only needs very few code from skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most of its members are not needed here. This patch is to add sctp_packet_gso_append() to build GSO frames instead of skb_gro_receive(), and it would avoid many unnecessary checks and make the code clearer. Note that sctp will use page frags instead of frag_list to build GSO frames in another patch. But it may take time, as sctp's GSO frames may have different size. skb_segment() can only split it into the frags with the same size, which would break the border of sctp chunks. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller13-23/+52
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter patches for your net tree: 1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is not loaded, from Prashant Bhole. 2) Fix socket extension module autoload. 3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from the dynset extension. 4) Fix races with nf_tables module removal and netns exit path, patches from Florian Westphal. 5) Don't hit BUG_ON if jumpstack goes too deep, instead hit WARN_ON_ONCE, from Taehee Yoo. 6) Another NULL pointer dereference from ctnetlink, again if NAT is not loaded, from Florian Westphal. 7) Fix x_tables match list corruption in xt_connmark module removal path, also from Florian. 8) nf_conncount doesn't properly deal with conntrack zones, hence garbage collector may get rid of entries in a different zone. From Yi-Hung Wei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13xen/netfront: raise max number of slots in xennet_get_responses()Juergen Gross1-2/+2
The max number of slots used in xennet_get_responses() is set to MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD). In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This difference is resulting in frequent messages "too many slots" and a reduced network throughput for some workloads (factor 10 below that of a kernel-xen based guest). Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of the max number of slots to use solves that problem (tests showed no more messages "too many slots" and throughput was as high as with the kernel-xen based guest system). Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in netfront_tx_slot_available() for making it clearer what is really being tested without actually modifying the tested value. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13smc: convert to ->poll_maskCong Wang1-9/+3
smc->clcsock is an internal TCP socket, after TCP socket converts to ->poll_mask, ->poll doesn't exist any more. So just convert smc socket to ->poll_mask too. Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13net: stmmac: dwmac-meson8b: Fix an error handling path in ↵Christophe JAILLET1-3/+4
'meson8b_dwmac_probe()' If 'of_device_get_match_data()' fails, we need to release some resources as done in the other error handling path of this function. Fixes: efacb568c962 ("net: stmmac: dwmac-meson: extend phy mode setting") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13tc-testing: ife: fix wrong teardown command in test b7b8Davide Caratti1-1/+1
fix failures in the 'teardown' stage of test b7b8, probably a leftover of commit 7c5995b33d6e ("tc-testing: fixed copy-pasting error in ife tests") Fixes: a56e6bcd34b55 ("tc-testing: updated ife test cases") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13net: thunderx: prevent concurrent data re-writing by nicvf_set_rx_modeVadim Lomovtsev2-14/+38
For each network interface linux network stack issue ndo_set_rx_mode call in order to configure MAC address filters (e.g. for multicast filtering). Currently ThunderX NICVF driver has only one ordered workqueue to process such requests for all VFs. And because of that it is possible that subsequent call to ndo_set_rx_mode would corrupt data which is currently in use by nicvf_set_rx_mode_task. Which in turn could cause following issue: [...] [ 48.978341] Unable to handle kernel paging request at virtual address 1fffff0000000000 [ 48.986275] Mem abort info: [ 48.989058] Exception class = DABT (current EL), IL = 32 bits [ 48.994965] SET = 0, FnV = 0 [ 48.998020] EA = 0, S1PTW = 0 [ 49.001152] Data abort info: [ 49.004022] ISV = 0, ISS = 0x00000004 [ 49.007869] CM = 0, WnR = 0 [ 49.010826] [1fffff0000000000] address between user and kernel address ranges [ 49.017963] Internal error: Oops: 96000004 [#1] SMP [...] [ 49.072138] task: ffff800fdd675400 task.stack: ffff000026440000 [ 49.078051] PC is at prefetch_freepointer.isra.37+0x28/0x3c [ 49.083613] LR is at kmem_cache_alloc_trace+0xc8/0x1fc [...] [ 49.272684] [<ffff0000082738f0>] prefetch_freepointer.isra.37+0x28/0x3c [ 49.279286] [<ffff000008276bc8>] kmem_cache_alloc_trace+0xc8/0x1fc [ 49.285455] [<ffff0000082c0c0c>] alloc_fdtable+0x78/0x134 [ 49.290841] [<ffff0000082c15c0>] dup_fd+0x254/0x2f4 [ 49.295709] [<ffff0000080d1954>] copy_process.isra.38.part.39+0x64c/0x1168 [ 49.302572] [<ffff0000080d264c>] _do_fork+0xfc/0x3b0 [ 49.307524] [<ffff0000080d29e8>] SyS_clone+0x44/0x50 [...] This patch is to prevent such concurrent data write with spinlock. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13net: phy: mdio-gpio: Cut surplus includesLinus Walleij1-3/+0
The GPIO MDIO driver now needs only <linux/gpio/consumer.h> so cut the legacy <linux/gpio.h> and <linux/of_gpio.h> includes that are no longer used. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13Merge branch 'hv_netvsc-notification-and-namespace-fixes'David S. Miller3-63/+184
Stephen Hemminger says: ==================== hv_netvsc: notification and namespace fixes This set of patches addresses two set of fixes. First it backs out the common callback model which was merged in net-next without completing all the review feedback or getting maintainer approval. Then it fixes the transparent VF management code to handle network namespaces. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13hv_netvsc: move VF to same namespace as netvsc deviceStephen Hemminger1-1/+20
When VF is added, the paravirtual device is already present and may have been moved to another network namespace. For example, sometimes the management interface is put in another net namespace in some environments. The VF should get moved to where the netvsc device is when the VF is discovered. The user can move it later (if desired). Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13hv_netvsc: fix network namespace issues with VF supportStephen Hemminger2-23/+22
When finding the parent netvsc device, the search needs to be across all netvsc device instances (independent of network namespace). Find parent device of VF using upper_dev_get routine which searches only adjacent list. Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> netns aware byref Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13hv_netvsc: drop common code until callback model fixedStephen Hemminger3-62/+165
The callback model of handling network failover is not suitable in the current form. 1. It was merged without addressing all the review feedback. 2. It was merged without approval of any of the netvsc maintainers. 3. Design discussion on how to handle PV/VF fallback is still not complete. 4. IMHO the code model using callbacks is trying to make something common which isn't. Revert the netvsc specific changes for now. Does not impact ongoing development of failover model for virtio. Revisit this after a simpler library based failover kernel routines are extracted. This reverts commit 9c6ffbacdb57 ("hv_netvsc: fix error return code in netvsc_probe()") and commit 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13Merge branch 'nfp-fixes'David S. Miller5-7/+11
Jakub Kicinski says: ==================== nfp: fix a warning, stats, naming and route leak Various fixes for the NFP. Patch 1 fixes a harmless GCC 8 warning. Patch 2 ensures statistics are correct after users decrease the number of channels/rings. Patch 3 restores phy_port_name behaviour for flower, ndo_get_phy_port_name used to return -EOPNOTSUPP on one of the netdevs, and we need to keep it that way otherwise interface names may change. Patch 4 fixes refcnt leak in flower tunnel offload code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13nfp: flower: free dst_entry in route tablePieter Jansen van Vuuren1-0/+2
We need to release the refcnt on dst_entry in the route table, otherwise we will leak the route. Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13nfp: remove phys_port_name on flower's vNICJakub Kicinski3-1/+6
.ndo_get_phys_port_name was recently extended to support multi-vNIC FWs. These are firmwares which can have more than one vNIC per PF without associated port (e.g. Adaptive Buffer Management FW), therefore we need a way of distinguishing the vNICs. Unfortunately, it's too late to make flower use the same naming. Flower users may depend on .ndo_get_phys_port_name returning -EOPNOTSUPP, for example the name udev gave the PF vNIC was just the bare PCI device-based name before the change, and will have 'nn0' appended after. To ensure flower's vNIC doesn't have phys_port_name attribute, add a flag to vNIC struct and set it in flower code. New projects will not set the flag adhere to the naming scheme from the start. Fixes: 51c1df83e35c ("nfp: assign vNIC id as phys_port_name of vNICs which are not ports") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13nfp: include all ring counters in interface statsJakub Kicinski1-1/+1
We are gathering software statistics on per-ring basis. .ndo_get_stats64 handler adds the rings up. Unfortunately we are currently only adding up active rings, which means that if user decreases the number of active rings the statistics from deactivated rings will no longer be counted and total interface statistics may go backwards. Always sum all possible rings, the stats are allocated statically for max number of rings, so we don't have to worry about them being removed. We could add the stats up when user changes the ring count, but it seems unnecessary.. Adding up inactive rings will be very quick since no datapath will be touching them. Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13nfp: don't pad strings in nfp_cpp_resource_find() to avoid gcc 8 warningJakub Kicinski1-5/+2
Once upon a time nfp_cpp_resource_find() took a name parameter, which could be any user-chosen string. Resources are identified by a CRC32 hash of a 8 byte string, so we had to pad user input with zeros to make sure CRC32 gave the correct result. Since then nfp_cpp_resource_find() was made to operate on allocated resources only (struct nfp_resource). We kzalloc those so there is no need to pad the strings and use memcmp. This avoids a GCC 8 stringop-truncation warning: In function ‘nfp_cpp_resource_find’, inlined from ‘nfp_resource_try_acquire’ at .../nfpcore/nfp_resource.c:153:8, inlined from ‘nfp_resource_acquire’ at .../nfpcore/nfp_resource.c:206:9: .../nfpcore/nfp_resource.c:108:2: warning: strncpy’ output may be truncated copying 8 bytes from a string of length 8 [-Wstringop-truncation] strncpy(name_pad, res->name, sizeof(name_pad)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"Bart Van Assche1-14/+1
Revert the patch mentioned in the subject because it breaks at least the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi daemon to fail to start: Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot(). Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities. Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services. Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting. Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'. Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack. Fixes: f396922d862a ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets") Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12netfilter: nf_conncount: Fix garbage collection with zonesYi-Hung Wei3-6/+12
Currently, we use check_hlist() for garbage colleciton. However, we use the ‘zone’ from the counted entry to query the existence of existing entries in the hlist. This could be wrong when they are in different zones, and this patch fixes this issue. Fixes: e59ea3df3fc2 ("netfilter: xt_connlimit: honor conntrack zone if available") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: xt_connmark: fix list corruption on rmmodFlorian Westphal1-1/+1
This needs to use xt_unregister_targets, else new revision is left on the list which then causes list to point to a target struct that has been free'd. Fixes: 472a73e00757 ("netfilter: xt_conntrack: Support bit-shifting for CONNMARK & MARK targets.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: ctnetlink: avoid null pointer dereferenceFlorian Westphal1-1/+2
Dan Carpenter points out that deref occurs after NULL check, we should re-fetch the pointer and check that instead. Fixes: 2c205dd3981f7 ("netfilter: add struct nf_nat_hook and use it") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()Taehee Yoo1-1/+2
When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain crashes. But there is no need to crash hard here. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: nf_tables: close race between netns exit and rmmodFlorian Westphal2-3/+15
If net namespace is exiting while nf_tables module is being removed we can oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: nf_tables(-) nfnetlink [..] unregister_netdevice_notifier+0xdd/0x130 nf_tables_module_exit+0x24/0x3a [nf_tables] SyS_delete_module+0x1c5/0x240 do_syscall_64+0x74/0x190 Avoid this by attempting to take reference on the net namespace from the notifiers. If it fails the namespace is exiting already, and nft core is taking care of cleanup work. We also need to make sure the netdev hook type gets removed before netns ops removal, else notifier might be invoked with device event for a netns where net->nft was never initialised (because pernet ops was removed beforehand). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: nf_tables: fix module unload raceFlorian Westphal2-6/+16
We must first remove the nfnetlink protocol handler when nf_tables module is unloaded -- we don't want userspace to submit new change requests once we've started to tear down nft state. Furthermore, nfnetlink must not call any subsystem function after call_batch returned -EAGAIN. EAGAIN means the subsys mutex was dropped, so its unlikely but possible that nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu. Therefore, we must abort batch completely and not move on to next part of the batch. Last, we can't invoke ->abort unless we've checked that the subsystem is still registered. Change netns exit path of nf_tables to make sure any incompleted transaction gets removed on exit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: nft_dynset: do not reject set updates with NFT_SET_EVALPablo Neira Ayuso2-4/+2
NFT_SET_EVAL is signalling the kernel that this sets can be updated from the evaluation path, even if there are no expressions attached to the element. Otherwise, set updates with no expressions fail. Update description to describe the right semantics. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: nft_socket: fix module autoloadPablo Neira Ayuso1-0/+1
Add alias definition for module autoload when adding socket rules. Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12netfilter: fix null-ptr-deref in nf_nat_decode_sessionPrashant Bhole1-1/+1
Add null check for nat_hook in nf_nat_decode_session() [ 195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14 [ 195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70 [ 195.653888] member access within null pointer of type 'struct nf_nat_hook' [ 195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ #5 [ 195.656320] Read of size 8 at addr 0000000000000008 by task ping/2469 [ 195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 195.658721] Call Trace: [ 195.661087] [ 195.669341] <IRQ> [ 195.670574] dump_stack+0xc6/0x150 [ 195.672156] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 195.674121] ? ubsan_prologue+0x31/0x92 [ 195.676546] ubsan_epilogue+0x9/0x49 [ 195.678159] handle_null_ptr_deref+0x11a/0x130 [ 195.679800] ? sprint_OID+0x1a0/0x1a0 [ 195.681322] __ubsan_handle_type_mismatch_v1+0xd5/0x11d [ 195.683146] ? ubsan_prologue+0x92/0x92 [ 195.684642] __xfrm_policy_check+0x18ef/0x1d70 [ 195.686294] ? rt_cache_valid+0x118/0x180 [ 195.687804] ? __xfrm_route_forward+0x410/0x410 [ 195.689463] ? fib_multipath_hash+0x700/0x700 [ 195.691109] ? kvm_sched_clock_read+0x23/0x40 [ 195.692805] ? pvclock_clocksource_read+0xf6/0x280 [ 195.694409] ? graph_lock+0xa0/0xa0 [ 195.695824] ? pvclock_clocksource_read+0xf6/0x280 [ 195.697508] ? pvclock_read_flags+0x80/0x80 [ 195.698981] ? kvm_sched_clock_read+0x23/0x40 [ 195.700347] ? sched_clock+0x5/0x10 [ 195.701525] ? sched_clock_cpu+0x18/0x1a0 [ 195.702846] tcp_v4_rcv+0x1d32/0x1de0 [ 195.704115] ? lock_repin_lock+0x70/0x270 [ 195.707072] ? pvclock_read_flags+0x80/0x80 [ 195.709302] ? tcp_v4_early_demux+0x4b0/0x4b0 [ 195.711833] ? lock_acquire+0x195/0x380 [ 195.714222] ? ip_local_deliver_finish+0xfc/0x770 [ 195.716967] ? raw_rcv+0x2b0/0x2b0 [ 195.718856] ? lock_release+0xa00/0xa00 [ 195.720938] ip_local_deliver_finish+0x1b9/0x770 [...] Fixes: 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12tcp: Do not reload skb pointer after skb_gro_receive().David Miller1-2/+0
This is not necessary. skb_gro_receive() will never change what 'head' points to. In it's original implementation (see commit 71d93b39e52e ("net: Add skb_gro_receive")), it did: ==================== + *head = nskb; + nskb->next = p->next; + p->next = NULL; ==================== This sequence was removed in commit 58025e46ea2d ("net: gro: remove obsolete code from skb_gro_receive()") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com>
2018-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-6/+15
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-12 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Avoid an allocation warning in AF_XDP by adding __GFP_NOWARN for the umem setup, from Björn. 2) Silence a warning in bpf fs when an application tries to open(2) a pinned bpf obj due to missing fops. Add a dummy open fop that continues to just bail out in such case, from Daniel. 3) Fix a BPF selftest urandom_read build issue where gcc complains that it gets built twice, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12Merge branch '10GbE' of ↵David S. Miller5-25/+48
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-06-11 This series contains fixes to ixgbe IPsec and MACVLAN. Alex provides the 5 fixes in this series, starting with fixing an issue where num_rx_pools was not being populated until after the queues and interrupts were reinitialized when enabling MACVLAN interfaces. Updated to use CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM, since the code requires CONFIG_XFRM_OFFLOAD to be enabled. Moved the IPsec initialization function to be more consistent with the placement of similar initialization functions and before the call to reset the hardware, which will clean up any link issues that may have been introduced. Fixed the boolean logic that was testing for transmit OR receive ready bits, when it should have been testing for transmit AND receive ready bits. Fixed the bit definitions for SECTXSTAT and SECRXSTAT registers and ensure that if IPsec is disabled on the part, do not enable it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12net/ipv6: Ensure cfg is properly initialized in ipv6_create_tempaddrDavid Ahern1-1/+1
Valdis reported a BUG in ipv6_add_addr: [ 1820.832682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000209 [ 1820.832728] RIP: 0010:ipv6_add_addr+0x280/0xd10 [ 1820.832732] Code: 49 8b 1f 0f 84 6a 0a 00 00 48 85 db 0f 84 4e 0a 00 00 48 8b 03 48 8b 53 08 49 89 45 00 49 8b 47 10 49 89 55 08 48 85 c0 74 15 <48> 8b 50 08 48 8b 00 49 89 95 b8 01 00 00 49 89 85 b0 01 00 00 4c [ 1820.832847] RSP: 0018:ffffaa07c2fd7880 EFLAGS: 00010202 [ 1820.832853] RAX: 0000000000000201 RBX: ffffaa07c2fd79b0 RCX: 0000000000000000 [ 1820.832858] RDX: a4cfbfba2cbfa64c RSI: 0000000000000000 RDI: ffffffff8a8e9fa0 [ 1820.832862] RBP: ffffaa07c2fd7920 R08: 000000000000017a R09: ffffffff8a555300 [ 1820.832866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888d18e71c00 [ 1820.832871] R13: ffff888d0a9b1200 R14: 0000000000000000 R15: ffffaa07c2fd7980 [ 1820.832876] FS: 00007faa51bdb800(0000) GS:ffff888d1d400000(0000) knlGS:0000000000000000 [ 1820.832880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1820.832885] CR2: 0000000000000209 CR3: 000000021e8f8001 CR4: 00000000001606e0 [ 1820.832888] Call Trace: [ 1820.832898] ? __local_bh_enable_ip+0x119/0x260 [ 1820.832904] ? ipv6_create_tempaddr+0x259/0x5a0 [ 1820.832912] ? __local_bh_enable_ip+0x139/0x260 [ 1820.832921] ipv6_create_tempaddr+0x2da/0x5a0 [ 1820.832926] ? ipv6_create_tempaddr+0x2da/0x5a0 [ 1820.832941] manage_tempaddrs+0x1a5/0x240 [ 1820.832951] inet6_addr_del+0x20b/0x3b0 [ 1820.832959] ? nla_parse+0xce/0x1e0 [ 1820.832968] inet6_rtm_deladdr+0xd9/0x210 [ 1820.832981] rtnetlink_rcv_msg+0x1d4/0x5f0 Looking at the code I found 1 element (peer_pfx) of the newly introduced ifa6_config struct that is not initialized. Use a memset rather than hard coding an init for each struct element. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Fixes: e6464b8c63619 ("net/ipv6: Convert ipv6_add_addr to struct ifa6_config") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12tls: fix NULL pointer dereference on pollDaniel Borkmann3-15/+12
While hacking on kTLS, I ran into the following panic from an unprivileged netserver / netperf TCP session: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0 Oops: 0010 [#1] SMP KASAN PTI CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88036abcf740 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26 RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280 RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48 R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280 R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350 FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? tls_sw_poll+0xa4/0x160 [tls] ? sock_poll+0x20a/0x680 ? do_select+0x77b/0x11a0 ? poll_schedule_timeout.constprop.12+0x130/0x130 ? pick_link+0xb00/0xb00 ? read_word_at_a_time+0x13/0x20 ? vfs_poll+0x270/0x270 ? deref_stack_reg+0xad/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 [...] Debugging further, it turns out that calling into ctx->sk_poll() is invalid since sk_poll itself is NULL which was saved from the original TCP socket in order for tls_sw_poll() to invoke it. Looks like the recent conversion from poll to poll_mask callback started in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed to eventually convert kTLS, too: TCP's ->poll was converted over to the ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") and therefore kTLS wrongly saved the ->poll old one which is now NULL. Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN | POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in tcp_poll_mask() as well that is mangled here. Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Watson <davejwatson@fb.com> Tested-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12xsk: silence warning on memory allocation failureBjörn Töpel1-1/+2
syzkaller reported a warning from xdp_umem_pin_pages(): WARNING: CPU: 1 PID: 4537 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996 ... __do_kmalloc mm/slab.c:3713 [inline] __kmalloc+0x25/0x760 mm/slab.c:3727 kmalloc_array include/linux/slab.h:634 [inline] kcalloc include/linux/slab.h:645 [inline] xdp_umem_pin_pages net/xdp/xdp_umem.c:205 [inline] xdp_umem_reg net/xdp/xdp_umem.c:318 [inline] xdp_umem_create+0x5c9/0x10f0 net/xdp/xdp_umem.c:349 xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531 __sys_setsockopt+0x1bd/0x390 net/socket.c:1935 __do_sys_setsockopt net/socket.c:1946 [inline] __se_sys_setsockopt net/socket.c:1943 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is a warning about attempting to allocate more than KMALLOC_MAX_SIZE memory. The request originates from userspace, and if the request is too big, the kernel is free to deny its allocation. In this patch, the failed allocation attempt is silenced with __GFP_NOWARN. Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: syzbot+4abadc5d69117b346506@syzkaller.appspotmail.com Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller15-36/+99
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Reject non-null terminated helper names from xt_CT, from Gao Feng. 2) Fix KASAN splat due to out-of-bound access from commit phase, from Alexey Kodanev. 3) Missing conntrack hook registration on IPVS FTP helper, from Julian Anastasov. 4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo. 5) Fix inverted check on packet xmit to non-local addresses, also from Julian. 6) Fix ebtables alignment compat problems, from Alin Nastac. 7) Hook mask checks are not correct in xt_set, from Serhey Popovych. 8) Fix timeout listing of element in ipsets, from Jozsef. 9) Cap maximum timeout value in ipset, also from Jozsef. 10) Don't allow family option for hash:mac sets, from Florent Fourcot. 11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this Florian. 12) Another bug reported by KASAN in the rbtree set backend, from Taehee Yoo. 13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT. From Gao Feng. 14) Missing initialization of match/target in ebtables, from Florian Westphal. 15) Remove useless nft_dup.h file in include path, from C. Labbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12net: dsa: add error handling for pskb_trim_rcsumZhouyang Jia1-1/+2
When pskb_trim_rcsum fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling pskb_trim_rcsum. Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12ipv6: allow PMTU exceptions to local routesJulian Anastasov1-3/+0
IPVS setups with local client and remote tunnel server need to create exception for the local virtual IP. What we do is to change PMTU from 64KB (on "lo") to 1460 in the common case. Suggested-by: Martin KaFai Lau <kafai@fb.com> Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Fixes: 7343ff31ebf0 ("ipv6: Don't create clones of host routes.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-11ixgbe: Fix bit definitions and add support for testing for ipsec supportAlexander Duyck2-3/+17
This patch addresses two issues. First it adds the correct bit definitions for the SECTXSTAT and SECRXSTAT registers. Then it makes use of those definitions to test for if IPsec has been disabled on the part and if so we do not enable it. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reported-by: Andre Tomt <andre@tomt.net> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-11ixgbe: Avoid loopback and fix boolean logic in ipsec_stop_dataAlexander Duyck1-2/+11
This patch fixes two issues. First we add an early test for the Tx and Rx security block ready bits. By doing this we can avoid the need for waits or loopback in the event that the security block is already flushed out. Secondly we fix the boolean logic that was testing for the Tx OR Rx ready bits being set and change it so that we only exit if the Tx AND Rx ready bits are both set. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-11ixgbe: Move ipsec init function to before reset callAlexander Duyck2-9/+9
This patch moves the IPsec init function in ixgbe_sw_init. This way it is a bit more consistent with the placement of similar initialization functions and is placed before the reset_hw call which should allow us to clean up any link issues that may be introduced by the fact that we force the link up if somehow the device had IPsec still enabled before the driver was loaded. In addition to the function move it is necessary to change the assignment of netdev->features. The easiest way to do this is to just test for the existence of adapter->ipsec and if it is present we set the feature bits. Fixes: 49a94d74d948 ("ixgbe: add ipsec engine start and stop routines") Reported-by: Andre Tomt <andre@tomt.net> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-11ixgbe: Use CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRMAlexander Duyck2-3/+3
There is no point in adding code if CONFIG_XFRM is defined that we won't use unless CONFIG_XFRM_OFFLOAD is defined. So instead of leaving this code floating around I am replacing the ifdef with what I believe is the correct one so that we only include the code and variables if they will actually be used. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>