summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2018-06-05net: skbuff.h: drop unneeded <linux/slab.h>Randy Dunlap1-2/+0
<linux/skbuff.h> does not use nor need <linux/slab.h>, so drop this header file from skbuff.h. <linux/skbuff.h> is currently #included in around 1200 C source and header files, making it the 31st most-used header file. Build tested [allmodconfig] on 20 arch-es. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03vlan: use non-archaic spelling of failesThadeu Lima de Souza Cascardo1-2/+2
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+3
Filling in the padding slot in the bpf structure as a bug fix in 'ne' overlapped with actually using that padding area for something in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02Merge tag 'staging-4.17-rc8' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Greg KH: "Here are some old IIO driver fixes that were sitting in my tree for a few weeks. Sorry about not getting them to you sooner. They fix a number of small IIO driver issues that have been reported. All of these have been in linux-next for a while with no reported problems" * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: select buffer for at91-sama5d2_adc iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels iio:kfifo_buf: check for uint overflow iio:buffer: make length types match kfifo types iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock iio: adc: stm32-dfsdm: fix successive oversampling settings iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-0/+1
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, the most relevant things in this batch are: 1) Compile masquerade infrastructure into NAT module, from Florian Westphal. Same thing with the redirection support. 2) Abort transaction if early initialization of the commit phase fails. Also from Florian. 3) Get rid of synchronize_rcu() by using rule array in nf_tables, from Florian. 4) Abort nf_tables batch if fatal signal is pending, from Florian. 5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless. From Florian Westphal. 6) Support to match transparent sockets from nf_tables, from Máté Eckl. 7) Audit support for nf_tables, from Phil Sutter. 8) Validate chain dependencies from commit phase, fall back to fine grain validation only in case of errors. 9) Attach dst to skbuff from netfilter flowtable packet path, from Jason A. Donenfeld. 10) Use artificial maximum attribute cap to remove VLA from nfnetlink. Patch from Kees Cook. 11) Add extension to allow to forward packets through neighbour layer. 12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov. 13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01netfilter: nf_tables: fix chain dependency validationPablo Neira Ayuso1-0/+1
The following ruleset: add table ip filter add chain ip filter input { type filter hook input priority 4; } add chain ip filter ap add rule ip filter input jump ap add rule ip filter ap masquerade results in a panic, because the masquerade extension should be rejected from the filter chain. The existing validation is missing a chain dependency check when the rule is added to the non-base chain. This patch fixes the problem by walking down the rules from the basechains, searching for either immediate or lookup expressions, then jumping to non-base chains and again walking down the rules to perform the expression validation, so we make sure the full ruleset graph is validated. This is done only once from the commit phase, in case of problem, we abort the transaction and perform fine grain validation for error reporting. This patch requires 003087911af2 ("netfilter: nfnetlink: allow commit to fail") to achieve this behaviour. This patch also adds a cleanup callback to nfnl batch interface to reset the validate state from the exit path. As a result of this patch, nf_tables_check_loops() doesn't use ->validate to check for loops, instead it just checks for immediate expressions. Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-31net/mlx5: Add FPGA QP error eventIlan Tayari3-0/+18
The FPGA queue pair (QP) event fires whenever a QP on the FPGA transitions to the error state. At this stage, this event is unrecoverable, it may become recoverable in the future. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net/mlx5: Add temperature warning event to logIlan Tayari2-1/+8
Temperature warning event is sent by FW to indicate high temperature as detected by one of the sensors on the board. Add handling of this event by writing the numbers of the alert sensors to the kernel log. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31qed*: Add link change count value to ethtool statistics display.Sudarsana Reddy Kalluru1-0/+1
This patch adds driver changes for capturing the link change count in ethtool statistics display. Please consider applying this to "net-next". Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net/ipv4: Add support for specifying metric of connected routesDavid Ahern1-0/+1
Add support for IFA_RT_PRIORITY to ipv4 addresses. If the metric is changed on an existing address then the new route is inserted before removing the old one. Since the metric is one of the route keys, the prefix route can not be replaced. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net: sched: mq: add simple offload notificationJakub Kicinski1-0/+1
mq offload is trivial, we just need to let the device know that the root qdisc is mq. Alternative approach would be to export qdisc_lookup() and make drivers check the root type themselves, but notification via ndo_setup_tc is more in line with other qdiscs. Note that mq doesn't hold any stats on it's own, it just adds up stats of its children. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29Merge tag 'mlx5e-updates-2018-05-25' of ↵David S. Miller2-5/+15
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-05-25 This series includes updates for mlx5e netdev driver. 1) Allowr flow based VF vport mirroring under sriov switchdev scheme, added support for offloading the TC mirred mirror sub-action, from Chris Mi. ================= From: Or Gerlitz <ogerlitz@mellanox.com> The user will typically set the actions order such that the mirror port (mirror VF) sees packets as the original port (VF under mirroring) sent them or as it will receive them. In the general case, it means that packets are potentially sent to the mirror port before or after some actions were applied on them. To properly do that, we follow on the exact action order as set for the flow and make sure this will also be the case when we program the HW offload. If all the actions should apply before forwarding to the mirror and dest port, mirroring is just multicasting to the two vports. Otherwise, we split the TC flow to two HW rules, where the 1st applies only the actions needed up to the mirror (if there are such) and the 2nd the rest of the actions plus the forwarding to the dest vport. ================= 2) Move to order-0 only allocations (using fragmented work queues) for all work queues used by the driver, RX and TX descriptor rings (RQs, SQs and Completion Queues (CQs)), from Tariq Toukan. 3) Avoid resetting netdevice statistics on netdevice state changes, from Eran Ben Elisha. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29ptp_qoriq: move some definitions to header fileYangbo Lu1-0/+141
This patch is to move some definitions in ptp_qoriq.c to the header file. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net: Introduce generic failover moduleSridhar Samudrala1-0/+16
The failover module provides a generic interface for paravirtual drivers to register a netdev and a set of ops with a failover instance. The ops are used as event handlers that get called to handle netdev register/ unregister/link change/name change events on slave pci ethernet devices with the same mac address as the failover netdev. This enables paravirtual drivers to use a VF as an accelerated low latency datapath. It also allows migration of VMs with direct attached VFs by failing over to the paravirtual datapath when the VF is unplugged. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller10-9/+36
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-4/+6
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kasan: fix memory hotplug during boot kasan: free allocated shadow memory on MEM_CANCEL_ONLINE checkpatch: fix macro argument precedence test init/main.c: include <linux/mem_encrypt.h> kernel/sys.c: fix potential Spectre v1 issue mm/memory_hotplug: fix leftover use of struct page during hotplug proc: fix smaps and meminfo alignment mm: do not warn on offline nodes unless the specific node is explicitly requested mm, memory_hotplug: make has_unmovable_pages more robust mm/kasan: don't vfree() nonexistent vm_area MAINTAINERS: change hugetlbfs maintainer and update files ipc/shm: fix shmat() nil address after round-down when remapping Revert "ipc/shm: Fix shmat mmap nil-page protection" idr: fix invalid ptr dereference on item delete ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" mm: fix nr_rotate_swap leak in swapon() error case
2018-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: "Let's begin the holiday weekend with some networking fixes: 1) Whoops need to restrict cfg80211 wiphy names even more to 64 bytes. From Eric Biggers. 2) Fix flags being ignored when using kernel_connect() with SCTP, from Xin Long. 3) Use after free in DCCP, from Alexey Kodanev. 4) Need to check rhltable_init() return value in ipmr code, from Eric Dumazet. 5) XDP handling fixes in virtio_net from Jason Wang. 6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu. 7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack Morgenstein. 8) Prevent out-of-bounds speculation using indexes in BPF, from Daniel Borkmann. 9) Fix regression added by AF_PACKET link layer cure, from Willem de Bruijn. 10) Correct ENIC dma mask, from Govindarajulu Varadarajan. 11) Missing config options for PMTU tests, from Stefano Brivio" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) ibmvnic: Fix partial success login retries selftests/net: Add missing config options for PMTU tests mlx4_core: allocate ICM memory in page size chunks enic: set DMA mask to 47 bit ppp: remove the PPPIOCDETACH ioctl ipv4: remove warning in ip_recv_error net : sched: cls_api: deal with egdev path only if needed vhost: synchronize IOTLB message with dev cleanup packet: fix reserve calculation net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands net/mlx5e: When RXFCS is set, add FCS data into checksum calculation bpf: properly enforce index mask to prevent out-of-bounds speculation net/mlx4: Fix irq-unsafe spinlock usage net: phy: broadcom: Fix bcm_write_exp() net: phy: broadcom: Fix auxiliary control register reads net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message ibmvnic: Only do H_EOI for mobility events tuntap: correctly set SOCKWQ_ASYNC_NOSPACE virtio-net: fix leaking page for gso packet during mergeable XDP ...
2018-05-26mm/memory_hotplug: fix leftover use of struct page during hotplugJonathan Cameron1-3/+5
The case of a new numa node got missed in avoiding using the node info from page_struct during hotplug. In this path we have a call to register_mem_sect_under_node (which allows us to specify it is hotplug so don't change the node), via link_mem_sections which unfortunately does not. Fix is to pass check_nid through link_mem_sections as well and disable it in the new numa node path. Note the bug only 'sometimes' manifests depending on what happens to be in the struct page structures - there are lots of them and it only needs to match one of them. The result of the bug is that (with a new memory only node) we never successfully call register_mem_sect_under_node so don't get the memory associated with the node in sysfs and meminfo for the node doesn't report it. It came up whilst testing some arm64 hotplug patches, but appears to be universal. Whilst I'm triggering it by removing then reinserting memory to a node with no other elements (thus making the node disappear then appear again), it appears it would happen on hotplugging memory where there was none before and it doesn't seem to be related the arm64 patches. These patches call __add_pages (where most of the issue was fixed by Pavel's patch). If there is a node at the time of the __add_pages call then all is well as it calls register_mem_sect_under_node from there with check_nid set to false. Without a node that function returns having not done the sysfs related stuff as there is no node to use. This is expected but it is the resulting path that fails... Exact path to the problem is as follows: mm/memory_hotplug.c: add_memory_resource() The node is not online so we enter the 'if (new_node)' twice, on the second such block there is a call to link_mem_sections which calls into drivers/node.c: link_mem_sections() which calls drivers/node.c: register_mem_sect_under_node() which calls get_nid_for_pfn and keeps trying until the output of that matches the expected node (passed all the way down from add_memory_resource) It is effectively the same fix as the one referred to in the fixes tag just in the code path for a new node where the comments point out we have to rerun the link creation because it will have failed in register_new_memory (as there was no node at the time). (actually that comment is wrong now as we don't have register_new_memory any more it got renamed to hotplug_memory_register in Pavel's patch). Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26mm: do not warn on offline nodes unless the specific node is explicitly ↵Michal Hocko1-1/+1
requested Oscar has noticed that we splat WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9 [...] CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: kacpi_hotplug acpi_hotplug_work_fn Call Trace: vmemmap_populate+0xf2/0x2ae sparse_mem_map_populate+0x28/0x35 sparse_add_one_section+0x4c/0x187 __add_pages+0xe7/0x1a0 add_pages+0x16/0x70 add_memory_resource+0xa3/0x1d0 add_memory+0xe4/0x110 acpi_memory_device_add+0x134/0x2e0 acpi_bus_attach+0xd9/0x190 acpi_bus_scan+0x37/0x70 acpi_device_hotplug+0x389/0x4e0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x146/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 ret_from_fork+0x35/0x40 when adding memory to a node that is currently offline. The VM_WARN_ON is just too loud without a good reason. In this particular case we are doing alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order) so we do not insist on allocating from the given node (it is more a hint) so we can fall back to any other populated node and moreover we explicitly ask to not warn for the allocation failure. Soften the warning only to cases when somebody asks for the given node explicitly by __GFP_THISNODE. Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-26net/mlx5: Use order-0 allocations for all WQ typesTariq Toukan1-4/+12
Complete the transition of all WQ types to use fragmented order-0 coherent memory instead of high-order allocations. CQ-WQ already uses order-0. Here we do the same for cyclic and linked-list WQs. This allows the driver to load cleanly on systems with a highly fragmented coherent memory. Performance tests: ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate of 64B packets, single transmit ring, size 8K. No degradation is sensed. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-26net/mlx5: Add cap bits for flow table destination in FDB tableChris Mi1-1/+3
If set, the FDB table supports the forward action with a destination list that includes a flow table. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25Merge tag 'mlx5e-updates-2018-05-19' of ↵David S. Miller3-0/+52
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-05-19 This series contains updates for mlx5e netdevice driver with one subject, DSCP to priority mapping, in the first patch Huy adds the needed API in dcbnl, the second patch adds the needed mlx5 core capability bits for the feature, and all other patches are mlx5e (netdev) only changes to add support for the feature. From: Huy Nguyen Dscp to priority mapping for Ethernet packet: These patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. This attribute combined with pfc attribute allows advanced user to fine tune the qos setting for specific priority queue. For example, user can give dedicated buffer for one or more priorities or user can give large buffer to certain priorities. The dcb buffer configuration will be controlled by lldptool. >> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6 >> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0 sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to choose dcbnl over devlink interface since this feature is intended to set port attributes which are governed by the netdev instance of that port, where devlink API is more suitable for global ASIC configurations. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qed*: Support drop action classificationManish Chopra2-0/+4
With this patch, User can configure for the supported flows to be dropped. Added a stat "gft_filter_drop" as well to be populated in ethtool for the dropped flows. For example - ethtool -N p5p1 flow-type udp4 dst-port 8000 action -1 ethtool -N p5p1 flow-type tcp4 scr-ip 192.168.8.1 action -1 Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25qed*: Support other classification modes.Manish Chopra1-0/+1
Currently, driver supports flow classification to PF receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip, src_port, dst_port] only. This patch enables to configure different flow profiles [For example - only UDP dest port or src_ip based] on the adapter so that classification can be done according to just those fields as well. Although, at a time just one type of flow configuration is supported due to limited number of flow profiles available on the device. For example - ethtool -N enp7s0f0 flow-type udp4 dst-port 45762 action 2 ethtool -N enp7s0f0 flow-type tcp4 src-ip 192.16.4.10 action 1 ethtool -N enp7s0f0 flow-type udp6 dst-port 45762 action 3 Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-1/+1
Daniel Borkmann says: ==================== pull-request: bpf 2018-05-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a bug in the original fix to prevent out of bounds speculation when multiple tail call maps from different branches or calls end up at the same tail call helper invocation, from Daniel. 2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in case of missing numa support and another one to update kernel config to properly support xdp_meta.sh test, from Anders. ... Would be great if you have a chance to merge net into net-next after that. The verifier fix would be needed later as a dependency in bpf-next for upcomig work there. When you do the merge there's a trivial conflict on BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper"): Resolution is to keep both functions, the do_refine_retval_range() and record_func_map(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25net: bridge: add support for port isolationNikolay Aleksandrov1-0/+1
This patch adds support for a new port flag - BR_ISOLATED. If it is set then isolated ports cannot communicate between each other, but they can still communicate with non-isolated ports. The same can be achieved via ACLs but they can't scale with large number of ports and also the complexity of the rules grows. This feature can be used to achieve isolated vlan functionality (similar to pvlan) as well, though currently it will be port-wide (for all vlans on the port). The new test in should_deliver uses data that is already cache hot and the new boolean is used to avoid an additional source port test in should_deliver. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25net: include hash policy in LAG changeupper infoJohn Hurley1-0/+11
LAG upper event notifiers contain the tx type used by the LAG device. Extend this to also include the hash policy used for tx types that utilize hashing. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-12/+54
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-05-24 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc). 2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers. 3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit. 4) Jiong Wang adds support for indirect and arithmetic shifts to NFP 5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible. 6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions. 7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT. 8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events. 9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25xdp: change ndo_xdp_xmit API to support bulkingJesper Dangaard Brouer1-5/+9
This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25xdp: add tracepoint for devmap like cpumap haveJesper Dangaard Brouer1-2/+4
Notice how this allow us get XDP statistic without affecting the XDP performance, as tracepoint is no-longer activated on a per packet basis. V5: Spotted by John Fastabend. Fix 'sent' also counted 'drops' in this patch, a later patch corrected this, but it was a mistake in this intermediate step. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25bpf: devmap introduce dev_map_enqueueJesper Dangaard Brouer1-3/+13
Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robot <lkp@intel.com> V5: Cleanups requested by Daniel - Newlines before func definition - Use BUILD_BUG_ON checks - Remove unnecessary use return value store in dev_map_enqueue Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25bpf: introduce bpf subcommand BPF_TASK_FD_QUERYYonghong Song1-0/+17
Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, if the <pid, fd> is associated with a tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25perf/core: add perf_get_event() to return perf_event given a struct fileYonghong Song1-0/+5
A new extern function, perf_get_event(), is added to return a perf event given a struct file. This function will be used in later patches. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25net/mlx5: PPTB and PBMC register firmware command supportHuy Nguyen2-0/+37
Add firmware command interface to read and write PPTB and PBMC registers. PPTB register enables mappings priority to a specific receive buffer. PBMC registers enables changing the receive buffer's configuration such as buffer size, xon/xoff thresholds, buffer's lossy property and buffer's shared property. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5: Add pbmc and pptb in the port_access_reg_cap_maskHuy Nguyen2-0/+15
Add pbmc and pptb in the port_access_reg_cap_mask. These two bits determine if device supports receive buffer configuration. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net: phy: replace bool members in struct phy_device with bit-fieldsHeiner Kallweit1-12/+11
In struct phy_device we have a number of flags being defined as type bool. Similar to e.g. struct pci_dev we can save some space by using bit-fields. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"Joonsoo Kim2-1/+3
This reverts the following commits that change CMA design in MM. 3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") 1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA") bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24bpf: properly enforce index mask to prevent out-of-bounds speculationDaniel Borkmann1-1/+1
While reviewing the verifier code, I recently noticed that the following two program variants in relation to tail calls can be loaded. Variant 1: # bpftool p d x i 15 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:5] 3: (05) goto pc+2 4: (18) r2 = map[id:6] 6: (b7) r3 = 7 7: (35) if r3 >= 0xa0 goto pc+2 8: (54) (u32) r3 &= (u32) 255 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 5 5: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B # bpftool m s i 6 6: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B Variant 2: # bpftool p d x i 20 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:8] 3: (05) goto pc+2 4: (18) r2 = map[id:7] 6: (b7) r3 = 7 7: (35) if r3 >= 0x4 goto pc+2 8: (54) (u32) r3 &= (u32) 3 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 8 8: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B # bpftool m s i 7 7: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B In both cases the index masking inserted by the verifier in order to control out of bounds speculation from a CPU via b2157399cc98 ("bpf: prevent out-of-bounds speculation") seems to be incorrect in what it is enforcing. In the 1st variant, the mask is applied from the map with the significantly larger number of entries where we would allow to a certain degree out of bounds speculation for the smaller map, and in the 2nd variant where the mask is applied from the map with the smaller number of entries, we get buggy behavior since we truncate the index of the larger map. The original intent from commit b2157399cc98 is to reject such occasions where two or more different tail call maps are used in the same tail call helper invocation. However, the check on the BPF_MAP_PTR_POISON is never hit since we never poisoned the saved pointer in the first place! We do this explicitly for map lookups but in case of tail calls we basically used the tail call map in insn_aux_data that was processed in the most recent path which the verifier walked. Thus any prior path that stored a pointer in insn_aux_data at the helper location was always overridden. Fix it by moving the map pointer poison logic into a small helper that covers both BPF helpers with the same logic. After that in fixup_bpf_calls() the poison check is then hit for tail calls and the program rejected. Latter only happens in unprivileged case since this is the *only* occasion where a rewrite needs to happen, and where such rewrite is specific to the map (max_entries, index_mask). In the privileged case the rewrite is generic for the insn->imm / insn->code update so multiple maps from different paths can be handled just fine since all the remaining logic happens in the instruction processing itself. This is similar to the case of map lookups: in case there is a collision of maps in fixup_bpf_calls() we must skip the inlined rewrite since this will turn the generic instruction sequence into a non- generic one. Thus the patch_call_imm will simply update the insn->imm location where the bpf_map_lookup_elem() will later take care of the dispatch. Given we need this 'poison' state as a check, the information of whether a map is an unpriv_array gets lost, so enforcing it prior to that needs an additional state. In general this check is needed since there are some complex and tail call intensive BPF programs out there where LLVM tends to generate such code occasionally. We therefore convert the map_ptr rather into map_state to store all this w/o extra memory overhead, and the bit whether one of the maps involved in the collision was from an unpriv_array thus needs to be retained as well there. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24ipv6: sr: Add seg6local action End.BPFMathieu Xhonneux1-0/+1
This patch adds the End.BPF action to the LWT seg6local infrastructure. This action works like any other seg6local End action, meaning that an IPv6 header with SRH is needed, whose DA has to be equal to the SID of the action. It will also advance the SRH to the next segment, the BPF program does not have to take care of this. Since the BPF program may not be a source of instability in the kernel, it is important to ensure that the integrity of the packet is maintained before yielding it back to the IPv6 layer. The hook hence keeps track if the SRH has been altered through the helpers, and re-validates its content if needed with seg6_validate_srh. The state kept for validation is stored in a per-CPU buffer. The BPF program is not allowed to directly write into the packet, and only some fields of the SRH can be altered through the helper bpf_lwt_seg6_store_bytes. Performances profiling has shown that the SRH re-validation does not induce a significant overhead. If the altered SRH is deemed as invalid, the packet is dropped. This validation is also done before executing any action through bpf_lwt_seg6_action, and will not be performed again if the SRH is not modified after calling the action. The BPF program may return 3 types of return codes: - BPF_OK: the End.BPF action will look up the next destination through seg6_lookup_nexthop. - BPF_REDIRECT: if an action has been executed through the bpf_lwt_seg6_action helper, the BPF program should return this value, as the skb's destination is already set and the default lookup should not be performed. - BPF_DROP : the packet will be dropped. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24bpf: Split lwt inout verifier structuresMathieu Xhonneux1-2/+2
The new bpf_lwt_push_encap helper should only be accessible within the LWT BPF IN hook, and not the OUT one, as this may lead to a skb under panic. At the moment, both LWT BPF IN and OUT share the same list of helpers, whose calls are authorized by the verifier. This patch separates the verifier ops for the IN and OUT hooks, and allows the IN hook to call the bpf_lwt_push_encap helper. This patch is also the occasion to put all lwt_*_func_proto functions together for clarity. At the moment, socks_op_func_proto is in the middle of lwt_inout_func_proto and lwt_xmit_func_proto. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2-7/+33
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal. 2) Add support for map lookups to numgen, random and hash expressions, from Laura Garcia. 3) Allow to register nat hooks for iptables and nftables at the same time. Patchset from Florian Westpha. 4) Timeout support for rbtree sets. 5) ip6_rpfilter works needs interface for link-local addresses, from Vincent Bernat. 6) Add nf_ct_hook and nf_nat_hook structures and use them. 7) Do not drop packets on packets raceing to insert conntrack entries into hashes, this is particularly a problem in nfqueue setups. 8) Address fallout from xt_osf separation to nf_osf, patches from Florian Westphal and Fernando Mancera. 9) Remove reference to struct nft_af_info, which doesn't exist anymore. From Taehee Yoo. This batch comes with is a conflict between 25fd386e0bc0 ("netfilter: core: add missing __rcu annotation") in your tree and 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") coming in this batch. This conflict can be solved by leaving the __rcu tag on __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code related to nf_nat_decode_session_hook - which is gone after 2c205dd3981f, as described by: diff --cc net/netfilter/core.c index e0ae4aae96f5,206fb2c4c319..168af54db975 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo EXPORT_SYMBOL_GPL(nf_ct_zone_dflt); #endif /* CONFIG_NF_CONNTRACK */ - static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max) -#ifdef CONFIG_NF_NAT_NEEDED -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *); -EXPORT_SYMBOL(nf_nat_decode_session_hook); -#endif - + static void __net_init + __netfilter_net_init(struct nf_hook_entries __rcu **e, int max) { int h; I can also merge your net-next tree into nf-next, solve the conflict and resend the pull request if you prefer so. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23gso: limit udp gso to egress-only virtual devicesWillem de Bruijn1-1/+0
Until the udp receive stack supports large packets (UDP GRO), GSO packets must not loop from the egress to the ingress path. Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual devices through NETIF_F_GSO_ENCAP_ALL as this included devices that may loop packets, such as veth and macvlan. Instead add it to specific devices that forward to another device's egress path, bonding and team. Fixes: 83aa025f535f ("udp: add gso support to virtual devices") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23net: add skeleton of bpfilter kernel moduleAlexei Starovoitov1-0/+15
bpfilter.ko consists of bpfilter_kern.c (normal kernel module code) and user mode helper code that is embedded into bpfilter.ko The steps to build bpfilter.ko are the following: - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file is converted into bpfilter_umh.o object file with _binary_net_bpfilter_bpfilter_umh_start and _end symbols Example: $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o 0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end 0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size 0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko bpfilter_kern.c is a normal kernel module code that calls the fork_usermode_blob() helper to execute part of its own data as a user mode process. Notice that _binary_net_bpfilter_bpfilter_umh_start - end is placed into .init.rodata section, so it's freed as soon as __init function of bpfilter.ko is finished. As part of __init the bpfilter.ko does first request/reply action via two unix pipe provided by fork_usermode_blob() helper to make sure that umh is healthy. If not it will kill it via pid. Later bpfilter_process_sockopt() will be called from bpfilter hooks in get/setsockopt() to pass iptable commands into umh via bpfilter.ko If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will kill umh as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23umh: introduce fork_usermode_blob() helperAlexei Starovoitov2-0/+13
Introduce helper: int fork_usermode_blob(void *data, size_t len, struct umh_info *info); struct umh_info { struct file *pipe_to_umh; struct file *pipe_from_umh; pid_t pid; }; that GPLed kernel modules (signed or unsigned) can use it to execute part of its own data as swappable user mode process. The kernel will do: - allocate a unique file in tmpfs - populate that file with [data, data + len] bytes - user-mode-helper code will do_execve that file and, before the process starts, the kernel will create two unix pipes for bidirectional communication between kernel module and umh - close tmpfs file, effectively deleting it - the fork_usermode_blob will return zero on success and populate 'struct umh_info' with two unix pipes and the pid of the user process As the first step in the development of the bpfilter project the fork_usermode_blob() helper is introduced to allow user mode code to be invoked from a kernel module. The idea is that user mode code plus normal kernel module code are built as part of the kernel build and installed as traditional kernel module into distro specified location, such that from a distribution point of view, there is no difference between regular kernel modules and kernel modules + umh code. Such modules can be signed, modprobed, rmmod, etc. The use of this new helper by a kernel module doesn't make it any special from kernel and user space tooling point of view. Such approach enables kernel to delegate functionality traditionally done by the kernel modules into the user space processes (either root or !root) and reduces security attack surface of the new code. The buggy umh code would crash the user process, but not the kernel. Another advantage is that umh code of the kernel module can be debugged and tested out of user space (e.g. opening the possibility to run clang sanitizers, fuzzers or user space test suites on the umh code). In case of the bpfilter project such architecture allows complex control plane to be done in the user space while bpf based data plane stays in the kernel. Since umh can crash, can be oom-ed by the kernel, killed by the admin, the kernel module that uses them (like bpfilter) needs to manage life time of umh on its own via two unix pipes and the pid of umh. The exit code of such kernel module should kill the umh it started, so that rmmod of the kernel module will cleanup the corresponding umh. Just like if the kernel module does kmalloc() it should kfree() it in the exit code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_infoMartin KaFai Lau1-2/+2
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23bpf: Expose check_uarg_tail_zero()Martin KaFai Lau1-0/+2
This patch exposes check_uarg_tail_zero() which will be reused by a later BTF patch. Its name is changed to bpf_check_uarg_tail_zero(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracksPablo Neira Ayuso1-0/+5
In nfqueue, two consecutive skbuffs may race to create the conntrack entry. Hence, the one that loses the race gets dropped due to clash in the insertion into the hashes from the nf_conntrack_confirm() path. This patch adds a new nf_conntrack_update() function which searches for possible clashes and resolve them. NAT mangling for the packet losing race is corrected by using the conntrack information that won race. In order to avoid direct module dependencies with conntrack and NAT, the nf_ct_hook and nf_nat_hook structures are used for this purpose. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: add struct nf_nat_hook and use itPablo Neira Ayuso1-5/+16
Move decode_session() and parse_nat_setup_hook() indirections to struct nf_nat_hook structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: add struct nf_ct_hook and use itPablo Neira Ayuso1-1/+6
Move the nf_ct_destroy indirection to the struct nf_ct_hook. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: lift one-nat-hook-only restrictionFlorian Westphal1-1/+0
This reverts commit f92b40a8b2645 ("netfilter: core: only allow one nat hook per hook point"), this limitation is no longer needed. The nat core now invokes these functions and makes sure that hook evaluation stops after a mapping is created and a null binding is created otherwise. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>