summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-04-24ipconfig: Create /proc/net/ipconfig directoryChris Novakovic1-0/+15
To allow ipconfig to report IP configuration details to user space processes without cluttering /proc/net, create a new subdirectory /proc/net/ipconfig. All files containing IP configuration details should be written to this directory. Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ipconfig: Correctly initialise ic_nameserversChris Novakovic1-0/+13
ic_nameservers, which stores the list of name servers discovered by ipconfig, is initialised (i.e. has all of its elements set to NONE, or 0xffffffff) by ic_nameservers_predef() in the following scenarios: - before the "ip=" and "nfsaddrs=" kernel command line parameters are parsed (in ip_auto_config_setup()); - before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in order to clear any values that may have been set after parsing "ip=" or "nfsaddrs=" and are no longer needed. This means that ic_nameservers_predef() is not called when neither "ip=" nor "nfsaddrs=" is specified on the kernel command line. In this scenario, every element in ic_nameservers remains set to 0x00000000, which is indistinguishable from ANY and causes pnp_seq_show() to write the following (bogus) information to /proc/net/pnp: #MANUAL nameserver 0.0.0.0 nameserver 0.0.0.0 nameserver 0.0.0.0 This is potentially problematic for systems that blindly link /etc/resolv.conf to /proc/net/pnp. Ensure that ic_nameservers is also initialised when neither "ip=" nor "nfsaddrs=" are specified by calling ic_nameservers_predef() in ip_auto_config(), but only when ip_auto_config_setup() was not called earlier. This causes the following to be written to /proc/net/pnp, and is consistent with what gets written when ipconfig is configured manually but no name servers are specified on the kernel command line: #MANUAL Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ipconfig: BOOTP: Request CONF_NAMESERVERS_MAX name serversChris Novakovic1-2/+4
When ipconfig is autoconfigured via BOOTP, the request packet initialised by ic_bootp_init_ext() always allocates 8 bytes for the name server option, limiting the BOOTP server to responding with at most 2 name servers even though ipconfig in fact supports an arbitrary number of name servers (as defined by CONF_NAMESERVERS_MAX, which is currently 3). Only request name servers in the request packet if CONF_NAMESERVERS_MAX is positive (to comply with [1, §3.8]), and allocate enough space in the packet for CONF_NAMESERVERS_MAX name servers to indicate the maximum number we can accept in response. [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions": https://tools.ietf.org/rfc/rfc2132.txt Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ipconfig: BOOTP: Don't request IEN-116 name serversChris Novakovic1-1/+1
When ipconfig is autoconfigured via BOOTP, the request packet initialised by ic_bootp_init_ext() allocates 8 bytes for tag 5 ("Name Server" [1, §3.7]), but tag 5 in the response isn't processed by ic_do_bootp_ext(). Instead, allocate the 8 bytes to tag 6 ("Domain Name Server" [1, §3.8]), which is processed by ic_do_bootp_ext(), and appears to have been the intended tag to request. This won't cause any breakage for existing users, as tag 5 responses provided by BOOTP servers weren't being processed anyway. [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions": https://tools.ietf.org/rfc/rfc2132.txt Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ipconfig: Tidy up reporting of name serversChris Novakovic1-8/+11
Commit 5e953778a2aab04929a5e7b69f53dc26e39b079e ("ipconfig: add nameserver IPs to kernel-parameter ip=") adds the IP addresses of discovered name servers to the summary printed by ipconfig when configuration is complete. It appears the intention in ip_auto_config() was to print the name servers on a new line (especially given the spacing and lack of comma before "nameserver0="), but they're actually printed on the same line as the NFS root filesystem configuration summary: [ 0.686186] IP-Config: Complete: [ 0.686226] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1 [ 0.686328] host=test, domain=example.com, nis-domain=(none) [ 0.686386] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath= nameserver0=10.0.0.1 This makes it harder to read and parse ipconfig's output. Instead, print the name servers on a separate line: [ 0.791250] IP-Config: Complete: [ 0.791289] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1 [ 0.791407] host=test, domain=example.com, nis-domain=(none) [ 0.791475] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath= [ 0.791476] nameserver0=10.0.0.1 Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24tcp: md5: only call tp->af_specific->md5_lookup() for md5 socketsEric Dumazet1-12/+14
RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive, given they have no result. We can omit the calls for sockets that have no md5 keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24Revert "net: init sk_cookie for inet socket"Yafang Shao1-7/+1
This reverts commit <c6849a3ac17e> ("net: init sk_cookie for inet socket") Per discussion with Eric, when update sock_net(sk)->cookie_gen, the whole cache cache line will be invalidated, as this cache line is shared with all cpus, that may cause great performace hit. Bellow is the data form Eric. "Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on my host" when running synflood test. Have to revert it to prevent from cache line false sharing. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24net: fib_rules: fix l3mdev netlink attr processingRoopa Prabhu1-0/+2
Fixes: b16fb418b1bf ("net: fib_rules: add extack support") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net/ipv6: Fix missing rcu dereferences on fromDavid Ahern1-5/+10
kbuild test robot reported 2 uses of rt->from not properly accessed using rcu_dereference: 1. add rcu_dereference_protected to rt6_remove_exception_rt and make sure it is always called with rcu lock held. 2. change rt6_do_redirect to take a reference on 'from' when accessed the first time so it can be used the sceond time outside of the lock Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net/ipv6: add rcu locking to ip6_negative_adviceDavid Ahern1-0/+2
syzbot reported a suspicious rcu_dereference_check: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4592 rt6_check_expired+0x38b/0x3e0 net/ipv6/route.c:410 ip6_negative_advice+0x67/0xc0 net/ipv6/route.c:2204 dst_negative_advice include/net/sock.h:1786 [inline] sock_setsockopt+0x138f/0x1fe0 net/core/sock.c:1051 __sys_setsockopt+0x2df/0x390 net/socket.c:1899 SYSC_setsockopt net/socket.c:1914 [inline] SyS_setsockopt+0x34/0x50 net/socket.c:1911 Add rcu locking around call to rt6_check_expired in ip6_negative_advice. Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: syzbot+2422c9e35796659d2273@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net: init sk_cookie for inet socketYafang Shao1-1/+7
With sk_cookie we can identify a socket, that is very helpful for traceing and statistic, i.e. tcp tracepiont and ebpf. So we'd better init it by default for inet socket. When using it, we just need call atomic64_read(&sk->sk_cookie). Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net: fib_rules: add extack supportRoopa Prabhu6-21/+61
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23fib_rules: move common handling of newrule delrule msgs into fib_nl2ruleRoopa Prabhu1-244/+192
This reduces code duplication in the fib rule add and del paths. Get rid of validate_rulemsg. This became obvious when adding duplicate extack support in fib newrule/delrule error paths. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net: introduce a new tracepoint for tcp_rcv_space_adjustYafang Shao1-0/+2
tcp_rcv_space_adjust is called every time data is copied to user space, introducing a tcp tracepoint for which could show us when the packet is copied to user. When a tcp packet arrives, tcp_rcv_established() will be called and with the existed tracepoint tcp_probe we could get the time when this packet arrives. Then this packet will be copied to user, and tcp_rcv_space_adjust will be called and with this new introduced tracepoint we could get the time when this packet is copied to user. With these two tracepoints, we could figure out whether the user program processes this packet immediately or there's latency. Hence in the printk message, sk_cookie is printed as a key to relate tcp_rcv_space_adjust with tcp_probe. Maybe we could export sockfd in this new tracepoint as well, then we could relate this new tracepoint with epoll/read/recv* tracepoints, and finally that could show us the whole lifespan of this packet. But we could also implement that with pid as these functions are executed in process context. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller44-251/+457
Conflicts were simple overlapping changes in microchip driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Remove unncessary check on f6i in fib6_checkDavid Ahern1-2/+1
Dan reported an imbalance in fib6_check on use of f6i and checking whether it is null. Since fib6_check is only called if f6i is non-null, remove the unnecessary check. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Make from in rt6_info rcu protectedDavid Ahern3-31/+82
When a dst entry is created from a fib entry, the 'from' in rt6_info is set to the fib entry. The 'from' reference is used most notably for cookie checking - making sure stale dst entries are updated if the fib entry is changed. When a fib entry is deleted, the pcpu routes on it are walked releasing the fib6_info reference. This is needed for the fib6_info cleanup to happen and to make sure all device references are released in a timely manner. There is a race window when a FIB entry is deleted and the 'from' on the pcpu route is dropped and the pcpu route hits a cookie check. Handle this race using rcu on from. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Move release of fib6_info from pcpu routes to helperDavid Ahern1-18/+23
Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Move rcu locking to callers of fib6_get_cookie_safeDavid Ahern1-3/+10
A later patch protects 'from' in rt6_info and this simplifies the locking needed by it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Move rcu_read_lock to callers of ip6_rt_cache_allocDavid Ahern1-7/+6
A later patch protects 'from' in rt6_info and this simplifies the locking needed by it. With the move, the fib6_info_hold for the uncached_rt is no longer needed since the rcu_lock is still held. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Rename rt6_get_cookie_safeDavid Ahern1-2/+2
rt6_get_cookie_safe takes a fib6_info and checks the sernum of the node. Update the name to reflect its purpose. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21net/ipv6: Clean up rt expires helpersDavid Ahern1-0/+9
rt6_clean_expires and rt6_set_expires are no longer used. Removed them. rt6_update_expires has 1 caller in route.c, so move it from the header. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller3-12/+42
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-04-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Initial work on BPF Type Format (BTF) is added, which is a meta data format which describes the data types of BPF programs / maps. BTF has its roots from CTF (Compact C-Type format) with a number of changes to it. First use case is to provide a generic pretty print capability for BPF maps inspection, later work will also add BTF to bpftool. pahole support to convert dwarf to BTF will be upstreamed as well (https://github.com/iamkafai/pahole/tree/btf), from Martin. 2) Add a new xdp_bpf_adjust_tail() BPF helper for XDP that allows for changing the data_end pointer. Only shrinking is currently supported which helps for crafting ICMP control messages. Minor changes in drivers have been added where needed so they recalc the packet's length also when data_end was adjusted, from Nikita. 3) Improve bpftool to make it easier to feed hex bytes via cmdline for map operations, from Quentin. 4) Add support for various missing BPF prog types and attach types that have been added to kernel recently but neither to bpftool nor libbpf yet. Doc and bash completion updates have been added as well for bpftool, from Andrey. 5) Proper fix for avoiding to leak info stored in frame data on page reuse for the two bpf_xdp_adjust_{head,meta} helpers by disallowing to move the pointers into struct xdp_frame area, from Jesper. 6) Follow-up compile fix from BTF in order to include stdbool.h in libbpf, from Björn. 7) Few fixes in BPF sample code, that is, a typo on the netdevice in a comment and fixup proper dump of XDP action code in the tracepoint exception, from Wang and Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds28-138/+218
Pull networking fixes from David Miller: 1) Unbalanced refcounting in TIPC, from Jon Maloy. 2) Only allow TCP_MD5SIG to be set on sockets in close or listen state. Once the connection is established it makes no sense to change this. From Eric Dumazet. 3) Missing attribute validation in neigh_dump_table(), also from Eric Dumazet. 4) Fix address comparisons in SCTP, from Xin Long. 5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller. 6) Fix tunnel refcounting in l2tp, from Guillaume Nault. 7) Fix double list insert in team driver, from Paolo Abeni. 8) af_vsock.ko module was accidently made unremovable, from Stefan Hajnoczi. 9) Fix reference to freed llc_sap object in llc stack, from Cong Wang. 10) Don't assume netdevice struct is DMA'able memory in virtio_net driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) net/smc: fix shutdown in state SMC_LISTEN bnxt_en: Fix memory fault in bnxt_ethtool_init() virtio_net: sparse annotation fix virtio_net: fix adding vids on big-endian virtio_net: split out ctrl buffer net: hns: Avoid action name truncation docs: ip-sysctl.txt: fix name of some ipv6 variables vmxnet3: fix incorrect dereference when rxvlan is disabled llc: hold llc_sap before release_sock() MAINTAINERS: Direct networking documentation changes to netdev atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit" net: qmi_wwan: add Wistron Neweb D19Q1 net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN" net: stmmac: Disable ACS Feature for GMAC >= 4 net: mvpp2: Fix DMA address mask size net: change the comment of dev_mc_init net: qualcomm: rmnet: Fix warning seen with fill_info tun: fix vlan packet truncation tipc: fix infinite loop when dumping link monitor summary tipc: fix use-after-free in tipc_nametbl_stop ...
2018-04-20net/ipv6: Fix ip6_convert_metrics() bugEric Dumazet1-11/+9
If ip6_convert_metrics() fails to allocate memory, it should not overwrite rt->fib6_metrics or we risk a crash later as syzbot found. BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: null-ptr-deref in refcount_sub_and_test+0x92/0x330 lib/refcount.c:179 Read of size 4 at addr 0000000000000044 by task syzkaller832429/4487 CPU: 1 PID: 4487 Comm: syzkaller832429 Not tainted 4.16.0+ #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 kasan_report_error mm/kasan/report.c:352 [inline] kasan_report.cold.7+0x6d/0x2fe mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_sub_and_test+0x92/0x330 lib/refcount.c:179 refcount_dec_and_test+0x1a/0x20 lib/refcount.c:212 fib6_info_destroy+0x2d0/0x3c0 net/ipv6/ip6_fib.c:206 fib6_info_release include/net/ip6_fib.h:304 [inline] ip6_route_info_create+0x677/0x3240 net/ipv6/route.c:3020 ip6_route_add+0x23/0xb0 net/ipv6/route.c:3030 inet6_rtm_newroute+0x142/0x160 net/ipv6/route.c:4406 rtnetlink_rcv_msg+0x466/0xc10 net/core/rtnetlink.c:4648 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4666 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 ___sys_sendmsg+0x805/0x940 net/socket.c:2117 __sys_sendmsg+0x115/0x270 net/socket.c:2155 SYSC_sendmsg net/socket.c:2164 [inline] SyS_sendmsg+0x29/0x30 net/socket.c:2162 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20tipc: confgiure and apply UDP bearer MTU on running linksGhantaKrishnamurthy MohanKrishna3-5/+25
Currently, we have option to configure MTU of UDP media. The configured MTU takes effect on the links going up after that moment. I.e, a user has to reset bearer to have new value applied across its links. This is confusing and disturbing on a running cluster. We now introduce the functionality to change the default UDP bearer MTU in struct tipc_bearer. Additionally, the links are updated dynamically, without any need for a reset, when bearer value is changed. We leverage the existing per-link functionality and the design being symetrical to the confguration of link tolerance. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20tipc: implement configuration of UDP media MTUGhantaKrishnamurthy MohanKrishna3-0/+30
In previous commit, we changed the default emulated MTU for UDP bearers to 14k. This commit adds the functionality to set/change the default value by configuring new MTU for UDP media. UDP bearer(s) have to be disabled and enabled back for the new MTU to take effect. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20tipc: set default MTU for UDP mediaGhantaKrishnamurthy MohanKrishna1-2/+2
Currently, all bearers are configured with MTU value same as the underlying L2 device. However, in case of bearers with media type UDP, higher throughput is possible with a fixed and higher emulated MTU value than adapting to the underlying L2 MTU. In this commit, we introduce a parameter mtu in struct tipc_media and a default value is set for UDP. A default value of 14k was determined by experimentation and found to have a higher throughput than 16k. MTU for UDP bearers are assigned the above set value of media MTU. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/smc: fix shutdown in state SMC_LISTENUrsula Braun1-6/+4
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket crashes, because commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") releases the internal clcsock in smc_close_active() and sets smc->clcsock to NULL. For SHUT_RD the smc_close_active() call is removed. For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the clcsock is already released. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Fix gfp_flags arg to addrconf_prefix_routeDavid Ahern1-2/+2
Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so the gfp argument to addrconf_prefix_route can not be GFP_KERNEL. While scrubbing other calls I noticed addrconf_addr_gen has one place with GFP_ATOMIC that can be GFP_KERNEL. Fixes: acb54e3cba404 ("net/ipv6: Add gfp_flags to route add functions") Reported-by: syzbot+2add39b05179b31f912f@syzkaller.appspotmail.com Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Remove fib6_idevDavid Ahern2-21/+47
fib6_idev can be obtained from __in6_dev_get on the nexthop device rather than caching it in the fib6_info. Remove it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Remove unnecessary checks on fib6_idevDavid Ahern1-22/+2
Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") host routes and anycast routes were installed with the device set to loopback (or VRF device once that feature was added). In the older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev was used to denote the actual interface. Commit 4832c30d5458 changed the code to have dst.dev pointing to the real device with the switch to lo or vrf device done on dst clones. As a consequence of this change a couple of device checks during route lookups are no longer needed. Remove them. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Remove aca_idevDavid Ahern2-5/+2
aca_idev has only 1 user - inet6_fill_ifacaddr - and it only wants the device index which can be extracted from the fib6_info nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Rename addrconf_dst_allocDavid Ahern3-43/+43
addrconf_dst_alloc now returns a fib6_info. Update the name and its users to reflect the change. Rename only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net/ipv6: Rename fib6_info struct elementsDavid Ahern5-238/+238
Change the prefix for fib6_info struct elements from rt6i_ to fib6_. rt6i_pcpu and rt6i_exception_bucket are left as is given that they point to rt6_info entries. Rename only; not functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19llc: hold llc_sap before release_sock()Cong Wang1-0/+7
syzbot reported we still access llc->sap in llc_backlog_rcv() after it is freed in llc_sap_remove_socket(): Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785 llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] llc_conn_service net/llc/llc_conn.c:400 [inline] llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75 llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x12f/0x3a0 net/core/sock.c:2335 release_sock+0xa4/0x2b0 net/core/sock.c:2850 llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204 llc->sap is refcount'ed and llc_sap_remove_socket() is paired with llc_sap_add_socket(). This can be amended by holding its refcount before llc_sap_remove_socket() and releasing it after release_sock(). Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friendsEric Dumazet1-0/+14
After working on IP defragmentation lately, I found that some large packets defeat CHECKSUM_COMPLETE optimization because of NIC adding zero paddings on the last (small) fragment. While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed to CHECKSUM_NONE, forcing a full csum validation, even if all prior fragments had CHECKSUM_COMPLETE set. We can instead compute the checksum of the part we are trimming, usually smaller than the part we keep. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"Colin Ian King1-1/+1
Trivial fix to spelling mistake Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19tcp: export packets delivery infoYuchung Cheng3-2/+13
Export data delivered and delivered with CE marks to 1) SNMP TCPDelivered and TCPDeliveredCE 2) getsockopt(TCP_INFO) 3) Timestamping API SOF_TIMESTAMPING_OPT_STATS Note that for SCM_TSTAMP_ACK, the delivery info in SOF_TIMESTAMPING_OPT_STATS is reported before the info was fully updated on the ACK. These stats help application monitor TCP delivery and ECN status on per host, per connection, even per message level. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19tcp: track total bytes delivered with ECN CE marksYuchung Cheng2-0/+3
Introduce a new delivered_ce stat in tcp socket to estimate number of packets being marked with CE bits. The estimation is done via ACKs with ECE bit. Depending on the actual receiver behavior, the estimation could have biases. Since the TCP sender can't really see the CE bit in the data path, so the sender is technically counting packets marked delivered with the "ECE / ECN-Echo" flag set. With RFC3168 ECN, because the ECE bit is sticky, this count can drastically overestimate the nummber of CE-marked data packets With DCTCP-style ECN this should be reasonably precise unless there is loss in the ACK path, in which case it's not precise. With AccECN proposal this can be made still more precise, even in the case some degree of ACK loss. However this is sender's best estimate of CE information. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19tcp: new helper to calculate newly deliveredYuchung Cheng1-2/+15
Add new helper tcp_newly_delivered() to prepare the ECN accounting change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19tcp: better delivery accounting for SYN-ACK and SYN-dataYuchung Cheng1-3/+7
the tcp_sock:delivered has inconsistent accounting for SYN and FIN. 1. it counts pure FIN 2. it counts pure SYN 3. it counts SYN-data twice 4. it does not count SYN-ACK For congestion control perspective it does not matter much as C.C. only cares about the difference not the aboslute value. But the next patch would export this field to user-space so it's better to report the absolute value w/o these caveats. This patch counts SYN, SYN-ACK, or SYN-data delivery once always in the "delivered" field. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: change the comment of dev_mc_initsunlianwen1-1/+1
The comment of dev_mc_init() is wrong. which use dev_mc_flush instead of dev_mc_init. Signed-off-by: Lianwen Sun <sunlw.fnst@cn.fujitsu.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19bpf: reserve xdp_frame size in xdp headroomJesper Dangaard Brouer1-9/+3
Commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse") tried to allow user/bpf_prog to (re)use area used by xdp_frame (stored in frame headroom), by memset clearing area when bpf_xdp_adjust_head give bpf_prog access to headroom area. The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta into account. (2) a combination of bpf_xdp_adjust_head calls, where xdp->data is moved into xdp_frame section, can cause clearing xdp_frame area again for area previously granted to bpf_prog. After discussions with Daniel, we choose to implement a simpler solution to the problem, which is to reserve the headroom used by xdp_frame info. This also avoids the situation where bpf_prog is allowed to adjust/add headers, and then XDP_REDIRECT later drops the packet due to lack of headroom for the xdp_frame. This would likely confuse the end-user. Fixes: 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19ipv6: frags: fix a lockdep false positiveEric Dumazet1-11/+12
lockdep does not know that the locks used by IPv4 defrag and IPv6 reassembly units are of different classes. It complains because of following chains : 1) sch_direct_xmit() (lock txq->_xmit_lock) dev_hard_start_xmit() xmit_one() dev_queue_xmit_nit() packet_rcv_fanout() ip_check_defrag() ip_defrag() spin_lock() (lock frag queue spinlock) 2) ip6_input_finish() ipv6_frag_rcv() (lock frag queue spinlock) ip6_frag_queue() icmpv6_param_prob() (lock txq->_xmit_lock at some point) We could add lockdep annotations, but we also can make sure IPv6 calls icmpv6_param_prob() only after the release of the frag queue spinlock, since this naturally makes frag queue spinlock a leaf in lock hierarchy. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19hv_netvsc: propogate Hyper-V friendly name into interface aliasStephen Hemminger1-0/+1
This patch implement the 'Device Naming' feature of the Hyper-V network device API. In Hyper-V on the host through the GUI or PowerShell it is possible to enable the device naming feature which causes the host to make available to the guest the name of the device. This shows up in the RNDIS protocol as the friendly name. The name has no particular meaning and is limited to 256 characters. The value can only be set via PowerShell on the host, but could be scripted for mass deployments. The default value is the string 'Network Adapter' and since that is the same for all devices and useless, the driver ignores it. In Windows, the value goes into a registry key for use in SNMP ifAlias. For Linux, this patch puts the value in the network device alias property; where it is visible in ip tools and SNMP. The host provided ifAlias is just a suggestion, and can be overridden by later ip commands. Also requires exporting dev_set_alias in netdev core. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19bpf: making bpf_prog_test run aware of possible data_end ptr changeNikita V. Shirokov1-1/+2
after introduction of bpf_xdp_adjust_tail helper packet length could be changed not only if xdp->data pointer has been changed but xdp->data_end as well. making bpf_prog_test_run aware of this possibility Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19bpf: make generic xdp compatible w/ bpf_xdp_adjust_tailNikita V. Shirokov1-1/+9
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for generic XDP we need to reflect this packet's length change by adjusting skb's tail pointer Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19bpf: adding bpf_xdp_adjust_tail helperNikita V. Shirokov1-1/+28
Adding new bpf helper which would allow us to manipulate xdp's data_end pointer, and allow us to reduce packet's size indended use case: to generate ICMP messages from XDP context, where such message would contain truncated original packet. Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18tipc: fix infinite loop when dumping link monitor summaryTung Nguyen2-8/+5
When configuring the number of used bearers to MAX_BEARER and issuing command "tipc link monitor summary", the command enters infinite loop in user space. This issue happens because function tipc_nl_node_dump_monitor() returns the wrong 'prev_bearer' value when all potential monitors have been scanned. The correct behavior is to always try to scan all monitors until either the netlink message is full, in which case we return the bearer identity of the affected monitor, or we continue through the whole bearer array until we can return MAX_BEARERS. This solution also caters for the case where there may be gaps in the bearer array. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>