summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-04-25ipv6: Revert optional address flusing on ifdown.David S. Miller1-150/+12
This reverts the following three commits: 70af921db6f8835f4b11c65731116560adb00c14 799977d9aafbf0ca0b9c39b04cbfb16db71302c9 f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac The feature was ill conceived, has terrible semantics, and has added nothing but regressions to the already fragile ipv6 stack. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25wireless: use nla_put_u64_64bit()Nicolas Dichtel1-36/+55
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25netfilter/ipvs: use nla_put_u64_64bit()Nicolas Dichtel1-12/+24
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25ieee802154: use nla_put_u64_64bit()Nicolas Dichtel2-7/+13
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25l2tp: use nla_put_u64_64bit()Nicolas Dichtel1-32/+48
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25bridge: use nla_put_u64_64bit()Nicolas Dichtel1-26/+36
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25ovs: use nla_put_u64_64bit()Nicolas Dichtel1-1/+2
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25ipv6: use nla_put_u64_64bit()Nicolas Dichtel2-7/+11
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25sched: use nla_put_u64_64bit()Nicolas Dichtel3-5/+10
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25rtnl: use nla_put_u64_64bit()Nicolas Dichtel1-18/+18
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25soreuseport: Resolve merge conflict for v4/v6 ordering fixCraig Gallek1-1/+5
d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") was merged as a bug fix to the net tree. Two conflicting changes were committed to net-next before the above fix was merged back to net-next: ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") These changes switched the datastructure used for TCP and UDP sockets from hlist_nulls to hlist. This patch applies the necessary parts of the net tree fix to net-next which were not automatic as part of the merge. Fixes: 1602f49b58ab ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25ipv4/fib: don't warn when primary address is missing if in_dev is deadPaolo Abeni1-1/+5
After commit fbd40ea0180a ("ipv4: Don't do expensive useless work during inetdev destroy.") when deleting an interface, fib_del_ifaddr() can be executed without any primary address present on the dead interface. The above is safe, but triggers some "bug: prim == NULL" warnings. This commit avoids warning if the in_dev is dead Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24tcp-tso: do not split TSO packets at retransmit timeEric Dumazet3-38/+32
Linux TCP stack painfully segments all TSO/GSO packets before retransmits. This was fine back in the days when TSO/GSO were emerging, with their bugs, but we believe the dark age is over. Keeping big packets in write queues, but also in stack traversal has a lot of benefits. - Less memory overhead, because write queues have less skbs - Less cpu overhead at ACK processing. - Better SACK processing, as lot of studies mentioned how awful linux was at this ;) - Less cpu overhead to send the rtx packets (IP stack traversal, netfilter traversal, drivers...) - Better latencies in presence of losses. - Smaller spikes in fq like packet schedulers, as retransmits are not constrained by TCP Small Queues. 1 % packet losses are common today, and at 100Gbit speeds, this translates to ~80,000 losses per second. Losses are often correlated, and we see many retransmit events leading to 1-MSS train of packets, at the time hosts are already under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24tipc: fix stale links after re-enabling bearerParthasarathy Bhuvaragan1-2/+1
Commit 42b18f605fea ("tipc: refactor function tipc_link_timeout()"), introduced a bug which prevents sending of probe messages during link synchronization phase. This leads to hanging links, if the bearer is disabled/enabled after links are up. In this commit, we send the probe messages correctly. Fixes: 42b18f605fea ("tipc: refactor function tipc_link_timeout()") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24bridge: mdb: Marking port-group as offloadedElad Raz3-32/+71
There is a race-condition when updating the mdb offload flag without using the mulicast_lock. This reverts commit 9e8430f8d60d98 ("bridge: mdb: Passing the port-group pointer to br_mdb module"). This patch marks offloaded MDB entry as "offload" by changing the port- group flags and marks it as MDB_PG_FLAGS_OFFLOAD. When switchdev PORT_MDB succeeded and adds a multicast group, a completion callback is been invoked "br_mdb_complete". The completion function locks the multicast_lock and finds the right net_bridge_port_group and marks it as offloaded. Fixes: 9e8430f8d60d98 ("bridge: mdb: Passing the port-group pointer to br_mdb module") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24bridge: mdb: Common function for mdb entry translationElad Raz1-18/+15
There is duplicate code that translates br_mdb_entry to br_ip let's wrap it in a common function. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24switchdev: Adding complete operation to deferred switchdev opsElad Raz1-0/+6
When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation is executed in different context and the application doesn't have any way to get the operation real status. Adding a completion callback fixes that. This patch adds fields to switchdev_attr and switchdev_obj "complete_priv" field which is used by the "complete" callback. Application can set a complete function which will be called once the operation executed. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24tcp: Merge txstamp_ack in tcp_skb_collapse_tstampMartin KaFai Lau1-0/+2
When collapsing skbs, txstamp_ack also needs to be merged. Retrans Collapse Test: ~~~~~~ 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 write(4, ..., 11680) = 11680 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 13141 win 257 BPF Output Before: ~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~ <...>-2027 [007] d.s. 79.765921: : ee_data:1459 Sacks Collapse Test: ~~~~~ 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 BPF Output Before: ~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~ <...>-2049 [007] d.s. 89.185538: : ee_data:14599 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24tcp: Carry txstamp_ack in tcp_fragment_tstampMartin KaFai Lau1-0/+2
When a tcp skb is sliced into two smaller skbs (e.g. in tcp_fragment() and tso_fragment()), it does not carry the txstamp_ack bit to the newly created skb if it is needed. The end result is a timestamping event (SCM_TSTAMP_ACK) will be missing from the sk->sk_error_queue. This patch carries this bit to the new skb2 in tcp_fragment_tstamp(). BPF Output Before: ~~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~~ <...>-2050 [000] d.s. 100.928763: : ee_data:14599 Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 14600) = 14600 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 > . 1:7301(7300) ack 1 0.200 > P. 7301:14601(7300) ack 1 0.300 < . 1:1(0) ack 14601 win 257 0.300 close(4) = 0 0.300 > F. 14601:14601(0) ack 1 0.400 < F. 1:1(0) ack 16062 win 257 0.400 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller11-814/+578
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, mostly from Florian Westphal to sort out the lack of sufficient validation in x_tables and connlabel preparation patches to add nf_tables support. They are: 1) Ensure we don't go over the ruleset blob boundaries in mark_source_chains(). 2) Validate that target jumps land on an existing xt_entry. This extra sanitization comes with a performance penalty when loading the ruleset. 3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables. 4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables. 5) Make sure the minimal possible target size in x_tables. 6) Similar to #3, add xt_compat_check_entry_offsets() for compat code. 7) Check that standard target size is valid. 8) More sanitization to ensure that the target_offset field is correct. 9) Add xt_check_entry_match() to validate that matches are well-formed. 10-12) Three patch to reduce the number of parameters in translate_compat_table() for {arp,ip,ip6}tables by using a container structure. 13) No need to return value from xt_compat_match_from_user(), so make it void. 14) Consolidate translate_table() so it can be used by compat code too. 15) Remove obsolete check for compat code, so we keep consistent with what was already removed in the native layout code (back in 2007). 16) Get rid of target jump validation from mark_source_chains(), obsoleted by #2. 17) Introduce xt_copy_counters_from_user() to consolidate counter copying, and use it from {arp,ip,ip6}tables. 18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump functions. 19) Move nf_connlabel_match() to xt_connlabel. 20) Skip event notification if connlabel did not change. 21) Update of nf_connlabels_get() to make the upcoming nft connlabel support easier. 23) Remove spinlock to read protocol state field in conntrack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24xfrm: align nlattr properly when neededNicolas Dichtel1-4/+6
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24libnl: nla_put_msecs(): align on a 64-bit areaNicolas Dichtel3-12/+16
nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24libnl: nla_put_be64(): align on a 64-bit areaNicolas Dichtel11-34/+60
nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24libnl: nla_put_le64(): align on a 64-bit areaNicolas Dichtel1-5/+8
nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller31-136/+306
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds29-133/+298
Pull networking fixes from David Miller: 1) Fix memory leak in iwlwifi, from Matti Gottlieb. 2) Add missing registration of netfilter arp_tables into initial namespace, from Florian Westphal. 3) Fix potential NULL deref in DecNET routing code. 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry Ivanov. 5) Fix dst ref counting in VRF, from David Ahern. 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck. 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause. 8) Ravalidate IPV6 datagram socket cached routes properly, particularly with UDP, from Martin KaFai Lau. 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang. 10) Fix stats typing in bcmgenet driver, from Eric Dumazet. 11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing, from Joe Stringer. 12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown. 13) atl2 doesn't actually support scatter-gather, so don't advertise the feature. From Ben Hucthings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits) openvswitch: use flow protocol when recalculating ipv6 checksums Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets atl2: Disable unimplemented scatter/gather feature net/mlx4_en: Split SW RX dropped counter per RX ring net/mlx4_core: Don't allow to VF change global pause settings net/mlx4_core: Avoid repeated calls to pci enable/disable net/mlx4_core: Implement pci_resume callback net: phy: spi_ks8895: Don't leak references to SPI devices net: ethernet: davinci_emac: Fix platform_data overwrite net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable qede: Fix single MTU sized packet from firmware GRO flow qede: Fix setting Skb network header qede: Fix various memory allocation error flows for fastpath tcp: Merge tx_flags and tskey in tcp_shifted_skb tcp: Merge tx_flags and tskey in tcp_collapse_retrans drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks openvswitch: Orphan skbs before IPv6 defrag Revert "Prevent NUll pointer dereference with two PHYs on cpsw" VSOCK: Only check error on skb_recv_datagram when skb is NULL ...
2016-04-21openvswitch: use flow protocol when recalculating ipv6 checksumsSimon Horman1-2/+2
When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the flow key rather than that in the set field action. Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.") Cc: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: Add support for IP ID mangling TSO in cases that require encapsulationAlexander Duyck1-0/+11
This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports NETIF_F_TSO. This way if needed a device can then later enable the TSO with IP ID mangling and the tunnels on top of that device can then also make use of the IP ID mangling as well. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Merge tx_flags and tskey in tcp_shifted_skbMartin KaFai Lau2-2/+3
After receiving sacks, tcp_shifted_skb() will collapse skbs if possible. tx_flags and tskey also have to be merged. This patch reuses the tcp_skb_collapse_tstamp() to handle them. BPF Output Before: ~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~ <...>-2024 [007] d.s. 88.644374: : ee_data:14599 Packetdrill Script: ~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Merge tx_flags and tskey in tcp_collapse_retransMartin KaFai Lau1-0/+16
If two skbs are merged/collapsed during retransmission, the current logic does not merge the tx_flags and tskey. The end result is the SCM_TSTAMP_ACK timestamp could be missing for a packet. The patch: 1. Merge the tx_flags 2. Overwrite the prev_skb's tskey with the next_skb's tskey BPF Output Before: ~~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~~ packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459 Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 write(4, ..., 11680) = 11680 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 13141 win 257 0.400 close(4) = 0 0.400 > F. 13141:13141(0) ack 1 0.500 < F. 1:1(0) ack 13142 win 257 0.500 > . 13142:13142(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21ip6mr: align RTA_MFC_STATS on 64-bitNicolas Dichtel1-2/+2
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21ipmr: align RTA_MFC_STATS on 64-bitNicolas Dichtel1-2/+2
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21rtnl: use the new API to align IFLA_STATS*Nicolas Dichtel1-17/+5
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21NLA_BINARY misuse bug in HSRPeter Heise1-4/+3
Removed .type field from NLA to do proper length checking. Reported by Daniel Borkmann and Julia Lawall. Signed-off-by: Peter Heise <peter.heise@airbus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diagXin Long2-10/+8
EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates back to before jiffies_to_msecs() has been introduced. Now we can remove it and use jiffies_to_msecs(). Suggested-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acksMartin KaFai Lau1-1/+2
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received, it could incorrectly think that a skb has already been acked and queue a SCM_TSTAMP_ACK cmsg to the sk->sk_error_queue. In tcp_ack_tstamp(), it checks 'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'. If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill script, between() returns true but the tskey is actually not acked. e.g. try between(3, 2, 1). The fix is to replace between() with one before() and one !before(). By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be removed. A packetdrill script is used to reproduce the dup ack scenario. Due to the lacking cmsg support in packetdrill (may be I cannot find it), a BPF prog is used to kprobe to sock_queue_err_skb() and print out the value of serr->ee.ee_data. Both the packetdrill and the bcc BPF script is attached at the end of this commit message. BPF Output Before Fix: ~~~~~~ <...>-2056 [001] d.s. 433.927987: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.929563: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 433.930765: : ee_data:1459 #incorrect packetdrill-2056 [001] d.s. 434.028177: : ee_data:1459 packetdrill-2056 [001] d.s. 434.029686: : ee_data:14599 BPF Output After Fix: ~~~~~~ <...>-2049 [000] d.s. 113.517039: : ee_data:1459 <...>-2049 [000] d.s. 113.517253: : ee_data:14599 BCC BPF Script: ~~~~~~ #!/usr/bin/env python from __future__ import print_function from bcc import BPF bpf_text = """ #include <uapi/linux/ptrace.h> #include <net/sock.h> #include <bcc/proto.h> #include <linux/errqueue.h> #ifdef memset #undef memset #endif int trace_err_skb(struct pt_regs *ctx) { struct sk_buff *skb = (struct sk_buff *)ctx->si; struct sock *sk = (struct sock *)ctx->di; struct sock_exterr_skb *serr; u32 ee_data = 0; if (!sk || !skb) return 0; serr = SKB_EXT_ERR(skb); bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data); bpf_trace_printk("ee_data:%u\\n", ee_data); return 0; }; """ b = BPF(text=bpf_text) b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb") print("Attached to kprobe") b.trace_print() Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 1460) = 1460 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net: dsa: remove tag_protocol from dsa_switchVivien Didelot1-3/+2
Having the tag protocol in dsa_switch_driver for setup time and in dsa_switch_tree for runtime is enough. Remove dsa_switch's one. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21openvswitch: Orphan skbs before IPv6 defragJoe Stringer1-0/+1
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()"). Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be cloned (implicitly orphaning) prior to queueing for reassembly. As such, when the IPv6 message is eventually reassembled, the skb->sk for all fragments would be NULL. After that commit was introduced, rather than cloning, the original skbs were queued directly without orphaning. The end result is that all frags except for the first and last may have a socket attached. This commit explicitly orphans such skbs during nf_ct_frag6_gather() to prevent BUG_ON(skb->sk) during a later call to ip6_fragment(). kernel BUG at net/ipv6/ip6_output.c:631! [...] Call Trace: <IRQ> [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch] [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70 [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch] [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20 [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80 [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch] [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch] [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch] [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch] [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch] [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch] [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch] [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch] [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch] [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660 [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0 [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950 [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff81690990>] dev_queue_xmit+0x10/0x20 [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220 [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0 [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0 [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0 [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500 [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580 [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690 [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690 [<ffffffff817567a0>] ip6_input+0x30/0xa0 [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0 [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0 [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0 [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0 [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0 [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230 [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70 [<ffffffff8168c088>] process_backlog+0x78/0x230 [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230 [<ffffffff8168db43>] net_rx_action+0x203/0x480 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0 [<ffffffff817c156e>] __do_softirq+0xde/0x49f [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0 [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40 [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0 [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0 [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50 [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30 [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0 [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8166d078>] sock_sendmsg+0x38/0x50 [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170 [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d [<ffffffff8166e38e>] SyS_sendto+0xe/0x10 [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8 Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8# RIP [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50 RSP <ffff880072803120> Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20rtnetlink: add new RTM_GETSTATS message to dump link statsRoopa Prabhu1-0/+158
This patch adds a new RTM_GETSTATS message to query link stats via netlink from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK returns a lot more than just stats and is expensive in some cases when frequent polling for stats from userspace is a common operation. RTM_GETSTATS is an attempt to provide a light weight netlink message to explicity query only link stats from the kernel on an interface. The idea is to also keep it extensible so that new kinds of stats can be added to it in the future. This patch adds the following attribute for NETDEV stats: struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = { [IFLA_STATS_LINK_64] = { .len = sizeof(struct rtnl_link_stats64) }, }; Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of a single interface or all interfaces with NLM_F_DUMP. Future possible new types of stat attributes: link af stats: - IFLA_STATS_LINK_IPV6 (nested. for ipv6 stats) - IFLA_STATS_LINK_MPLS (nested. for mpls/mdev stats) extended stats: - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge, vlan, vxlan etc) - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are available via ethtool today) This patch also declares a filter mask for all stat attributes. User has to provide a mask of stats attributes to query. filter mask can be specified in the new hdr 'struct if_stats_msg' for stats messages. Other important field in the header is the ifindex. This api can also include attributes for global stats (eg tcp) in the future. When global stats are included in a stats msg, the ifindex in the header must be zero. A single stats message cannot contain both global and netdev specific stats. To easily distinguish them, netdev specific stat attributes name are prefixed with IFLA_STATS_LINK_ Without any attributes in the filter_mask, no stats will be returned. This patch has been tested with mofified iproute2 ifstat. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20VSOCK: Only check error on skb_recv_datagram when skb is NULLJorgen Hansen1-5/+2
If skb_recv_datagram returns an skb, we should ignore the err value returned. Otherwise, datagram receives will return EAGAIN when they have to wait for a datagram. Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20net: dsa: kill circular reference with slave privVivien Didelot2-10/+4
The dsa_slave_priv structure does not need a pointer to its net_device. Kill it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20bpf: add event output helper for notifications/sampling/loggingDaniel Borkmann1-0/+2
This patch adds a new helper for cls/act programs that can push events to user space applications. For networking, this can be f.e. for sampling, debugging, logging purposes or pushing of arbitrary wake-up events. The idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example"). The eBPF program utilizes a perf event array map that user space populates with fds from perf_event_open(), the eBPF program calls into the helper f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw)) so that the raw data is pushed into the fd f.e. at the map index of the current CPU. User space can poll/mmap/etc on this and has a data channel for receiving events that can be post-processed. The nice thing is that since the eBPF program and user space application making use of it are tightly coupled, they can define their own arbitrary raw data format and what/when they want to push. While f.e. packet headers could be one part of the meta data that is being pushed, this is not a substitute for things like packet sockets as whole packet is not being pushed and push is only done in a single direction. Intention is more of a generically usable, efficient event pipe to applications. Workflow is that tc can pin the map and applications can attach themselves e.g. after cls/act setup to one or multiple map slots, demuxing is done by the eBPF program. Adding this facility is with minimal effort, it reuses the helper introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and we get its functionality for free by overloading its BPF_FUNC_ identifier for cls/act programs, ctx is currently unused, but will be made use of in future. Example will be added to iproute2's BPF example files. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20net/ipv6/addrconf: fix sysctl table indentationKonstantin Khlebnikov1-309/+307
Separated from previous patch for readability. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20net/ipv6/addrconf: simplify sysctl registrationKonstantin Khlebnikov1-26/+17
Struct ctl_table_header holds pointer to sysctl table which could be used for freeing it after unregistration. IPv4 sysctls already use that. Remove redundant NULL assignment: ndev allocated using kzalloc. This also saves some bytes: sysctl table could be shorter than DEVCONF_MAX+1 if some options are disable in config. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-20net: Add helpers for 64-bit aligning netlink attributes.David S. Miller1-19/+5
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net: Align IFLA_STATS64 attributes properly on architectures that need it.David S. Miller1-0/+19
Since the nlattr header is 4 bytes in size, it can cause the netlink attribute payload to not be 8-byte aligned. This is particularly troublesome for IFLA_STATS64 which contains 64-bit statistic values. Solve this by creating a dummy IFLA_PAD attribute which has a payload which is zero bytes in size. When HAVE_EFFICIENT_UNALIGNED_ACCESS is false, we insert an IFLA_PAD attribute into the netlink response when necessary such that the IFLA_STATS64 payload will be properly aligned. With help and suggestions from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19netfilter: conntrack: don't acquire lock during seq_printfFlorian Westphal2-14/+2
read access doesn't need any lock here. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18netfilter: ctnetlink: restore inlining for netlink message size calculationPablo Neira Ayuso1-5/+5
Calm down gcc warnings: net/netfilter/nf_conntrack_netlink.c:529:15: warning: 'ctnetlink_proto_size' defined but not used [-Wunused-function] static size_t ctnetlink_proto_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:546:15: warning: 'ctnetlink_acct_size' defined but not used [-Wunused-function] static size_t ctnetlink_acct_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:556:12: warning: 'ctnetlink_secctx_size' defined but not used [-Wunused-function] static int ctnetlink_secctx_size(const struct nf_conn *ct) ^ net/netfilter/nf_conntrack_netlink.c:572:15: warning: 'ctnetlink_timestamp_size' defined but not used [-Wunused-function] static size_t ctnetlink_timestamp_size(const struct nf_conn *ct) ^ So gcc compiles them out when CONFIG_NF_CONNTRACK_EVENTS and CONFIG_NETFILTER_NETLINK_GLUE_CT are not set. Fixes: 4054ff45454a9a4 ("netfilter: ctnetlink: remove unnecessary inlining") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arnd Bergmann <arnd@arndb.de>
2016-04-18net: ethtool: export conversion function between u32 and link modePhilippe Reynes1-9/+12
The function convert_legacy_u32_to_link_mode and convert_link_mode_to_legacy_u32 may be used outside of ethtool.c. We rename them to ethtool_convert_... and export them, so we could use them in others drivers and modules. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18netfilter: connlabels: change nf_connlabels_get bit arg to 'highest used'Florian Westphal4-6/+9
nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>