summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2012-06-10af_unix: remove unix_iter_stateEric Dumazet1-5/+1
As pointed out by Michael Tokarev , struct unix_iter_state is no longer needed. Suggested-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata.David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10inet: Consolidate inetpeer_invalidate_tree() interfaces.David S. Miller3-12/+4
We only need one interface for this operation, since we always know which inetpeer root we want to flush. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.cDavid S. Miller3-49/+74
Instead of net/ipv4/inetpeer.c Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10[PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary.David S. Miller3-30/+20
Since it's guarenteed that we will access the inetpeer if we're trying to do timewait recycling and TCP options were enabled on the connection, just cache the peer in the timewait socket. In the future, inetpeer lookups will be context dependent (per routing realm), and this helps facilitate that as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09tcp: Get rid of inetpeer special cases.David S. Miller3-30/+17
The get_peer method TCP uses is full of special cases that make no sense accommodating, and it also gets in the way of doing more reasonable things here. First of all, if the socket doesn't have a usable cached route, there is no sense in trying to optimize timewait recycling. Likewise for the case where we have IP options, such as SRR enabled, that make the IP header destination address (and thus the destination address of the route key) differ from that of the connection's destination address. Just return a NULL peer in these cases, and thus we're also able to get rid of the clumsy inetpeer release logic. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Create and use rt{,6}_get_peer_create().David S. Miller8-52/+24
There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09af_unix: speedup /proc/net/unixEric Dumazet2-48/+68
/proc/net/unix has quadratic behavior, and can hold unix_table_lock for a while if high number of unix sockets are alive. (90 ms for 200k sockets...) We already have a hash table, so its quite easy to use it. Problem is unbound sockets are still hashed in a single hash slot (unix_socket_table[UNIX_HASH_TABLE]) This patch also spreads unbound sockets to 256 hash slots, to speedup both /proc/net/unix and unix_diag. Time to read /proc/net/unix with 200k unix sockets : (time dd if=/proc/net/unix of=/dev/null bs=4k) before : 520 secs after : 2 secs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inetpeer: add parameter net for inet_getpeer_v4,v6Gao feng6-10/+21
add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inetpeer: add namespace support for inetpeerGao feng2-19/+51
now inetpeer doesn't support namespace,the information will be leaking across namespace. this patch move the global vars v4_peers and v6_peers to netns_ipv4 and netns_ipv6 as a field peers. add struct pernet_operations inetpeer_ops to initial pernet inetpeer data. and change family_to_base and inet_getpeer to support namespace. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08Added kernel support in EEE Ethtool commandsYuval Mintz1-0/+40
This patch extends the kernel's ethtool interface by adding support for 2 new EEE commands - get_eee and set_eee. Thanks goes to Giuseppe Cavallaro for his original patch adding this support. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08net: Update kernel-doc for __alloc_skb()Ben Hutchings1-2/+2
__alloc_skb() now extends tailroom to allow the use of padding added by the heap allocator. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-95/+142
2012-06-07Merge branch 'for-davem' of ↵David S. Miller11-22/+97
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John Linville says: ==================== Amitkumar Karwar gives us a cfg80211 fix that changes some state tracking in order to avoid a WARNING. Arik Nemtsov provide a mac80211 fix for an RCU-related race. Avinash Patil shares a pair of mwifiex fixes, one which invalidates some stale configuration data before a channel change and another to restrict hidden SSID support to zero-length SSIDs only. Chun-Yeow Yeoh brings a mac80211 fix for a mesh problem triggered when combining multiple mesh networks into one. Felix Fietkau provides a mac80211 lockdep fix. Joe Perches fixes a couple of thinkos related to bitwise operations. Johannes Berg comes through with a flurry of fixes. The iwlwifi ones address a problem Linus recently reported, and some of the fallout discovered while fixing it. The mac80211 fix properly cleans-up remain-on-channel work on an interface that is stopped. The others are clean-ups for regressions caused by stricter checking of possible virtual interfaces supported by wireless drivers. Meenakshi Venkataraman provides a mac80211 fix for an off-by-one error. Seth Forshee provides a fix to make the wireless adapters used in some Mac boxes work after being in S3 power saving state. Stanislaw Gruszka offers a copule of fixes, a fix for a mac80211 scanning regression and an rt2x00 fix to avoid some lockdep spew. Last but not least, Vinicius Costa Gomes provides a bluetooth fix for a typo that "was preventing important features of Bluetooth from working". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06Merge branch 'master' of git://gitorious.org/linux-can/linux-can-nextDavid S. Miller1-5/+5
2012-06-06Merge branch 'master' of ↵John W. Linville11-22/+97
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-06-06Merge branch 'master' of ↵John W. Linville1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
2012-06-06inetpeer: fix a race in inetpeer_gc_worker()Eric Dumazet1-4/+12
commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with the routing cache) added a race : Before freeing an inetpeer, we must respect a RCU grace period, and make sure no user will attempt to increase refcnt. inetpeer_invalidate_tree() waits for a RCU grace period before inserting inetpeer tree into gc_list and waking the worker. At that time, no concurrent lookup can find a inetpeer in this tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-05cfg80211: fix interface combinations checkJohannes Berg1-1/+18
If a given interface combination doesn't contain a required interface type then we missed checking that and erroneously allowed it even though iface type wasn't there at all. Add a check that makes sure that all interface types are accounted for. Cc: stable@kernel.org Reported-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-05Bluetooth: Fix checking the wrong flag when accepting a socketVinicius Costa Gomes1-1/+1
Most probably a typo, the check should have been for BT_SK_DEFER_SETUP instead of BT_DEFER_SETUP (which right now only represents a socket option). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-06-04mac80211: fix non RCU-safe sta_list manipulationArik Nemtsov1-2/+2
sta_info_cleanup locks the sta_list using rcu_read_lock however the delete operation isn't rcu safe. A race between sta_info_cleanup timer being called and a STA being removed can occur which leads to a panic while traversing sta_list. Fix this by switching to the RCU-safe versions. Cc: stable@vger.kernel.org Reported-by: Eyal Shapira <eyal@wizery.com> Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: Fix likely misuse of | for &Joe Perches1-3/+3
Using | with a constant is always true. Likely this should have be &. cc: Ben Greear <greearb@candelatech.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: add missing rcu_read_lock/unlock in agg-rx session timerFelix Fietkau1-1/+6
Fixes a lockdep warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/mac80211/agg-rx.c:148 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by arecord/11226: #0: (&tid_agg_rx->session_timer){+.-...}, at: [<ffffffff81066bb0>] call_timer_fn+0x0/0x360 stack backtrace: Pid: 11226, comm: arecord Not tainted 3.1.0-kml #16 Call Trace: <IRQ> [<ffffffff81093454>] lockdep_rcu_dereference+0xa4/0xc0 [<ffffffffa02778c9>] sta_rx_agg_session_timer_expired+0xc9/0x110 [mac80211] [<ffffffffa0277800>] ? ieee80211_process_addba_resp+0x220/0x220 [mac80211] [<ffffffff81066c3a>] call_timer_fn+0x8a/0x360 [<ffffffff81066bb0>] ? init_timer_deferrable_key+0x30/0x30 [<ffffffff81477bb0>] ? _raw_spin_unlock_irq+0x30/0x70 [<ffffffff81067049>] run_timer_softirq+0x139/0x310 [<ffffffff81091d5e>] ? put_lock_stats.isra.25+0xe/0x40 [<ffffffff810922ac>] ? lock_release_holdtime.part.26+0xdc/0x160 [<ffffffffa0277800>] ? ieee80211_process_addba_resp+0x220/0x220 [mac80211] [<ffffffff8105cb78>] __do_softirq+0xc8/0x3c0 [<ffffffff8108f088>] ? tick_dev_program_event+0x48/0x110 [<ffffffff8108f16f>] ? tick_program_event+0x1f/0x30 [<ffffffff81153b15>] ? putname+0x35/0x50 [<ffffffff8147a43c>] call_softirq+0x1c/0x30 [<ffffffff81004c55>] do_softirq+0xa5/0xe0 [<ffffffff8105d1ee>] irq_exit+0xae/0xe0 [<ffffffff8147ac6b>] smp_apic_timer_interrupt+0x6b/0x98 [<ffffffff81479ab3>] apic_timer_interrupt+0x73/0x80 <EOI> [<ffffffff8146aac6>] ? free_debug_processing+0x1a1/0x1d5 [<ffffffff81153b15>] ? putname+0x35/0x50 [<ffffffff8146ab2b>] __slab_free+0x31/0x2ca [<ffffffff81477c3a>] ? _raw_spin_unlock_irqrestore+0x4a/0x90 [<ffffffff81253b8f>] ? __debug_check_no_obj_freed+0x15f/0x210 [<ffffffff81097054>] ? lock_release_nested+0x84/0xc0 [<ffffffff8113ec55>] ? kmem_cache_free+0x105/0x250 [<ffffffff81153b15>] ? putname+0x35/0x50 [<ffffffff81153b15>] ? putname+0x35/0x50 [<ffffffff8113ed8f>] kmem_cache_free+0x23f/0x250 [<ffffffff81153b15>] putname+0x35/0x50 [<ffffffff81146d8d>] do_sys_open+0x16d/0x1d0 [<ffffffff81146e10>] sys_open+0x20/0x30 [<ffffffff81478f42>] system_call_fastpath+0x16/0x1b Reported-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: clean up remain-on-channel on interface stopJohannes Berg2-0/+28
When any interface goes down, it could be the one that we were doing a remain-on-channel with. We therefore need to cancel the remain-on-channel and flush the related work structs so they don't run after the interface has been removed or even destroyed. It's also possible in this case that an off-channel SKB was never transmitted, so free it if this is the case. Note that this can also happen if the driver finishes the off-channel period without ever starting it. Cc: stable@kernel.org Reported-by: Nirav Shah <nirav.j2.shah@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: fix error in station state transitions during reconfigMeenakshi Venkataraman1-1/+1
As part of hardware reconfig mac80211 tries to restore the station state to its values before the hardware reconfig, but it only goes to the last-state - 1. Fix this off-by-one error. Cc: stable@kernel.org [3.4] Signed-off-by: Meenakshi Venkataraman <meenakshi.venkataraman@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: Fix Unreachable Mesh Station Problem when joining to another MBSSChun-Yeow Yeoh1-3/+6
Mesh station that joins an MBSS is reachable using mesh portal with 6 address frame by mesh stations from another MBSS if these two different MBSSes are bridged. However, if the mesh station later moves into the same MBSS of those mesh stations, it is unreachable by mesh stations in the MBSS due to the mpp_paths table is not deleted. A quick fix is to perform mesh_path_lookup, if it is available for the target destination, mpp_path_lookup is not performed. When the mesh station moves back to its original MBSS, the mesh_paths will be deleted once expired. So, it will be reachable using mpp_path_lookup again. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04cfg80211: use sme_state in ibss start/join pathAmitkumar Karwar1-1/+5
CFG80211_DEV_WARN_ON() at "net/wireless/ibss.c line 63" is unnecessarily triggered even after successful connection, when cfg80211_ibss_joined() is called by driver inside .join_ibss handler. This patch fixes the problem by changing 'sme_state' in ibss path and having WARN_ON() check for 'sme_state' similar to infra association. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04mac80211: run scan after finish connection monitoringStanislaw Gruszka1-9/+27
commit 133d40f9a22bdfd2617a446f1e3209537c5415ec Author: Stanislaw Gruszka <sgruszka@redhat.com> Date: Wed Mar 28 16:01:19 2012 +0200 mac80211: do not scan and monitor connection in parallel add bug, which make possible to start a scan and never finish it, so make every new scanning request finish with -EBUSY error. This can happen on code paths where we finish connection monitoring and clear IEEE80211_STA_*_POLL flags, but do not check if scan was deferred. This patch fixes those code paths. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-04net/9p: Add __force to cast of __user pointerJoe Perches1-1/+1
A recent commit that removed unnecessary casts of pointers to the same type uncovered a missing __force cast. Add it. Reported by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04net: Remove casts to same typeJoe Perches14-36/+32
Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04drop_monitor: dont sleep in atomic contextEric Dumazet1-69/+33
drop_monitor calls several sleeping functions while in atomic context. BUG: sleeping function called from invalid context at mm/slub.c:943 in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2 Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55 Call Trace: [<ffffffff810697ca>] __might_sleep+0xca/0xf0 [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0 [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130 [<ffffffff815343fb>] __alloc_skb+0x4b/0x230 [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 [<ffffffff81058249>] worker_thread+0x159/0x360 [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 [<ffffffff8105d403>] kthread+0x93/0xa0 [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 [<ffffffff816be6d0>] ? gs_change+0xb/0xb Rework the logic to call the sleeping functions in right context. Use standard timer/workqueue api to let system chose any cpu to perform the allocation and netlink send. Also avoid a loop if reset_per_cpu_data() cannot allocate memory : use mod_timer() to wait 1/10 second before next try. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04sock_diag: add SK_MEMINFO_BACKLOGEric Dumazet1-0/+1
Adding socket backlog len in INET_DIAG_SKMEMINFO is really useful to diagnose various TCP problems. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04net: use consume_skb() in place of kfree_skb()Eric Dumazet7-12/+14
Remove some dropwatch/drop_monitor false positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04tcp: tcp_make_synack() consumes dst parameterEric Dumazet3-6/+14
tcp_make_synack() clones the dst, and callers release it. We can avoid two atomic operations per SYNACK if tcp_make_synack() consumes dst instead of cloning it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04tcp: tcp_make_synack() can use alloc_skb()Eric Dumazet1-1/+1
There is no value using sock_wmalloc() in tcp_make_synack(). A listener socket only sends SYNACK packets, they are not queued in a socket queue, only in Qdisc and device layers, so the number of in flight packets is limited in these layers. We used sock_wmalloc() with the %force parameter set to 1 to ignore socket limits anyway. This patch removes two atomic operations per SYNACK packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds4-9/+19
Pull networking updates from David Miller: 1) Make syn floods consume significantly less resources by a) Not pre-COW'ing routing metrics for SYN/ACKs b) Mirroring the device queue mapping of the SYN for the SYN/ACK reply. Both from Eric Dumazet. 2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA. 3) Validate the length requested when building a paged SKB for a socket, so we don't overrun the page vector accidently. From Jason Wang. 4) When netlabel is disabled, we abort all IP option processing when we see a CIPSO option. This isn't the right thing to do, we should simply skip over it and continue processing the remaining options (if any). Fix from Paul Moore. 5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel Apfelbaum. 6) 8139cp enables the receiver before the ring address is properly programmed, which potentially lets the device crap over random memory. Fix from Jason Wang. 7) e1000/e1000e fixes for i217 RST handling, and an improper buffer address reference in jumbo RX frame processing from Bruce Allan and Sebastian Andrzej Siewior, respectively. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: fec_mpc52xx: fix timestamp filtering mcs7830: Implement link state detection e1000e: fix Rapid Start Technology support for i217 e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats() r8169: call netif_napi_del at errpaths and at driver unload tcp: reflect SYN queue_mapping into SYNACK packets tcp: do not create inetpeer on SYNACK message 8139cp/8139too: terminate the eeprom access with the right opmode 8139cp: set ring address before enabling receiver cipso: handle CIPSO options correctly when NetLabel is disabled net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() bql: Avoid possible inconsistent calculation. bql: Avoid unneeded limit decrement. bql: Fix POSDIFF() to integer overflow aware. net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap net/mlx4_core: Fixes for VF / Guest startup flow net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event net/mlx4_core: Fix number of EQs used in ICM initialisation net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
2012-06-03tty: Revert the tty locking series, it needs more workLinus Torvalds1-2/+2
This reverts the tty layer change to use per-tty locking, because it's not correct yet, and fixing it will require some more deep surgery. The main revert is d29f3ef39be4 ("tty_lock: Localise the lock"), but there are several smaller commits that built upon it, they also get reverted here. The list of reverted commits is: fde86d310886 - tty: add lockdep annotations 8f6576ad476b - tty: fix ldisc lock inversion trace d3ca8b64b97e - pty: Fix lock inversion b1d679afd766 - tty: drop the pty lock during hangup abcefe5fc357 - tty/amiserial: Add missing argument for tty_unlock() fd11b42e3598 - cris: fix missing tty arg in wait_event_interruptible_tty call d29f3ef39be4 - tty_lock: Localise the lock The revert had a trivial conflict in the 68360serial.c staging driver that got removed in the meantime. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01tcp: reflect SYN queue_mapping into SYNACK packetsEric Dumazet2-6/+12
While testing how linux behaves on SYNFLOOD attack on multiqueue device (ixgbe), I found that SYNACK messages were dropped at Qdisc level because we send them all on a single queue. Obvious choice is to reflect incoming SYN packet @queue_mapping to SYNACK packet. Under stress, my machine could only send 25.000 SYNACK per second (for 200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues. After patch, not a single SYNACK is dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-01tcp: do not create inetpeer on SYNACK messageEric Dumazet1-1/+2
Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting larger and larger, using lots of memory and cpu time. tcp_v4_send_synack() ->inet_csk_route_req() ->ip_route_output_flow() ->rt_set_nexthop() ->rt_init_metrics() ->inet_getpeer( create = true) This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for TCP) added in 2.6.39 Possible solution : Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS Before patch : # grep peer /proc/slabinfo inet_peer_cache 4175430 4175430 192 42 2 : tunables 0 0 0 : slabdata 99415 99415 0 Samples: 41K of event 'cycles', Event count (approx.): 30716565122 + 20,24% ksoftirqd/0 [kernel.kallsyms] [k] inet_getpeer + 8,19% ksoftirqd/0 [kernel.kallsyms] [k] peer_avl_rebalance.isra.1 + 4,81% ksoftirqd/0 [kernel.kallsyms] [k] sha_transform + 3,64% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup + 2,36% ksoftirqd/0 [ixgbe] [k] ixgbe_poll + 2,16% ksoftirqd/0 [kernel.kallsyms] [k] __ip_route_output_key + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] kernel_map_pages + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] ip_route_input_common + 2,01% ksoftirqd/0 [kernel.kallsyms] [k] __inet_lookup_established + 1,83% ksoftirqd/0 [kernel.kallsyms] [k] md5_transform + 1,75% ksoftirqd/0 [kernel.kallsyms] [k] check_leaf.isra.9 + 1,49% ksoftirqd/0 [kernel.kallsyms] [k] ipt_do_table + 1,46% ksoftirqd/0 [kernel.kallsyms] [k] hrtimer_interrupt + 1,45% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] inet_csk_search_req + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb + 1,16% ksoftirqd/0 [kernel.kallsyms] [k] copy_user_generic_string + 1,15% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free + 1,02% ksoftirqd/0 [kernel.kallsyms] [k] tcp_make_synack + 0,93% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 0,87% ksoftirqd/0 [kernel.kallsyms] [k] __call_rcu + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] rt_garbage_collect + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] fib_rules_lookup Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-01Merge branch 'for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
2012-06-01Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds6-58/+75
Pull the rest of the nfsd commits from Bruce Fields: "... and then I cherry-picked the remainder of the patches from the head of my previous branch" This is the rest of the original nfsd branch, rebased without the delegation stuff that I thought really needed to be redone. I don't like rebasing things like this in general, but in this situation this was the lesser of two evils. * 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits) nfsd4: fix, consolidate client_has_state nfsd4: don't remove rebooted client record until confirmation nfsd4: remove some dprintk's and a comment nfsd4: return "real" sequence id in confirmed case nfsd4: fix exchange_id to return confirm flag nfsd4: clarify that renewing expired client is a bug nfsd4: simpler ordering of setclientid_confirm checks nfsd4: setclientid: remove pointless assignment nfsd4: fix error return in non-matching-creds case nfsd4: fix setclientid_confirm same_cred check nfsd4: merge 3 setclientid cases to 2 nfsd4: pull out common code from setclientid cases nfsd4: merge last two setclientid cases nfsd4: setclientid/confirm comment cleanup nfsd4: setclientid remove unnecessary terms from a logical expression nfsd4: move rq_flavor into svc_cred nfsd4: stricter cred comparison for setclientid/exchange_id nfsd4: move principal name into svc_cred nfsd4: allow removing clients not holding state nfsd4: rearrange exchange_id logic to simplify ...
2012-06-01sch_atm.c: get rid of poinless externAl Viro1-2/+0
sockfd_lookup() is declared in linux/net.h, which is pulled by linux/skbuff.h (and needed for a lot of other stuff in sch_atm.c anyway). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01Merge branch 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-45/+68
Pull nfsd update from Bruce Fields. * 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits) nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek SUNRPC: split upcall function to extract reusable parts nfsd: allocate id-to-name and name-to-id caches in per-net operations. nfsd: make name-to-id cache allocated per network namespace context nfsd: make id-to-name cache allocated per network namespace context nfsd: pass network context to idmap init/exit functions nfsd: allocate export and expkey caches in per-net operations. nfsd: make expkey cache allocated per network namespace context nfsd: make export cache allocated per network namespace context nfsd: pass pointer to export cache down to stack wherever possible. nfsd: pass network context to export caches init/shutdown routines Lockd: pass network namespace to creation and destruction routines NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches nfsd: pass pointer to expkey cache down to stack wherever possible. nfsd: use hash table from cache detail in nfsd export seq ops nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops nfsd: use exp_put() for svc_export_cache put nfsd: use cache detail pointer from svc_export structure on cache put nfsd: add link to owner cache detail to svc_export structure nfsd: use passed cache_detail pointer expkey_parse() ...
2012-06-01nfsd4: move rq_flavor into svc_credJ. Bruce Fields2-3/+3
Move the rq_flavor into struct svc_cred, and use it in setclientid and exchange_id comparisons as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01nfsd4: move principal name into svc_credJ. Bruce Fields2-19/+8
Instead of keeping the principal name associated with a request in a structure that's private to auth_gss and using an accessor function, move it to svc_cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01svcrpc: fix a comment typoJ. Bruce Fields1-1/+1
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01sunrpc: do array overrun check in svc_recv before allocating pagesJeff Layton1-1/+1
There's little point in waiting until after we allocate all of the pages to see if we're going to overrun the array. In the event that this calculation is really off we could end up scribbling over a bunch of memory and make it tougher to debug. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01SUNRPC: move per-net operations from svc_destroy()Stanislav Kinsbursky1-4/+0
The idea is to separate service destruction and per-net operations, because these are two different things and the mix looks ugly. Notes: 1) For NFS server this patch looks ugly (sorry for that). But these place will be rewritten soon during NFSd containerization. 2) LockD per-net counter increase int lockd_up() was moved prior to make_socks() to make lockd_down_net() call safe in case of error. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01SUNRPC: new svc_bind() routine introducedStanislav Kinsbursky2-14/+17
This new routine is responsible for service registration in a specified network context. The idea is to separate service creation from per-net operations. Note also: since registering service with svc_bind() can fail, the service will be destroyed and during destruction it will try to unregister itself from rpcbind. In this case unregistration has to be skipped. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-01rpc: handle rotated gss data for Windows interoperabilityJ. Bruce Fields1-16/+45
The data in Kerberos gss tokens can be rotated. But we were lazy and rejected any nonzero rotation value. It wasn't necessary for the implementations we were testing against at the time. But it appears that Windows does use a nonzero value here. So, implement rotation to bring ourselves into compliance with the spec and to interoperate with Windows. Signed-off-by: J. Bruce Fields <bfields@redhat.com>