summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin1-0/+6
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove d_mountedNick Piggin1-21/+21
Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: fs_struct use seqlockNick Piggin1-0/+3
Use a seqlock in the fs_struct to enable us to take an atomic copy of the complete cwd and root paths. Use this in the RCU lookup path to avoid a thread-shared spinlock in RCU lookup operations. Multi-threaded apps may now perform path lookups with scalability matching multi-process apps. Operations such as stat(2) become very scalable for multi-threaded workload. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: rcu-walk for path lookupNick Piggin3-9/+46
Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07kernel: optimise seqlockNick Piggin1-7/+73
Add branch annotations for seqlock read fastpath, and introduce __read_seqcount_begin and __read_seqcount_end functions, that can avoid the smp_rmb() if used carefully. These will be used by store-free path walking algorithm performance is critical and seqlocks are in use. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: avoid inode RCU freeing for pseudo fsNick Piggin2-0/+2
Pseudo filesystems that don't put inode on RCU list or reachable by rcu-walk dentries do not need to RCU free their inodes. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: icache RCU free inodesNick Piggin2-2/+4
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache rationalise dget variantsNick Piggin1-12/+3
dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove dcache_lockNick Piggin5-11/+14
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: Use rename lock and RCU for multi-step operationsNick Piggin1-0/+1
The remaining usages for dcache_lock is to allow atomic, multi-step read-side operations over the directory tree by excluding modifications to the tree. Also, to walk in the leaf->root direction in the tree where we don't have a natural d_lock ordering. This could be accomplished by taking every d_lock, but this would mean a huge number of locks and actually gets very tricky. Solve this instead by using the rename seqlock for multi-step read-side operations, retry in case of a rename so we don't walk up the wrong parent. Concurrent dentry insertions are not serialised against. Concurrent deletes are tricky when walking up the directory: our parent might have been deleted when dropping locks so also need to check and retry for that. We can also use the rename lock in cases where livelock is a worry (and it is introduced in subsequent patch). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: scale inode alias listNick Piggin1-0/+1
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale subdirsNick Piggin1-0/+1
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale dentry refcountNick Piggin1-14/+15
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale hashNick Piggin1-33/+2
Add a new lock, dcache_hash_lock, to protect the dcache hash table from concurrent modification. d_hash is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07hostfs: simplify lockingNick Piggin1-1/+1
Remove dcache_lock locking from hostfs filesystem, and move it into dcache helpers. All that is required is a coherent path name. Protection from concurrent modification of the namespace after path name generation is not provided in current code, because dcache_lock is dropped before the path is used. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: change d_hash for rcu-walkNick Piggin1-1/+2
Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: change d_compare for rcu-walkNick Piggin2-9/+7
Change d_compare so it may be called from lock-free RCU lookups. This does put significant restrictions on what may be done from the callback, however there don't seem to have been any problems with in-tree fses. If some strange use case pops up that _really_ cannot cope with the rcu-walk rules, we can just add new rcu-unaware callbacks, which would cause name lookup to drop out of rcu-walk mode. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: name case update methodNick Piggin1-0/+2
smpfs and ncpfs want to update a live dentry name in-place. Rather than have them open code the locking, provide a documented dcache API. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: change d_delete semanticsNick Piggin1-3/+3
Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache documentation cleanupNick Piggin1-12/+6
Remove redundant (and incorrect, since dcache RCU lookup) dentry locking documentation and point to the canonical document. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07kernel: kmem_ptr_validate considered harmfulNick Piggin1-2/+0
This is a nasty and error prone API. It is no longer used, remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-03Merge branch 'v4l_for_linus' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: [media] em28xx: radio_fops should also use unlocked_ioctl [media] wm8775: Revert changeset fcb9757333 to avoid a regression [media] cx25840: Prevent device probe failure due to volume control ERANGE error
2011-01-03Merge branch 'fixes' of ↵Linus Torvalds1-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: dmaengine: provide dummy functions for DMA_ENGINE=n mv_xor: fix race in tasklet function
2011-01-03[media] wm8775: Revert changeset fcb9757333 to avoid a regressionMauro Carvalho Chehab1-3/+0
It seems that cx88 and ivtv use wm8775 on some different modes. The patch that added support for a board with wm8775 broke ivtv boards with this device. As we're too close to release 2.6.37, let's just revert it. Reported-by: Andy Walls <awalls@md.metrocast.net> Reported-by: Eric Sharkey <eric@lisaneric.org> Reported-by: Auric <auric@aanet.com.au> Reported by: David Gesswein <djg@pdp8online.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-01-03dmaengine: provide dummy functions for DMA_ENGINE=nGuennadi Liakhovetski1-3/+10
This lets drivers, optionally using the dmaengine, build with DMA_ENGINE unselected. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-12-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds7-12/+42
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) ipv4: dont create routes on down devices epic100: hamachi: yellowfin: Fix skb allocation size sundance: Fix oopses with corrupted skb_shared_info Revert "ipv4: Allow configuring subnets as local addresses" USB: mcs7830: return negative if auto negotiate fails irda: prevent integer underflow in IRLMP_ENUMDEVICES tcp: fix listening_get_next() atl1c: Do not use legacy PCI power management mac80211: fix mesh forwarding MAINTAINERS: email address change net: Fix range checks in tcf_valid_offset(). net_sched: sch_sfq: fix allot handling hostap: remove netif_stop_queue from init mac80211/rt2x00: add ieee80211_tx_status_ni() typhoon: memory corruption in typhoon_get_drvinfo() net: Add USB PID for new MOSCHIP USB ethernet controller MCS7832 variant net_sched: always clone skbs ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed. netlink: fix gcc -Wconversion compilation warning asix: add USB ID for Logitec LAN-GTJ U2A ...
2010-12-24Merge branch 'for-linus' of ↵Linus Torvalds1-10/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: print out alloc information with KERN_DEBUG instead of KERN_INFO kthread_work: make lockdep happy
2010-12-23Revert "ipv4: Allow configuring subnets as local addresses"David S. Miller1-1/+0
This reverts commit 4465b469008bc03b98a1b8df4e9ae501b6c69d4b. Conflicts: net/ipv4/fib_frontend.c As reported by Ben Greear, this causes regressions: > Change 4465b469008bc03b98a1b8df4e9ae501b6c69d4b caused rules > to stop matching the input device properly because the > FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find(). > > This breaks rules such as: > > ip rule add pref 512 lookup local > ip rule del pref 0 lookup local > ip link set eth2 up > ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2 > ip rule add to 172.16.0.102 iif eth2 lookup local pref 10 > ip rule add iif eth2 lookup 10001 pref 20 > ip route add 172.16.0.0/24 dev eth2 table 10001 > ip route add unreachable 0/0 table 10001 > > If you had a second interface 'eth0' that was on a different > subnet, pinging a system on that interface would fail: > > [root@ct503-60 ~]# ping 192.168.100.1 > connect: Invalid argument Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-23taskstats: pad taskstats netlink response for aligment issues on ia64Jeff Mahoney1-1/+2
The taskstats structure is internally aligned on 8 byte boundaries but the layout of the aggregrate reply, with two NLA headers and the pid (each 4 bytes), actually force the entire structure to be unaligned. This causes the kernel to issue unaligned access warnings on some architectures like ia64. Unfortunately, some software out there doesn't properly unroll the NLA packet and assumes that the start of the taskstats structure will always be 20 bytes from the start of the netlink payload. Aligning the start of the taskstats structure breaks this software, which we don't want. So, for now the alignment only happens on architectures that require it and those users will have to update to fixed versions of those packages. Space is reserved in the packet only when needed. This ifdef should be removed in several years e.g. 2012 once we can be confident that fixed versions are installed on most systems. We add the padding before the aggregate since the aggregate is already a defined type. Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems") previously addressed the alignment issues by padding out the pid field. This was supposed to be a compatible change but the circumstances described above mean that it wasn't. This patch backs out that change, since it was a hack, and introduces a new NULL attribute type to provide the padding. Padding the response with 4 bytes avoids allocating an aligned taskstats structure and copying it back. Since the structure weighs in at 328 bytes, it's too big to do it on the stack. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Brian Rogers <brian@xyzw.org> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Guillaume Chazarain <guichaz@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-23include/linux/unaligned: pack the whole struct rather than just the fieldWill Newton1-3/+3
The current packed struct implementation of unaligned access adds the packed attribute only to the field within the unaligned struct rather than to the struct as a whole. This is not sufficient to enforce proper behaviour on architectures with a default struct alignment of more than one byte. For example, the current implementation of __get_unaligned_cpu16 when compiled for arm with gcc -O1 -mstructure-size-boundary=32 assumes the struct is on a 4 byte boundary so performs the load of the 16bit packed field as if it were on a 4 byte boundary: __get_unaligned_cpu16: ldrh r0, [r0, #0] bx lr Moving the packed attribute to the struct rather than the field causes the proper unaligned access code to be generated: __get_unaligned_cpu16: ldrb r3, [r0, #0] @ zero_extendqisi2 ldrb r0, [r0, #1] @ zero_extendqisi2 orr r0, r3, r0, asl #8 bx lr Signed-off-by: Will Newton <will.newton@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-23Merge branch 'master' of ↵David S. Miller1-4/+24
ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-12-22kthread_work: make lockdep happyYong Zhang1-10/+35
spinlock in kthread_worker and wait_queue_head in kthread_work both should be lockdep sensible, so change the interface to make it suiltable for CONFIG_LOCKDEP. tj: comment update Reported-by: Nicolas <nicolas.mailhot@laposte.net> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Andy Walls <awalls@md.metrocast.net> Tested-by: Andy Walls <awalls@md.metrocast.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-21net: Fix range checks in tcf_valid_offset().David S. Miller1-1/+3
This function has three bugs: 1) The offset should be valid most of the time, this is just a sanity check, therefore we should use "likely" not "unlikely" 2) This is the only place where we can check for arithmetic overflow of the pointer plus the length. 3) The existing range checks are off by one, the valid range is skb->head to skb_tail_pointer(), inclusive. Based almost entirely upon a patch by Ralph Loader. Reported-by: Ralph Loader <suckfish@ihug.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-21Merge branch 'for-linus' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path
2010-12-20mac80211/rt2x00: add ieee80211_tx_status_ni()Johannes Stezenbach1-4/+24
All rt2x00 drivers except rt2800pci call ieee80211_tx_status() from a workqueue, which causes "NOHZ: local_softirq_pending 08" messages. To fix it, add ieee80211_tx_status_ni() similar to ieee80211_rx_ni() which can be called from process context, and call it from rt2x00lib_txdone(). For the rt2800pci special case a driver flag is introduced. https://bugzilla.kernel.org/show_bug.cgi?id=24892 Signed-off-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20Merge branch 'v4l_for_linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: [media] gspca - sonixj: Better handling of the bridge registers 0x01 and 0x17 [media] gspca - sonixj: Add the bit definitions of the bridge reg 0x01 and 0x17 [media] gspca - sonixj: Set the flag for some devices [media] gspca - sonixj: Add a flag in the driver_info table [media] gspca - sonixj: Fix a bad probe exchange [media] gspca - sonixj: Move bridge init to sd start [media] bttv: remove unneeded locking comments [media] bttv: fix mutex use before init (BZ#24602) [media] Don't export format_by_forcc on two different drivers
2010-12-20net_sched: always clone skbsChangli Gao1-5/+1
Pawel reported a panic related to handling shared skbs in ixgbe incorrectly. So we need to revert my previous patch to work around this bug. Instead of reverting the patch completely, I just revert the essential lines, so we can add the previous optimization back more easily in future. commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70 Author: Changli Gao <xiaosuo@gmail.com> Date: Sat Oct 16 13:04:08 2010 +0000 net_sched: remove the unused parameter of qdisc_create_dflt() Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-3/+7
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix cciss_revalidate panic block: max hardware sectors limit wrapper block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead blk-throttle: Correct the placement of smp_rmb() blk-throttle: Trim/adjust slice_end once a bio has been dispatched block: check for proper length of iov entries earlier in blk_rq_map_user_iov() drbd: fix for spin_lock_irqsave in endio callback drbd: don't recvmsg with zero length
2010-12-20clarify a usage constraint for cnt32_to_63()Nicolas Pitre1-1/+19
The cnt32_to_63 algorithm relies on proper counter data evaluation ordering to work properly. This was missing from the provided documentation. Let's augment the documentation with the missing usage constraint and fix the only instance that got it wrong. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-20ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed.David Stevens1-0/+10
This patch modifies IPsec6 to fragment IPv6 packets that are locally generated as needed. This version of the patch only fragments in tunnel mode, so that fragment headers will not be obscured by ESP in transport mode. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-19Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure we can map all of lowmem if we need to x86, vt-d: Handle previous faults after enabling fault handling x86: Enable the intr-remap fault handling after local APIC setup x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() bootmem: Add alloc_bootmem_align() x86, gcc-4.6: Use gcc -m options when building vdso x86: HPET: Chose a paranoid safe value for the ETIME check x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix off by one in perf_swevent_init() perf: Fix duplicate events with multiple-pmu vs software events ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO scripts/tags.sh: Add magic for trace-events tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix the irqtime code for 32bit sched: Fix the irqtime code to deal with u64 wraps nohz: Fix get_next_timer_interrupt() vs cpu hotplug Sched: fix skip_clock_update optimization sched: Cure more NO_HZ load average woes
2010-12-18Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-17netlink: fix gcc -Wconversion compilation warningDmitry V. Levin1-1/+1
$ cat << EOF | gcc -Wconversion -xc -S -o/dev/null - unsigned f(void) {return NLMSG_HDRLEN;} EOF <stdin>: In function 'f': <stdin>:3:26: warning: negative integer implicitly converted to unsigned type Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-17resources: add arch hook for preventing allocation in reserved areasBjorn Helgaas1-0/+1
This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17Revert "resources: support allocating space within a region from the top down"Bjorn Helgaas1-1/+0
This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17ceph: mark user pages dirty on direct-io readsHenry C Chang1-2/+4
For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17Merge branch 'pm-fixes' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Fix pm_runtime_suspended() PM / Hibernate: Restore old swap signature to avoid user space breakage PM / Hibernate: Fix PM_POST_* notification with user-space suspend
2010-12-17Merge branch 'bkl_removal' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: [media] uvcvideo: Convert to unlocked_ioctl [media] uvcvideo: Lock stream mutex when accessing format-related information [media] uvcvideo: Move mmap() handler to uvc_queue.c [media] uvcvideo: Move mutex lock/unlock inside uvc_free_buffers [media] uvcvideo: Lock controls mutex when querying menus [media] v4l2-dev: fix race condition [media] V4L: improve the BKL replacement heuristic [media] v4l2-dev: use mutex_lock_interruptible instead of plain mutex_lock [media] cx18: convert to unlocked_ioctl [media] radio-timb: convert to unlocked_ioctl [media] sh_vou: convert to unlocked_ioctl [media] cafe_ccic: replace ioctl by unlocked_ioctl [media] et61x251_core: trivial conversion to unlocked_ioctl [media] sn9c102: convert to unlocked_ioctl [media] BKL: trivial ioctl -> unlocked_ioctl video driver conversions [media] typhoon: convert to unlocked_ioctl [media] si4713: convert to unlocked_ioctl [media] tea5764: convert to unlocked_ioctl [media] cadet: use unlocked_ioctl [media] BKL: trivial BKL removal from V4L2 radio drivers
2010-12-17block: max hardware sectors limit wrapperMike Snitzer1-0/+1
Implement blk_limits_max_hw_sectors() and make blk_queue_max_hw_sectors() a wrapper around it. DM needs this to avoid setting queue_limits' max_hw_sectors and max_sectors directly. dm_set_device_limits() now leverages blk_limits_max_hw_sectors() logic to establish the appropriate max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was incorrectly setting max_sectors rather than max_hw_sectors (which caused dm_merge_bvec()'s max_hw_sectors check to be ineffective). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>