summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2012-01-12Merge git://git.infradead.org/battery-2.6Linus Torvalds7-9/+223
* git://git.infradead.org/battery-2.6: (68 commits) power_supply: Mark da9052 driver as broken power_supply: Drop usage of nowarn variant of sysfs_create_link() s3c_adc_battery: Average over more than one adc sample power_supply: Add DA9052 battery driver isp1704_charger: Fix missing check jz4740-battery: Fix signedness bug power_supply: Assume mains power by default sbs-battery: Fix devicetree match table ARM: rx51: Add bq27200 i2c board info sbs-battery: Change power supply name devicetree-bindings: Propagate bq20z75->sbs rename to dt bindings devicetree-bindings: Add vendor entry for Smart Battery Systems sbs-battery: Rename internals to new name bq20z75: Rename to sbs-battery wm97xx_battery: Use DEFINE_MUTEX() for work_lock max8997_charger: Remove duplicate module.h lp8727_charger: Some minor fixes for the header lp8727_charger: Add header file power_supply: Convert drivers/power/* to use module_platform_driver() power_supply: Add "unknown" in power supply type ...
2012-01-12Merge branch 'linux-next' of ↵Linus Torvalds3-21/+54
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits) x86/PCI: Expand the x86_msi_ops to have a restore MSIs. PCI: Increase resource array mask bit size in pcim_iomap_regions() PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT) PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB x86/PCI: amd: factor out MMCONFIG discovery PCI: Enable ATS at the device state restore PCI: msi: fix imbalanced refcount of msi irq sysfs objects PCI: kconfig: English typo in pci/pcie/Kconfig PCI/PM/Runtime: make PCI traces quieter PCI: remove pci_create_bus() xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus() x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented() x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources sparc/PCI: convert to pci_create_root_bus() sh/PCI: convert to pci_scan_root_bus() for correct root bus resources powerpc/PCI: convert to pci_create_root_bus() powerpc/PCI: split PHB part out of pcibios_map_io_space() ... Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due to the same patches being applied in other branches.
2012-01-11Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds4-1/+214
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: (32 commits) ima: fix invalid memory reference ima: free duplicate measurement memory security: update security_file_mmap() docs selinux: Casting (void *) value returned by kmalloc is useless apparmor: fix module parameter handling Security: tomoyo: add .gitignore file tomoyo: add missing rcu_dereference() apparmor: add missing rcu_dereference() evm: prevent racing during tfm allocation evm: key must be set once during initialization mpi/mpi-mpow: NULL dereference on allocation failure digsig: build dependency fix KEYS: Give key types their own lockdep class for key->sem TPM: fix transmit_cmd error logic TPM: NSC and TIS drivers X86 dependency fix TPM: Export wait_for_stat for other vendor specific drivers TPM: Use vendor specific function for status probe tpm_tis: add delay after aborting command tpm_tis: Check return code from getting timeouts/durations tpm: Introduce function to poll for result of self test ... Fix up trivial conflict in lib/Makefile due to addition of CONFIG_MPI and SIGSIG next to CONFIG_DQL addition.
2012-01-11Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: autofs4: deal with autofs4_write/autofs4_write races autofs4: catatonic_mode vs. notify_daemon race autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race hfsplus: creation of hidden dir on mount can fail block_dev: Suppress bdev_cache_init() kmemleak warninig fix shrink_dcache_parent() livelock coda: switch coda_cnode_make() to sane API as well, clean coda_lookup() coda: deal correctly with allocation failure from coda_cnode_makectl() securityfs: fix object creation races
2012-01-11Merge tag 'for-linux-3.3-merge-window' of ↵Linus Torvalds1-0/+1
git://linux-c6x.org/git/projects/linux-c6x-upstreaming * tag 'for-linux-3.3-merge-window' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: (29 commits) C6X: replace tick_nohz_stop/restart_sched_tick calls C6X: add register_cpu call C6X: deal with memblock API changes C6X: fix timer64 initialization C6X: fix layout of EMIFA registers C6X: MAINTAINERS C6X: DSCR - Device State Configuration Registers C6X: EMIF - External Memory Interface C6X: general SoC support C6X: library code C6X: headers C6X: ptrace support C6X: loadable module support C6X: cache control C6X: clocks C6X: build infrastructure C6X: syscalls C6X: interrupt handling C6X: time management C6X: signal management ...
2012-01-11Merge branch 'writeback-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c writeback: balanced_rate cannot exceed write bandwidth writeback: do strict bdi dirty_exceeded writeback: avoid tiny dirty poll intervals writeback: max, min and target dirty pause time writeback: dirty ratelimit - think time compensation btrfs: fix dirtied pages accounting on sub-page writes writeback: fix dirtied pages accounting on redirty writeback: fix dirtied pages accounting on sub-page writes writeback: charge leaked page dirties to active tasks writeback: Include all dirty inodes in background writeback
2012-01-11Merge branch 'akpm' (aka "Andrew's patch-bomb")Linus Torvalds17-35/+156
Andrew elucidates: - First installmeant of MM. We have a HUGE number of MM patches this time. It's crazy. - MAINTAINERS updates - backlight updates - leds - checkpatch updates - misc ELF stuff - rtc updates - reiserfs - procfs - some misc other bits * akpm: (124 commits) user namespace: make signal.c respect user namespaces workqueue: make alloc_workqueue() take printf fmt and args for name procfs: add hidepid= and gid= mount options procfs: parse mount options procfs: introduce the /proc/<pid>/map_files/ directory procfs: make proc_get_link to use dentry instead of inode signal: add block_sigmask() for adding sigmask to current->blocked sparc: make SA_NOMASK a synonym of SA_NODEFER reiserfs: don't lock root inode searching reiserfs: don't lock journal_init() reiserfs: delay reiserfs lock until journal initialization reiserfs: delete comments referring to the BKL drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030 drivers/rtc/: remove redundant spi driver bus initialization drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static rtc: convert drivers/rtc/* to use module_platform_driver() drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc() drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler ...
2012-01-11workqueue: make alloc_workqueue() take printf fmt and args for nameTejun Heo1-16/+31
alloc_workqueue() currently expects the passed in @name pointer to remain accessible. This is inconvenient and a bit silly given that the whole wq is being dynamically allocated. This patch updates alloc_workqueue() and friends to take printf format string instead of opaque string and matching varargs at the end. The name is allocated together with the wq and formatted. alloc_ordered_workqueue() is converted to a macro to unify varargs handling with alloc_workqueue(), and, while at it, add comment to alloc_workqueue(). None of the current in-kernel users pass in string with '%' as constant name and this change shouldn't cause any problem. [akpm@linux-foundation.org: use __printf] Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11procfs: add hidepid= and gid= mount optionsVasiliy Kulikov1-0/+2
Add support for mount options to restrict access to /proc/PID/ directories. The default backward-compatible "relaxed" behaviour is left untouched. The first mount option is called "hidepid" and its value defines how much info about processes we want to be available for non-owners: hidepid=0 (default) means the old behavior - anybody may read all world-readable /proc/PID/* files. hidepid=1 means users may not access any /proc/<pid>/ directories, but their own. Sensitive files like cmdline, sched*, status are now protected against other users. As permission checking done in proc_pid_permission() and files' permissions are left untouched, programs expecting specific files' modes are not confused. hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other users. It doesn't mean that it hides whether a process exists (it can be learned by other means, e.g. by kill -0 $PID), but it hides process' euid and egid. It compicates intruder's task of gathering info about running processes, whether some daemon runs with elevated privileges, whether another user runs some sensitive program, whether other users run any program at all, etc. gid=XXX defines a group that will be able to gather all processes' info (as in hidepid=0 mode). This group should be used instead of putting nonroot user in sudoers file or something. However, untrusted users (like daemons, etc.) which are not supposed to monitor the tasks in the whole system should not be added to the group. hidepid=1 or higher is designed to restrict access to procfs files, which might reveal some sensitive private information like precise keystrokes timings: http://www.openwall.com/lists/oss-security/2011/11/05/3 hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and conky gracefully handle EPERM/ENOENT and behave as if the current user is the only user running processes. pstree shows the process subtree which contains "pstree" process. Note: the patch doesn't deal with setuid/setgid issues of keeping preopened descriptors of procfs files (like https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked information like the scheduling counters of setuid apps doesn't threaten anybody's privacy - only the user started the setuid program may read the counters. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Theodore Tso <tytso@MIT.EDU> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: James Morris <jmorris@namei.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11procfs: introduce the /proc/<pid>/map_files/ directoryPavel Emelyanov1-0/+12
This one behaves similarly to the /proc/<pid>/fd/ one - it contains symlinks one for each mapping with file, the name of a symlink is "vma->vm_start-vma->vm_end", the target is the file. Opening a symlink results in a file that point exactly to the same inode as them vma's one. For example the ls -l of some arbitrary /proc/<pid>/map_files/ | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so This *helps* checkpointing process in three ways: 1. When dumping a task mappings we do know exact file that is mapped by particular region. We do this by opening /proc/$pid/map_files/$address symlink the way we do with file descriptors. 2. This also helps in determining which anonymous shared mappings are shared with each other by comparing the inodes of them. 3. When restoring a set of processes in case two of them has a mapping shared, we map the memory by the 1st one and then open its /proc/$pid/map_files/$address file and map it by the 2nd task. Using /proc/$pid/maps for this is quite inconvenient since it brings repeatable re-reading and reparsing for this text file which slows down restore procedure significantly. Also as being pointed in (3) it is a way easier to use top level shared mapping in children as /proc/$pid/map_files/$address when needed. [akpm@linux-foundation.org: coding-style fixes] [gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE] Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Vasiliy Kulikov <segoon@openwall.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Tejun Heo <tj@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11procfs: make proc_get_link to use dentry instead of inodeCyrill Gorcunov1-1/+1
Prepare the ground for the next "map_files" patch which needs a name of a link file to analyse. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11signal: add block_sigmask() for adding sigmask to current->blockedMatt Fleming1-0/+1
Abstract the code sequence for adding a signal handler's sa_mask to current->blocked because the sequence is identical for all architectures. Furthermore, in the past some architectures actually got this code wrong, so introduce a wrapper that all architectures can use. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11leds: add driver for TCA6507 LED controllerNeilBrown1-0/+34
TI's TCA6507 is the LED driver in the GTA04 Openmoko motherboard. The driver provides full support for brightness levels and hardware blinking. This driver can drive each of 7 outputs as an LED or a GPIO output, and provides hardware-assist blinking. [akpm@linux-foundation.org: fix __mod_i2c_device_table alias] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm/mempolicy.c: mpol_equal(): use boolKOSAKI Motohiro1-5/+5
mpol_equal() logically returns a boolean. Use a bool type to slightly improve readability. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Stephen Wilson <wilsons@start.ca> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mremap: enforce rmap src/dst vma ordering in case of vma_merge() succeeding ↵Andrea Arcangeli1-0/+1
in copy_vma() migrate was doing an rmap_walk with speculative lock-less access on pagetables. That could lead it to not serializing properly against mremap PT locks. But a second problem remains in the order of vmas in the same_anon_vma list used by the rmap_walk. If vma_merge succeeds in copy_vma, the src vma could be placed after the dst vma in the same_anon_vma list. That could still lead to migrate missing some pte. This patch adds an anon_vma_moveto_tail() function to force the dst vma at the end of the list before mremap starts to solve the problem. If the mremap is very large and there are a lots of parents or childs sharing the anon_vma root lock, this should still scale better than taking the anon_vma root lock around every pte copy practically for the whole duration of mremap. Update: Hugh noticed special care is needed in the error path where move_page_tables goes in the reverse direction, a second anon_vma_moveto_tail() call is needed in the error path. This program exercises the anon_vma_moveto_tail: === int main() { static struct timeval oldstamp, newstamp; long diffsec; char *p, *p2, *p3, *p4; if (posix_memalign((void **)&p, 2*1024*1024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void **)&p2, 2*1024*1024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void **)&p3, 2*1024*1024, SIZE)) perror("memalign"), exit(1); memset(p, 0xff, SIZE); printf("%p\n", p); memset(p2, 0xff, SIZE); memset(p3, 0x77, 4096); if (memcmp(p, p2, SIZE)) printf("error\n"); p4 = mremap(p+SIZE/2, SIZE/2, SIZE/2, MREMAP_FIXED|MREMAP_MAYMOVE, p3); if (p4 != p3) perror("mremap"), exit(1); p4 = mremap(p4, SIZE/2, SIZE/2, MREMAP_FIXED|MREMAP_MAYMOVE, p+SIZE/2); if (p4 != p+SIZE/2) perror("mremap"), exit(1); if (memcmp(p, p2, SIZE)) printf("error\n"); printf("ok\n"); return 0; } === $ perf probe -a anon_vma_moveto_tail Add new event: probe:anon_vma_moveto_tail (on anon_vma_moveto_tail) You can now use it on all perf tools, such as: perf record -e probe:anon_vma_moveto_tail -aR sleep 1 $ perf record -e probe:anon_vma_moveto_tail -aR ./anon_vma_moveto_tail 0x7f2ca2800000 ok [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.043 MB perf.data (~1860 samples) ] $ perf report --stdio 100.00% anon_vma_moveto [kernel.kallsyms] [k] anon_vma_moveto_tail Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Nai Xia <nai.xia@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Pawel Sikora <pluto@agmk.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: try to distribute dirty pages fairly across zonesJohannes Weiner2-1/+4
The maximum number of dirty pages that exist in the system at any time is determined by a number of pages considered dirtyable and a user-configured percentage of those, or an absolute number in bytes. This number of dirtyable pages is the sum of memory provided by all the zones in the system minus their lowmem reserves and high watermarks, so that the system can retain a healthy number of free pages without having to reclaim dirty pages. But there is a flaw in that we have a zoned page allocator which does not care about the global state but rather the state of individual memory zones. And right now there is nothing that prevents one zone from filling up with dirty pages while other zones are spared, which frequently leads to situations where kswapd, in order to restore the watermark of free pages, does indeed have to write pages from that zone's LRU list. This can interfere so badly with IO from the flusher threads that major filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim already, taking away the VM's only possibility to keep such a zone balanced, aside from hoping the flushers will soon clean pages from that zone. Enter per-zone dirty limits. They are to a zone's dirtyable memory what the global limit is to the global amount of dirtyable memory, and try to make sure that no single zone receives more than its fair share of the globally allowed dirty pages in the first place. As the number of pages considered dirtyable excludes the zones' lowmem reserves and high watermarks, the maximum number of dirty pages in a zone is such that the zone can always be balanced without requiring page cleaning. As this is a placement decision in the page allocator and pages are dirtied only after the allocation, this patch allows allocators to pass __GFP_WRITE when they know in advance that the page will be written to and become dirty soon. The page allocator will then attempt to allocate from the first zone of the zonelist - which on NUMA is determined by the task's NUMA memory policy - that has not exceeded its dirty limit. At first glance, it would appear that the diversion to lower zones can increase pressure on them, but this is not the case. With a full high zone, allocations will be diverted to lower zones eventually, so it is more of a shift in timing of the lower zone allocations. Workloads that previously could fit their dirty pages completely in the higher zone may be forced to allocate from lower zones, but the amount of pages that "spill over" are limited themselves by the lower zones' dirty constraints, and thus unlikely to become a problem. For now, the problem of unfair dirty page distribution remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, wake up the flusher threads and remedy the situation. Because of this, an allocation that could not succeed on any of the considered zones is allowed to ignore the dirty limits before going into direct reclaim or even failing the allocation, until a future patch changes the global dirty throttling and flusher thread activation so that they take individual zone states into account. Test results 15M DMA + 3246M DMA32 + 504 Normal = 3765M memory 40% dirty ratio 16G USB thumb drive 10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 << 15)) seconds nr_vmscan_write (stddev) min| median| max xfs vanilla: 549.747( 3.492) 0.000| 0.000| 0.000 patched: 550.996( 3.802) 0.000| 0.000| 0.000 fuse-ntfs vanilla: 1183.094(53.178) 54349.000| 59341.000| 65163.000 patched: 558.049(17.914) 0.000| 0.000| 43.000 btrfs vanilla: 573.679(14.015) 156657.000| 460178.000| 606926.000 patched: 563.365(11.368) 0.000| 0.000| 1362.000 ext4 vanilla: 561.197(15.782) 0.000|2725438.000|4143837.000 patched: 568.806(17.496) 0.000| 0.000| 0.000 Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Tested-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: exclude reserved pages from dirtyable memoryJohannes Weiner2-0/+7
Per-zone dirty limits try to distribute page cache pages allocated for writing across zones in proportion to the individual zone sizes, to reduce the likelihood of reclaim having to write back individual pages from the LRU lists in order to make progress. This patch: The amount of dirtyable pages should not include the full number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: +----------+ --- | anon | | +----------+ | | | | | | -- dirty limit new -- flusher new | file | | | | | | | | | -- dirty limit old -- flusher old | | | +----------+ --- reclaim | reserved | +----------+ | kernel | +----------+ This patch introduces a per-zone dirty reserve that takes both the lowmem reserve as well as the high watermark of the zone into account, and a global sum of those per-zone values that is subtracted from the global amount of dirtyable pages. The lowmem reserve is unavailable to page cache allocations and kswapd tries to keep the high watermark free. We don't want to end up in a situation where reclaim has to clean pages in order to balance zones. Not treating reserved pages as dirtyable on a global level is only a conceptual fix. In reality, dirty pages are not distributed equally across zones and reclaim runs into dirty pages on a regular basis. But it is important to get this right before tackling the problem on a per-zone level, where the distance between reclaim and the dirty pages is mostly much smaller in absolute numbers. [akpm@linux-foundation.org: fix highmem build] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm, debug: test for online nid when allocating on single nodeDavid Rientjes1-1/+1
Calling alloc_pages_exact_node() means the allocation only passes the zonelist of a single node into the page allocator. If that node isn't online, it's zonelist may never have been initialized causing a strange oops that may not immediately be clear. I recently debugged an issue where node 0 wasn't online and an allocator was passing 0 to alloc_pages_exact_node() and it resulted in a NULL pointer on zonelist->_zoneref. If CONFIG_DEBUG_VM is enabled, though, it would be nice to catch this a bit earlier. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: more intensive memory corruption debuggingStanislaw Gruszka2-1/+20
With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception on access (read,write) to an unallocated page, which permits us to catch code which corrupts memory. However the kernel is trying to maximise memory usage, hence there are usually few free pages in the system and buggy code usually corrupts some crucial data. This patch changes the buddy allocator to keep more free/protected pages and to interlace free/protected and allocated pages to increase the probability of catching corruption. When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC, debug_guardpage_minorder defines the minimum order used by the page allocator to grant a request. The requested size will be returned with the remaining pages used as guard pages. The default value of debug_guardpage_minorder is zero: no change from current behaviour. [akpm@linux-foundation.org: tweak documentation, s/flg/flag/] Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11kernel.h: add BUILD_BUG() macroDavid Daney3-1/+20
We can place this in definitions that we expect the compiler to remove by dead code elimination. If this assertion fails, we get a nice error message at build time. The GCC function attribute error("message") was added in version 4.3, so we define a new macro __linktime_error(message) to expand to this for GCC-4.3 and later. This will give us an error diagnostic from the compiler on the line that fails. For other compilers __linktime_error(message) expands to nothing, and we have to be content with a link time error, but at least we will still get a build error. BUILD_BUG() expands to the undefined function __build_bug_failed() and will fail at link time if the compiler ever emits code for it. On GCC-4.3 and later, attribute((error())) is used so that the failure will be noted at compile time instead. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: David Rientjes <rientjes@google.com> Cc: DM <dm.n9107@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: avoid livelock on !__GFP_FS allocationsMel Gorman1-0/+16
Colin Cross reported; Under the following conditions, __alloc_pages_slowpath can loop forever: gfp_mask & __GFP_WAIT is true gfp_mask & __GFP_FS is false reclaim and compaction make no progress order <= PAGE_ALLOC_COSTLY_ORDER These conditions happen very often during suspend and resume, when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL allocations into __GFP_WAIT. The oom killer is not run because gfp_mask & __GFP_FS is false, but should_alloc_retry will always return true when order is less than PAGE_ALLOC_COSTLY_ORDER. In his fix, he avoided retrying the allocation if reclaim made no progress and __GFP_FS was not set. The problem is that this would result in GFP_NOIO allocations failing that previously succeeded which would be very unfortunate. The big difference between GFP_NOIO and suspend converting GFP_KERNEL to behave like GFP_NOIO is that normally flushers will be cleaning pages and kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay. The same does not necessarily apply during suspend as the storage device may be suspended. This patch special cases the suspend case to fail the page allocation if reclaim cannot make progress and adds some documentation on how gfp_allowed_mask is currently used. Failing allocations like this may cause suspend to abort but that is better than a livelock. [mgorman@suse.de: Rework fix to be suspend specific] [rientjes@google.com: Move suspended device check to should_alloc_retry] Reported-by: Colin Cross <ccross@android.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: remove unused pagevec_freeKonstantin Khlebnikov1-7/+0
It not exported and now nobody uses it. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm: add free_hot_cold_page_list() helperKonstantin Khlebnikov1-0/+1
This patch adds helper free_hot_cold_page_list() to free list of 0-order pages. It frees pages directly from list without temporary page-vector. It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour. bloat-o-meter: add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28) function old new delta free_hot_cold_page_list - 264 +264 get_page_from_freelist 2129 2132 +3 __pagevec_free 243 239 -4 split_free_page 380 373 -7 release_pages 606 510 -96 free_page_list 188 - -188 Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11mm/page-writeback.c: make determine_dirtyable_memory static againJohannes Weiner1-2/+0
The tracing ring-buffer used this function briefly, but not anymore. Make it local to the writeback code again. Also, move the function so that no forward declaration needs to be reintroduced. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Ext4 commits for 3.3 merge window * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: fix undefined behavior in ext4_fill_flex_info() ext4: make more symbols static ext4: make local symbol ext4_initxattrs static jbd2: fix hung processes in jbd2_journal_lock_updates() ext4: reserve new feature flag codepoints ext4: Report max_batch_time option correctly ext4: add missing ext4_resize_end on error paths ext4: let ext4_group_add() use common code ext4: let ext4_group_extend() use common code ext4: add new online resize interface ext4: add a new function which adds a flex group to a fs ext4: add a new function which allocates bitmaps and inode tables ext4: pass verify_reserved_gdb() the number of group decriptors ext4: add a function which updates the super block during online resizing ext4: add a function which sets up a block group descriptors of a flex bg ext4: add a function which sets up group blocks of a flex bg ext4: add a structure which will be used by 64bit-resize interface ext4: add a function which adds a new group descriptors to a fs ext4: add a function which extends a group without checking parameters ext4: use proper little-endian bitops ...
2012-01-11Merge branch 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds6-7/+31
* 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Change the default setting of the nfs4_disable_idmapping parameter NFSv4: Save the owner/group name string when doing open NFS: Remove pNFS bloat from the generic write path pnfs-obj: Must return layout on IO error pnfs-obj: pNFS errors are communicated on iodata->pnfs_error NFS: Cache state owners after files are closed NFS: Clean up nfs4_find_state_owners_locked() NFSv4: include bitmap in nfsv4 get acl data nfs: fix a minor do_div portability issue NFSv4.1: cleanup comment and debug printk NFSv4.1: change nfs4_free_slot parameters for dynamic slots NFSv4.1: cleanup init and reset of session slot tables NFSv4.1: fix backchannel slotid off-by-one bug nfs: fix regression in handling of context= option in NFSv4 NFS - fix recent breakage to NFS error handling. NFS: Retry mounting NFSROOT SUNRPC: Clean up the RPCSEC_GSS service ticket requests
2012-01-11Merge branch 'for-linus' of ↵Linus Torvalds1-5/+66
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add recovery callbacks dlm: add node slots and generation dlm: move recovery barrier calls dlm: convert rsb list to rb_tree
2012-01-11Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6Linus Torvalds6-135/+296
MTD pull for 3.3 * tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits) mtd: Fix dependency for MTD_DOC200x mtd: do not use mtd->block_markbad directly logfs: do not use 'mtd->block_isbad' directly mtd: introduce mtd_can_have_bb helper mtd: do not use mtd->suspend and mtd->resume directly mtd: do not use mtd->lock, unlock and is_locked directly mtd: do not use mtd->sync directly mtd: harmonize mtd_writev usage mtd: do not use mtd->lock_user_prot_reg directly mtd: mtd->write_user_prot_reg directly mtd: do not use mtd->read_*_prot_reg directly mtd: do not use mtd->get_*_prot_info directly mtd: do not use mtd->read_oob directly mtd: mtdoops: do not use mtd->panic_write directly romfs: do not use mtd->get_unmapped_area directly mtd: do not use mtd->get_unmapped_area directly mtd: do use mtd->point directly mtd: introduce mtd_has_oob helper mtd: mtdcore: export symbols cleanup mtd: clean-up the default_mtd_writev function ... Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
2012-01-10Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds7-25/+192
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits) iommu/amd: Set IOTLB invalidation timeout iommu/amd: Init stats for iommu=pt iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume iommu/amd: Add invalidate-context call-back iommu/amd: Add amd_iommu_device_info() function iommu/amd: Adapt IOMMU driver to PCI register name changes iommu/amd: Add invalid_ppr callback iommu/amd: Implement notifiers for IOMMUv2 iommu/amd: Implement IO page-fault handler iommu/amd: Add routines to bind/unbind a pasid iommu/amd: Implement device aquisition code for IOMMUv2 iommu/amd: Add driver stub for AMD IOMMUv2 support iommu/amd: Add stat counter for IOMMUv2 events iommu/amd: Add device errata handling iommu/amd: Add function to get IOMMUv2 domain for pdev iommu/amd: Implement function to send PPR completions iommu/amd: Implement functions to manage GCR3 table iommu/amd: Implement IOMMUv2 TLB flushing routines iommu/amd: Add support for IOMMUv2 domain mode iommu/amd: Add amd_iommu_domain_direct_map function ...
2012-01-10Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-1/+1
* 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits) drm/nouveau/pm: fix build with HWMON off gma500: silence gcc warnings in mid_get_vbt_data() drm/ttm: fix condition (and vs or) drm/radeon: double lock typo in radeon_vm_bo_rmv() drm/radeon: use after free in radeon_vm_bo_add() drm/sis|via: don't return stack garbage from free_mem ioctl drm/radeon/kms: remove pointless CS flags priority struct drm/radeon/kms: check if vm is supported in VA ioctl drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2) radeon: Fix disabling PCI bus mastering on big endian hosts. ttm: fix agp since ttm tt rework agp: Fix multi-line warning message whitespace drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages. drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool. drm/radeon/kms: sync across multiple rings when doing bo moves v3 drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2) drm/radeon: GPU virtual memory support v22 drm: make DRM_UNLOCKED ioctls with their own mutex drm: no need to hold global mutex for static data drm/radeon/benchmark: common modes sweep ignores 640x480@32 ... Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c
2012-01-10Merge branch 'for-linus' of ↵Linus Torvalds7-4/+250
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (64 commits) Input: tc3589x-keypad - add missing kerneldoc Input: ucb1400-ts - switch to using dev_xxx() for diagnostic messages Input: ucb1400_ts - convert to threaded IRQ Input: ucb1400_ts - drop inline annotations Input: usb1400_ts - add __devinit/__devexit section annotations Input: ucb1400_ts - set driver owner Input: ucb1400_ts - convert to use dev_pm_ops Input: psmouse - make sure we do not use stale methods Input: evdev - do not block waiting for an event if fd is nonblock Input: evdev - if no events and non-block, return EAGAIN not 0 Input: evdev - only allow reading events if a full packet is present Input: add driver for pixcir i2c touchscreens Input: samsung-keypad - implement runtime power management support Input: tegra-kbc - report wakeup key for some platforms Input: tegra-kbc - add device tree bindings Input: add driver for AUO In-Cell touchscreens using pixcir ICs Input: mpu3050 - configure the sampling method Input: mpu3050 - ensure we enable interrupts Input: mpu3050 - add of_match table for device-tree probing Input: sentelic - document the latest hardware ... Fix up fairly trivial conflicts (device tree matching conflicting with some independent cleanups) in drivers/input/keyboard/samsung-keypad.c
2012-01-10Merge branch 'for-linus' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (68 commits) hid-input/battery: add FEATURE quirk hid-input/battery: remove battery_val hid-input/battery: power-supply type really *is* a battery hid-input/battery: make the battery setup common for INPUTs and FEATUREs hid-input/battery: deal with both FEATURE and INPUT report batteries hid-input/battery: add quirks for battery hid-input/battery: remove apparently redundant kmalloc hid-input: add support for HID devices reporting Battery Strength HID: hid-multitouch: add support 9 new Xiroku devices HID: multitouch: add support for 3M 32" HID: multitouch: add support of Atmel multitouch panels HID: usbhid: defer LED setting to a workqueue HID: usbhid: hid-core: submit queued urbs before suspend HID: usbhid: remove LED_ON HID: emsff: use symbolic name instead of hardcoded PID constant HID: Enable HID_QUIRK_MULTI_INPUT for Trio Linker Plus II HID: Kconfig: fix syntax HID: introduce proper dependency of HID_BATTERY on POWER_SUPPLY HID: multitouch: support PixArt optical touch screen HID: make parser more verbose about parsing errors by default ... Fix up rename/delete conflict in drivers/hid/hid-hyperv.c (removed in staging, moved in this branch) and similarly for the rules for same file in drivers/staging/hv/{Kconfig,Makefile}.
2012-01-10Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds1-5/+16
* git://www.linux-watchdog.org/linux-watchdog: watchdog: omap_wdt.c: fix the WDIOC_GETBOOTSTATUS ioctl if not implemented. watchdog: new driver for VIA chipsets watchdog: ath79_wdt: flush register writes drivers/watchdog/lantiq_wdt.c: drop iounmap for devm_ allocated data watchdog: documentation: describe nowayout in coversion-guide watchdog: documentation: update index file watchdog: Convert wm831x driver to devm_kzalloc() watchdog: add nowayout helpers to Watchdog Timer Driver Kernel API watchdog: convert drivers/watchdog/* to use module_platform_driver() watchdog: Use DEFINE_SPINLOCK() for static spinlocks watchdog: Convert Wolfson drivers to module_platform_driver
2012-01-10Merge branch 'for-linus' of ↵Linus Torvalds5-2/+68
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (40 commits) regulator: set constraints.apply_uV to 0 in of_get_fixed_voltage_config regulator: max8925: fix enabled/disabled judgement mistake regulator: add regulator_bulk_force_disable function regulator: pass regulator_register of_node in fixed voltage driver regulator: add regulator_force_disable() definition for !CONFIG_REGULATOR regulator: Enable supply regulator if child rail is enabled. regulator: mc13892: Convert to devm_kzalloc() regulator: mc13783: Convert to devm_kzalloc() regulator: Fix checking return value of create_regulator regulator: Fix the error handling if create_regulator fails regulator: Export regulator_is_supported_voltage() regulator: mc13892: add device tree probe support regulator: mc13892: remove the unnecessary prefix from regulator name regulator: Convert wm831x regulator drivers to devm_kzalloc() regulator: da9052: Staticize non-exported symbols regulator: Replace kzalloc with devm_kzalloc and if-else with a switch-case for da9052-regulator regulator: Update da9052-regulator for DT changes regulator: DA9052/53 Regulator support regulator: pass device_node to of_get_regulator_init_data() regulator: If a single voltage is set with device tree then set apply_uV ...
2012-01-10Merge branch 'for-next' of ↵Linus Torvalds4-28/+143
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (31 commits) pinctrl: remove unnecessary max pin number pinctrl: correct a offset while enumerating pins pinctrl: some typo fixes pinctrl: rename U300 and SIRF pin controllers pinctrl: pass name instead of device to pin_config_* pinctrl: add "struct seq_file;" to pinconf.h pinctrl: conjure names for unnamed pins pinctrl: add a group-specific hog macro pinctrl: don't create a device for each pin controller arm/u300: don't use PINMUX_MAP_PRIMARY* pinctrl: implement PINMUX_MAP_SYS_HOG pinctrl: add a pin config interface pinctrl/coh901: driver to request its pins pinctrl: u300-pinmux: register proper GPIO ranges pinctrl: move the U300 GPIO driver to pinctrl ARM: u300: localize GPIO assignments pinctrl: make it possible to add multiple maps pinctrl: make a copy of pinmux map pinctrl: GPIO direction support for muxing pinctrl: print pin range in GPIO range debugs ...
2012-01-10Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-devLinus Torvalds1-0/+2
* 'upstream-linus' of git://github.com/jgarzik/libata-dev: ahci: support the STA2X11 I/O Hub pata_bf54x: fix BMIDE status register emulation ata: add ata port hibernate callbacks ata: update ata port's runtime status during system resume [SCSI] runtime resume parent for child's system-resume ahci: platform support for suspend/resume libata-core: kill duplicate statement in ata_do_set_mode() pata_of_platform: remove direct dependency on OF_IRQ SATA/PATA: convert drivers/ata/* to use module_platform_driver() pata_cs5536: forward port changes from cs5536 libata-sff: use ATAPI_{COD|IO} ata: add ata port runtime PM callbacks ata: add ata port system PM callbacks [SCSI] sd: check runtime PM status in sd_shutdown [SCSI] check runtime PM status in system PM [SCSI] add flag to skip the runtime PM calls on the host ata: make ata port as parent device of scsi host ahci: start engine only during soft/hard resets
2012-01-10fix shrink_dcache_parent() livelockMiklos Szeredi1-0/+1
Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may cause shrink_dcache_parent() to loop forever. Here's what appears to happen: 1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1 2 - CPU1: select_parent(P) locks P->d_lock 3 - CPU0: shrink_dentry_list() locks C->d_lock dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock 4 - CPU1: select_parent(P) locks C->d_lock, moves C from dispose list being processed on CPU0 to the new dispose list, returns 1 5 - CPU0: shrink_dentry_list() finds dispose list empty, returns 6 - Goto 2 with CPU0 and CPU1 switched Basically select_parent() steals the dentry from shrink_dentry_list() and thinks it found a new one, causing shrink_dentry_list() to think it's making progress and loop over and over. One way to trigger this is to make udev calls stat() on the sysfs file while it is going away. Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick: ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true" Then execute the following loop: while true; do echo -bond0 > /sys/class/net/bonding_masters echo +bond0 > /sys/class/net/bonding_masters echo -bond1 > /sys/class/net/bonding_masters echo +bond1 > /sys/class/net/bonding_masters done One fix would be to check all callers and prevent concurrent calls to shrink_dcache_parent(). But I think a better solution is to stop the stealing behavior. This patch adds a new dentry flag that is set when the dentry is added to the dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a new reference just before being pruned. If the dentry has this flag, select_parent() will skip it and let shrink_dentry_list() retry pruning it. With select_parent() skipping those dentries there will not be the appearance of progress (new dentries found) when there is none, hence shrink_dcache_parent() will not loop forever. Set the flag is also set in prune_dcache_sb() for consistency as suggested by Linus. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-5/+35
* 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits) KVM: PPC: Whitespace fix for kvm.h KVM: Fix whitespace in kvm_para.h KVM: PPC: annotate kvm_rma_init as __init KVM: x86 emulator: implement RDPMC (0F 33) KVM: x86 emulator: fix RDPMC privilege check KVM: Expose the architectural performance monitoring CPUID leaf KVM: VMX: Intercept RDPMC KVM: SVM: Intercept RDPMC KVM: Add generic RDPMC support KVM: Expose a version 2 architectural PMU to a guests KVM: Expose kvm_lapic_local_deliver() KVM: x86 emulator: Use opcode::execute for Group 9 instruction KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions KVM: x86 emulator: Use opcode::execute for Group 1A instruction KVM: ensure that debugfs entries have been created KVM: drop bsp_vcpu pointer from kvm struct KVM: x86: Consolidate PIT legacy test KVM: x86: Do not rely on implicit inclusions KVM: Make KVM_INTEL depend on CPU_SUP_INTEL KVM: Use memdup_user instead of kmalloc/copy_from_user ...
2012-01-10Merge git://git.infradead.org/users/cbou/battery-urgentAnton Vorontsov22-41/+46
2012-01-10Merge branch 'for_linus' into for_linus_mergedTheodore Ts'o1-0/+1
Conflicts: fs/ext4/ioctl.c
2012-01-10Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: new helper - d_make_root() dcache: use a dispose list in select_parent ceph: d_alloc_root() may fail ext4: fix failure exits isofs: inode leak on mount failure
2012-01-10vfs: new helper - d_make_root()Al Viro1-0/+1
d_alloc_root() with iput() in case of allocation failure... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-2/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: igmp: Avoid zero delay when receiving odd mixture of IGMP queries netdev: make net_device_ops const bcm63xx: make ethtool_ops const usbnet: make ethtool_ops const net: Fix build with INET disabled. net: introduce netif_addr_lock_nested() and call if when appropriate net: correct lock name in dev_[uc/mc]_sync documentations. net: sk_update_clone is only used in net/core/sock.c 8139cp: fix missing napi_gro_flush. pktgen: set correct max and min in pktgen_setup_inject() smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h asix: fix infinite loop in rx_fixup() net: Default UDP and UNIX diag to 'n'. r6040: fix typo in use of MCR0 register bits net: fix sock_clone reference mismatch with tcp memcontrol
2012-01-10Merge tag 'clk' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds1-0/+22
clock management changes for i.MX Another simple series related to clock management, this time only for imx. * tag 'clk' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: mxs: select HAVE_CLK_PREPARE for clock clk: add config option HAVE_CLK_PREPARE into Kconfig ASoC: mxs-saif: convert to clk_prepare/clk_unprepare video: mxsfb: convert to clk_prepare/clk_unprepare serial: mxs-auart: convert to clk_prepare/clk_unprepare net: flexcan: convert to clk_prepare/clk_unprepare mtd: gpmi-lib: convert to clk_prepare/clk_unprepare mmc: mxs-mmc: convert to clk_prepare/clk_unprepare dma: mxs-dma: convert to clk_prepare/clk_unprepare net: fec: add clk_prepare/clk_unprepare ARM: mxs: convert platform code to clk_prepare/clk_unprepare clk: add helper functions clk_prepare_enable and clk_disable_unprepare Fix up trivial conflicts in drivers/net/ethernet/freescale/fec.c due to commit 0ebafefcaa7a ("net: fec: add clk_prepare/clk_unprepare") clashing trivially with commit e163cc97f9ac ("net/fec: fix the .remove code").
2012-01-10Merge tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds1-0/+16
Driver specific changes Again, a lot of platforms have changes in here: pxa, samsung, omap, at91, imx, ... * tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits) ARM: sa1100: clean up of the clock support ARM: pxa: add dummy clock for sa1100-rtc RTC: sa1100: support sa1100, pxa and mmp soc families RTC: sa1100: remove redundant code of setting alarm RTC: sa1100: Clean out ost register Input: zylonite-wm97xx - replace IRQ_GPIO() with gpio_to_irq() pcmcia: pxa: replace IRQ_GPIO() with gpio_to_irq() ARM: EXYNOS: Modified files for SPI consolidation work ARM: S5P64X0: Enable SDHCI support ARM: S5P64X0: Add lookup of sdhci-s3c clocks using generic names ARM: S5P64X0: Add HSMMC setup for host Controller ARM: EXYNOS: Add USB OHCI support to ORIGEN board USB: Add Samsung Exynos OHCI diver ARM: EXYNOS: Add USB OHCI support to SMDKV310 board ARM: EXYNOS: Add USB OHCI device net: macb: fix build break with !CONFIG_OF i2c: tegra: Support DVC controller in device tree i2c: tegra: Add __devinit/exit to probe/remove net/at91_ether: use gpio_is_valid for phy IRQ line ARM: at91/net: add macb ethernet controller in 9g45/9g20 DT ...
2012-01-10Merge tag 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds2-4/+12
New feature development This adds support for new features, and contains stuff from most platforms. A number of these patches could have fit into other branches, too, but were small enough not to cause too much confusion here. * tag 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits) mfd/db8500-prcmu: remove support for early silicon revisions ARM: ux500: fix the smp_twd clock calculation ARM: ux500: remove support for early silicon revisions ARM: ux500: update register files ARM: ux500: register DB5500 PMU dynamically ARM: ux500: update ASIC detection for U5500 ARM: ux500: support DB8520 ARM: picoxcell: implement watchdog restart ARM: OMAP3+: hwmod data: Add the default clockactivity for I2C ARM: OMAP3: hwmod data: disable multiblock reads on MMC1/2 on OMAP34xx/35xx <= ES2.1 ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP4 ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP3 ARM: OMAP: hwmod data: Add support for AM35xx UART4/ttyO3 ARM: Orion: Remove address map info from all platform data structures ARM: Orion: Get address map from plat-orion instead of via platform_data ARM: Orion: mbus_dram_info consolidation ARM: Orion: Consolidate the address map setup ARM: Kirkwood: Add configuration for MPP12 as GPIO ARM: Kirkwood: Recognize A1 revision of 6282 chip ARM: ux500: update the MOP500 GPIO assignments ...
2012-01-10Merge tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds1-10/+5
Device tree conversions for samsung and tegra Both platforms had some initial device tree support, but this adds much more to actually make it usable. * tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (45 commits) ARM: dts: Add intial dts file for EXYNOS4210 SoC, SMDKV310 and ORIGEN ARM: EXYNOS: Add Exynos4 device tree enabled board file rtc: rtc-s3c: Add device tree support input: samsung-keypad: Add device tree support ARM: S5PV210: Modify platform data for pl330 driver ARM: S5PC100: Modify platform data for pl330 driver ARM: S5P64x0: Modify platform data for pl330 driver ARM: EXYNOS: Add a alias for pdma clocks ARM: EXYNOS: Limit usage of pl330 device instance to non-dt build ARM: SAMSUNG: Add device tree support for pl330 dma engine wrappers DMA: PL330: Add device tree support ARM: EXYNOS: Modify platform data for pl330 driver DMA: PL330: Infer transfer direction from transfer request instead of platform data DMA: PL330: move filter function into driver serial: samsung: Fix build for non-Exynos4210 devices serial: samsung: add device tree support serial: samsung: merge probe() function from all SoC specific extensions serial: samsung: merge all SoC specific port reset functions ARM: SAMSUNG: register uart clocks to clock lookup list serial: samsung: remove all uses of get_clksrc and set_clksrc ... Fix up fairly trivial conflicts in arch/arm/mach-s3c2440/clock.c and drivers/tty/serial/Kconfig both due to just adding code close to changes.
2012-01-10Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds1-0/+17
Cleanups on various subarchitectures Cleanup patches for various ARM platforms and some of their associated drivers, the bulk of these is for mach-91. Arnd ended up pulling in the restart branch from Russell in order to fix up some simple but annoying merge conflicts. * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits) arm/at91: fix build of stamp9g20 ARM: u300: delete memory.h MAINTAINERS: add maintainer entry for Picochip picoxcell ARM: picoxcell: move io mappings to common.c ARM: picoxcell: don't reserve irq_descs ARM: picoxcell: remove mach/memory.h ARM: at91: delete the pcontrol_g20_defconfig arm/tegra: Remove code that's ifndef CONFIG_ARM_GIC arm/tegra: remove unused defines arm/tegra: fix variable formatting in makefile ARM: davinci: vpif: move code to driver core header from platform ARM: at91/gpio: fix display of number of irq setuped ARM: at91/gpio: drop PIN_BASE ARM: at91/udc: use gpio_is_valid to check the gpio ARM: at91/ohci: use gpio_is_valid to check the gpio ARM: at91/nand: use gpio_is_valid to check the gpio ARM: at91/mmc: use gpio_is_valid to check the gpio ARM: at91/ide: use gpio_is_valid to check the gpio ARM: at91/pata: use gpio_is_valid to check the gpio ARM: at91/soc: use gpio_is_valid to check the gpio ...
2012-01-10tracing/mm: Move include of trace/events/kmem.h out of header into slab.cSteven Rostedt1-2/+0
Including trace/events/*.h TRACE_EVENT() macro headers in other headers can cause strange side effects if another trace/event/*.h header includes that header. Having trace/events/kmem.h inside slab_def.h caused a compile error in sparc64 when changes were done to some header files. Moving the kmem.h trace header out of slab.h and into slab.c fixes the problem. Note, both slub.c and slob.c already include the trace/events/kmem.h file. Only slab.c had it missing. Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10net: Fix build with INET disabled.David S. Miller1-2/+0
> net/core/sock.c: In function 'sk_update_clone': > net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>