From c5f88bd29ab42d5d1e77085b5f69d5c6da20324e Mon Sep 17 00:00:00 2001 From: Vegard Nossum Date: Tue, 2 Aug 2016 14:02:22 -0700 Subject: mm: fail prefaulting if page table allocation fails I ran into this: BUG: sleeping function called from invalid context at mm/page_alloc.c:3784 in_atomic(): 0, irqs_disabled(): 0, pid: 1434, name: trinity-c1 2 locks held by trinity-c1/1434: #0: (&mm->mmap_sem){......}, at: [] __do_page_fault+0x1ce/0x8f0 #1: (rcu_read_lock){......}, at: [] filemap_map_pages+0xd6/0xdd0 CPU: 0 PID: 1434 Comm: trinity-c1 Not tainted 4.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x65/0x84 panic+0x185/0x2dd ___might_sleep+0x51c/0x600 __might_sleep+0x90/0x1a0 __alloc_pages_nodemask+0x5b1/0x2160 alloc_pages_current+0xcc/0x370 pte_alloc_one+0x12/0x90 __pte_alloc+0x1d/0x200 alloc_set_pte+0xe3e/0x14a0 filemap_map_pages+0x42b/0xdd0 handle_mm_fault+0x17d5/0x28b0 __do_page_fault+0x310/0x8f0 trace_do_page_fault+0x18d/0x310 do_async_page_fault+0x27/0xa0 async_page_fault+0x28/0x30 The important bits from the above is that filemap_map_pages() is calling into the page allocator while holding rcu_read_lock (sleeping is not allowed inside RCU read-side critical sections). According to Kirill Shutemov, the prefaulting code in do_fault_around() is supposed to take care of this, but missing error handling means that the allocation failure can go unnoticed. We don't need to return VM_FAULT_OOM (or any other error) here, since we can just let the normal fault path try again. Fixes: 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Link: http://lkml.kernel.org/r/1469708107-11868-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum Acked-by: Kirill A. Shutemov Cc: "Hillf Danton" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'mm') diff --git a/mm/memory.c b/mm/memory.c index 4425b6059339..04004834e985 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3133,6 +3133,8 @@ static int do_fault_around(struct fault_env *fe, pgoff_t start_pgoff) if (pmd_none(*fe->pmd)) { fe->prealloc_pte = pte_alloc_one(fe->vma->vm_mm, fe->address); + if (!fe->prealloc_pte) + goto out; smp_wmb(); /* See comment in __pte_alloc() */ } -- cgit v1.2.3 From 1a8018fb4c6976559c3f04bcf760822381be501d Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 2 Aug 2016 14:02:25 -0700 Subject: mm: move swap-in anonymous page into active list Every swap-in anonymous page starts from inactive lru list's head. It should be activated unconditionally when VM decide to reclaim because page table entry for the page always usually has marked accessed bit. Thus, their window size for getting a new referece is 2 * NR_inactive + NR_active while others is NR_inactive + NR_active. It's not fair that it has more chance to be referenced compared to other newly allocated page which starts from active lru list's head. Johannes: : The page can still have a valid copy on the swap device, so prefering to : reclaim that page over a fresh one could make sense. But as you point : out, having it start inactive instead of active actually ends up giving it : *more* LRU time, and that seems to be without justification. Rik: : The reason newly read in swap cache pages start on the inactive list is : that we do some amount of read-around, and do not know which pages will : get used. : : However, immediately activating the ones that DO get used, like your patch : does, is the right thing to do. Link: http://lkml.kernel.org/r/1469762740-17860-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim Acked-by: Johannes Weiner Acked-by: Rik van Riel Cc: Nadav Amit Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 1 + 1 file changed, 1 insertion(+) (limited to 'mm') diff --git a/mm/memory.c b/mm/memory.c index 04004834e985..83be99d9d8a1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2642,6 +2642,7 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte) if (page == swapcache) { do_page_add_anon_rmap(page, vma, fe->address, exclusive); mem_cgroup_commit_charge(page, memcg, true, false); + activate_page(page); } else { /* ksm created a completely new copy */ page_add_new_anon_rmap(page, vma, fe->address, false); mem_cgroup_commit_charge(page, memcg, false, false); -- cgit v1.2.3 From 649920c6ab93429b94bc7c1aa7c0e8395351be32 Mon Sep 17 00:00:00 2001 From: Jia He Date: Tue, 2 Aug 2016 14:02:31 -0700 Subject: mm/hugetlb: avoid soft lockup in set_max_huge_pages() In powerpc servers with large memory(32TB), we watched several soft lockups for hugepage under stress tests. The call traces are as follows: 1. get_page_from_freelist+0x2d8/0xd50 __alloc_pages_nodemask+0x180/0xc20 alloc_fresh_huge_page+0xb0/0x190 set_max_huge_pages+0x164/0x3b0 2. prep_new_huge_page+0x5c/0x100 alloc_fresh_huge_page+0xc8/0x190 set_max_huge_pages+0x164/0x3b0 This patch fixes such soft lockups. It is safe to call cond_resched() there because it is out of spin_lock/unlock section. Link: http://lkml.kernel.org/r/1469674442-14848-1-git-send-email-hejianet@gmail.com Signed-off-by: Jia He Reviewed-by: Naoya Horiguchi Acked-by: Michal Hocko Acked-by: Dave Hansen Cc: Mike Kravetz Cc: "Kirill A. Shutemov" Cc: Paul Gortmaker Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'mm') diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f904246a8fd5..619e00d82c5d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2216,6 +2216,10 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, * and reducing the surplus. */ spin_unlock(&hugetlb_lock); + + /* yield cpu to avoid soft lockup */ + cond_resched(); + if (hstate_is_gigantic(h)) ret = alloc_fresh_gigantic_page(h, nodes_allowed); else -- cgit v1.2.3 From 4e666314d286765a9e61818b488c7372326654ec Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 2 Aug 2016 14:02:34 -0700 Subject: mm, hugetlb: fix huge_pte_alloc BUG_ON Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he runs his database load with memory online and offline running in parallel. The reason is that huge_pmd_share might detect a shared pmd which is currently migrated and so it has migration pte which is !pte_huge. There doesn't seem to be any easy way to prevent from the race and in fact seeing the migration swap entry is not harmful. Both callers of huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range will copy the swap entry and make it COW if needed. hugetlb_fault will back off and so the page fault is retries if the page is still under migration and waits for its completion in hugetlb_fault. That means that the BUG_ON is wrong and we should update it. Let's simply check that all present ptes are pte_huge instead. Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz Signed-off-by: Michal Hocko Reported-by: zhongjiang Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 619e00d82c5d..ef968306fd5b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4310,7 +4310,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, pte = (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); return pte; } -- cgit v1.2.3 From d6507ff5331c002430cc20ab25922479453baae7 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 2 Aug 2016 14:02:37 -0700 Subject: memcg: put soft limit reclaim out of way if the excess tree is empty We've had a report about soft lockups caused by lock bouncing in the soft reclaim path: BUG: soft lockup - CPU#0 stuck for 22s! [kav4proxy-kavic:3128] RIP: 0010:[] [] _raw_spin_lock+0x18/0x20 Call Trace: mem_cgroup_soft_limit_reclaim+0x25a/0x280 shrink_zones+0xed/0x200 do_try_to_free_pages+0x74/0x320 try_to_free_pages+0x112/0x180 __alloc_pages_slowpath+0x3ff/0x820 __alloc_pages_nodemask+0x1e9/0x200 alloc_pages_vma+0xe1/0x290 do_wp_page+0x19f/0x840 handle_pte_fault+0x1cd/0x230 do_page_fault+0x1fd/0x4c0 page_fault+0x25/0x30 There are no memcgs created so there cannot be any in the soft limit excess obviously: [...] memory 0 1 1 so all this just seems to be mem_cgroup_largest_soft_limit_node trying to get spin_lock_irq(&mctz->lock) just to find out that the soft limit excess tree is empty. This is just pointless wasting of cycles and cache line bouncing during heavy parallel reclaim on large machines. The particular machine wasn't very healthy and most probably suffering from a memory leak which just caused the memory reclaim to trash heavily. But bouncing on the lock certainly didn't help... Fix this by optimistic lockless check and bail out early if the tree is empty. This is theoretically racy but that shouldn't matter all that much. First of all soft limit is a best effort feature and it is slowly getting deprecated and its usage should be really scarce. Bouncing on a lock without a good reason is surely much bigger problem, especially on large CPU machines. Link: http://lkml.kernel.org/r/1470073277-1056-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Vladimir Davydov Acked-by: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'mm') diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c265212bec8c..66beca1ad92f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2559,6 +2559,15 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, return 0; mctz = soft_limit_tree_node(pgdat->node_id); + + /* + * Do not even bother to check the largest node if the root + * is empty. Do it lockless to prevent lock bouncing. Races + * are acceptable as soft limit is best effort anyway. + */ + if (RB_EMPTY_ROOT(&mctz->rb_root)) + return 0; + /* * This loop can run a while, specially if mem_cgroup's continuously * keep exceeding their soft limit and putting the system under -- cgit v1.2.3 From 4a3d308d6674fabf213bce9c1a661ef43a85e515 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:40 -0700 Subject: mm/kasan: fix corruptions and false positive reports Once an object is put into quarantine, we no longer own it, i.e. object could leave the quarantine and be reallocated. So having set_track() call after the quarantine_put() may corrupt slab objects. BUG kmalloc-4096 (Not tainted): Poison overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b ... INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492 __slab_free+0x1d6/0x2e0 ___cache_free+0xb6/0xd0 qlist_free_all+0x83/0x100 quarantine_reduce+0x177/0x1b0 kasan_kmalloc+0xf3/0x100 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x109/0x3e0 mmap_region+0x53e/0xe40 do_mmap+0x70f/0xa50 vm_mmap_pgoff+0x147/0x1b0 SyS_mmap_pgoff+0x2c7/0x5b0 SyS_mmap+0x1b/0x30 do_syscall_64+0x1a0/0x4e0 return_from_SYSCALL_64+0x0/0x7a INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x (null) flags=0x8000000000004080 INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588 Redzone ffff8804540de840: bb bb bb bb bb bb bb bb ........ Object ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc kkkkkkkk.R....`. Similarly, poisoning after the quarantine_put() leads to false positive use-after-free reports: BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr ffff880405c540a0 Read of size 8 by task trinity-c0/3036 CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9 Call Trace: dump_stack+0x68/0x96 kasan_report_error+0x222/0x600 __asan_report_load8_noabort+0x61/0x70 anon_vma_interval_tree_insert+0x304/0x430 anon_vma_chain_link+0x91/0xd0 anon_vma_clone+0x136/0x3f0 anon_vma_fork+0x81/0x4c0 copy_process.part.47+0x2c43/0x5b20 _do_fork+0x16d/0xbd0 SyS_clone+0x19/0x20 do_syscall_64+0x1a0/0x4e0 entry_SYSCALL64_slow_path+0x25/0x25 Fix this by putting an object in the quarantine after all other operations. Fixes: 80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB") Link: http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Reported-by: Dave Jones Reported-by: Vegard Nossum Reported-by: Sasha Levin Acked-by: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index b6f99e81bfeb..3019cecc0833 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -543,9 +543,9 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object) switch (alloc_info->state) { case KASAN_STATE_ALLOC: alloc_info->state = KASAN_STATE_QUARANTINE; - quarantine_put(free_info, cache); set_track(&free_info->track, GFP_NOWAIT); kasan_poison_slab_free(cache, object); + quarantine_put(free_info, cache); return true; case KASAN_STATE_QUARANTINE: case KASAN_STATE_FREE: -- cgit v1.2.3 From 4b3ec5a3f4b1d5c9d64b9ab704042400d050d432 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:43 -0700 Subject: mm/kasan: don't reduce quarantine in atomic contexts Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied by __GFP_RECLAIM) allocation. So, basically we call it on almost every allocation. quarantine_reduce() sometimes is heavy operation, and calling it with disabled interrupts may trigger hard LOCKUP: NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp: 1411258 Call Trace: dump_stack+0x68/0x96 watchdog_overflow_callback+0x15b/0x190 __perf_event_overflow+0x1b1/0x540 perf_event_overflow+0x14/0x20 intel_pmu_handle_irq+0x36a/0xad0 perf_event_nmi_handler+0x2c/0x50 nmi_handle+0x128/0x480 default_do_nmi+0xb2/0x210 do_nmi+0x1aa/0x220 end_repeat_nmi+0x1a/0x1e <> __kernel_text_address+0x86/0xb0 print_context_stack+0x7b/0x100 dump_trace+0x12b/0x350 save_stack_trace+0x2b/0x50 set_track+0x83/0x140 free_debug_processing+0x1aa/0x420 __slab_free+0x1d6/0x2e0 ___cache_free+0xb6/0xd0 qlist_free_all+0x83/0x100 quarantine_reduce+0x177/0x1b0 kasan_kmalloc+0xf3/0x100 Reduce the quarantine_reduce iff direct reclaim is allowed. Fixes: 55834c59098d("mm: kasan: initial memory quarantine implementation") Link: http://lkml.kernel.org/r/1470062715-14077-2-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Reported-by: Dave Jones Acked-by: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index 3019cecc0833..c99ef40ebdfa 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -565,7 +565,7 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size, unsigned long redzone_start; unsigned long redzone_end; - if (flags & __GFP_RECLAIM) + if (gfpflags_allow_blocking(flags)) quarantine_reduce(); if (unlikely(object == NULL)) @@ -596,7 +596,7 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags) unsigned long redzone_start; unsigned long redzone_end; - if (flags & __GFP_RECLAIM) + if (gfpflags_allow_blocking(flags)) quarantine_reduce(); if (unlikely(ptr == NULL)) -- cgit v1.2.3 From f7376aed6c032aab820fa36806a89e16e353a0d9 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:46 -0700 Subject: mm/kasan, slub: don't disable interrupts when object leaves quarantine SLUB doesn't require disabled interrupts to call ___cache_free(). Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Acked-by: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/quarantine.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 65793f150d1f..4852625ff851 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -147,10 +147,14 @@ static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache) struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object); unsigned long flags; - local_irq_save(flags); + if (IS_ENABLED(CONFIG_SLAB)) + local_irq_save(flags); + alloc_info->state = KASAN_STATE_FREE; ___cache_free(cache, object, _THIS_IP_); - local_irq_restore(flags); + + if (IS_ENABLED(CONFIG_SLAB)) + local_irq_restore(flags); } static void qlist_free_all(struct qlist_head *q, struct kmem_cache *cache) -- cgit v1.2.3 From 47b5c2a0f021e90a79845d1a1353780e5edd0bce Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:49 -0700 Subject: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta Size of slab object already stored in cache->object_size. Note, that kmalloc() internally rounds up size of allocation, so object_size may be not equal to alloc_size, but, usually we don't need to know the exact size of allocated object. In case if we need that information, we still can figure it out from the report. The dump of shadow memory allows to identify the end of allocated memory, and thereby the exact allocation size. Link: http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.c | 1 - mm/kasan/kasan.h | 3 +-- mm/kasan/report.c | 8 +++----- 3 files changed, 4 insertions(+), 8 deletions(-) (limited to 'mm') diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index c99ef40ebdfa..388e812ccaca 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -584,7 +584,6 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size, get_alloc_info(cache, object); alloc_info->state = KASAN_STATE_ALLOC; - alloc_info->alloc_size = size; set_track(&alloc_info->track, flags); } } diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index 31972cdba433..aa175460c8f9 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -75,8 +75,7 @@ struct kasan_track { struct kasan_alloc_meta { struct kasan_track track; - u32 state : 2; /* enum kasan_state */ - u32 alloc_size : 30; + u32 state; }; struct qlist_node { diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 861b9776841a..d67a7e020905 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -136,7 +136,9 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page, struct kasan_free_meta *free_info; dump_stack(); - pr_err("Object at %p, in cache %s\n", object, cache->name); + pr_err("Object at %p, in cache %s size: %d\n", object, cache->name, + cache->object_size); + if (!(cache->flags & SLAB_KASAN)) return; switch (alloc_info->state) { @@ -144,15 +146,11 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page, pr_err("Object not allocated yet\n"); break; case KASAN_STATE_ALLOC: - pr_err("Object allocated with size %u bytes.\n", - alloc_info->alloc_size); pr_err("Allocation:\n"); print_track(&alloc_info->track); break; case KASAN_STATE_FREE: case KASAN_STATE_QUARANTINE: - pr_err("Object freed, allocated with size %u bytes\n", - alloc_info->alloc_size); free_info = get_free_info(cache, object); pr_err("Allocation:\n"); print_track(&alloc_info->track); -- cgit v1.2.3 From b3cbd9bf77cd1888114dbee1653e79aa23fd4068 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:52 -0700 Subject: mm/kasan: get rid of ->state in struct kasan_alloc_meta The state of object currently tracked in two places - shadow memory, and the ->state field in struct kasan_alloc_meta. We can get rid of the latter. The will save us a little bit of memory. Also, this allow us to move free stack into struct kasan_alloc_meta, without increasing memory consumption. So now we should always know when the last time the object was freed. This may be useful for long delayed use-after-free bugs. As a side effect this fixes following UBSAN warning: UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13 member access within misaligned address ffff88000d1efebc for type 'struct qlist_node' which requires 8 byte alignment Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com Reported-by: kernel test robot Signed-off-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 3 +++ mm/kasan/kasan.c | 61 +++++++++++++++++++++++---------------------------- mm/kasan/kasan.h | 12 ++-------- mm/kasan/quarantine.c | 2 -- mm/kasan/report.c | 23 +++++-------------- mm/slab.c | 4 +++- mm/slub.c | 1 + 7 files changed, 42 insertions(+), 64 deletions(-) (limited to 'mm') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index c9cf374445d8..d600303306eb 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -56,6 +56,7 @@ void kasan_cache_destroy(struct kmem_cache *cache); void kasan_poison_slab(struct page *page); void kasan_unpoison_object_data(struct kmem_cache *cache, void *object); void kasan_poison_object_data(struct kmem_cache *cache, void *object); +void kasan_init_slab_obj(struct kmem_cache *cache, const void *object); void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags); void kasan_kfree_large(const void *ptr); @@ -102,6 +103,8 @@ static inline void kasan_unpoison_object_data(struct kmem_cache *cache, void *object) {} static inline void kasan_poison_object_data(struct kmem_cache *cache, void *object) {} +static inline void kasan_init_slab_obj(struct kmem_cache *cache, + const void *object) {} static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {} static inline void kasan_kfree_large(const void *ptr) {} diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index 388e812ccaca..92750e3b0083 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -442,11 +442,6 @@ void kasan_poison_object_data(struct kmem_cache *cache, void *object) kasan_poison_shadow(object, round_up(cache->object_size, KASAN_SHADOW_SCALE_SIZE), KASAN_KMALLOC_REDZONE); - if (cache->flags & SLAB_KASAN) { - struct kasan_alloc_meta *alloc_info = - get_alloc_info(cache, object); - alloc_info->state = KASAN_STATE_INIT; - } } static inline int in_irqentry_text(unsigned long ptr) @@ -510,6 +505,17 @@ struct kasan_free_meta *get_free_info(struct kmem_cache *cache, return (void *)object + cache->kasan_info.free_meta_offset; } +void kasan_init_slab_obj(struct kmem_cache *cache, const void *object) +{ + struct kasan_alloc_meta *alloc_info; + + if (!(cache->flags & SLAB_KASAN)) + return; + + alloc_info = get_alloc_info(cache, object); + __memset(alloc_info, 0, sizeof(*alloc_info)); +} + void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags) { kasan_kmalloc(cache, object, cache->object_size, flags); @@ -529,34 +535,27 @@ static void kasan_poison_slab_free(struct kmem_cache *cache, void *object) bool kasan_slab_free(struct kmem_cache *cache, void *object) { + s8 shadow_byte; + /* RCU slabs could be legally used after free within the RCU period */ if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU)) return false; - if (likely(cache->flags & SLAB_KASAN)) { - struct kasan_alloc_meta *alloc_info; - struct kasan_free_meta *free_info; + shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object)); + if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) { + pr_err("Double free"); + dump_stack(); + return true; + } - alloc_info = get_alloc_info(cache, object); - free_info = get_free_info(cache, object); + kasan_poison_slab_free(cache, object); - switch (alloc_info->state) { - case KASAN_STATE_ALLOC: - alloc_info->state = KASAN_STATE_QUARANTINE; - set_track(&free_info->track, GFP_NOWAIT); - kasan_poison_slab_free(cache, object); - quarantine_put(free_info, cache); - return true; - case KASAN_STATE_QUARANTINE: - case KASAN_STATE_FREE: - pr_err("Double free"); - dump_stack(); - break; - default: - break; - } - } - return false; + if (unlikely(!(cache->flags & SLAB_KASAN))) + return false; + + set_track(&get_alloc_info(cache, object)->free_track, GFP_NOWAIT); + quarantine_put(get_free_info(cache, object), cache); + return true; } void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size, @@ -579,13 +578,9 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size, kasan_unpoison_shadow(object, size); kasan_poison_shadow((void *)redzone_start, redzone_end - redzone_start, KASAN_KMALLOC_REDZONE); - if (cache->flags & SLAB_KASAN) { - struct kasan_alloc_meta *alloc_info = - get_alloc_info(cache, object); - alloc_info->state = KASAN_STATE_ALLOC; - set_track(&alloc_info->track, flags); - } + if (cache->flags & SLAB_KASAN) + set_track(&get_alloc_info(cache, object)->alloc_track, flags); } EXPORT_SYMBOL(kasan_kmalloc); diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index aa175460c8f9..9b7b31e25fd2 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -59,13 +59,6 @@ struct kasan_global { * Structures to keep alloc and free tracks * */ -enum kasan_state { - KASAN_STATE_INIT, - KASAN_STATE_ALLOC, - KASAN_STATE_QUARANTINE, - KASAN_STATE_FREE -}; - #define KASAN_STACK_DEPTH 64 struct kasan_track { @@ -74,8 +67,8 @@ struct kasan_track { }; struct kasan_alloc_meta { - struct kasan_track track; - u32 state; + struct kasan_track alloc_track; + struct kasan_track free_track; }; struct qlist_node { @@ -86,7 +79,6 @@ struct kasan_free_meta { * Otherwise it might be used for the allocator freelist. */ struct qlist_node quarantine_link; - struct kasan_track track; }; struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache, diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 4852625ff851..7fd121d13b88 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -144,13 +144,11 @@ static void *qlink_to_object(struct qlist_node *qlink, struct kmem_cache *cache) static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache) { void *object = qlink_to_object(qlink, cache); - struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object); unsigned long flags; if (IS_ENABLED(CONFIG_SLAB)) local_irq_save(flags); - alloc_info->state = KASAN_STATE_FREE; ___cache_free(cache, object, _THIS_IP_); if (IS_ENABLED(CONFIG_SLAB)) diff --git a/mm/kasan/report.c b/mm/kasan/report.c index d67a7e020905..f437398b685a 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -133,7 +133,6 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page, void *object, char *unused_reason) { struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object); - struct kasan_free_meta *free_info; dump_stack(); pr_err("Object at %p, in cache %s size: %d\n", object, cache->name, @@ -141,23 +140,11 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page, if (!(cache->flags & SLAB_KASAN)) return; - switch (alloc_info->state) { - case KASAN_STATE_INIT: - pr_err("Object not allocated yet\n"); - break; - case KASAN_STATE_ALLOC: - pr_err("Allocation:\n"); - print_track(&alloc_info->track); - break; - case KASAN_STATE_FREE: - case KASAN_STATE_QUARANTINE: - free_info = get_free_info(cache, object); - pr_err("Allocation:\n"); - print_track(&alloc_info->track); - pr_err("Deallocation:\n"); - print_track(&free_info->track); - break; - } + + pr_err("Allocated:\n"); + print_track(&alloc_info->alloc_track); + pr_err("Freed:\n"); + print_track(&alloc_info->free_track); } static void print_address_description(struct kasan_access_info *info) diff --git a/mm/slab.c b/mm/slab.c index 09771ed3e693..ca135bd47c35 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2604,9 +2604,11 @@ static void cache_init_objs(struct kmem_cache *cachep, } for (i = 0; i < cachep->num; i++) { + objp = index_to_obj(cachep, page, i); + kasan_init_slab_obj(cachep, objp); + /* constructor could break poison info */ if (DEBUG == 0 && cachep->ctor) { - objp = index_to_obj(cachep, page, i); kasan_unpoison_object_data(cachep, objp); cachep->ctor(objp); kasan_poison_object_data(cachep, objp); diff --git a/mm/slub.c b/mm/slub.c index 74e7c8c30db8..26eb6a99540e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1384,6 +1384,7 @@ static void setup_object(struct kmem_cache *s, struct page *page, void *object) { setup_object_debug(s, page, object); + kasan_init_slab_obj(s, object); if (unlikely(s->ctor)) { kasan_unpoison_object_data(s, object); s->ctor(object); -- cgit v1.2.3 From 7e088978933ee186533355ae03a9dc1de99cf6c7 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 2 Aug 2016 14:02:55 -0700 Subject: kasan: improve double-free reports Currently we just dump stack in case of double free bug. Let's dump all info about the object that we have. [aryabinin@virtuozzo.com: change double free message per Alexander] Link: http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/kasan.c | 3 +-- mm/kasan/kasan.h | 2 ++ mm/kasan/report.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- 3 files changed, 41 insertions(+), 18 deletions(-) (limited to 'mm') diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index 92750e3b0083..88af13c00d3c 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -543,8 +543,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object) shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object)); if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) { - pr_err("Double free"); - dump_stack(); + kasan_report_double_free(cache, object, shadow_byte); return true; } diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index 9b7b31e25fd2..e5c2181fee6f 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -99,6 +99,8 @@ static inline bool kasan_report_enabled(void) void kasan_report(unsigned long addr, size_t size, bool is_write, unsigned long ip); +void kasan_report_double_free(struct kmem_cache *cache, void *object, + s8 shadow); #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB) void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache); diff --git a/mm/kasan/report.c b/mm/kasan/report.c index f437398b685a..24c1211fe9d5 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -116,6 +116,26 @@ static inline bool init_task_stack_addr(const void *addr) sizeof(init_thread_union.stack)); } +static DEFINE_SPINLOCK(report_lock); + +static void kasan_start_report(unsigned long *flags) +{ + /* + * Make sure we don't end up in loop. + */ + kasan_disable_current(); + spin_lock_irqsave(&report_lock, *flags); + pr_err("==================================================================\n"); +} + +static void kasan_end_report(unsigned long *flags) +{ + pr_err("==================================================================\n"); + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + spin_unlock_irqrestore(&report_lock, *flags); + kasan_enable_current(); +} + static void print_track(struct kasan_track *track) { pr_err("PID = %u\n", track->pid); @@ -129,8 +149,7 @@ static void print_track(struct kasan_track *track) } } -static void kasan_object_err(struct kmem_cache *cache, struct page *page, - void *object, char *unused_reason) +static void kasan_object_err(struct kmem_cache *cache, void *object) { struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object); @@ -147,6 +166,18 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page, print_track(&alloc_info->free_track); } +void kasan_report_double_free(struct kmem_cache *cache, void *object, + s8 shadow) +{ + unsigned long flags; + + kasan_start_report(&flags); + pr_err("BUG: Double free or freeing an invalid pointer\n"); + pr_err("Unexpected shadow byte: 0x%hhX\n", shadow); + kasan_object_err(cache, object); + kasan_end_report(&flags); +} + static void print_address_description(struct kasan_access_info *info) { const void *addr = info->access_addr; @@ -160,8 +191,7 @@ static void print_address_description(struct kasan_access_info *info) struct kmem_cache *cache = page->slab_cache; object = nearest_obj(cache, page, (void *)info->access_addr); - kasan_object_err(cache, page, object, - "kasan: bad access detected"); + kasan_object_err(cache, object); return; } dump_page(page, "kasan: bad access detected"); @@ -226,19 +256,13 @@ static void print_shadow_for_address(const void *addr) } } -static DEFINE_SPINLOCK(report_lock); - static void kasan_report_error(struct kasan_access_info *info) { unsigned long flags; const char *bug_type; - /* - * Make sure we don't end up in loop. - */ - kasan_disable_current(); - spin_lock_irqsave(&report_lock, flags); - pr_err("==================================================================\n"); + kasan_start_report(&flags); + if (info->access_addr < kasan_shadow_to_mem((void *)KASAN_SHADOW_START)) { if ((unsigned long)info->access_addr < PAGE_SIZE) @@ -259,10 +283,8 @@ static void kasan_report_error(struct kasan_access_info *info) print_address_description(info); print_shadow_for_address(info->first_bad_addr); } - pr_err("==================================================================\n"); - add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); - spin_unlock_irqrestore(&report_lock, flags); - kasan_enable_current(); + + kasan_end_report(&flags); } void kasan_report(unsigned long addr, size_t size, -- cgit v1.2.3 From c3cee372282cb6bcdf19ac1457581d5dd5ecb554 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Tue, 2 Aug 2016 14:02:58 -0700 Subject: kasan: avoid overflowing quarantine size on low memory systems If the total amount of memory assigned to quarantine is less than the amount of memory assigned to per-cpu quarantines, |new_quarantine_size| may overflow. Instead, set it to zero. [akpm@linux-foundation.org: cleanup: use WARN_ONCE return value] Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation") Signed-off-by: Alexander Potapenko Reported-by: Dmitry Vyukov Cc: Andrey Ryabinin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/quarantine.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 7fd121d13b88..b6728a33a4ac 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -198,7 +198,7 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache) void quarantine_reduce(void) { - size_t new_quarantine_size; + size_t new_quarantine_size, percpu_quarantines; unsigned long flags; struct qlist_head to_free = QLIST_INIT; size_t size_to_free = 0; @@ -216,7 +216,12 @@ void quarantine_reduce(void) */ new_quarantine_size = (READ_ONCE(totalram_pages) << PAGE_SHIFT) / QUARANTINE_FRACTION; - new_quarantine_size -= QUARANTINE_PERCPU_SIZE * num_online_cpus(); + percpu_quarantines = QUARANTINE_PERCPU_SIZE * num_online_cpus(); + if (WARN_ONCE(new_quarantine_size < percpu_quarantines, + "Too little memory, disabling global KASAN quarantine.\n")) + new_quarantine_size = 0; + else + new_quarantine_size -= percpu_quarantines; WRITE_ONCE(quarantine_size, new_quarantine_size); last = global_quarantine.head; -- cgit v1.2.3 From b5afba2974f9ebbaaa11b5a633e55db9be3cc363 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Tue, 2 Aug 2016 14:03:04 -0700 Subject: mm: vmscan: fix memcg-aware shrinkers not called on global reclaim We must call shrink_slab() for each memory cgroup on both global and memcg reclaim in shrink_node_memcg(). Commit d71df22b55099 accidentally changed that so that now shrink_slab() is only called with memcg != NULL on memcg reclaim. As a result, memcg-aware shrinkers (including dentry/inode) are never invoked on global reclaim. Fix that. Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis") Link: http://lkml.kernel.org/r/1470056590-7177-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov Acked-by: Johannes Weiner Acked-by: Michal Hocko Cc: Mel Gorman Cc: Hillf Danton Cc: Vlastimil Babka Cc: Joonsoo Kim Cc: Minchan Kim Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/vmscan.c b/mm/vmscan.c index 650d26832569..374d95d04178 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2561,7 +2561,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) shrink_node_memcg(pgdat, memcg, sc, &lru_pages); node_lru_pages += lru_pages; - if (!global_reclaim(sc)) + if (memcg) shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->nr_scanned - scanned, lru_pages); -- cgit v1.2.3 From bd721ea73e1f965569b40620538c942001f76294 Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Tue, 2 Aug 2016 14:03:33 -0700 Subject: treewide: replace obsolete _refok by __ref There was only one use of __initdata_refok and __exit_refok __init_refok was used 46 times against 82 for __ref. Those definitions are obsolete since commit 312b1485fb50 ("Introduce new section reference annotations tags: __ref, __refdata, __refconst") This patch removes the following compatibility definitions and replaces them treewide. /* compatibility defines */ #define __init_refok __ref #define __initdata_refok __refdata #define __exit_refok __ref I can also provide separate patches if necessary. (One patch per tree and check in 1 month or 2 to remove old definitions) [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick Cc: Ingo Molnar Cc: Sam Ravnborg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/alpha/kernel/machvec_impl.h | 2 +- arch/arc/mm/init.c | 2 +- arch/arm/mach-integrator/impd1.c | 4 ++-- arch/arm/mach-mv78xx0/common.c | 2 +- arch/blackfin/mm/init.c | 2 +- arch/hexagon/mm/init.c | 2 +- arch/ia64/kernel/mca.c | 2 +- arch/microblaze/mm/init.c | 4 ++-- arch/microblaze/mm/pgtable.c | 2 +- arch/mips/mm/init.c | 2 +- arch/mips/txx9/generic/pci.c | 2 +- arch/nios2/mm/init.c | 2 +- arch/openrisc/mm/ioremap.c | 4 ++-- arch/powerpc/lib/alloc.c | 2 +- arch/powerpc/mm/pgtable_32.c | 2 +- arch/powerpc/platforms/powermac/setup.c | 4 ++-- arch/powerpc/platforms/ps3/device-init.c | 2 +- arch/powerpc/sysdev/msi_bitmap.c | 2 +- arch/score/mm/init.c | 2 +- arch/sh/drivers/pci/pci.c | 4 ++-- arch/sh/mm/ioremap.c | 2 +- arch/x86/mm/init.c | 4 ++-- arch/x86/platform/efi/early_printk.c | 4 ++-- drivers/acpi/osl.c | 5 ++--- drivers/base/node.c | 2 +- drivers/clk/clkdev.c | 4 ++-- drivers/pci/xen-pcifront.c | 2 +- drivers/video/logo/logo.c | 4 ++-- include/acpi/acpi_io.h | 2 +- include/linux/init.h | 6 ------ include/net/net_namespace.h | 2 +- init/main.c | 2 +- mm/page_alloc.c | 4 ++-- mm/slab.c | 2 +- mm/sparse-vmemmap.c | 2 +- mm/sparse.c | 2 +- 36 files changed, 46 insertions(+), 53 deletions(-) (limited to 'mm') diff --git a/arch/alpha/kernel/machvec_impl.h b/arch/alpha/kernel/machvec_impl.h index f54bdf658cd0..d3398f6ab74c 100644 --- a/arch/alpha/kernel/machvec_impl.h +++ b/arch/alpha/kernel/machvec_impl.h @@ -137,7 +137,7 @@ #define __initmv __initdata #define ALIAS_MV(x) #else -#define __initmv __initdata_refok +#define __initmv __refdata /* GCC actually has a syntax for defining aliases, but is under some delusion that you shouldn't be able to declare it extern somewhere diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c index 8be930394750..399e2f223d25 100644 --- a/arch/arc/mm/init.c +++ b/arch/arc/mm/init.c @@ -220,7 +220,7 @@ void __init mem_init(void) /* * free_initmem: Free all the __init memory. */ -void __init_refok free_initmem(void) +void __ref free_initmem(void) { free_initmem_default(-1); } diff --git a/arch/arm/mach-integrator/impd1.c b/arch/arm/mach-integrator/impd1.c index 38b0da300dd5..ed9a01484030 100644 --- a/arch/arm/mach-integrator/impd1.c +++ b/arch/arm/mach-integrator/impd1.c @@ -320,11 +320,11 @@ static struct impd1_device impd1_devs[] = { #define IMPD1_VALID_IRQS 0x00000bffU /* - * As this module is bool, it is OK to have this as __init_refok() - no + * As this module is bool, it is OK to have this as __ref() - no * probe calls will be done after the initial system bootup, as devices * are discovered as part of the machine startup. */ -static int __init_refok impd1_probe(struct lm_device *dev) +static int __ref impd1_probe(struct lm_device *dev) { struct impd1_module *impd1; int irq_base; diff --git a/arch/arm/mach-mv78xx0/common.c b/arch/arm/mach-mv78xx0/common.c index 45a05207b418..6af5430d0d97 100644 --- a/arch/arm/mach-mv78xx0/common.c +++ b/arch/arm/mach-mv78xx0/common.c @@ -343,7 +343,7 @@ void __init mv78xx0_init_early(void) DDR_WINDOW_CPU1_BASE, DDR_WINDOW_CPU_SZ); } -void __init_refok mv78xx0_timer_init(void) +void __ref mv78xx0_timer_init(void) { orion_time_init(BRIDGE_VIRT_BASE, BRIDGE_INT_TIMER1_CLR, IRQ_MV78XX0_TIMER_1, get_tclk()); diff --git a/arch/blackfin/mm/init.c b/arch/blackfin/mm/init.c index 166842de3dc7..b59cd7c3261a 100644 --- a/arch/blackfin/mm/init.c +++ b/arch/blackfin/mm/init.c @@ -112,7 +112,7 @@ void __init free_initrd_mem(unsigned long start, unsigned long end) } #endif -void __init_refok free_initmem(void) +void __ref free_initmem(void) { #if defined CONFIG_RAMKERNEL && !defined CONFIG_MPU free_initmem_default(-1); diff --git a/arch/hexagon/mm/init.c b/arch/hexagon/mm/init.c index 88977e42af0a..192584d5ac2f 100644 --- a/arch/hexagon/mm/init.c +++ b/arch/hexagon/mm/init.c @@ -93,7 +93,7 @@ void __init mem_init(void) * Todo: free pages between __init_begin and __init_end; possibly * some devtree related stuff as well. */ -void __init_refok free_initmem(void) +void __ref free_initmem(void) { } diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index 07a4e32ae96a..eb9220cde76c 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -1831,7 +1831,7 @@ format_mca_init_stack(void *mca_data, unsigned long offset, } /* Caller prevents this from being called after init */ -static void * __init_refok mca_bootmem(void) +static void * __ref mca_bootmem(void) { return __alloc_bootmem(sizeof(struct ia64_mca_cpu), KERNEL_STACK_SIZE, 0); diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c index 77bc7c7e6522..434639f9a3a6 100644 --- a/arch/microblaze/mm/init.c +++ b/arch/microblaze/mm/init.c @@ -414,7 +414,7 @@ void __init *early_get_page(void) #endif /* CONFIG_MMU */ -void * __init_refok alloc_maybe_bootmem(size_t size, gfp_t mask) +void * __ref alloc_maybe_bootmem(size_t size, gfp_t mask) { if (mem_init_done) return kmalloc(size, mask); @@ -422,7 +422,7 @@ void * __init_refok alloc_maybe_bootmem(size_t size, gfp_t mask) return alloc_bootmem(size); } -void * __init_refok zalloc_maybe_bootmem(size_t size, gfp_t mask) +void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask) { void *p; diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c index eb99fcc76088..cc732fe357ad 100644 --- a/arch/microblaze/mm/pgtable.c +++ b/arch/microblaze/mm/pgtable.c @@ -234,7 +234,7 @@ unsigned long iopa(unsigned long addr) return pa; } -__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm, +__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { pte_t *pte; diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c index 9b58eb5fd0d5..a5509e7dcad2 100644 --- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -504,7 +504,7 @@ void free_initrd_mem(unsigned long start, unsigned long end) void (*free_init_pages_eva)(void *begin, void *end) = NULL; -void __init_refok free_initmem(void) +void __ref free_initmem(void) { prom_free_prom_memory(); /* diff --git a/arch/mips/txx9/generic/pci.c b/arch/mips/txx9/generic/pci.c index a77698ff2b6f..1f6bc9a3036c 100644 --- a/arch/mips/txx9/generic/pci.c +++ b/arch/mips/txx9/generic/pci.c @@ -268,7 +268,7 @@ static int txx9_i8259_irq_setup(int irq) return err; } -static void __init_refok quirk_slc90e66_bridge(struct pci_dev *dev) +static void __ref quirk_slc90e66_bridge(struct pci_dev *dev) { int irq; /* PCI/ISA Bridge interrupt */ u8 reg_64; diff --git a/arch/nios2/mm/init.c b/arch/nios2/mm/init.c index e75c75d249d6..c92fe4234009 100644 --- a/arch/nios2/mm/init.c +++ b/arch/nios2/mm/init.c @@ -89,7 +89,7 @@ void __init free_initrd_mem(unsigned long start, unsigned long end) } #endif -void __init_refok free_initmem(void) +void __ref free_initmem(void) { free_initmem_default(-1); } diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c index 5b2a95116e8f..fa60b81aee3e 100644 --- a/arch/openrisc/mm/ioremap.c +++ b/arch/openrisc/mm/ioremap.c @@ -38,7 +38,7 @@ static unsigned int fixmaps_used __initdata; * have to convert them into an offset in a page-aligned mapping, but the * caller shouldn't need to know that small detail. */ -void __iomem *__init_refok +void __iomem *__ref __ioremap(phys_addr_t addr, unsigned long size, pgprot_t prot) { phys_addr_t p; @@ -116,7 +116,7 @@ void iounmap(void *addr) * the memblock infrastructure. */ -pte_t __init_refok *pte_alloc_one_kernel(struct mm_struct *mm, +pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { pte_t *pte; diff --git a/arch/powerpc/lib/alloc.c b/arch/powerpc/lib/alloc.c index 60b0b3fc8fc1..a58abe4afbd1 100644 --- a/arch/powerpc/lib/alloc.c +++ b/arch/powerpc/lib/alloc.c @@ -6,7 +6,7 @@ #include -void * __init_refok zalloc_maybe_bootmem(size_t size, gfp_t mask) +void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask) { void *p; diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 7f922f557936..0ae0572bc239 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -79,7 +79,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd) #endif } -__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) +__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { pte_t *pte; diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 3de4a7c85140..6b4e9d181126 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -353,12 +353,12 @@ static int pmac_late_init(void) machine_late_initcall(powermac, pmac_late_init); /* - * This is __init_refok because we check for "initializing" before + * This is __ref because we check for "initializing" before * touching any of the __init sensitive things and "initializing" * will be false after __init time. This can't be __init because it * can be called whenever a disk is first accessed. */ -void __init_refok note_bootable_part(dev_t dev, int part, int goodness) +void __ref note_bootable_part(dev_t dev, int part, int goodness) { char *p; diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c index 3f175e8aedb4..57caaf11a83f 100644 --- a/arch/powerpc/platforms/ps3/device-init.c +++ b/arch/powerpc/platforms/ps3/device-init.c @@ -189,7 +189,7 @@ fail_malloc: return result; } -static int __init_refok ps3_setup_uhc_device( +static int __ref ps3_setup_uhc_device( const struct ps3_repository_device *repo, enum ps3_match_id match_id, enum ps3_interrupt_type interrupt_type, enum ps3_reg_type reg_type) { diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c index ed5234ed8d3f..5ebd3f018295 100644 --- a/arch/powerpc/sysdev/msi_bitmap.c +++ b/arch/powerpc/sysdev/msi_bitmap.c @@ -112,7 +112,7 @@ int msi_bitmap_reserve_dt_hwirqs(struct msi_bitmap *bmp) return 0; } -int __init_refok msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count, +int __ref msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count, struct device_node *of_node) { int size; diff --git a/arch/score/mm/init.c b/arch/score/mm/init.c index 9fbce49ad3bd..444c26c0f750 100644 --- a/arch/score/mm/init.c +++ b/arch/score/mm/init.c @@ -91,7 +91,7 @@ void free_initrd_mem(unsigned long start, unsigned long end) } #endif -void __init_refok free_initmem(void) +void __ref free_initmem(void) { free_initmem_default(POISON_FREE_INITMEM); } diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c index d5462b7bc514..84563e39a5b8 100644 --- a/arch/sh/drivers/pci/pci.c +++ b/arch/sh/drivers/pci/pci.c @@ -221,7 +221,7 @@ pcibios_bus_report_status_early(struct pci_channel *hose, * We can't use pci_find_device() here since we are * called from interrupt context. */ -static void __init_refok +static void __ref pcibios_bus_report_status(struct pci_bus *bus, unsigned int status_mask, int warn) { @@ -256,7 +256,7 @@ pcibios_bus_report_status(struct pci_bus *bus, unsigned int status_mask, pcibios_bus_report_status(dev->subordinate, status_mask, warn); } -void __init_refok pcibios_report_status(unsigned int status_mask, int warn) +void __ref pcibios_report_status(unsigned int status_mask, int warn) { struct pci_channel *hose; diff --git a/arch/sh/mm/ioremap.c b/arch/sh/mm/ioremap.c index 0c99ec2e7ed8..d09ddfe58fd8 100644 --- a/arch/sh/mm/ioremap.c +++ b/arch/sh/mm/ioremap.c @@ -34,7 +34,7 @@ * have to convert them into an offset in a page-aligned mapping, but the * caller shouldn't need to know that small detail. */ -void __iomem * __init_refok +void __iomem * __ref __ioremap_caller(phys_addr_t phys_addr, unsigned long size, pgprot_t pgprot, void *caller) { diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index fb4c1b42fc7e..620928903be3 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -208,7 +208,7 @@ static int __meminit save_mr(struct map_range *mr, int nr_range, * adjust the page_size_mask for small range to go with * big page size instead small one if nearby are ram too. */ -static void __init_refok adjust_range_page_size_mask(struct map_range *mr, +static void __ref adjust_range_page_size_mask(struct map_range *mr, int nr_range) { int i; @@ -396,7 +396,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn) * This runs before bootmem is initialized and gets pages directly from * the physical memory. To access them they are temporarily mapped. */ -unsigned long __init_refok init_memory_mapping(unsigned long start, +unsigned long __ref init_memory_mapping(unsigned long start, unsigned long end) { struct map_range mr[NR_RANGE_MR]; diff --git a/arch/x86/platform/efi/early_printk.c b/arch/x86/platform/efi/early_printk.c index 524142117296..5fdacb322ceb 100644 --- a/arch/x86/platform/efi/early_printk.c +++ b/arch/x86/platform/efi/early_printk.c @@ -44,7 +44,7 @@ early_initcall(early_efi_map_fb); * In case earlyprintk=efi,keep we have the whole framebuffer mapped already * so just return the offset efi_fb + start. */ -static __init_refok void *early_efi_map(unsigned long start, unsigned long len) +static __ref void *early_efi_map(unsigned long start, unsigned long len) { unsigned long base; @@ -56,7 +56,7 @@ static __init_refok void *early_efi_map(unsigned long start, unsigned long len) return early_ioremap(base + start, len); } -static __init_refok void early_efi_unmap(void *addr, unsigned long len) +static __ref void early_efi_unmap(void *addr, unsigned long len) { if (!efi_fb) early_iounmap(addr, len); diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index b108f1358a32..4305ee9db4b2 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -309,7 +309,7 @@ static void acpi_unmap(acpi_physical_address pg_off, void __iomem *vaddr) * During early init (when acpi_gbl_permanent_mmap has not been set yet) this * routine simply calls __acpi_map_table() to get the job done. */ -void __iomem *__init_refok +void __iomem *__ref acpi_os_map_iomem(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map; @@ -362,8 +362,7 @@ out: } EXPORT_SYMBOL_GPL(acpi_os_map_iomem); -void *__init_refok -acpi_os_map_memory(acpi_physical_address phys, acpi_size size) +void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size) { return (void *)acpi_os_map_iomem(phys, size); } diff --git a/drivers/base/node.c b/drivers/base/node.c index 29cd96661b30..5548f9686016 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -370,7 +370,7 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE #define page_initialized(page) (page->lru.next) -static int __init_refok get_nid_for_pfn(unsigned long pfn) +static int __ref get_nid_for_pfn(unsigned long pfn) { struct page *page; diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c index 89cc700fbc37..97ae60fa1584 100644 --- a/drivers/clk/clkdev.c +++ b/drivers/clk/clkdev.c @@ -250,7 +250,7 @@ struct clk_lookup_alloc { char con_id[MAX_CON_ID]; }; -static struct clk_lookup * __init_refok +static struct clk_lookup * __ref vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt, va_list ap) { @@ -287,7 +287,7 @@ vclkdev_create(struct clk_hw *hw, const char *con_id, const char *dev_fmt, return cl; } -struct clk_lookup * __init_refok +struct clk_lookup * __ref clkdev_alloc(struct clk *clk, const char *con_id, const char *dev_fmt, ...) { struct clk_lookup *cl; diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index 5f70fee59a94..d6ff5e82377d 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -1086,7 +1086,7 @@ out: return err; } -static void __init_refok pcifront_backend_changed(struct xenbus_device *xdev, +static void __ref pcifront_backend_changed(struct xenbus_device *xdev, enum xenbus_state be_state) { struct pcifront_device *pdev = dev_get_drvdata(&xdev->dev); diff --git a/drivers/video/logo/logo.c b/drivers/video/logo/logo.c index 10fbfd8ab963..b6bc4a0bda2a 100644 --- a/drivers/video/logo/logo.c +++ b/drivers/video/logo/logo.c @@ -36,11 +36,11 @@ static int __init fb_logo_late_init(void) late_initcall(fb_logo_late_init); -/* logo's are marked __initdata. Use __init_refok to tell +/* logo's are marked __initdata. Use __ref to tell * modpost that it is intended that this function uses data * marked __initdata. */ -const struct linux_logo * __init_refok fb_find_logo(int depth) +const struct linux_logo * __ref fb_find_logo(int depth) { const struct linux_logo *logo = NULL; diff --git a/include/acpi/acpi_io.h b/include/acpi/acpi_io.h index dd86c5fc102d..d7d0f495a34e 100644 --- a/include/acpi/acpi_io.h +++ b/include/acpi/acpi_io.h @@ -13,7 +13,7 @@ static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys, } #endif -void __iomem *__init_refok +void __iomem *__ref acpi_os_map_iomem(acpi_physical_address phys, acpi_size size); void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size); void __iomem *acpi_os_get_iomem(acpi_physical_address phys, unsigned int size); diff --git a/include/linux/init.h b/include/linux/init.h index aedb254abc37..6935d02474aa 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -77,12 +77,6 @@ #define __refdata __section(.ref.data) #define __refconst __constsection(.ref.rodata) -/* compatibility defines */ -#define __init_refok __ref -#define __initdata_refok __refdata -#define __exit_refok __ref - - #ifdef MODULE #define __exitused #else diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 4089abc6e9c0..0933c7455a30 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -275,7 +275,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet) #define __net_initconst #else #define __net_init __init -#define __net_exit __exit_refok +#define __net_exit __ref #define __net_initdata __initdata #define __net_initconst __initconst #endif diff --git a/init/main.c b/init/main.c index eae02aa03c9e..e7345dcaaf05 100644 --- a/init/main.c +++ b/init/main.c @@ -380,7 +380,7 @@ static void __init setup_command_line(char *command_line) static __initdata DECLARE_COMPLETION(kthreadd_done); -static noinline void __init_refok rest_init(void) +static noinline void __ref rest_init(void) { int pid; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ea759b935360..39a372a2a1d6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5276,7 +5276,7 @@ void __init setup_per_cpu_pageset(void) setup_zone_pageset(zone); } -static noinline __init_refok +static noinline __ref int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages) { int i; @@ -5903,7 +5903,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) } } -static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) +static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { unsigned long __maybe_unused start = 0; unsigned long __maybe_unused offset = 0; diff --git a/mm/slab.c b/mm/slab.c index ca135bd47c35..261147ba156f 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1877,7 +1877,7 @@ static struct array_cache __percpu *alloc_kmem_cache_cpus( return cpu_cache; } -static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp) +static int __ref setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp) { if (slab_state >= FULL) return enable_cpucache(cachep, gfp); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 68885dcbaf40..574c67b663fe 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -36,7 +36,7 @@ * Uses the main allocators if they are available, else bootmem. */ -static void * __init_refok __earlyonly_bootmem_alloc(int node, +static void * __ref __earlyonly_bootmem_alloc(int node, unsigned long size, unsigned long align, unsigned long goal) diff --git a/mm/sparse.c b/mm/sparse.c index 36d7bbb80e49..1e168bf2779a 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -59,7 +59,7 @@ static inline void set_section_nid(unsigned long section_nr, int nid) #endif #ifdef CONFIG_SPARSEMEM_EXTREME -static struct mem_section noinline __init_refok *sparse_index_alloc(int nid) +static noinline struct mem_section __ref *sparse_index_alloc(int nid) { struct mem_section *section = NULL; unsigned long array_size = SECTIONS_PER_ROOT * -- cgit v1.2.3 From ba093a6d9397da8eafcfbaa7d95bd34255da39a0 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 2 Aug 2016 14:04:54 -0700 Subject: mm: refuse wrapped vm_brk requests The vm_brk() alignment calculations should refuse to overflow. The ELF loader depending on this, but it has been fixed now. No other unsafe callers have been found. Link: http://lkml.kernel.org/r/1468014494-25291-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook Reported-by: Hector Marco-Gisbert Cc: Ismael Ripoll Ripoll Cc: Alexander Viro Cc: "Kirill A. Shutemov" Cc: Oleg Nesterov Cc: Chen Gang Cc: Michal Hocko Cc: Konstantin Khlebnikov Cc: Andrea Arcangeli Cc: Andrey Ryabinin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mmap.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'mm') diff --git a/mm/mmap.c b/mm/mmap.c index d44bee96a5fe..ca9d91bca0d6 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2653,16 +2653,18 @@ static inline void verify_mm_writelocked(struct mm_struct *mm) * anonymous maps. eventually we may be able to do some * brk-specific accounting here. */ -static int do_brk(unsigned long addr, unsigned long len) +static int do_brk(unsigned long addr, unsigned long request) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma, *prev; - unsigned long flags; + unsigned long flags, len; struct rb_node **rb_link, *rb_parent; pgoff_t pgoff = addr >> PAGE_SHIFT; int error; - len = PAGE_ALIGN(len); + len = PAGE_ALIGN(request); + if (len < request) + return -ENOMEM; if (!len) return 0; -- cgit v1.2.3