summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-04-23include/linux/swapops.h: correct guards for non_swap_entry()Steven Price1-1/+2
[ Upstream commit 3f3673d7d324d872d9d8ddb73b3e5e47fbf12e0d ] If CONFIG_DEVICE_PRIVATE is defined, but neither CONFIG_MEMORY_FAILURE nor CONFIG_MIGRATION, then non_swap_entry() will return 0, meaning that the condition (non_swap_entry(entry) && is_device_private_entry(entry)) in zap_pte_range() will never be true even if the entry is a device private one. Equally any other code depending on non_swap_entry() will not function as expected. I originally spotted this just by looking at the code, I haven't actually observed any problems. Looking a bit more closely it appears that actually this situation (currently at least) cannot occur: DEVICE_PRIVATE depends on ZONE_DEVICE ZONE_DEVICE depends on MEMORY_HOTREMOVE MEMORY_HOTREMOVE depends on MIGRATION Fixes: 5042db43cc26 ("mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200305130550.22693-1-steven.price@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFSChristophe Leroy1-11/+8
[ Upstream commit bb297bb2de517e41199185021f043bbc5d75b377 ] When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the following build failure is encoutered: In file included from arch/powerpc/mm/fault.c:33:0: include/linux/hugetlb.h: In function 'hstate_inode': include/linux/hugetlb.h:477:9: error: implicit declaration of function 'HUGETLBFS_SB' [-Werror=implicit-function-declaration] return HUGETLBFS_SB(i->i_sb)->hstate; ^ include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 'int') return HUGETLBFS_SB(i->i_sb)->hstate; ^ Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE. Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Link: http://lkml.kernel.org/r/7e8c3a3c9a587b9cd8a2f146df32a421b961f3a2.1584432148.git.christophe.leroy@c-s.fr Link: https://patchwork.ozlabs.org/patch/1255548/#2386036 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23f2fs: Add a new CP flag to help fsck fix resize SPO issuesSahitya Tummala1-0/+1
[ Upstream commit c84ef3c5e65ccf99a7a91a4d731ebb5d6331a178 ] Add and set a new CP flag CP_RESIZEFS_FLAG during online resize FS to help fsck fix the metadata mismatch that may happen due to SPO during resize, where SB got updated but CP data couldn't be written yet. fsck errors - Info: CKPT version = 6ed7bccb Wrong user_block_count(2233856) [f2fs_do_mount:3365] Checkpoint is polluted Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23bpf: Reliably preserve btf_trace_xxx typesAndrii Nakryiko1-7/+11
[ Upstream commit 441420a1f0b3031f228453697406c86f110e59d4 ] btf_trace_xxx types, crucial for tp_btf BPF programs (raw tracepoint with verifier-checked direct memory access), have to be preserved in kernel BTF to allow verifier do its job and enforce type/memory safety. It was reported ([0]) that for kernels built with Clang current type-casting approach doesn't preserve these types. This patch fixes it by declaring an anonymous union for each registered tracepoint, capturing both struct bpf_raw_event_map information, as well as recording btf_trace_##call type reliably. Structurally, it's still the same content as for a plain struct bpf_raw_event_map, so no other changes are necessary. [0] https://github.com/iovisor/bcc/issues/2770#issuecomment-591007692 Fixes: e8c423fb31fa ("bpf: Add typecast to raw_tracepoints to help BTF generation") Reported-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200301081045.3491005-2-andriin@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23x86/Hyper-V: Report crash data in die() when panic_on_oops is setTianyu Lan1-1/+1
commit f3a99e761efa616028b255b4de58e9b5b87c5545 upstream. When oops happens with panic_on_oops unset, the oops thread is killed by die() and system continues to run. In such case, guest should not report crash register data to host since system still runs. Check panic_on_oops and return directly in hyperv_report_panic() when the function is called in the die() and panic_on_oops is unset. Fix it. Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23ext4: use non-movable memory for superblock readaheadRoman Gushchin1-0/+8
commit d87f639258a6a5980183f11876c884931ad93da2 upstream. Since commit a8ac900b8163 ("ext4: use non-movable memory for the superblock") buffers for ext4 superblock were allocated using the sb_bread_unmovable() helper which allocated buffer heads out of non-movable memory blocks. It was necessarily to not block page migrations and do not cause cma allocation failures. However commit 85c8f176a611 ("ext4: preload block group descriptors") broke this by introducing pre-reading of the ext4 superblock. The problem is that __breadahead() is using __getblk() underneath, which allocates buffer heads out of movable memory. It resulted in page migration failures I've seen on a machine with an ext4 partition and a preallocated cma area. Fix this by introducing sb_breadahead_unmovable() and __breadahead_gfp() helpers which use non-movable memory for buffer head allocations and use them for the ext4 superblock readahead. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Fixes: 85c8f176a611 ("ext4: preload block group descriptors") Signed-off-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridgeVladimir Oltean1-1/+3
[ Upstream commit 87b0f983f66f23762921129fd35966eddc3f2dae ] To rehash a previous explanation given in commit 1c44ce560b4d ("net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up"), the switch driver operates the in a mode where a single VLAN can be transmitted as untagged on a particular egress port. That is the "native VLAN on trunk port" use case. The configuration for this native VLAN is driven in 2 ways: - Set the egress port rewriter to strip the VLAN tag for the native VID (as it is egress-untagged, after all). - Configure the ingress port to drop untagged and priority-tagged traffic, if there is no native VLAN. The intention of this setting is that a trunk port with no native VLAN should not accept untagged traffic. Since both of the above configurations for the native VLAN should only be done if VLAN awareness is requested, they are actually done from the ocelot_port_vlan_filtering function, after the basic procedure of toggling the VLAN awareness flag of the port. But there's a problem with that simplistic approach: we are trying to juggle with 2 independent variables from a single function: - Native VLAN of the port - its value is held in port->vid. - VLAN awareness state of the port - currently there are some issues here, more on that later*. The actual problem can be seen when enslaving the switch ports to a VLAN filtering bridge: 0. The driver configures a pvid of zero for each port, when in standalone mode. While the bridge configures a default_pvid of 1 for each port that gets added as a slave to it. 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true. The VLAN-filtering-dependent portion of the native VLAN configuration is done, considering that the native VLAN is 0. 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true, untagged=true. The native VLAN changes to 1 (change which gets propagated to hardware). 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the VLAN-filtering-dependent portion of the native VLAN configuration, for the new native VLAN of 1. One can notice that after toggling "ip link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0 type bridge vlan_filtering 1", the new native VLAN finally makes it through and untagged traffic finally starts flowing again. But obviously that shouldn't be needed. So it is clear that 2 independent variables need to both re-trigger the native VLAN configuration. So we introduce the second variable as ocelot_port->vlan_aware. *Actually both the DSA Felix driver and the Ocelot driver already had each its own variable: - Ocelot: ocelot_port_private->vlan_aware - Felix: dsa_port->vlan_filtering but the common Ocelot library needs to work with a single, common, variable, so there is some refactoring done to move the vlan_aware property from the private structure into the common ocelot_port structure. Fixes: 97bb69e1e36e ("net: mscc: ocelot: break apart ocelot_vlan_port_apply") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21net: ipv6: do not consider routes via gateways for anycast address checkTim Stallard1-0/+1
[ Upstream commit 03e2a984b6165621f287fadf5f4b5cd8b58dcaba ] The behaviour for what is considered an anycast address changed in commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception"). This now considers the first address in a subnet where there is a route via a gateway to be an anycast address. This breaks path MTU discovery and traceroutes when a host in a remote network uses the address at the start of a prefix (eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors will not be sent to anycast addresses. This patch excludes any routes with a gateway, or via point to point links, like the behaviour previously from rt6_is_gw_or_nonexthop in net/ipv6/route.c. This can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8:100:: dev lo nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1 ip netns exec test sysctl net.ipv6.conf.all.forwarding=1 ip route add 2001:db8:1::1 via 2001:db8::2 ping -I 2001:db8::1 2001:db8:1::1 -c1 ping -I 2001:db8:100:: 2001:db8:1::1 -c1 ip addr delete 2001:db8:100:: dev lo ip netns delete test Currently the first ping will get back a destination unreachable ICMP error, but the second will never get a response, with "icmp6_send: acast source" logged. After this patch, both get destination unreachable ICMP replies. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Tim Stallard <code@timstallard.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17xarray: Fix early termination of xas_for_each_markedMatthew Wilcox (Oracle)1-1/+5
commit 7e934cf5ace1dceeb804f7493fa28bb697ed3c52 upstream. xas_for_each_marked() is using entry == NULL as a termination condition of the iteration. When xas_for_each_marked() is used protected only by RCU, this can however race with xas_store(xas, NULL) in the following way: TASK1 TASK2 page_cache_delete() find_get_pages_range_tag() xas_for_each_marked() xas_find_marked() off = xas_find_chunk() xas_store(&xas, NULL) xas_init_marks(&xas); ... rcu_assign_pointer(*slot, NULL); entry = xa_entry(off); And thus xas_for_each_marked() terminates prematurely possibly leading to missed entries in the iteration (translating to missing writeback of some pages or a similar problem). If we find a NULL entry that has been marked, skip it (unless we're trying to allocate an entry). Reported-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17signal: Extend exec_id to 64bitsEric W. Biederman1-2/+2
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream. Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()Thomas Gleixner1-3/+9
commit e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b upstream. A recent change to freeze_secondary_cpus() which added an early abort if a wakeup is pending missed the fact that the function is also invoked for shutdown, reboot and kexec via disable_nonboot_cpus(). In case of disable_nonboot_cpus() the wakeup event needs to be ignored as the purpose is to terminate the currently running kernel. Add a 'suspend' argument which is only set when the freeze is in context of a suspend operation. If not set then an eventually pending wakeup event is ignored. Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending") Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavankumar Kondeti <pkondeti@codeaurora.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17rcu: Make rcu_barrier() account for offline no-CBs CPUsPaul E. McKenney1-0/+1
commit 127e29815b4b2206c0a97ac1d83f92ffc0e25c34 upstream. Currently, rcu_barrier() ignores offline CPUs, However, it is possible for an offline no-CBs CPU to have callbacks queued, and rcu_barrier() must wait for those callbacks. This commit therefore makes rcu_barrier() directly invoke the rcu_barrier_func() with interrupts disabled for such CPUs. This requires passing the CPU number into this function so that it can entrain the rcu_barrier() callback onto the correct CPU's callback list, given that the code must instead execute on the current CPU. While in the area, this commit fixes a bug where the first CPU's callback might have been invoked before rcu_segcblist_entrain() returned, which would also result in an early wakeup. Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply optimization feedback from Boqun Feng. ] Cc: <stable@vger.kernel.org> # 5.5.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17PCI: endpoint: Fix for concurrent memory allocation in OB address regionKishon Vijay Abraham I1-0/+3
commit 04e046ca57ebed3943422dee10eec9e73aec081e upstream. pci-epc-mem uses a bitmap to manage the Endpoint outbound (OB) address region. This address region will be shared by multiple endpoint functions (in the case of multi function endpoint) and it has to be protected from concurrent access to avoid updating an inconsistent state. Use a mutex to protect bitmap updates to prevent the memory allocation API from returning incorrect addresses. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17nvme-fc: Revert "add module to ops template to allow module references"James Smart1-4/+0
commit 8c5c660529209a0e324c1c1a35ce3f83d67a2aa5 upstream. The original patch was to resolve the lldd being able to be unloaded while being used to talk to the boot device of the system. However, the end result of the original patch is that any driver unload while a nvme controller is live via the lldd is now being prohibited. Given the module reference, the module teardown routine can't be called, thus there's no way, other than manual actions to terminate the controllers. Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=nMartin Blumenstingl1-1/+1
commit 3f5b9959041e0db6dacbea80bb833bff5900999f upstream. When CONFIG_DEVFREQ_THERMAL is disabled all functions except of_devfreq_cooling_register_power() were already inlined. Also inline the last function to avoid compile errors when multiple drivers call of_devfreq_cooling_register_power() when CONFIG_DEVFREQ_THERMAL is not set. Compilation failed with the following message: multiple definition of `of_devfreq_cooling_register_power' (which then lists all usages of of_devfreq_cooling_register_power()) Thomas Zimmermann reported this problem [0] on a kernel config with CONFIG_DRM_LIMA={m,y}, CONFIG_DRM_PANFROST={m,y} and CONFIG_DEVFREQ_THERMAL=n after both, the lima and panfrost drivers gained devfreq cooling support. [0] https://www.spinics.net/lists/dri-devel/msg252825.html Fixes: a76caf55e5b356 ("thermal: Add devfreq cooling") Cc: stable@vger.kernel.org Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200403205133.1101808-1-martin.blumenstingl@googlemail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17ACPICA: Allow acpi_any_gpe_status_set() to skip one GPERafael J. Wysocki1-1/+1
commit 0ce792d660bda990c675eaf14ce09594a9b85cbf upstream. The check carried out by acpi_any_gpe_status_set() is not precise enough for the suspend-to-idle implementation in Linux and in some cases it is necessary make it skip one GPE (specifically, the EC GPE) from the check to prevent a race condition leading to a premature system resume from occurring. For this reason, redefine acpi_any_gpe_status_set() to take the number of a GPE to skip as an argument. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206629 Tested-by: Ondřej Caletka <ondrej@caletka.cz> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17block: Fix use-after-free issue accessing struct io_cqSahitya Tummala1-0/+1
[ Upstream commit 30a2da7b7e225ef6c87a660419ea04d3cef3f6a7 ] There is a potential race between ioc_release_fn() and ioc_clear_queue() as shown below, due to which below kernel crash is observed. It also can result into use-after-free issue. context#1: context#2: ioc_release_fn() __ioc_clear_queue() gets the same icq ->spin_lock(&ioc->lock); ->spin_lock(&ioc->lock); ->ioc_destroy_icq(icq); ->list_del_init(&icq->q_node); ->call_rcu(&icq->__rcu_head, icq_free_icq_rcu); ->spin_unlock(&ioc->lock); ->ioc_destroy_icq(icq); ->hlist_del_init(&icq->ioc_node); This results into below crash as this memory is now used by icq->__rcu_head in context#1. There is a chance that icq could be free'd as well. 22150.386550: <6> Unable to handle kernel write to read-only memory at virtual address ffffffaa8d31ca50 ... Call trace: 22150.607350: <2> ioc_destroy_icq+0x44/0x110 22150.611202: <2> ioc_clear_queue+0xac/0x148 22150.615056: <2> blk_cleanup_queue+0x11c/0x1a0 22150.619174: <2> __scsi_remove_device+0xdc/0x128 22150.623465: <2> scsi_forget_host+0x2c/0x78 22150.627315: <2> scsi_remove_host+0x7c/0x2a0 22150.631257: <2> usb_stor_disconnect+0x74/0xc8 22150.635371: <2> usb_unbind_interface+0xc8/0x278 22150.639665: <2> device_release_driver_internal+0x198/0x250 22150.644897: <2> device_release_driver+0x24/0x30 22150.649176: <2> bus_remove_device+0xec/0x140 22150.653204: <2> device_del+0x270/0x460 22150.656712: <2> usb_disable_device+0x120/0x390 22150.660918: <2> usb_disconnect+0xf4/0x2e0 22150.664684: <2> hub_event+0xd70/0x17e8 22150.668197: <2> process_one_work+0x210/0x480 22150.672222: <2> worker_thread+0x32c/0x4c8 Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to indicate this icq is once marked as destroyed. Also, ensure __ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so that icq doesn't get free'd up while it is still using it. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Co-developed-by: Pradeep P V K <ppvk@codeaurora.org> Signed-off-by: Pradeep P V K <ppvk@codeaurora.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17media: rc: add keymap for Videostrong KII ProMohammad Rasim1-0/+1
[ Upstream commit 30defecb98400575349a7d32f0526e1dc42ea83e ] This is an NEC remote control device shipped with the Videostrong KII Pro tv box as well as other devices from videostrong. Signed-off-by: Mohammad Rasim <mohammad.rasim96@gmail.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-13IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloadsAlex Vesker1-1/+5
commit 41e684ef3f37ce6e5eac3fb5b9c7c1853f4b0447 upstream. Until now the flex parser capability was used in ib_query_device() to indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp. Newer devices and firmware will have configurations with the flexparser but without mpls support. Testing for the flex parser capability was a mistake, the tunnel_stateless capability was intended for detecting mpls and was introduced at the same time as the flex parser capability. Otherwise userspace will be incorrectly informed that a future device supports MPLS when it does not. Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org Cc: <stable@vger.kernel.org> # 4.17 Fixes: e818e255a58d ("IB/mlx5: Expose MPLS related tunneling offloads") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13ACPI: PM: Add acpi_[un]register_wakeup_handler()Hans de Goede1-0/+5
commit ddfd9dcf270ce23ed1985b66fcfa163920e2e1b8 upstream. Since commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") the SCI triggering without there being a wakeup cause recognized by the ACPI sleep code will no longer wakeup the system. This works as intended, but this is a problem for devices where the SCI is shared with another device which is also a wakeup source. In the past these, from the pov of the ACPI sleep code, spurious SCIs would still cause a wakeup so the wakeup from the device sharing the interrupt would actually wakeup the system. This now no longer works. This is a problem on e.g. Bay Trail-T and Cherry Trail devices where some peripherals (typically the XHCI controller) can signal a Power Management Event (PME) to the Power Management Controller (PMC) to wakeup the system, this uses the same interrupt as the SCI. These wakeups are handled through a special INT0002 ACPI device which checks for events in the GPE0a_STS for this and takes care of acking the PME so that the shared interrupt stops triggering. The change to the ACPI sleep code to ignore the spurious SCI, causes the system to no longer wakeup on these PME events. To make things worse this means that the INT0002 device driver interrupt handler will no longer run, causing the PME to not get cleared and resulting in the system hanging. Trying to wakeup the system after such a PME through e.g. the power button no longer works. Add an acpi_register_wakeup_handler() function which registers a handler to be called from acpi_s2idle_wake() and when the handler returns true, return true from acpi_s2idle_wake(). The INT0002 driver will use this mechanism to check the GPE0a_STS register from acpi_s2idle_wake() and to tell the system to wakeup if a PME is signaled in the register. Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-08Revert "ALSA: uapi: Drop asound.h inclusion from asoc.h"Takashi Iwai1-0/+1
commit b6f69c795547f59ddf1db17cddbd2b9a15c656ed upstream. This reverts commit 645c08f17f477915f6d900b767e789852f150054 which was reported to break the build a program using this header. The original issue was addressed in the alsa-lib side recently, so we can make the header more self-contained again. Reported-by: Dmitry V. Levin <ldv@altlinux.org> Fixes: 645c08f17f47 ("ALSA: uapi: Drop asound.h inclusion from asoc.h") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200331090023.8112-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-08coresight: do not use the BIT() macro in the UAPI headerEugene Syromiatnikov1-2/+4
commit 9b6eaaf3db5e5888df7bca7fed7752a90f7fd871 upstream. The BIT() macro definition is not available for the UAPI headers (moreover, it can be defined differently in the user space); replace its usage with the _BITUL() macro that is defined in <linux/const.h>. Fixes: 237483aa5cf4 ("coresight: stm: adding driver for CoreSight STM component") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02vt: switch vt_dont_switch to boolJiri Slaby1-1/+1
commit f400991bf872debffb01c46da882dc97d7e3248e upstream. vt_dont_switch is pure boolean, no need for whole char. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02vt: selection, introduce vc_is_selJiri Slaby1-1/+3
commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream. Avoid global variables (namely sel_cons) by introducing vc_is_sel. It checks whether the parameter is the current selection console. This will help putting sel_cons to a struct later. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+12
Merge vm fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/sparse: fix kernel crash with pfn_section_valid check mm: fork: fix kernel_stack memcg stats for various stack implementations hugetlb_cgroup: fix illegal access to memory drivers/base/memory.c: indicate all memory blocks as removable mm/swapfile.c: move inode_lock out of claim_swapfile
2020-03-29mm: fork: fix kernel_stack memcg stats for various stack implementationsRoman Gushchin1-0/+12
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the space for task stacks can be allocated using __vmalloc_node_range(), alloc_pages_node() and kmem_cache_alloc_node(). In the first and the second cases page->mem_cgroup pointer is set, but in the third it's not: memcg membership of a slab page should be determined using the memcg_from_slab_page() function, which looks at page->slab_cache->memcg_params.memcg . In this case, using mod_memcg_page_state() (as in account_kernel_stack()) is incorrect: page->mem_cgroup pointer is NULL even for pages charged to a non-root memory cgroup. It can lead to kernel_stack per-memcg counters permanently showing 0 on some architectures (depending on the configuration). In order to fix it, let's introduce a mod_memcg_obj_state() helper, which takes a pointer to a kernel object as a first argument, uses mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls mod_memcg_state(). It allows to handle all possible configurations (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without spilling any memcg/kmem specifics into fork.c . Note: This is a special version of the patch created for stable backports. It contains code from the following two patches: - mm: memcg/slab: introduce mem_cgroup_from_obj() - mm: fork: fix kernel_stack memcg stats for various stack implementations [guro@fb.com: introduce mem_cgroup_from_obj()] Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-2/+3
Pull networking fixes from David Miller: 1) Fix memory leak in vti6, from Torsten Hilbrich. 2) Fix double free in xfrm_policy_timer, from YueHaibing. 3) NL80211_ATTR_CHANNEL_WIDTH attribute is put with wrong type, from Johannes Berg. 4) Wrong allocation failure check in qlcnic driver, from Xu Wang. 5) Get ks8851-ml IO operations right, for real this time, from Marek Vasut. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits) r8169: fix PHY driver check on platforms w/o module softdeps net: ks8851-ml: Fix IO operations, again mlxsw: spectrum_mr: Fix list iteration in error path qlcnic: Fix bad kzalloc null test mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX mac80211: mark station unauthorized before key removal mac80211: Check port authorization in the ieee80211_tx_dequeue() case cfg80211: Do not warn on same channel at the end of CSA mac80211: drop data frames without key on encrypted links ieee80211: fix HE SPR size calculation nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type xfrm: policy: Fix doulbe free in xfrm_policy_timer bpf: Explicitly memset some bpf info structures declared on the stack bpf: Explicitly memset the bpf_attr structure bpf: Sanitize the bpf_struct_ops tcp-cc name vti6: Fix memory leak of skb if input policy check fails esp: remove the skb from the chain when it's enqueued in cryptd_wq ipv6: xfrm6_tunnel.c: Use built-in RCU list checking xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire xfrm: fix uctx len check in verify_sec_ctx_len ...
2020-03-28Merge branch 'i2c/for-current' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Three more driver bugfixes, and two doc improvements fixing build warnings while we are here" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: pca-platform: Use platform_irq_get_optional i2c: st: fix missing struct parameter description i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status() i2c: fix a doc warning i2c: hix5hd2: add missed clk_disable_unprepare in remove
2020-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-0/+1
Daniel Borkmann says: ==================== pull-request: bpf 2020-03-27 The following pull-request contains BPF updates for your *net* tree. We've added 3 non-merge commits during the last 4 day(s) which contain a total of 4 files changed, 25 insertions(+), 20 deletions(-). The main changes are: 1) Explicitly memset the bpf_attr structure on bpf() syscall to avoid having to rely on compiler to do so. Issues have been noticed on some compilers with padding and other oddities where the request was then unexpectedly rejected, from Greg Kroah-Hartman. 2) Sanitize the bpf_struct_ops TCP congestion control name in order to avoid problematic characters such as whitespaces, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes. Mostly they're around the i.MX drivers fixing the parents of a few clks and making KASAN happy with how the message passing code works. Besides that we have a TI driver fix for the RTC parent and a fix for the basic gate type registration functions introduced this release where they didn't actually pass the arguments in the right places to the multiplexer function down below" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx: Align imx sc clock parent msg structs to 4 clk: imx: Align imx sc clock msg structs to 4 clk: Pass correct arguments to __clk_hw_register_gate() clk: ti: am43xx: Fix clock parent for RTC clock clk: imx8mp: Correct the enet_qos parent clock clk: imx8mp: Correct IMX8MP_CLK_HDMI_AXI clock parent
2020-03-27Merge branch 'for-linus' of ↵Linus Torvalds2-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - a fix to generate proper timestamps on key autorepeat events that were broken recently - a fix for Synaptics driver to only activate reduced reporting mode when explicitly requested - a new keycode for "selective screenshot" function - other assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fix stale timestamp on key autorepeat events Input: move the new KEY_SELECTIVE_SCREENSHOT keycode Input: avoid BIT() macro usage in the serio.h UAPI header Input: synaptics-rmi4 - set reduced reporting mode only when requested Input: synaptics - enable RMI on HP Envy 13-ad105ng Input: allocate keycode for "Selective Screenshot" key Input: tm2-touchkey - add support for Coreriver TC360 variant dt-bindings: input: add Coreriver TC360 binding dt-bindings: vendor-prefixes: Add Coreriver vendor prefix Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
2020-03-27Merge tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds3-5/+12
Pull ceph fixes from Ilya Dryomov: "A patch for a rather old regression in fullness handling and two memory leak fixes, marked for stable" * tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client: ceph: fix memory leak in ceph_cleanup_snapid_map() libceph: fix alloc_msg_with_page_vector() memory leaks ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
2020-03-26Input: move the new KEY_SELECTIVE_SCREENSHOT keycodeDmitry Torokhov1-1/+1
We should try to keep keycodes sequential unless there is a reason to leave a gap in numbering, so let's move it from 0x280 to 0x27a while we still can. Fixes: 3b059da9835c ("Input: allocate keycode for Selective Screenshot key") Acked-by: Rajat Jain <rajatja@google.com> Link: https://lore.kernel.org/r/20200326182711.GA259753@dtor-ws Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-03-26Merge tag 'mac80211-for-net-2020-03-26' of ↵David S. Miller1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have the following fixes: * drop data packets if there's no key for them anymore, after there had been one, to avoid sending them in clear when hostapd removes the key before it removes the station and the packets are still queued * check port authorization again after dequeue, to avoid sending packets if the station is no longer authorized * actually remove the authorization flag before the key so packets are also dropped properly because of this * fix nl80211 control port packet tagging to handle them as packets allowed to go out without encryption * fix NL80211_ATTR_CHANNEL_WIDTH outgoing netlink attribute width (should be 32 bits, not 8) * don't WARN in a CSA scenario that happens on some APs * fix HE spatial reuse element size calculation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26clk: Pass correct arguments to __clk_hw_register_gate()Stephen Boyd1-5/+5
I copy/pasted these macros and forgot to update the argument names and where they're passed to. Fix it so that these macros make sense. Reported-by: Maxime Ripard <maxime@cerno.tech> Fixes: 194efb6e2667 ("clk: gate: Add support for specifying parents via DT/pointers") Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20200325022257.148244-1-sboyd@kernel.org Tested-by: Maxime Ripard <mripard@kernel.org>
2020-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds6-32/+54
Pull networking fixes from David Miller: 1) Fix deadlock in bpf_send_signal() from Yonghong Song. 2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan. 3) Add missing locking in iwlwifi mvm code, from Avraham Stern. 4) Fix MSG_WAITALL handling in rxrpc, from David Howells. 5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong Wang. 6) Fix producer race condition in AF_PACKET, from Willem de Bruijn. 7) cls_route removes the wrong filter during change operations, from Cong Wang. 8) Reject unrecognized request flags in ethtool netlink code, from Michal Kubecek. 9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from Doug Berger. 10) Don't leak ct zone template in act_ct during replace, from Paul Blakey. 11) Fix flushing of offloaded netfilter flowtable flows, also from Paul Blakey. 12) Fix throughput drop during tx backpressure in cxgb4, from Rahul Lakkireddy. 13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric Dumazet. 14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well, also from Eric Dumazet. 15) Restrict macsec to ethernet devices, from Willem de Bruijn. 16) Fix reference leak in some ethtool *_SET handlers, from Michal Kubecek. 17) Fix accidental disabling of MSI for some r8169 chips, from Heiner Kallweit. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits) net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build net: ena: Add PCI shutdown handler to allow safe kexec selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED selftests/net: add missing tests to Makefile r8169: re-enable MSI on RTL8168c net: phy: mdio-bcm-unimac: Fix clock handling cxgb4/ptp: pass the sign of offset delta in FW CMD net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop net: cbs: Fix software cbs to consider packet sending time net/mlx5e: Do not recover from a non-fatal syndrome net/mlx5e: Fix ICOSQ recovery flow with Striding RQ net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset net/mlx5e: Enhance ICOSQ WQE info fields net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure selftests: netfilter: add nfqueue test case netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress netfilter: nft_fwd_netdev: validate family and chain type netfilter: nft_set_rbtree: Detect partial overlaps on insertion netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion ...
2020-03-25net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} buildPablo Neira Ayuso2-20/+32
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’: net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’ pkt->skb->tc_redirected = 1; ^~ net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’ pkt->skb->tc_from_ingress = 1; ^~ To avoid a direct dependency with tc actions from netfilter, wrap the redirect bits around CONFIG_NET_REDIRECT and move helpers to include/linux/skbuff.h. Turn on this toggle from the ifb driver, the only existing client of these bits in the tree. This patch adds skb_set_redirected() that sets on the redirected bit on the skbuff, it specifies if the packet was redirect from ingress and resets the timestamp (timestamp reset was originally missing in the netfilter bugfix). Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress") Reported-by: noreply@ellerman.id.au Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25ieee80211: fix HE SPR size calculationJohannes Berg1-2/+2
The he_sr_control field is just a u8, so le32_to_cpu() shouldn't be applied to it; this was evidently copied from ieee80211_he_oper_size(). Fix it, and also adjust the type of the local variable. Fixes: ef11a931bd1c ("mac80211: HE: add Spatial Reuse element parsing support") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200325090918.dfe483b49e06.Ia53622f23b2610a2ae6ea39a199866196fe946c1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-25net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_popVladimir Oltean1-7/+0
Not only did this wheel did not need reinventing, but there is also an issue with it: It doesn't remove the VLAN header in a way that preserves the L2 payload checksum when that is being provided by the DSA master hw. It should recalculate checksum both for the push, before removing the header, and for the pull afterwards. But the current implementation is quite dizzying, with pulls followed immediately afterwards by pushes, the memmove is done before the push, etc. This makes a DSA master with RX checksumming offload to print stack traces with the infamous 'hw csum failure' message. So remove the dsa_8021q_remove_header function and replace it with something that actually works with inet checksumming. Fixes: d461933638ae ("net: dsa: tag_8021q: Create helper function for removing VLAN header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25Input: avoid BIT() macro usage in the serio.h UAPI headerEugene Syromiatnikov1-5/+5
The commit 19ba1eb15a2a ("Input: psmouse - add a custom serio protocol to send extra information") introduced usage of the BIT() macro for SERIO_* flags; this macro is not provided in UAPI headers. Replace if with similarly defined _BITUL() macro defined in <linux/const.h>. Fixes: 19ba1eb15a2a ("Input: psmouse - add a custom serio protocol to send extra information") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Cc: <stable@vger.kernel.org> # v5.0+ Link: https://lore.kernel.org/r/20200324041341.GA32335@asgard.redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-03-24netlink: check for null extack in cookie helpersMichal Kubecek1-0/+4
Unlike NL_SET_ERR_* macros, nl_set_extack_cookie_u64() and nl_set_extack_cookie_u32() helpers do not check extack argument for null and neither do their callers, as syzbot recently discovered for ethnl_parse_header(). Instead of fixing the callers and leaving the trap in place, add check of null extack to both helpers to make them consistent with NL_SET_ERR_* macros. v2: drop incorrect second Fixes tag Fixes: 2363d73a2f3e ("ethtool: reject unrecognized request flags") Reported-by: syzbot+258a9089477493cea67b@syzkaller.appspotmail.com Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23libceph: fix alloc_msg_with_page_vector() memory leaksIlya Dryomov1-3/+4
Make it so that CEPH_MSG_DATA_PAGES data item can own pages, fixing a bunch of memory leaks for a page vector allocated in alloc_msg_with_page_vector(). Currently, only watch-notify messages trigger this allocation, and normally the page vector is freed either in handle_watch_notify() or by the caller of ceph_osdc_notify(). But if the message is freed before that (e.g. if the session faults while reading in the message or if the notify is stale), we leak the page vector. This was supposed to be fixed by switching to a message-owned pagelist, but that never happened. Fixes: 1907920324f1 ("libceph: support for sending notifies") Reported-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
2020-03-23ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULLIlya Dryomov2-2/+8
CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult per-pool flags as well. Unfortunately the backwards compatibility here is lacking: - the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but was guarded by require_osd_release >= RELEASE_LUMINOUS - it was subsequently backported to luminous in v12.2.2, but that makes no difference to clients that only check OSDMAP_FULL/NEARFULL because require_osd_release is not client-facing -- it is for OSDs Since all kernels are affected, the best we can do here is just start checking both map flags and pool flags and send that to stable. These checks are best effort, so take osdc->lock and look up pool flags just once. Remove the FIXME, since filesystem quotas are checked above and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set. Cc: stable@vger.kernel.org Reported-by: Yanhu Cao <gmayyyha@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Sage Weil <sage@redhat.com>
2020-03-23i2c: fix a doc warningMauro Carvalho Chehab1-2/+2
Don't let non-letters inside a literal block without escaping it, as the toolchain would mis-interpret it: ./include/linux/i2c.h:518: WARNING: Inline strong start-string without end-string. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+4
Merge misc fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: x86/mm: split vmalloc_sync_all() mm, slub: prevent kmalloc_node crashes and memory leaks mm/mmu_notifier: silence PROVE_RCU_LIST warnings epoll: fix possible lost wakeup on epoll_ctl() path mm: do not allow MADV_PAGEOUT for CoW pages mm, memcg: throttle allocators based on ancestral memory.high mm, memcg: fix corruption on 64-bit divisor in memory.high throttling page-flags: fix a crash at SetPageError(THP_SWAP) mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
2020-03-22x86/mm: split vmalloc_sync_all()Joerg Roedel1-2/+3
Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-22page-flags: fix a crash at SetPageError(THP_SWAP)Qian Cai1-1/+1
Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") supported writing THP to a swap device but forgot to upgrade an older commit df8c94d13c7e ("page-flags: define behavior of FS/IO-related flags on compound pages") which could trigger a crash during THP swapping out with DEBUG_VM_PGFLAGS=y, kernel BUG at include/linux/page-flags.h:317! page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0 anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked) end_swap_bio_write() SetPageError(page) VM_BUG_ON_PAGE(1 && PageCompound(page)) <IRQ> bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4b0 scsi_io_completion+0x509/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 __blk_mqnterrupt+0xf/0x20 </IRQ> Fix by checking PF_NO_TAIL in those places instead. Fixes: bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-blockLinus Torvalds2-1/+3
Pull io_uring fixes from Jens Axboe: "Two different fixes in here: - Fix for a potential NULL pointer deref for links with async or drain marked (Pavel) - Fix for not properly checking RLIMIT_NOFILE for async punted operations. This affects openat/openat2, which were added this cycle, and accept4. I did a full audit of other cases where we might check current->signal->rlim[] and found only RLIMIT_FSIZE for buffered writes and fallocate. That one is fixed and queued for 5.7 and marked stable" * tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block: io_uring: make sure accept honor rlimit nofile io_uring: make sure openat/openat2 honor rlimit nofile io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
2020-03-20io_uring: make sure accept honor rlimit nofileJens Axboe1-1/+2
Just like commit 4022e7af86be, this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-20io_uring: make sure openat/openat2 honor rlimit nofileJens Axboe1-0/+1
Dmitry reports that a test case shows that io_uring isn't honoring a modified rlimit nofile setting. get_unused_fd_flags() checks the task signal->rlimi[] for the limits. As this isn't easily inheritable, provide a __get_unused_fd_flags() that takes the value instead. Then we can grab it when the request is prepared (from the original task), and pass that in when we do the async part part of the open. Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Tested-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>