summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-21samples, bpf: fix to change the buffer size for read()Chang-Hsien Tsai1-1/+1
[ Upstream commit f7c2d64bac1be2ff32f8e4f500c6e5429c1003e0 ] If the trace for read is larger than 4096, the return value sz will be 4096. This results in off-by-one error on buf: static char buf[4096]; ssize_t sz; sz = read(trace_fd, buf, sizeof(buf)); if (sz > 0) { buf[sz] = 0; puts(buf); } Signed-off-by: Chang-Hsien Tsai <luke.tw@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21Input: elantech - enable middle button support on 2 ThinkPadsAaron Ma1-0/+2
[ Upstream commit aa440de3058a3ef530851f9ef373fbb5f694dbc3 ] Adding 2 new touchpad PNPIDs to enable middle button support. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-21crypto: talitos - rename alternative AEAD algos.Christophe Leroy1-8/+8
commit a1a42f84011fae6ff08441a91aefeb7febc984fc upstream. The talitos driver has two ways to perform AEAD depending on the HW capability. Some HW support both. It is needed to give them different names to distingish which one it is for instance when a test fails. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Fixes: 7405c8d7ff97 ("crypto: talitos - templates for AEAD using HMAC_SNOOP_NO_AFEU") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDTJames Morse2-1/+3
commit 83b44fe343b5abfcb1b2261289bd0cfcfcfd60a8 upstream. The cacheinfo structures are alloced/freed by cpu online/offline callbacks. Originally these were only used by sysfs to expose the cache topology to user space. Without any in-kernel dependencies CPUHP_AP_ONLINE_DYN was an appropriate choice. resctrl has started using these structures to identify CPUs that share a cache. It updates its 'domain' structures from cpu online/offline callbacks. These depend on the cacheinfo structures (resctrl_online_cpu()->domain_add_cpu()->get_cache_id()-> get_cpu_cacheinfo()). These also run as CPUHP_AP_ONLINE_DYN. Now that there is an in-kernel dependency, move the cacheinfo work earlier so we know its done before resctrl's CPUHP_AP_ONLINE_DYN work runs. Fixes: 2264d9c74dda1 ("x86/intel_rdt: Build structures for each resource based on cache topology") Cc: <stable@vger.kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20190624173656.202407-1-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi headerMasahiro Yamada1-12/+12
commit c32cc30c0544f13982ee0185d55f4910319b1a79 upstream. cpu_to_le32/le32_to_cpu is defined in include/linux/byteorder/generic.h, which is not exported to user-space. UAPI headers must use the ones prefixed with double-underscore. Detected by compile-testing exported headers: include/linux/nilfs2_ondisk.h: In function `nilfs_checkpoint_set_snapshot': include/linux/nilfs2_ondisk.h:536:17: error: implicit declaration of function `cpu_to_le32' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h:536:29: error: implicit declaration of function `le32_to_cpu' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h: In function `nilfs_segment_usage_set_clean': include/linux/nilfs2_ondisk.h:622:19: error: implicit declaration of function `cpu_to_le64' [-Werror=implicit-function-declaration] su->su_lastmod = cpu_to_le64(0); ^~~~~~~~~~~ Link: http://lkml.kernel.org/r/20190605053006.14332-1-yamada.masahiro@socionext.com Fixes: e63e88bc53ba ("nilfs2: move ioctl interface and disk layout to uapi separately") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Cc: <stable@vger.kernel.org> [4.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21Input: synaptics - enable SMBUS on T480 thinkpad trackpadCole Rogers1-0/+1
commit abbe3acd7d72ab4633ade6bd24e8306b67e0add3 upstream. Thinkpad t480 laptops had some touchpad features disabled, resulting in the loss of pinch to activities in GNOME, on wayland, and other touch gestures being slower. This patch adds the touchpad of the t480 to the smbus_pnp_ids whitelist to enable the extra features. In my testing this does not break suspend (on fedora, with wayland, and GNOME, using the rc-6 kernel), while also fixing the feature on a T480. Signed-off-by: Cole Rogers <colerogers@disroot.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21e1000e: start network tx queue only when link is upKonstantin Khlebnikov1-2/+4
commit d17ba0f616a08f597d9348c372d89b8c0405ccf3 upstream. Driver does not want to keep packets in Tx queue when link is lost. But present code only reset NIC to flush them, but does not prevent queuing new packets. Moreover reset sequence itself could generate new packets via netconsole and NIC falls into endless reset loop. This patch wakes Tx queue only when NIC is ready to send packets. This is proper fix for problem addressed by commit 0f9e980bf5ee ("e1000e: fix cyclic resets at link up with active tx"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Tested-by: Joseph Yasi <joe.yasi@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21Revert "e1000e: fix cyclic resets at link up with active tx"Konstantin Khlebnikov1-6/+9
commit caff422ea81e144842bc44bab408d85ac449377b upstream. This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61. That change cased false-positive warning about hardware hang: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: TDH <0> TDT <1> next_to_use <1> next_to_clean <0> buffer_info[next_to_clean]: time_stamp <fffba7a7> next_to_watch <0> jiffies <fffbb140> next_to_watch.status <0> MAC Status <40080080> PHY Status <7949> PHY 1000BASE-T Status <0> PHY Extended Status <3000> PCI Status <10> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Besides warning everything works fine. Original issue will be fixed property in following patch. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reported-by: Joseph Yasi <joe.yasi@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175 Tested-by: Joseph Yasi <joe.yasi@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10Linux 4.14.133v4.14.133Greg Kroah-Hartman1-1/+1
2019-07-10stable/btrfs: fix backport bug in d819d97ea025 ("btrfs: honor ↵Stanislaw Gruszka1-2/+0
path->skip_locking in backref code") Upstream commit 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") was incorrectly backported to 4.14.y . It misses removal of two lines from original commit, what cause deadlock. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203993 Reported-by: Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr> Fixes: d819d97ea025 ("btrfs: honor path->skip_locking in backref code") Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10dmaengine: imx-sdma: remove BD_INTR for channel0Robin Gong1-2/+2
commit 3f93a4f297961c12bb17aa16cb3a4d1291823cae upstream. It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org #5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10MIPS: Add missing EHB in mtc0 -> mfc0 sequence.Dmitry Korotin1-9/+20
commit 0b24cae4d535045f4c9e177aa228d4e97bad212c upstream. Add a missing EHB (Execution Hazard Barrier) in mtc0 -> mfc0 sequence. Without this execution hazard barrier it's possible for the value read back from the KScratch register to be the value from before the mtc0. Reproducible on P5600 & P6600. The hazard is documented in the MIPS Architecture Reference Manual Vol. III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev 6.03 table 8.1 which includes: Producer | Consumer | Hazard ----------|----------|---------------------------- mtc0 | mfc0 | any coprocessor 0 register Signed-off-by: Dmitry Korotin <dkorotin@wavecomp.com> [paul.burton@mips.com: - Commit message tweaks. - Add Fixes tags. - Mark for stable back to v3.15 where P5600 support was introduced.] Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 3d8bfdd03072 ("MIPS: Use C0_KScratch (if present) to hold PGD pointer.") Fixes: 829dcc0a956a ("MIPS: Add MIPS P5600 probe support") Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10MIPS: Fix bounds check virt_addr_validHauke Mehrtens1-1/+1
commit d6ed083f5cc621e15c15b56c3b585fd524dbcb0f upstream. The bounds check used the uninitialized variable vaddr, it should use the given parameter kaddr instead. When using the uninitialized value the compiler assumed it to be 0 and optimized this function to just return 0 in all cases. This should make the function check the range of the given address and only do the page map check in case it is in the expected range of virtual addresses. Fixes: 074a1e1167af ("MIPS: Bounds check virt_addr_valid") Cc: stable@vger.kernel.org # v4.12+ Cc: Paul Burton <paul.burton@mips.com> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: ralf@linux-mips.org Cc: jhogan@kernel.org Cc: f4bug@amsat.org Cc: linux-mips@vger.kernel.org Cc: ysu@wavecomp.com Cc: jcristau@debian.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10svcrdma: Ignore source port when computing DRC hashChuck Lever1-1/+6
commit 1e091c3bbf51d34d5d96337a59ce5ab2ac3ba2cc upstream. The DRC appears to be effectively empty after an RPC/RDMA transport reconnect. The problem is that each connection uses a different source port, which defeats the DRC hash. Clients always have to disconnect before they send retransmissions to reset the connection's credit accounting, thus every retransmit on NFS/RDMA will miss the DRC. An NFS/RDMA client's IP source port is meaningless for RDMA transports. The transport layer typically sets the source port value on the connection to a random ephemeral port. The server already ignores it for the "secure port" check. See commit 16e4d93f6de7 ("NFSD: Ignore client's source port on RDMA transports"). The Linux NFS server's DRC resolves XID collisions from the same source IP address by using the checksum of the first 200 bytes of the RPC call header. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPICWanpeng Li1-1/+1
commit bb34e690e9340bc155ebed5a3d75fc63ff69e082 upstream. Thomas reported that: | Background: | | In preparation of supporting IPI shorthands I changed the CPU offline | code to software disable the local APIC instead of just masking it. | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV | register. | | Failure: | | When the CPU comes back online the startup code triggers occasionally | the warning in apic_pending_intr_clear(). That complains that the IRRs | are not empty. | | The offending vector is the local APIC timer vector who's IRR bit is set | and stays set. | | It took me quite some time to reproduce the issue locally, but now I can | see what happens. | | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 | (and hardware support) it behaves correctly. | | Here is the series of events: | | Guest CPU | | goes down | | native_cpu_disable() | | apic_soft_disable(); | | play_dead() | | .... | | startup() | | if (apic_enabled()) | apic_pending_intr_clear() <- Not taken | | enable APIC | | apic_pending_intr_clear() <- Triggers warning because IRR is stale | | When this happens then the deadline timer or the regular APIC timer - | happens with both, has fired shortly before the APIC is disabled, but the | interrupt was not serviced because the guest CPU was in an interrupt | disabled region at that point. | | The state of the timer vector ISR/IRR bits: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 1 | | Now one would assume that the IRR is cleared after the INIT reset, but this | happens only on CPU0. | | Why? | | Because our CPU0 hotplug is just for testing to make sure nothing breaks | and goes through an NMI wakeup vehicle because INIT would send it through | the boots-trap code which is not really working if that CPU was not | physically unplugged. | | Now looking at a real world APIC the situation in that case is: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 0 | | Why? | | Once the dying CPU reenables interrupts the pending interrupt gets | delivered as a spurious interupt and then the state is clear. | | While that CPU0 hotplug test case is surely an esoteric issue, the APIC | emulation is still wrong, Even if the play_dead() code would not enable | interrupts then the pending IRR bit would turn into an ISR .. interrupt | when the APIC is reenabled on startup. From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled * Pending interrupts in the IRR and ISR registers are held and require masking or handling by the CPU. In Thomas's testing, hardware cpu will not respect soft disable LAPIC when IRR has already been set or APICv posted-interrupt is in flight, so we can skip soft disable APIC checking when clearing IRR and set ISR, continue to respect soft disable APIC when attempting to set IRR. Reported-by: Rong Chen <rong.a.chen@intel.com> Reported-by: Feng Tang <feng.tang@intel.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10KVM: x86: degrade WARN to pr_warn_ratelimitedPaolo Bonzini1-3/+3
commit 3f16a5c318392cbb5a0c7a3d19dff8c8ef3c38ee upstream. This warning can be triggered easily by userspace, so it should certainly not cause a panic if panic_on_warn is set. Reported-by: syzbot+c03f30b4f4c46bdf8575@syzkaller.appspotmail.com Suggested-by: Alexander Potapenko <glider@google.com> Acked-by: Alexander Potapenko <glider@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ARC: handle gcc generated __builtin_trap for older compilerVineet Gupta1-0/+8
commit af1be2e21203867cb958aaceed5366e2e24b88e8 upstream. ARC gcc prior to GNU 2018.03 release didn't have a target specific __builtin_trap() implementation, generating default abort() call. Implement the abort() call - emulating what newer gcc does for the same, as suggested by Arnd. Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10tty: rocket: fix incorrect forward declaration of 'rp_init()'Linus Torvalds1-1/+1
[ Upstream commit 423ea3255424b954947d167681b71ded1b8fca53 ] Make the forward declaration actually match the real function definition, something that previous versions of gcc had just ignored. This is another patch to fix new warnings from gcc-9 before I start the merge window pulls. I don't want to miss legitimate new warnings just because my system update brought a new compiler with new warnings. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10vhost: scsi: add weight supportJason Wang1-3/+3
commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream. This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable. This addresses CVE-2019-3900. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Balbir Singh <sblbir@amzn.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10vhost: vsock: add weight supportJason Wang1-6/+10
commit e79b431fb901ba1106670bcc80b9b617b25def7d upstream. This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing vsock kthread from hogging cpu which is guest triggerable. The weight can help to avoid starving the request from on direction while another direction is being processed. The value of weight is picked from vhost-net. This addresses CVE-2019-3900. Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Balbir Singh <sblbir@amzn.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10vhost_net: fix possible infinite loopJason Wang1-10/+10
commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream. When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet: while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... } This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like: 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when: - userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers This addresses CVE-2019-3900. Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Balbir Singh <sblbir@amzn.com>
2019-07-10vhost: introduce vhost_exceeds_weight()Jason Wang5-17/+46
commit e82b9b0727ff6d665fff2d326162b460dded554d upstream. We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Balbir Singh <sblbir@amzn.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10vhost_net: introduce vhost_exceeds_weight()Jason Wang1-5/+8
commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Balbir Singh <sblbir@amzn.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10vhost_net: use packet weight for rx handler, tooPaolo Abeni1-4/+8
commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream. Similar to commit a2ac99905f1e ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10vhost-net: set packet weight of tx polling to 2 * vq sizehaibinzhang(张海斌)1-1/+7
commit a2ac99905f1ea8b15997a6ec39af69aa28a3653b upstream. handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy polling udp packets with small length(e.g. 1byte udp payload), because setting VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length. Ping-Latencies shown below were tested between two Virtual Machines using netperf (UDP_STREAM, len=1), and then another machine pinged the client: vq size=256 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 3.319 18.489 57.303 64 1.643 2.021 2.552 128 1.825 2.600 3.224 256 1.997 2.710 4.295 512 1.860 3.171 4.631 1024 2.002 4.173 9.056 2048 2.257 5.650 9.688 4096 2.093 8.508 15.943 vq size=512 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.537 29.177 66.245 64 2.798 3.614 4.403 128 2.861 3.820 4.775 256 3.008 4.018 4.807 512 3.254 4.523 5.824 1024 3.079 5.335 7.747 2048 3.944 8.201 12.762 4096 4.158 11.057 19.985 Seems pretty consistent, a small dip at 2 VQ sizes. Ring size is a hint from device about a burst size it can tolerate. Based on benchmarks, set the weight to 2 * vq size. To evaluate this change, another tests were done using netperf(RR, TX) between two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was tweaked through qemu. Results shown below does not show obvious changes. vq size=256 TCP_RR vq size=512 TCP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2% 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1% 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3% 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1% 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% vq size=256 UDP_RR vq size=512 UDP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0% 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1% 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0% 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% vq size=256 TCP_STREAM vq size=512 TCP_STREAM size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2% 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1% 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0% 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0% 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1% 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0% 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1% 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1% 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0% 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0% 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1% 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1% 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1% Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com> Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10btrfs: Ensure replaced device doesn't have pending chunk allocationNikolay Borisov3-10/+26
commit debd1c065d2037919a7da67baf55cc683fee09f0 upstream. Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update operations during transaction commit") combined the way certain operations are recoded in a transaction. As a result an ASSERT was added in dev_replace_finish to ensure the new code works correctly. Unfortunately I got reports that it's possible to trigger the assert, meaning that during a device replace it's possible to have an unfinished chunk allocation on the source device. This is supposed to be prevented by the fact that a transaction is committed before finishing the replace oepration and alter acquiring the chunk mutex. This is not sufficient since by the time the transaction is committed and the chunk mutex acquired it's possible to allocate a chunk depending on the workload being executed on the replaced device. This bug has been present ever since device replace was introduced but there was never code which checks for it. The correct way to fix is to ensure that there is no pending device modification operation when the chunk mutex is acquire and if there is repeat transaction commit. Unfortunately it's not possible to just exclude the source device from btrfs_fs_devices::dev_alloc_list since this causes ENOSPC to be hit in transaction commit. Fixing that in another way would need to add special cases to handle the last writes and forbid new ones. The looped transaction fix is more obvious, and can be easily backported. The runtime of dev-replace is long so there's no noticeable delay caused by that. Reported-by: David Sterba <dsterba@suse.com> Fixes: 391cd9df81ac ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10mm/vmscan.c: prevent useless kswapd loopsShakeel Butt1-12/+15
commit dffcac2cb88e4ec5906235d64a83d802580b119e upstream. In production we have noticed hard lockups on large machines running large jobs due to kswaps hoarding lru lock within isolate_lru_pages when sc->reclaim_idx is 0 which is a small zone. The lru was couple hundred GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in isolate_lru_pages() was basically skipping GiBs of pages while holding the LRU spinlock with interrupt disabled. On further inspection, it seems like there are two issues: (1) If kswapd on the return from balance_pgdat() could not sleep (i.e. node is still unbalanced), the classzone_idx is unintentionally set to 0 and the whole reclaim cycle of kswapd will try to reclaim only the lowest and smallest zone while traversing the whole memory. (2) Fundamentally isolate_lru_pages() is really bad when the allocation has woken kswapd for a smaller zone on a very large machine running very large jobs. It can hoard the LRU spinlock while skipping over 100s of GiBs of pages. This patch only fixes (1). (2) needs a more fundamental solution. To fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is invalid use the classzone_idx of the previous kswapd loop otherwise use the one the waker has requested. Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com Fixes: e716f2eb24de ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ftrace/x86: Remove possible deadlock between register_kprobe() and ↵Petr Mladek2-9/+4
ftrace_run_update_code() commit d5b844a2cf507fc7642c9ae80a9d585db3065c28 upstream. The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") causes a possible deadlock between register_kprobe() and ftrace_run_update_code() when ftrace is using stop_machine(). The existing dependency chain (in reverse order) is: -> #1 (text_mutex){+.+.}: validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 __mutex_lock+0x88/0x908 mutex_lock_nested+0x32/0x40 register_kprobe+0x254/0x658 init_kprobes+0x11a/0x168 do_one_initcall+0x70/0x318 kernel_init_freeable+0x456/0x508 kernel_init+0x22/0x150 ret_from_fork+0x30/0x34 kernel_thread_starter+0x0/0xc -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x90c/0xde0 validate_chain.isra.21+0xb32/0xd70 __lock_acquire+0x4b8/0x928 lock_acquire+0x102/0x230 cpus_read_lock+0x62/0xd0 stop_machine+0x2e/0x60 arch_ftrace_update_code+0x2e/0x40 ftrace_run_update_code+0x40/0xa0 ftrace_startup+0xb2/0x168 register_ftrace_function+0x64/0x88 klp_patch_object+0x1a2/0x290 klp_enable_patch+0x554/0x980 do_one_initcall+0x70/0x318 do_init_module+0x6e/0x250 load_module+0x1782/0x1990 __s390x_sys_finit_module+0xaa/0xf0 system_call+0xd8/0x2d0 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); lock(text_mutex); lock(cpu_hotplug_lock.rw_sem); It is similar problem that has been solved by the commit 2d1e38f56622b9b ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved. To be on the safe side, text_mutex must become a low level lock taken after cpu_hotplug_lock.rw_sem. This can't be achieved easily with the current ftrace design. For example, arm calls set_all_modules_text_rw() already in ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c. This functions is called: + outside stop_machine() from ftrace_run_update_code() + without stop_machine() from ftrace_module_enable() Fortunately, the problematic fix is needed only on x86_64. It is the only architecture that calls set_all_modules_text_rw() in ftrace path and supports livepatching at the same time. Therefore it is enough to move text_mutex handling from the generic kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c: ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_post_process() This patch basically reverts the ftrace part of the problematic commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race"). And provides x86_64 specific-fix. Some refactoring of the ftrace code will be needed when livepatching is implemented for arm or nds32. These architectures call set_all_modules_text_rw() and use stop_machine() at the same time. Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race") Acked-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Petr Mladek <pmladek@suse.com> [ As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of ftrace_run_update_code() as it is a void function. ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10drm/imx: only send event on crtc disable if kept disabledRobert Beckett1-1/+1
commit 5aeab2bfc9ffa72d3ca73416635cb3785dfc076f upstream. The event will be sent as part of the vblank enable during the modeset if the crtc is not being kept disabled. Fixes: 5f2f911578fb ("drm/imx: atomic phase 3 step 1: Use atomic configuration") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10drm/imx: notify drm core before sending event during crtc disableRobert Beckett1-2/+2
commit 78c68e8f5cd24bd32ba4ca1cdfb0c30cf0642685 upstream. Notify drm core before sending pending events during crtc disable. This fixes the first event after disable having an old stale timestamp by having drm_crtc_vblank_off update the timestamp to now. This was seen while debugging weston log message: Warning: computed repaint delay is insane: -8212 msec This occurred due to: 1. driver starts up 2. fbcon comes along and restores fbdev, enabling vblank 3. vblank_disable_fn fires via timer disabling vblank, keeping vblank seq number and time set at current value (some time later) 4. weston starts and does a modeset 5. atomic commit disables crtc while it does the modeset 6. ipu_crtc_atomic_disable sends vblank with old seq number and time Fixes: a474478642d5 ("drm/imx: fix crtc vblank state regression") Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZEAlex Deucher1-19/+0
commit 25f09f858835b0e9a06213811031190a17d8ab78 upstream. Recommended by the hw team. Reviewed-and-Tested-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10arm64: kaslr: keep modules inside module region when KASAN is enabledArd Biesheuvel1-2/+6
commit 6f496a555d93db7a11d4860b9220d904822f586a upstream. When KASLR and KASAN are both enabled, we keep the modules where they are, and randomize the placement of the kernel so it is within 2 GB of the module region. The reason for this is that putting modules in the vmalloc region (like we normally do when KASLR is enabled) is not possible in this case, given that the entire vmalloc region is already backed by KASAN zero shadow pages, and so allocating dedicated KASAN shadow space as required by loaded modules is not possible. The default module allocation window is set to [_etext - 128MB, _etext] in kaslr.c, which is appropriate for KASLR kernels booted without a seed or with 'nokaslr' on the command line. However, as it turns out, it is not quite correct for the KASAN case, since it still intersects the vmalloc region at the top, where attempts to allocate shadow pages will collide with the KASAN zero shadow pages, causing a WARN() and all kinds of other trouble. So cap the top end to MODULES_END explicitly when running with KASAN. Cc: <stable@vger.kernel.org> # 4.9+ Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10tracing/snapshot: Resize spare buffer if size changedEiichi Tsukata1-4/+6
commit 46cc0b44428d0f0e81f11ea98217fc0edfbeab07 upstream. Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10lib/mpi: Fix karactx leak in mpi_powmHerbert Xu1-4/+2
commit c8ea9fce2baf7b643384f36f29e4194fa40d33a6 upstream. Sometimes mpi_powm will leak karactx because a memory allocation failure causes a bail-out that skips the freeing of karactx. This patch moves the freeing of karactx to the end of the function like everything else so that it can't be skipped. Reported-by: syzbot+f7baccc38dcc1e094e77@syzkaller.appspotmail.com Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files...") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ALSA: hda/realtek - Change front mic location for Lenovo M710qDennis Wassenberg1-0/+1
commit bef33e19203dde434bcdf21c449e3fb4f06c2618 upstream. On M710q Lenovo ThinkCentre machine, there are two front mics, we change the location for one of them to avoid conflicts. Signed-off-by: Dennis Wassenberg <dennis.wassenberg@secunet.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ALSA: usb-audio: fix sign unintended sign extension on left shiftsColin Ian King1-2/+2
commit 2acf5a3e6e9371e63c9e4ff54d84d08f630467a0 upstream. There are a couple of left shifts of unsigned 8 bit values that first get promoted to signed ints and hence get sign extended on the shift if the top bit of the 8 bit values are set. Fix this by casting the 8 bit values to unsigned ints to stop the unintentional sign extension. Addresses-Coverity: ("Unintended sign extension") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ALSA: line6: Fix write on zero-sized bufferTakashi Iwai1-0/+5
commit 3450121997ce872eb7f1248417225827ea249710 upstream. LINE6 drivers allocate the buffers based on the value returned from usb_maxpacket() calls. The manipulated device may return zero for this, and this results in the kmalloc() with zero size (and it may succeed) while the other part of the driver code writes the packet data with the fixed size -- which eventually overwrites. This patch adds a simple sanity check for the invalid buffer size for avoiding that problem. Reported-by: syzbot+219f00fb49874dcaea17@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messagesTakashi Sakamoto1-1/+1
commit 7fbd1753b64eafe21cf842348a40a691d0dee440 upstream. In IEC 61883-6, 8 MIDI data streams are multiplexed into single MIDI conformant data channel. The index of stream is calculated by modulo 8 of the value of data block counter. In fireworks, the value of data block counter in CIP header has a quirk with firmware version v5.0.0, v5.7.3 and v5.8.0. This brings ALSA IEC 61883-1/6 packet streaming engine to miss detection of MIDI messages. This commit fixes the miss detection to modify the value of data block counter for the modulo calculation. For maintainers, this bug exists since a commit 18f5ed365d3f ("ALSA: fireworks/firewire-lib: add support for recent firmware quirk") in Linux kernel v4.2. There're many changes since the commit. This fix can be backported to Linux kernel v4.4 or later. I tagged a base commit to the backport for your convenience. Besides, my work for Linux kernel v5.3 brings heavy code refactoring and some structure members are renamed in 'sound/firewire/amdtp-stream.h'. The content of this patch brings conflict when merging -rc tree with this patch and the latest tree. I request maintainers to solve the conflict to replace 'tx_first_dbc' with 'ctx_data.tx.first_dbc'. Fixes: df075feefbd3 ("ALSA: firewire-lib: complete AM824 data block processing layer") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ALSA: seq: fix incorrect order of dest_client/dest_ports argumentsColin Ian King2-2/+2
commit c3ea60c231446663afd6ea1054da6b7f830855ca upstream. There are two occurrances of a call to snd_seq_oss_fill_addr where the dest_client and dest_port arguments are in the wrong order. Fix this by swapping them around. Addresses-Coverity: ("Arguments in wrong order") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10crypto: cryptd - Fix skcipher instance memory leakVincent Whitchurch1-0/+1
commit 1a0fad630e0b7cff38e7691b28b0517cfbb0633f upstream. cryptd_skcipher_free() fails to free the struct skcipher_instance allocated in cryptd_create_skcipher(), leading to a memory leak. This is detected by kmemleak on bootup on ARM64 platforms: unreferenced object 0xffff80003377b180 (size 1024): comm "cryptomgr_probe", pid 822, jiffies 4294894830 (age 52.760s) backtrace: kmem_cache_alloc_trace+0x270/0x2d0 cryptd_create+0x990/0x124c cryptomgr_probe+0x5c/0x1e8 kthread+0x258/0x318 ret_from_fork+0x10/0x1c Fixes: 4e0958d19bd8 ("crypto: cryptd - Add support for skcipher") Cc: <stable@vger.kernel.org> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10crypto: user - prevent operating on larval algorithmsEric Biggers1-0/+3
commit 21d4120ec6f5b5992b01b96ac484701163917b63 upstream. Michal Suchanek reported [1] that running the pcrypt_aead01 test from LTP [2] in a loop and holding Ctrl-C causes a NULL dereference of alg->cra_users.next in crypto_remove_spawns(), via crypto_del_alg(). The test repeatedly uses CRYPTO_MSG_NEWALG and CRYPTO_MSG_DELALG. The crash occurs when the instance that CRYPTO_MSG_DELALG is trying to unregister isn't a real registered algorithm, but rather is a "test larval", which is a special "algorithm" added to the algorithms list while the real algorithm is still being tested. Larvals don't have initialized cra_users, so that causes the crash. Normally pcrypt_aead01 doesn't trigger this because CRYPTO_MSG_NEWALG waits for the algorithm to be tested; however, CRYPTO_MSG_NEWALG returns early when interrupted. Everything else in the "crypto user configuration" API has this same bug too, i.e. it inappropriately allows operating on larval algorithms (though it doesn't look like the other cases can cause a crash). Fix this by making crypto_alg_match() exclude larval algorithms. [1] https://lkml.kernel.org/r/20190625071624.27039-1-msuchanek@suse.de [2] https://github.com/linux-test-project/ltp/blob/20190517/testcases/kernel/crypto/pcrypt_aead01.c Reported-by: Michal Suchanek <msuchanek@suse.de> Fixes: a38f7907b926 ("crypto: Add userspace configuration API") Cc: <stable@vger.kernel.org> # v3.2+ Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEMEJann Horn1-3/+1
commit 6994eefb0053799d2e07cd140df6c2ea106c41ee upstream. Fix two issues: When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU reference to the parent's objective credentials, then give that pointer to get_cred(). However, the object lifetime rules for things like struct cred do not permit unconditionally turning an RCU reference into a stable reference. PTRACE_TRACEME records the parent's credentials as if the parent was acting as the subject, but that's not the case. If a malicious unprivileged child uses PTRACE_TRACEME and the parent is privileged, and at a later point, the parent process becomes attacker-controlled (because it drops privileges and calls execve()), the attacker ends up with control over two processes with a privileged ptrace relationship, which can be abused to ptrace a suid binary and obtain root privileges. Fix both of these by always recording the credentials of the process that is requesting the creation of the ptrace relationship: current_cred() can't change under us, and current is the proper subject for access control. This change is theoretically userspace-visible, but I am not aware of any code that it will actually break. Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10drm/i915/dmc: protect against reading random memoryLucas De Marchi1-0/+18
commit bc7b488b1d1c71dc4c5182206911127bc6c410d6 upstream. While loading the DMC firmware we were double checking the headers made sense, but in no place we checked that we were actually reading memory we were supposed to. This could be wrong in case the firmware file is truncated or malformed. Before this patch: # ls -l /lib/firmware/i915/icl_dmc_ver1_07.bin -rw-r--r-- 1 root root 25716 Feb 1 12:26 icl_dmc_ver1_07.bin # truncate -s 25700 /lib/firmware/i915/icl_dmc_ver1_07.bin # modprobe i915 # dmesg| grep -i dmc [drm:intel_csr_ucode_init [i915]] Loading i915/icl_dmc_ver1_07.bin [drm] Finished loading DMC firmware i915/icl_dmc_ver1_07.bin (v1.7) i.e. it loads random data. Now it fails like below: [drm:intel_csr_ucode_init [i915]] Loading i915/icl_dmc_ver1_07.bin [drm:csr_load_work_fn [i915]] *ERROR* Truncated DMC firmware, rejecting. i915 0000:00:02.0: Failed to load DMC firmware i915/icl_dmc_ver1_07.bin. Disabling runtime power management. i915 0000:00:02.0: DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 Before reading any part of the firmware file, validate the input first. Fixes: eb805623d8b1 ("drm/i915/skl: Add support to load SKL CSR firmware.") Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190605235535.17791-1-lucas.demarchi@intel.com (cherry picked from commit bc7b488b1d1c71dc4c5182206911127bc6c410d6) Signed-off-by: Jani Nikula <jani.nikula@intel.com> [ Lucas: backported to 4.9+ adjusting the context ] Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()Paul Burton1-2/+0
[ Upstream commit 02eec6c9fc0cb13169cc97a6139771768791f92b ] In nlm_fmn_send() we have a loop which attempts to send a message multiple times in order to handle the transient failure condition of a lack of available credit. When examining the status register to detect the failure we check for a condition that can never be true, which falls foul of gcc 8's -Wtautological-compare: In file included from arch/mips/netlogic/common/irq.c:65: ./arch/mips/include/asm/netlogic/xlr/fmn.h: In function 'nlm_fmn_send': ./arch/mips/include/asm/netlogic/xlr/fmn.h:304:22: error: bitwise comparison always evaluates to false [-Werror=tautological-compare] if ((status & 0x2) == 1) ^~ If the path taken if this condition were true all we do is print a message to the kernel console. Since failures seem somewhat expected here (making the console message questionable anyway) and the condition has clearly never evaluated true we simply remove it, rather than attempting to fix it to check status correctly. Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20174/ Cc: Ganesan Ramalingam <ganesanr@broadcom.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jayachandran C <jnair@caviumnetworks.com> Cc: John Crispin <john@phrozen.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()Wei Li1-2/+5
[ Upstream commit 04e03d9a616c19a47178eaca835358610e63a1dd ] The mapper may be NULL when called from register_ftrace_function_probe() with probe->data == NULL. This issue can be reproduced as follow (it may be covered by compiler optimization sometime): / # cat /sys/kernel/debug/tracing/set_ftrace_filter #### all functions enabled #### / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter [ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 206.952402] Mem abort info: [ 206.952819] ESR = 0x96000006 [ 206.955326] Exception class = DABT (current EL), IL = 32 bits [ 206.955844] SET = 0, FnV = 0 [ 206.956272] EA = 0, S1PTW = 0 [ 206.956652] Data abort info: [ 206.957320] ISV = 0, ISS = 0x00000006 [ 206.959271] CM = 0, WnR = 0 [ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000 [ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000 [ 206.964953] Internal error: Oops: 96000006 [#1] SMP [ 206.971122] Dumping ftrace buffer: [ 206.973677] (ftrace buffer empty) [ 206.975258] Modules linked in: [ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____)) [ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17 [ 206.978955] Hardware name: linux,dummy-virt (DT) [ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO) [ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118 [ 206.980874] lr : ftrace_count_free+0x68/0x80 [ 206.982539] sp : ffff0000182f3ab0 [ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700 [ 206.983632] x27: ffff000013054b40 x26: 0000000000000001 [ 206.984000] x25: ffff00001385f000 x24: 0000000000000000 [ 206.984394] x23: ffff000013453000 x22: ffff000013054000 [ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28 [ 206.986575] x19: ffff000013872c30 x18: 0000000000000000 [ 206.987111] x17: 0000000000000000 x16: 0000000000000000 [ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000 [ 206.987850] x13: 000000000017430e x12: 0000000000000580 [ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc [ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550 [ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000 [ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001 [ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28 [ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000 [ 206.992557] Call trace: [ 206.993101] free_ftrace_func_mapper+0x2c/0x118 [ 206.994827] ftrace_count_free+0x68/0x80 [ 206.995238] release_probe+0xfc/0x1d0 [ 206.995555] register_ftrace_function_probe+0x4a8/0x868 [ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180 [ 206.996330] ftrace_dump_callback+0x50/0x70 [ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8 [ 206.997157] ftrace_filter_write+0x44/0x60 [ 206.998971] __vfs_write+0x64/0xf0 [ 206.999285] vfs_write+0x14c/0x2f0 [ 206.999591] ksys_write+0xbc/0x1b0 [ 206.999888] __arm64_sys_write+0x3c/0x58 [ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0 [ 207.000607] el0_svc_handler+0x144/0x1c8 [ 207.000916] el0_svc+0x8/0xc [ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303) [ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]--- [ 207.010126] Kernel panic - not syncing: Fatal exception [ 207.011322] SMP: stopping secondary CPUs [ 207.013956] Dumping ftrace buffer: [ 207.014595] (ftrace buffer empty) [ 207.015632] Kernel Offset: disabled [ 207.017187] CPU features: 0x002,20006008 [ 207.017985] Memory Limit: none [ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10module: Fix livepatch/ftrace module text permissions raceJosh Poimboeuf2-1/+15
[ Upstream commit 9f255b632bf12c4dd7fc31caee89aa991ef75176 ] It's possible for livepatch and ftrace to be toggling a module's text permissions at the same time, resulting in the following panic: BUG: unable to handle page fault for address: ffffffffc005b1d9 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:apply_relocate_add+0xbe/0x14c Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08 RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246 RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060 RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000 RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0 R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018 R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74 FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: klp_init_object_loaded+0x10f/0x219 ? preempt_latency_start+0x21/0x57 klp_enable_patch+0x662/0x809 ? virt_to_head_page+0x3a/0x3c ? kfree+0x8c/0x126 patch_init+0x2ed/0x1000 [livepatch_test02] ? 0xffffffffc0060000 do_one_initcall+0x9f/0x1c5 ? kmem_cache_alloc_trace+0xc4/0xd4 ? do_init_module+0x27/0x210 do_init_module+0x5f/0x210 load_module+0x1c41/0x2290 ? fsnotify_path+0x3b/0x42 ? strstarts+0x2b/0x2b ? kernel_read+0x58/0x65 __do_sys_finit_module+0x9f/0xc3 ? __do_sys_finit_module+0x9f/0xc3 __x64_sys_finit_module+0x1a/0x1c do_syscall_64+0x52/0x61 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The above panic occurs when loading two modules at the same time with ftrace enabled, where at least one of the modules is a livepatch module: CPU0 CPU1 klp_enable_patch() klp_init_object_loaded() module_disable_ro() ftrace_module_enable() ftrace_arch_code_modify_post_process() set_all_modules_text_ro() klp_write_object_relocations() apply_relocate_add() *patches read-only code* - BOOM A similar race exists when toggling ftrace while loading a livepatch module. Fix it by ensuring that the livepatch and ftrace code patching operations -- and their respective permissions changes -- are protected by the text_mutex. Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com Reported-by: Johannes Erdfelt <johannes@erdfelt.com> Fixes: 444d13ff10fb ("modules: add ro_after_init support") Acked-by: Jessica Yu <jeyu@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10mm/mlock.c: change count_mm_mlocked_page_nr return typeswkhack1-2/+2
[ Upstream commit 0874bb49bb21bf24deda853e8bf61b8325e24bcb ] On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result will be wrong. So change the local variable and return value to unsigned long to fix the problem. Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com Fixes: 0cf2f6f6dc60 ("mm: mlock: check against vma for actual mlock() size") Signed-off-by: swkhack <swkhack@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILEManuel Traut1-1/+1
[ Upstream commit c04e32e911653442fc834be6e92e072aeebe01a1 ] At least for ARM64 kernels compiled with the crosstoolchain from Debian/stretch or with the toolchain from kernel.org the line number is not decoded correctly by 'decode_stacktrace.sh': $ echo "[ 136.513051] f1+0x0/0xc [kcrash]" | \ CROSS_COMPILE=/opt/gcc-8.1.0-nolibc/aarch64-linux/bin/aarch64-linux- \ ./scripts/decode_stacktrace.sh /scratch/linux-arm64/vmlinux \ /scratch/linux-arm64 \ /nfs/debian/lib/modules/4.20.0-devel [ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:68) kcrash If addr2line from the toolchain is used the decoded line number is correct: [ 136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:57) kcrash Link: http://lkml.kernel.org/r/20190527083425.3763-1-manut@linutronix.de Signed-off-by: Manuel Traut <manut@linutronix.de> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10cpuset: restore sanity to cpuset_cpus_allowed_fallback()Joel Savitz1-1/+14
[ Upstream commit d477f8c202d1f0d4791ab1263ca7657bbe5cf79e ] In the case that a process is constrained by taskset(1) (i.e. sched_setaffinity(2)) to a subset of available cpus, and all of those are subsequently offlined, the scheduler will set tsk->cpus_allowed to the current value of task_cs(tsk)->effective_cpus. This is done via a call to do_set_cpus_allowed() in the context of cpuset_cpus_allowed_fallback() made by the scheduler when this case is detected. This is the only call made to cpuset_cpus_allowed_fallback() in the latest mainline kernel. However, this is not sane behavior. I will demonstrate this on a system running the latest upstream kernel with the following initial configuration: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffff Cpus_allowed_list: 0-63 (Where cpus 32-63 are provided via smt.) If we limit our current shell process to cpu2 only and then offline it and reonline it: # taskset -p 4 $$ pid 2272's current affinity mask: ffffffffffffffff pid 2272's new affinity mask: 4 # echo off > /sys/devices/system/cpu/cpu2/online # dmesg | tail -3 [ 2195.866089] process 2272 (bash) no longer affine to cpu2 [ 2195.872700] IRQ 114: no longer affine to CPU2 [ 2195.879128] smpboot: CPU 2 is now offline # echo on > /sys/devices/system/cpu/cpu2/online # dmesg | tail -1 [ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4 We see that our current process now has an affinity mask containing every cpu available on the system _except_ the one we originally constrained it to: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffffb Cpus_allowed_list: 0-1,3-63 This is not sane behavior, as the scheduler can now not only place the process on previously forbidden cpus, it can't even schedule it on the cpu it was originally constrained to! Other cases result in even more exotic affinity masks. Take for instance a process with an affinity mask containing only cpus provided by smt at the moment that smt is toggled, in a configuration such as the following: # taskset -p f000000000 $$ # grep -i cpu /proc/$$/status Cpus_allowed: 000000f0,00000000 Cpus_allowed_list: 36-39 A double toggle of smt results in the following behavior: # echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control # grep -i cpus /proc/$$/status Cpus_allowed: ffffff00,ffffffff Cpus_allowed_list: 0-31,40-63 This is even less sane than the previous case, as the new affinity mask excludes all smt-provided cpus with ids less than those that were previously in the affinity mask, as well as those that were actually in the mask. With this patch applied, both of these cases end in the following state: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 The original policy is discarded. Though not ideal, it is the simplest way to restore sanity to this fallback case without reinventing the cpuset wheel that rolls down the kernel just fine in cgroup v2. A user who wishes for the previous affinity mask to be restored in this fallback case can use that mechanism instead. This patch modifies scheduler behavior by instead resetting the mask to task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy mode. I tested the cases above on both modes. Note that the scheduler uses this fallback mechanism if and only if _every_ other valid avenue has been traveled, and it is the last resort before calling BUG(). Suggested-by: Waiman Long <longman@redhat.com> Suggested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Joel Savitz <jsavitz@redhat.com> Acked-by: Phil Auld <pauld@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registrationVadim Pasternak1-1/+1
[ Upstream commit 160da20b254dd4bfc5828f12c208fa831ad4be6c ] Fix the issue found while running kernel with the option CONFIG_DEBUG_TEST_DRIVER_REMOVE. Driver 'mlx-platform' registers 'i2c_mlxcpld' device and then registers few underlying 'i2c-mux-reg' devices: priv->pdev_i2c = platform_device_register_simple("i2c_mlxcpld", nr, NULL, 0); ... for (i = 0; i < ARRAY_SIZE(mlxplat_mux_data); i++) { priv->pdev_mux[i] = platform_device_register_resndata( &mlxplat_dev->dev, "i2c-mux-reg", i, NULL, 0, &mlxplat_mux_data[i], sizeof(mlxplat_mux_data[i])); But actual parent of "i2c-mux-reg" device is priv->pdev_i2c->dev and not mlxplat_dev->dev. Patch fixes parent device parameter in a call to platform_device_register_resndata() for "i2c-mux-reg". It solves the race during initialization flow while 'i2c_mlxcpld.1' is removing after probe, while 'i2c-mux-reg.0' is still in probing flow: 'i2c_mlxcpld.1' flow: probe -> remove -> probe. 'i2c-mux-reg.0' flow: probe -> ... [ 12:621096] Registering platform device 'i2c_mlxcpld.1'. Parent at platform [ 12:621117] device: 'i2c_mlxcpld.1': device_add [ 12:621155] bus: 'platform': add device i2c_mlxcpld.1 [ 12:621384] Registering platform device 'i2c-mux-reg.0'. Parent at mlxplat [ 12:621395] device: 'i2c-mux-reg.0': device_add [ 12:621425] bus: 'platform': add device i2c-mux-reg.0 [ 12:621806] Registering platform device 'i2c-mux-reg.1'. Parent at mlxplat [ 12:621828] device: 'i2c-mux-reg.1': device_add [ 12:621892] bus: 'platform': add device i2c-mux-reg.1 [ 12:621906] bus: 'platform': add driver i2c_mlxcpld [ 12:621996] bus: 'platform': driver_probe_device: matched device i2c_mlxcpld.1 with driver i2c_mlxcpld [ 12:622003] bus: 'platform': really_probe: probing driver i2c_mlxcpld with device i2c_mlxcpld.1 [ 12:622100] i2c_mlxcpld i2c_mlxcpld.1: no default pinctrl state [ 12:622293] device: 'i2c-1': device_add [ 12:627280] bus: 'i2c': add device i2c-1 [ 12:627692] device: 'i2c-1': device_add [ 12.629639] bus: 'platform': add driver i2c-mux-reg [ 12.629718] bus: 'platform': driver_probe_device: matched device i2c-mux-reg.0 with driver i2c-mux-reg [ 12.629723] bus: 'platform': really_probe: probing driver i2c-mux-reg with device i2c-mux-reg.0 [ 12.629818] i2c-mux-reg i2c-mux-reg.0: no default pinctrl state [ 12.629981] platform i2c-mux-reg.0: Driver i2c-mux-reg requests probe deferral [ 12.629986] platform i2c-mux-reg.0: Added to deferred list [ 12.629992] bus: 'platform': driver_probe_device: matched device i2c-mux-reg.1 with driver i2c-mux-reg [ 12.629997] bus: 'platform': really_probe: probing driver i2c-mux-reg with device i2c-mux-reg.1 [ 12.630091] i2c-mux-reg i2c-mux-reg.1: no default pinctrl state [ 12.630247] platform i2c-mux-reg.1: Driver i2c-mux-reg requests probe deferral [ 12.630252] platform i2c-mux-reg.1: Added to deferred list [ 12.640892] devices_kset: Moving i2c-mux-reg.0 to end of list [ 12.640900] platform i2c-mux-reg.0: Retrying from deferred list [ 12.640911] bus: 'platform': driver_probe_device: matched device i2c-mux-reg.0 with driver i2c-mux-reg [ 12.640919] bus: 'platform': really_probe: probing driver i2c-mux-reg with device i2c-mux-reg.0 [ 12.640999] i2c-mux-reg i2c-mux-reg.0: no default pinctrl state [ 12.641177] platform i2c-mux-reg.0: Driver i2c-mux-reg requests probe deferral [ 12.641187] platform i2c-mux-reg.0: Added to deferred list [ 12.641198] devices_kset: Moving i2c-mux-reg.1 to end of list [ 12.641219] platform i2c-mux-reg.1: Retrying from deferred list [ 12.641237] bus: 'platform': driver_probe_device: matched device i2c-mux-reg.1 with driver i2c-mux-reg [ 12.641247] bus: 'platform': really_probe: probing driver i2c-mux-reg with device i2c-mux-reg.1 [ 12.641331] i2c-mux-reg i2c-mux-reg.1: no default pinctrl state [ 12.641465] platform i2c-mux-reg.1: Driver i2c-mux-reg requests probe deferral [ 12.641469] platform i2c-mux-reg.1: Added to deferred list [ 12.646427] device: 'i2c-1': device_add [ 12.646647] bus: 'i2c': add device i2c-1 [ 12.647104] device: 'i2c-1': device_add [ 12.669231] devices_kset: Moving i2c-mux-reg.0 to end of list [ 12.669240] platform i2c-mux-reg.0: Retrying from deferred list [ 12.669258] bus: 'platform': driver_probe_device: matched device i2c-mux-reg.0 with driver i2c-mux-reg [ 12.669263] bus: 'platform': really_probe: probing driver i2c-mux-reg with device i2c-mux-reg.0 [ 12.669343] i2c-mux-reg i2c-mux-reg.0: no default pinctrl state [ 12.669585] device: 'i2c-2': device_add [ 12.669795] bus: 'i2c': add device i2c-2 [ 12.670201] device: 'i2c-2': device_add [ 12.671427] i2c i2c-1: Added multiplexed i2c bus 2 [ 12.671514] device: 'i2c-3': device_add [ 12.671724] bus: 'i2c': add device i2c-3 [ 12.672136] device: 'i2c-3': device_add [ 12.673378] i2c i2c-1: Added multiplexed i2c bus 3 [ 12.673472] device: 'i2c-4': device_add [ 12.673676] bus: 'i2c': add device i2c-4 [ 12.674060] device: 'i2c-4': device_add [ 12.675861] i2c i2c-1: Added multiplexed i2c bus 4 [ 12.675941] device: 'i2c-5': device_add [ 12.676150] bus: 'i2c': add device i2c-5 [ 12.676550] device: 'i2c-5': device_add [ 12.678103] i2c i2c-1: Added multiplexed i2c bus 5 [ 12.678193] device: 'i2c-6': device_add [ 12.678395] bus: 'i2c': add device i2c-6 [ 12.678774] device: 'i2c-6': device_add [ 12.679969] i2c i2c-1: Added multiplexed i2c bus 6 [ 12.680065] device: 'i2c-7': device_add [ 12.680275] bus: 'i2c': add device i2c-7 [ 12.680913] device: 'i2c-7': device_add [ 12.682506] i2c i2c-1: Added multiplexed i2c bus 7 [ 12.682600] device: 'i2c-8': device_add [ 12.682808] bus: 'i2c': add device i2c-8 [ 12.683189] device: 'i2c-8': device_add [ 12.683907] device: 'i2c-1': device_unregister [ 12.683945] device: 'i2c-1': device_unregister [ 12.684387] device: 'i2c-1': device_create_release [ 12.684536] bus: 'i2c': remove device i2c-1 [ 12.686019] i2c i2c-8: Failed to create compatibility class link [ 12.686086] ------------[ cut here ]------------ [ 12.686087] can't create symlink to mux device [ 12.686224] Workqueue: events deferred_probe_work_func [ 12.686135] WARNING: CPU: 7 PID: 436 at drivers/i2c/i2c-mux.c:416 i2c_mux_add_adapter+0x729/0x7d0 [i2c_mux] [ 12.686232] RIP: 0010:i2c_mux_add_adapter+0x729/0x7d0 [i2c_mux] [ 0x190/0x190 [i2c_mux] [ 12.686300] ? i2c_mux_alloc+0xac/0x110 [i2c_mux] [ 12.686306] ? i2c_mux_reg_set+0x200/0x200 [i2c_mux_reg] [ 12.686313] i2c_mux_reg_probe+0x22c/0x731 [i2c_mux_reg] [ 12.686322] ? i2c_mux_reg_deselect+0x60/0x60 [i2c_mux_reg] [ 12.686346] platform_drv_probe+0xa8/0x110 [ 12.686351] really_probe+0x185/0x720 [ 12.686358] driver_probe_device+0xdf/0x1f0 ... [ 12.686522] i2c i2c-1: Added multiplexed i2c bus 8 [ 12.686621] device: 'i2c-9': device_add [ 12.686626] kobject_add_internal failed for i2c-9 (error: -2 parent: i2c-1) [ 12.694729] i2c-core: adapter 'i2c-1-mux (chan_id 8)': can't register device (-2) [ 12.705726] i2c i2c-1: failed to add mux-adapter 8 as bus 9 (error=-2) [ 12.714494] device: 'i2c-8': device_unregister [ 12.714537] device: 'i2c-8': device_unregister Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>