summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-28tcp: be more careful in tcp_fragment()Eric Dumazet2-2/+16
[ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ] Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28sky2: Disable MSI on ASUS P6TTakashi Iwai1-0/+7
[ Upstream commit a261e3797506bd561700be643fe1a85bf81e9661 ] The onboard sky2 NIC on ASUS P6T WS PRO doesn't work after PM resume due to the infamous IRQ problem. Disabling MSI works around it, so let's add it to the blacklist. Unfortunately the BIOS on the machine doesn't fill the standard DMI_SYS_* entry, so we pick up DMI_BOARD_* entries instead. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1142496 Reported-and-tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28sctp: not bind the socket in sctp_connectXin Long1-21/+3
[ Upstream commit 9b6c08878e23adb7cc84bdca94d8a944b03f099e ] Now when sctp_connect() is called with a wrong sa_family, it binds to a port but doesn't set bp->port, then sctp_get_af_specific will return NULL and sctp_connect() returns -EINVAL. Then if sctp_bind() is called to bind to another port, the last port it has bound will leak due to bp->port is NULL by then. sctp_connect() doesn't need to bind ports, as later __sctp_connect will do it if bp->port is NULL. So remove it from sctp_connect(). While at it, remove the unnecessary sockaddr.sa_family len check as it's already done in sctp_inet_connect. Fixes: 644fbdeacf1d ("sctp: fix the issue that flags are ignored when using kernel_connect") Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28sctp: fix error handling on stream scheduler initializationMarcelo Ricardo Leitner1-1/+8
[ Upstream commit 4d1415811e492d9a8238f8a92dd0d51612c788e9 ] It allocates the extended area for outbound streams only on sendmsg calls, if they are not yet allocated. When using the priority stream scheduler, this initialization may imply into a subsequent allocation, which may fail. In this case, it was aborting the stream scheduler initialization but leaving the ->ext pointer (allocated) in there, thus in a partially initialized state. On a subsequent call to sendmsg, it would notice the ->ext pointer in there, and trip on uninitialized stuff when trying to schedule the data chunk. The fix is undo the ->ext initialization if the stream scheduler initialization fails and avoid the partially initialized state. Although syzkaller bisected this to commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc"), this bug was actually introduced on the commit I marked below. Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28rxrpc: Fix send on a connected, but unbound socketDavid Howells1-2/+2
[ Upstream commit e835ada07091f40dcfb1bc735082bd0a7c005e59 ] If sendmsg() or sendmmsg() is called on a connected socket that hasn't had bind() called on it, then an oops will occur when the kernel tries to connect the call because no local endpoint has been allocated. Fix this by implicitly binding the socket if it is in the RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state. Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this to prevent further attempts to bind it. This can be tested with: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> static const unsigned char inet6_addr[16] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa }; int main(void) { struct sockaddr_rxrpc srx; struct cmsghdr *cm; struct msghdr msg; unsigned char control[16]; int fd; memset(&srx, 0, sizeof(srx)); srx.srx_family = 0x21; srx.srx_service = 0; srx.transport_type = AF_INET; srx.transport_len = 0x1c; srx.transport.sin6.sin6_family = AF_INET6; srx.transport.sin6.sin6_port = htons(0x4e22); srx.transport.sin6.sin6_flowinfo = htons(0x4e22); srx.transport.sin6.sin6_scope_id = htons(0xaa3b); memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16); cm = (struct cmsghdr *)control; cm->cmsg_len = CMSG_LEN(sizeof(unsigned long)); cm->cmsg_level = SOL_RXRPC; cm->cmsg_type = RXRPC_USER_CALL_ID; *(unsigned long *)CMSG_DATA(cm) = 0; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = NULL; msg.msg_iovlen = 0; msg.msg_control = control; msg.msg_controllen = cm->cmsg_len; msg.msg_flags = 0; fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); connect(fd, (struct sockaddr *)&srx, sizeof(srx)); sendmsg(fd, &msg, 0); return 0; } Leading to the following oops: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:rxrpc_connect_call+0x42/0xa01 ... Call Trace: ? mark_held_locks+0x47/0x59 ? __local_bh_enable_ip+0xb6/0xba rxrpc_new_client_call+0x3b1/0x762 ? rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_do_sendmsg+0x3c0/0x92e rxrpc_sendmsg+0x16b/0x1b5 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1a4/0x22a ? release_sock+0x19/0x9e ? reacquire_held_locks+0x136/0x160 ? release_sock+0x19/0x9e ? find_held_lock+0x2b/0x6e ? __lock_acquire+0x268/0xf73 ? rxrpc_connect+0xdd/0xe4 ? __local_bh_enable_ip+0xb6/0xba __sys_sendmsg+0x5e/0x94 do_syscall_64+0x7d/0x1bf entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 2341e0775747 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op") Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28r8169: fix issue with confused RX unit after PHY power-down on RTL8411bHeiner Kallweit1-0/+137
[ Upstream commit fe4e8db0392a6c2e795eb89ef5fcd86522e66248 ] On RTL8411b the RX unit gets confused if the PHY is powered-down. This was reported in [0] and confirmed by Realtek. Realtek provided a sequence to fix the RX unit after PHY wakeup. The issue itself seems to have been there longer, the Fixes tag refers to where the fix applies properly. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075 Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Tested-by: Ionut Radu <ionut.radu@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28nfc: fix potential illegal memory accessYang Wei1-1/+1
[ Upstream commit dd006fc434e107ef90f7de0db9907cbc1c521645 ] The frags_q is not properly initialized, it may result in illegal memory access when conn_info is NULL. The "goto free_exit" should be replaced by "goto exit". Signed-off-by: Yang Wei <albin_yang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net/tls: make sure offload also gets the keys wipedJakub Kicinski3-3/+4
[ Upstream commit acd3e96d53a24d219f720ed4012b62723ae05da1 ] Commit 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: stmmac: Re-work the queue selection for TSO packetsJose Abreu1-10/+19
[ Upstream commit 4993e5b37e8bcb55ac90f76eb6d2432647273747 ] Ben Hutchings says: "This is the wrong place to change the queue mapping. stmmac_xmit() is called with a specific TX queue locked, and accessing a different TX queue results in a data race for all of that queue's state. I think this commit should be reverted upstream and in all stable branches. Instead, the driver should implement the ndo_select_queue operation and override the queue mapping there." Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0") Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net_sched: unset TCQ_F_CAN_BYPASS when adding filtersCong Wang3-4/+1
[ Upstream commit 3f05e6886a595c9a29a309c52f45326be917823c ] For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: phy: sfp: hwmon: Fix scaling of RX powerAndrew Lunn1-1/+1
[ Upstream commit 0cea0e1148fe134a4a3aaf0b1496f09241fb943a ] The RX power read from the SFP uses units of 0.1uW. This must be scaled to units of uW for HWMON. This requires a divide by 10, not the current 100. With this change in place, sensors(1) and ethtool -m agree: sff2-isa-0000 Adapter: ISA adapter in0: +3.23 V temp1: +33.1 C power1: 270.00 uW power2: 200.00 uW curr1: +0.01 A Laser output power : 0.2743 mW / -5.62 dBm Receiver signal average optical power : 0.2014 mW / -6.96 dBm Reported-by: chris.healy@zii.aero Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: openvswitch: fix csum updates for MPLS actionsJohn Hurley1-4/+2
[ Upstream commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615 ] Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: neigh: fix multiple neigh timer schedulingLorenzo Bianconi1-0/+2
[ Upstream commit 071c37983d99da07797294ea78e9da1a6e287144 ] Neigh timer can be scheduled multiple times from userspace adding multiple neigh entries and forcing the neigh timer scheduling passing NTF_USE in the netlink requests. This will result in a refcount leak and in the following dump stack: [ 32.465295] NEIGH: BUG, double timer add, state is 8 [ 32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65 [ 32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 32.465313] Call Trace: [ 32.465318] dump_stack+0x7c/0xc0 [ 32.465323] __neigh_event_send+0x20c/0x880 [ 32.465326] ? ___neigh_create+0x846/0xfb0 [ 32.465329] ? neigh_lookup+0x2a9/0x410 [ 32.465332] ? neightbl_fill_info.constprop.0+0x800/0x800 [ 32.465334] neigh_add+0x4f8/0x5e0 [ 32.465337] ? neigh_xmit+0x620/0x620 [ 32.465341] ? find_held_lock+0x85/0xa0 [ 32.465345] rtnetlink_rcv_msg+0x204/0x570 [ 32.465348] ? rtnl_dellink+0x450/0x450 [ 32.465351] ? mark_held_locks+0x90/0x90 [ 32.465354] ? match_held_lock+0x1b/0x230 [ 32.465357] netlink_rcv_skb+0xc4/0x1d0 [ 32.465360] ? rtnl_dellink+0x450/0x450 [ 32.465363] ? netlink_ack+0x420/0x420 [ 32.465366] ? netlink_deliver_tap+0x115/0x560 [ 32.465369] ? __alloc_skb+0xc9/0x2f0 [ 32.465372] netlink_unicast+0x270/0x330 [ 32.465375] ? netlink_attachskb+0x2f0/0x2f0 [ 32.465378] netlink_sendmsg+0x34f/0x5a0 [ 32.465381] ? netlink_unicast+0x330/0x330 [ 32.465385] ? move_addr_to_kernel.part.0+0x20/0x20 [ 32.465388] ? netlink_unicast+0x330/0x330 [ 32.465391] sock_sendmsg+0x91/0xa0 [ 32.465394] ___sys_sendmsg+0x407/0x480 [ 32.465397] ? copy_msghdr_from_user+0x200/0x200 [ 32.465401] ? _raw_spin_unlock_irqrestore+0x37/0x40 [ 32.465404] ? lockdep_hardirqs_on+0x17d/0x250 [ 32.465407] ? __wake_up_common_lock+0xcb/0x110 [ 32.465410] ? __wake_up_common+0x230/0x230 [ 32.465413] ? netlink_bind+0x3e1/0x490 [ 32.465416] ? netlink_setsockopt+0x540/0x540 [ 32.465420] ? __fget_light+0x9c/0xf0 [ 32.465423] ? sockfd_lookup_light+0x8c/0xb0 [ 32.465426] __sys_sendmsg+0xa5/0x110 [ 32.465429] ? __ia32_sys_shutdown+0x30/0x30 [ 32.465432] ? __fd_install+0xe1/0x2c0 [ 32.465435] ? lockdep_hardirqs_off+0xb5/0x100 [ 32.465438] ? mark_held_locks+0x24/0x90 [ 32.465441] ? do_syscall_64+0xf/0x270 [ 32.465444] do_syscall_64+0x63/0x270 [ 32.465448] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER' receiving a netlink request with NTF_USE flag set Reported-by: Marek Majkowski <marek@cloudflare.com> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: make skb_dst_force return true when dst is refcountedFlorian Westphal2-2/+9
[ Upstream commit b60a77386b1d4868f72f6353d35dabe5fbe981f2 ] netfilter did not expect that skb_dst_force() can cause skb to lose its dst entry. I got a bug report with a skb->dst NULL dereference in netfilter output path. The backtrace contains nf_reinject(), so the dst might have been cleared when skb got queued to userspace. Other users were fixed via if (skb_dst(skb)) { skb_dst_force(skb); if (!skb_dst(skb)) goto handle_err; } But I think its preferable to make the 'dst might be cleared' part of the function explicit. In netfilter case, skb with a null dst is expected when queueing in prerouting hook, so drop skb for the other hooks. v2: v1 of this patch returned true in case skb had no dst entry. Eric said: Say if we have two skb_dst_force() calls for some reason on the same skb, only the first one will return false. This now returns false even when skb had no dst, as per Erics suggestion, so callers might need to check skb_dst() first before skb_dst_force(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: dsa: mv88e6xxx: wait after reset deactivationBaruch Siach1-0/+2
[ Upstream commit 7b75e49de424ceb53d13e60f35d0a73765626fda ] Add a 1ms delay after reset deactivation. Otherwise the chip returns bogus ID value. This is observed with 88E6390 (Peridot) chip. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: bcmgenet: use promisc for unsupported filtersJustin Chen1-31/+26
[ Upstream commit 35cbef9863640f06107144687bd13151bc2e8ce3 ] Currently we silently ignore filters if we cannot meet the filter requirements. This will lead to the MAC dropping packets that are expected to pass. A better solution would be to set the NIC to promisc mode when the required filters cannot be met. Also correct the number of MDF filters supported. It should be 17, not 16. Signed-off-by: Justin Chen <justinpopo6@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ipv6: Unlink sibling route in case of failureIdo Schimmel1-1/+17
[ Upstream commit 54851aa90cf27041d64b12f65ac72e9f97bd90fd ] When a route needs to be appended to an existing multipath route, fib6_add_rt2node() first appends it to the siblings list and increments the number of sibling routes on each sibling. Later, the function notifies the route via call_fib6_entry_notifiers(). In case the notification is vetoed, the route is not unlinked from the siblings list, which can result in a use-after-free. Fix this by unlinking the route from the siblings list before returning an error. Audited the rest of the call sites from which the FIB notification chain is called and could not find more problems. Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ipv6: rt6_check should return NULL if 'from' is NULLDavid Ahern1-1/+1
[ Upstream commit 49d05fe2c9d1b4a27761c9807fec39b8155bef9e ] Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ipv4: don't set IPv6 only flags to IPv4 addressesMatteo Croce1-0/+8
[ Upstream commit 2e60546368165c2449564d71f6005dda9205b5fb ] Avoid the situation where an IPV6 only flag is applied to an IPv4 address: # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global noprefixroute dummy0 valid_lft forever preferred_lft forever Or worse, by sending a malicious netlink command: # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0 valid_lft forever preferred_lft forever Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28igmp: fix memory leak in igmpv3_del_delrec()Eric Dumazet1-6/+2
[ Upstream commit e5b1c6c6277d5a283290a8c033c72544746f9b5b ] im->tomb and/or im->sources might not be NULL, but we currently overwrite their values blindly. Using swap() will make sure the following call to kfree_pmc(pmc) will properly free the psf structures. Tested with the C repro provided by syzbot, which basically does : socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0 setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0 exit_group(0) = ? BUG: memory leak unreferenced object 0xffff88811450f140 (size 64): comm "softirq", pid 0, jiffies 4294942448 (age 32.070s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline] [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline] [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline] [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline] [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100 [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484 [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959 [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248 [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618 [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126 [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072 [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline] [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline] [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080 [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()Haiyang Zhang1-1/+0
[ Upstream commit be4363bdf0ce9530f15aa0a03d1060304d116b15 ] There is an extra rcu_read_unlock left in netvsc_recv_callback(), after a previous patch that removes RCU from this function. This patch removes the extra RCU unlock. Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28caif-hsi: fix possible deadlock in cfhsi_exit_module()Taehee Yoo1-1/+1
[ Upstream commit fdd258d49e88a9e0b49ef04a506a796f1c768a8e ] cfhsi_exit_module() calls unregister_netdev() under rtnl_lock(). but unregister_netdev() internally calls rtnl_lock(). So deadlock would occur. Fixes: c41254006377 ("caif-hsi: Add rtnl support") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28bnx2x: Prevent load reordering in tx completion processingBrian King1-0/+3
[ Upstream commit ea811b795df24644a8eb760b493c43fba4450677 ] This patch fixes an issue seen on Power systems with bnx2x which results in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in bnx2x_tx_int. Adding a read memory barrier resolves the issue. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26Linux 5.1.20v5.1.20Greg Kroah-Hartman1-1/+1
2019-07-26dm bufio: fix deadlock with loop deviceJunxiao Bi1-3/+1
commit bd293d071ffe65e645b4d8104f9d8fe15ea13862 upstream. When thin-volume is built on loop device, if available memory is low, the following deadlock can be triggered: One process P1 allocates memory with GFP_FS flag, direct alloc fails, memory reclaim invokes memory shrinker in dm_bufio, dm_bufio_shrink_scan() runs, mutex dm_bufio_client->lock is acquired, then P1 waits for dm_buffer IO to complete in __try_evict_buffer(). But this IO may never complete if issued to an underlying loop device that forwards it using direct-IO, which allocates memory using GFP_KERNEL (see: do_blockdev_direct_IO()). If allocation fails, memory reclaim will invoke memory shrinker in dm_bufio, dm_bufio_shrink_scan() will be invoked, and since the mutex is already held by P1 the loop thread will hang, and IO will never complete. Resulting in ABBA deadlock. Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26pstore: Fix double-free in pstore_mkfile() failure pathNorbert Manthey1-7/+6
commit 4c6d80e1144bdf48cae6b602ae30d41f3e5c76a9 upstream. The pstore_mkfile() function is passed a pointer to a struct pstore_record. On success it consumes this 'record' pointer and references it from the created inode. On failure, however, it may or may not free the record. There are even two different code paths which return -ENOMEM -- one of which does and the other doesn't free the record. Make the behaviour deterministic by never consuming and freeing the record when returning failure, allowing the caller to do the cleanup consistently. Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon.de Fixes: 83f70f0769ddd ("pstore: Do not duplicate record metadata") Fixes: 1dfff7dd67d1a ("pstore: Pass record contents instead of copying") Cc: stable@vger.kernel.org [kees: also move "private" allocation location, rename inode cleanup label] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26dt-bindings: allow up to four clocks for orion-mdioJosua Mayer1-1/+1
commit 80785f5a22e9073e2ded5958feb7f220e066d17b upstream. Armada 8040 needs four clocks to be enabled for MDIO accesses to work. Update the binding to allow the extra clock to be specified. Cc: stable@vger.kernel.org Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Josua Mayer <josua@solid-run.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26net: mvmdio: allow up to four clocks to be specified for orion-mdioJosua Mayer1-1/+1
commit 4aabed699c400810981d3dda170f05fa4d782905 upstream. Allow up to four clocks to be specified and enabled for the orion-mdio interface, which are required by the Armada 8k and defined in armada-cp110.dtsi. Fixes a hang in probing the mvmdio driver that was encountered on the Clearfog GT 8K with all drivers built as modules, but also affects other boards such as the MacchiatoBIN. Cc: stable@vger.kernel.org Fixes: 96cb43423822 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Josua Mayer <josua@solid-run.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26blkcg: update blkcg_print_stat() to handle larger outputsTejun Heo1-2/+6
commit f539da82f2158916e154d206054e0efd5df7ab61 upstream. Depending on the number of devices, blkcg stats can go over the default seqfile buf size. seqfile normally retries with a larger buffer but since the ->pd_stat() addition, blkcg_print_stat() doesn't tell seqfile that overflow has happened and the output gets printed truncated. Fix it by calling seq_commit() w/ -1 on possible overflows. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 903d23f0a354 ("blk-cgroup: allow controllers to output their own stats") Cc: stable@vger.kernel.org # v4.19+ Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26blk-iolatency: clear use_delay when io.latency is set to zeroTejun Heo1-1/+3
commit 5de0073fcd50cc1f150895a7bb04d3cf8067b1d7 upstream. If use_delay was non-zero when the latency target of a cgroup was set to zero, it will stay stuck until io.latency is enabled on the cgroup again. This keeps readahead disabled for the cgroup impacting performance negatively. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <jbacik@fb.com> Fixes: d70675121546 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26clk: imx: imx8mm: correct audio_pll2_clk to audio_pll2_outPeng Fan1-3/+3
commit 5b933e28d8b1fbdc7fbac4bfc569f3b152c3dd59 upstream. There is no audio_pll2_clk registered, it should be audio_pll2_out. Cc: <stable@vger.kernel.org> Fixes: ba5625c3e272 ("clk: imx: Add clock driver support for imx8mm") Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26blk-throttle: fix zero wait time for iops throttled groupKonstantin Khlebnikov1-6/+3
commit 3a10f999ffd464d01c5a05592a15470a3c4bbc36 upstream. After commit 991f61fe7e1d ("Blk-throttle: reduce tail io latency when iops limit is enforced") wait time could be zero even if group is throttled and cannot issue requests right now. As a result throtl_select_dispatch() turns into busy-loop under irq-safe queue spinlock. Fix is simple: always round up target time to the next throttle slice. Fixes: 991f61fe7e1d ("Blk-throttle: reduce tail io latency when iops limit is enforced") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26usb: Handle USB3 remote wakeup for LPM enabled devices correctlyLee, Chiasheng1-2/+5
commit e244c4699f859cf7149b0781b1894c7996a8a1df upstream. With Link Power Management (LPM) enabled USB3 links transition to low power U1/U2 link states from U0 state automatically. Current hub code detects USB3 remote wakeups by checking if the software state still shows suspended, but the link has transitioned from suspended U3 to enabled U0 state. As it takes some time before the hub thread reads the port link state after a USB3 wake notification, the link may have transitioned from U0 to U1/U2, and wake is not detected by hub code. Fix this by handling U1/U2 states in the same way as U0 in USB3 wakeup handling This patch should be added to stable kernels since 4.13 where LPM was kept enabled during suspend/resume Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Lee, Chiasheng <chiasheng.lee@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26dax: Fix missed wakeup with PMD faultsMatthew Wilcox (Oracle)1-20/+33
commit 23c84eb7837514e16d79ed6d849b13745e0ce688 upstream. RocksDB can hang indefinitely when using a DAX file. This is due to a bug in the XArray conversion when handling a PMD fault and finding a PTE entry. We use the wrong index in the hash and end up waiting on the wrong waitqueue. There's actually no need to wait; if we find a PTE entry while looking for a PMD entry, we can return immediately as we know we should fall back to a PTE fault (which may not conflict with the lock held). We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found. This value can never be found in an XArray while holding its lock, so it does not create an ambiguity. Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Robert Barror <robert.barror@intel.com> Reported-by: Seema Pandit <seema.pandit@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bugSzymon Janc1-0/+13
commit 1d87b88ba26eabd4745e158ecfd87c93a9b51dc2 upstream. Microsoft Surface Precision Mouse provides bogus identity address when pairing. It connects with Static Random address but provides Public Address in SMP Identity Address Information PDU. Address has same value but type is different. Workaround this by dropping IRK if ID address discrepancy is detected. > HCI Event: LE Meta Event (0x3e) plen 19 LE Connection Complete (0x01) Status: Success (0x00) Handle: 75 Role: Master (0x00) Peer address type: Random (0x01) Peer address: E0:52:33:93:3B:21 (Static) Connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Master clock accuracy: 0x00 .... > ACL Data RX: Handle 75 flags 0x02 dlen 12 SMP: Identity Address Information (0x09) len 7 Address type: Public (0x00) Address: E0:52:33:93:3B:21 Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Tested-by: Maarten Fonville <maarten.fonville@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199461 Cc: stable@vger.kernel.org Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26intel_th: msu: Fix single mode with disabled IOMMUAlexander Shishkin1-1/+1
commit 918b8646497b5dba6ae82d4a7325f01b258972b9 upstream. Commit 4e0eaf239fb3 ("intel_th: msu: Fix single mode with IOMMU") switched the single mode code to use dma mapping pages obtained from the page allocator, but with IOMMU disabled, that may lead to using SWIOTLB bounce buffers and without additional sync'ing, produces empty trace buffers. Fix this by using a DMA32 GFP flag to the page allocation in single mode, as the device supports full 32-bit DMA addressing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: 4e0eaf239fb3 ("intel_th: msu: Fix single mode with IOMMU") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Ammy Yi <ammy.yi@intel.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190621161930.60785-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26mtd: spinand: read returns badly if the last page has bitflipsliaoweixiong1-1/+1
commit b83408b580eccf8d2797cd6cb9ae42c2a28656a7 upstream. In case of the last page containing bitflips (ret > 0), spinand_mtd_read() will return that number of bitflips for the last page while it should instead return max_bitflips like it does when the last page read returns with 0. Signed-off-by: Weixiong Liao <liaoweixiong@allwinnertech.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de> Cc: stable@vger.kernel.org Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26mtd: rawnand: mtk: Correct low level time calculation of r/w cycleXiaolei Li1-3/+21
commit e1884ffddacc0424d7e785e6f8087bd12f7196db upstream. At present, the flow of calculating AC timing of read/write cycle in SDR mode is that: At first, calculate high hold time which is valid for both read and write cycle using the max value between tREH_min and tWH_min. Secondly, calculate WE# pulse width using tWP_min. Thridly, calculate RE# pulse width using the bigger one between tREA_max and tRP_min. But NAND SPEC shows that Controller should also meet write/read cycle time. That is write cycle time should be more than tWC_min and read cycle should be more than tRC_min. Obviously, we do not achieve that now. This patch corrects the low level time calculation to meet minimum read/write cycle time required. After getting the high hold time, WE# low level time will be promised to meet tWP_min and tWC_min requirement, and RE# low level time will be promised to meet tREA_max, tRP_min and tRC_min requirement. Fixes: edfee3619c49 ("mtd: nand: mtk: add ->setup_data_interface() hook") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26eCryptfs: fix a couple type promotion bugsDan Carpenter1-4/+8
commit 0bdf8a8245fdea6f075a5fede833a5fcf1b3466c upstream. ECRYPTFS_SIZE_AND_MARKER_BYTES is type size_t, so if "rc" is negative that gets type promoted to a high positive value and treated as success. Fixes: 778aeb42a708 ("eCryptfs: Cleanup and optimize ecryptfs_lookup_interpose()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [tyhicks: Use "if/else if" rather than "if/if"] Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26mmc: sdhci-msm: fix mutex while in spinlockJorge Ramirez-Ortiz1-3/+6
commit 5e6b6651d22de109ebf48ca00d0373bc2c0cc080 upstream. mutexes can sleep and therefore should not be taken while holding a spinlock. move clk_get_rate (can sleep) outside the spinlock protected region. Fixes: 83736352e0ca ("mmc: sdhci-msm: Update DLL reset sequence") Cc: stable@vger.kernel.org Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/pseries: Fix oops in hotplug memory notifierNathan Lynch1-0/+3
commit 0aa82c482ab2ece530a6f44897b63b274bb43c8e upstream. During post-migration device tree updates, we can oops in pseries_update_drconf_memory() if the source device tree has an ibm,dynamic-memory-v2 property and the destination has a ibm,dynamic_memory (v1) property. The notifier processes an "update" for the ibm,dynamic-memory property but it's really an add in this scenario. So make sure the old property object is there before dereferencing it. Fixes: 2b31e3aec1db ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/powernv: Fix stale iommu table base after VFIOAlexey Kardashevskiy1-0/+10
commit 5636427d087a55842c1a199dfb839e6545d30e5d upstream. The powernv platform uses @dma_iommu_ops for non-bypass DMA. These ops need an iommu_table pointer which is stored in dev->archdata.iommu_table_base. It is initialized during pcibios_setup_device() which handles boot time devices. However when a device is taken from the system in order to pass it through, the default IOMMU table is destroyed but the pointer in a device is not updated; also when a device is returned back to the system, a new table pointer is not stored in dev->archdata.iommu_table_base either. So when a just returned device tries using IOMMU, it crashes on accessing stale iommu_table or its members. This calls set_iommu_table_base() when the default window is created. Note it used to be there before but was wrongly removed (see "fixes"). It did not appear before as these days most devices simply use bypass. This adds set_iommu_table_base(NULL) when a device is taken from the system to make it clear that IOMMU DMA cannot be used past that point. Fixes: c4e9d3c1e65a ("powerpc/powernv/pseries: Rework device adding to IOMMU groups") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/powernv/npu: Fix reference leakGreg Kurz1-1/+14
commit 02c5f5394918b9b47ff4357b1b18335768cd867d upstream. Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This has the effect of incrementing the reference count of the PCI device, as explained in drivers/pci/search.c: * Given a PCI domain, bus, and slot/function number, the desired PCI * device is located in the list of PCI devices. If the device is * found, its reference count is increased and this function returns a * pointer to its data structure. The caller must decrement the * reference count by calling pci_dev_put(). If no device is found, * %NULL is returned. Nothing was done to call pci_dev_put() and the reference count of GPU and NPU PCI devices rockets up. A natural way to fix this would be to teach the callers about the change, so that they call pci_dev_put() when done with the pointer. This turns out to be quite intrusive, as it affects many paths in npu-dma.c, pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and some affected code got moved around since then: it would be problematic to backport the fix to stable releases. All that code never cared for reference counting anyway. Call pci_dev_put() from get_pci_dev() to revert to the previous behavior. Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn") Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/watchpoint: Restore NV GPRs while returning from exceptionRavi Bangoria1-2/+7
commit f474c28fbcbe42faca4eb415172c07d76adcb819 upstream. powerpc hardware triggers watchpoint before executing the instruction. To make trigger-after-execute behavior, kernel emulates the instruction. If the instruction is 'load something into non-volatile register', exception handler should restore emulated register state while returning back, otherwise there will be register state corruption. eg, adding a watchpoint on a list can corrput the list: # cat /proc/kallsyms | grep kthread_create_list c00000000121c8b8 d kthread_create_list Add watchpoint on kthread_create_list->prev: # perf record -e mem:0xc00000000121c8c0 Run some workload such that new kthread gets invoked. eg, I just logged out from console: list_add corruption. next->prev should be prev (c000000001214e00), \ but was c00000000121c8b8. (next=c00000000121c8b8). WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0 CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69 ... NIP __list_add_valid+0xb4/0xc0 LR __list_add_valid+0xb0/0xc0 Call Trace: __list_add_valid+0xb0/0xc0 (unreliable) __kthread_create_on_node+0xe0/0x260 kthread_create_on_node+0x34/0x50 create_worker+0xe8/0x260 worker_thread+0x444/0x560 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x70 List corruption happened because it uses 'load into non-volatile register' instruction: Snippet from __kthread_create_on_node: c000000000136be8: addis r29,r2,-19 c000000000136bec: ld r29,31424(r29) if (!__list_add_valid(new, prev, next)) c000000000136bf0: mr r3,r30 c000000000136bf4: mr r5,r28 c000000000136bf8: mr r4,r29 c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8> Register state from WARN_ON(): GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075 GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000 GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00 GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370 GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000 GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628 GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0 Watchpoint hit at 0xc000000000136bec. addis r29,r2,-19 => r29 = 0xc000000001344e00 + (-19 << 16) => r29 = 0xc000000001214e00 ld r29,31424(r29) => r29 = *(0xc000000001214e00 + 31424) => r29 = *(0xc00000000121c8c0) 0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was emulated by emulate_step. But because handle_dabr_fault did not restore emulated register state, r29 still contains stale value in above register state. Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/mm/32s: fix condition that is always trueAndreas Schwab1-1/+1
commit 46c2478af610efb3212b8b08f74389d69899ef70 upstream. Move a misplaced paren that makes the condition always true. Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26powerpc/32s: fix suspend/resume when IBATs 4-7 are usedChristophe Leroy2-13/+128
commit 6ecb78ef56e08d2119d337ae23cb951a640dc52d upstream. Previously, only IBAT1 and IBAT2 were used to map kernel linear mem. Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping kernel text. But the suspend/restore functions only save/restore BATs 0 to 3, and clears BATs 4 to 7. Make suspend and restore functions respectively save and reload the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1Helge Deller1-10/+18
commit 10835c854685393a921b68f529bf740fa7c9984d upstream. On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace calls in the native and compat ptrace paths. Link: https://bugs.gentoo.org/481768 Reported-by: Jeroen Roovers <jer@gentoo.org> Cc: <stable@vger.kernel.org> Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26parisc: Ensure userspace privilege for ptraced processes in regset functionsHelge Deller1-1/+2
commit 34c32fc603311a72cb558e5e337555434f64c27b upstream. On parisc the privilege level of a process is stored in the lowest two bits of the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0 for the kernel and privilege level 3 for user-space. So userspace should not be allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege level to e.g. 0 to try to gain kernel privileges. This patch prevents such modifications in the regset support functions by always setting the two lowest bits to one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are modified via ptrace regset calls. Link: https://bugs.gentoo.org/481768 Cc: <stable@vger.kernel.org> # v4.7+ Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEMSteve Longerbeam1-1/+1
commit 3d1f62c686acdedf5ed9642b763f3808d6a47d1e upstream. The saturation bit was being set at bit 9 in the second 32-bit word of the TPMEM CSC. This isn't correct, the saturation bit is bit 42, which is bit 10 of the second word. Fixes: 1aa8ea0d2bd5d ("gpu: ipu-v3: Add Image Converter unit") Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26xfs: abort unaligned nowait directio earlyDarrick J. Wong1-3/+3
[ Upstream commit 1fdeaea4d92c69fb9f871a787af6ad00f32eeea7 ] Dave Chinner noticed that xfs_file_dio_aio_write returns EAGAIN without dropping the IOLOCK when its deciding not to wait, which means that we leak the IOLOCK there. Since we now make unaligned directio always wait, we have the opportunity to bail out before trying to take the lock, which should reduce the overhead of this never-gonna-work case considerably while also solving the dropped lock problem. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>