summaryrefslogtreecommitdiff
path: root/net/vmw_vsock/af_vsock.c
AgeCommit message (Collapse)AuthorFilesLines
2024-05-14net: change proto and proto_ops accept typeJens Axboe1-3/+3
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information. No functional changes in this patch. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-15virtio/vsock: send credit update during setting SO_RCVLOWATArseniy Krasnov1-2/+7
Send credit update message when SO_RCVLOWAT is updated and it is bigger than number of bytes in rx queue. It is needed, because 'poll()' will wait until number of bytes in rx queue will be not smaller than O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup for tx/rx is possible: sender waits for free space and receiver is waiting data in 'poll()'. Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set 'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the transport doesn't need to do it. Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages") Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: enable setting SO_ZEROCOPYArseniy Krasnov1-2/+43
For AF_VSOCK, zerocopy tx mode depends on transport, so this option must be set in AF_VSOCK implementation where transport is accessible (if transport is not set during setting SO_ZEROCOPY: for example socket is not connected, then SO_ZEROCOPY will be enabled, but once transport will be assigned, support of this type of transmission will be checked). To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT bit, thus handling SOL_SOCKET option operations, but all of them except SO_ZEROCOPY will be forwarded to the generic handler by calling 'sock_setsockopt()'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: enable SOCK_SUPPORT_ZC bitArseniy Krasnov1-0/+6
This bit is used by io_uring in case of zerocopy tx mode. io_uring code checks, that socket has this feature. This patch sets it in two places: 1) For socket in 'connect()' call. 2) For new socket which is returned by 'accept()' call. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: check for MSG_ZEROCOPY support on sendArseniy Krasnov1-0/+6
This feature totally depends on transport, so if transport doesn't support it, return error. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: read from socket's error queueArseniy Krasnov1-0/+6
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-15vsock: set EPOLLERR on non-empty error queueArseniy Krasnov1-1/+1
If socket's error queue is not empty, EPOLLERR must be set. Otherwise, reader of error queue won't detect data in it using EPOLLERR bit. Currently for AF_VSOCK this is actual only with MSG_ZEROCOPY, as this feature is the only user of an error queue of the socket. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-14vsock: send SIGPIPE on write to shutdowned socketArseniy Krasnov1-0/+3
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-25sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)David Howells1-3/+0
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-12vsock: avoid to close connected socket after the timeoutZhuang Shengen1-1/+1
When client and server establish a connection through vsock, the client send a request to the server to initiate the connection, then start a timer to wait for the server's response. When the server's RESPONSE message arrives, the timer also times out and exits. The server's RESPONSE message is processed first, and the connection is established. However, the client's timer also times out, the original processing logic of the client is to directly set the state of this vsock to CLOSE and return ETIMEDOUT. It will not notify the server when the port is released, causing the server port remain. when client's vsock_connect timeout,it should check sk state is ESTABLISHED or not. if sk state is ESTABLISHED, it means the connection is established, the client should not set the sk state to CLOSE Note: I encountered this issue on kernel-4.18, which can be fixed by this patch. Then I checked the latest code in the community and found similar issue. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Zhuang Shengen <zhuangshengen@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-04vsock: return errors other than -ENOMEM to socketArseniy Krasnov1-2/+2
This removes behaviour, where error code returned from any transport was always switched to ENOMEM. This works in the same way as: commit c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"), but for receive calls. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-29vsock: support sockmapBobby Eshleman1-6/+58
This patch adds sockmap support for vsock sockets. It is intended to be usable by all transports, but only the virtio and loopback transports are implemented. SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported. Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-12vsock: return errors other than -ENOMEM to socketBobby Eshleman1-1/+2
This removes behaviour, where error code returned from any transport was always switched to ENOMEM. For example when user tries to send too big message via SEQPACKET socket, transport layers return EMSGSIZE, but this error code was always replaced with ENOMEM and returned to user. Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-18treewide: use get_random_u32_{above,below}() instead of manual loopJason A. Donenfeld1-2/+1
These cases were done with this Coccinelle: @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I > E); + I = get_random_u32_below(E + 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I >= E); + I = get_random_u32_below(E); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I < E); + I = get_random_u32_above(E - 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I <= E); + I = get_random_u32_above(E); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (!I); + I = get_random_u32_above(0); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I == 0); + I = get_random_u32_above(0); @@ expression E; @@ - E + 1 + get_random_u32_below(U32_MAX - E) + get_random_u32_above(E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld1-1/+1
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-03vsock: fix possible infinite sleep in vsock_connectible_wait_data()Dexuan Cui1-1/+4
Currently vsock_connectible_has_data() may miss a wakeup operation between vsock_connectible_has_data() == 0 and the prepare_to_wait(). Fix the race by adding the process to the wait queue before checking vsock_connectible_has_data(). Fixes: b3f7fd54881b ("af_vsock: separate wait data loop") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reported-by: Frédéric Dalleau <frederic.dalleau@docker.com> Tested-by: Frédéric Dalleau <frederic.dalleau@docker.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03vsock: remove the unused 'wait' in vsock_connectible_recvmsg()Dexuan Cui1-2/+0
Remove the unused variable introduced by 19c1b90e1979. Fixes: 19c1b90e1979 ("af_vsock: separate receive data loop") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-23vsock: add API call for data readyArseniy Krasnov1-0/+10
This adds 'vsock_data_ready()' which must be called by transport to kick sleeping data readers. It checks for SO_RCVLOWAT value before waking user, thus preventing spurious wake ups. Based on 'tcp_data_ready()' logic. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-23vsock: pass sock_rcvlowat to notify_poll_in as targetArseniy Krasnov1-1/+2
Passing 1 as the target to notify_poll_in(), we don't honor what the user has set via SO_RCVLOWAT, going to set POLLIN and POLLRDNORM, even if we don't have the amount of bytes expected by the user. Let's use sock_rcvlowat() to get the right target to pass to notify_poll_in(); Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-23vsock: SO_RCVLOWAT transport set callbackArseniy Krasnov1-0/+20
This adds transport specific callback for SO_RCVLOWAT, because in some transports it may be difficult to know current available number of bytes ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-10vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout()Peilin Ye1-0/+1
Imagine two non-blocking vsock_connect() requests on the same socket. The first request schedules @connect_work, and after it times out, vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps *socket* state as SS_CONNECTING. Later, the second request returns -EALREADY, meaning the socket "already has a pending connection in progress", even though the first request has already timed out. As suggested by Stefano, fix it by setting *socket* state back to SS_UNCONNECTED, so that the second request will return -ETIMEDOUT. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10vsock: Fix memory leak in vsock_connect()Peilin Ye1-1/+8
An O_NONBLOCK vsock_connect() request may try to reschedule @connect_work. Imagine the following sequence of vsock_connect() requests: 1. The 1st, non-blocking request schedules @connect_work, which will expire after 200 jiffies. Socket state is now SS_CONNECTING; 2. Later, the 2nd, blocking request gets interrupted by a signal after a few jiffies while waiting for the connection to be established. Socket state is back to SS_UNCONNECTED, but @connect_work is still pending, and will expire after 100 jiffies. 3. Now, the 3rd, non-blocking request tries to schedule @connect_work again. Since @connect_work is already scheduled, schedule_delayed_work() silently returns. sock_hold() is called twice, but sock_put() will only be called once in vsock_connect_timeout(), causing a memory leak reported by syzbot: BUG: memory leak unreferenced object 0xffff88810ea56a40 (size 1232): comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace: [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930 [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989 [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734 [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203 [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468 [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline] [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561 [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline] [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline] [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568 [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae <...> Use mod_delayed_work() instead: if @connect_work is already scheduled, reschedule it, and undo sock_hold() to keep the reference count balanced. Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-12vsock: each transport cycles only on its own socketsJiyong Park1-2/+7
When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17vsock: remove vsock from connected table when connect is interrupted by a signalSeth Forshee1-0/+1
vsock_connect() expects that the socket could already be in the TCP_ESTABLISHED state when the connecting task wakes up with a signal pending. If this happens the socket will be in the connected table, and it is not removed when the socket state is reset. In this situation it's common for the process to retry connect(), and if the connection is successful the socket will be added to the connected table a second time, corrupting the list. Prevent this by calling vsock_remove_connected() if a signal is received while waiting for a connection. This is harmless if the socket is not in the connected table, and if it is in the table then removing it will prevent list corruption from a double add. Note for backporting: this patch requires d5afa82c977e ("vsock: correct removal of socket from the list"), which is in all current stable trees except 4.9.y. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski1-0/+1
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-11-10vsock: prevent unnecessary refcnt inc for nonblocking connectEiichi Tsukata1-0/+2
Currently vosck_connect() increments sock refcount for nonblocking socket each time it's called, which can lead to memory leak if it's called multiple times because connect timeout function decrements sock refcount only once. Fixes it by making vsock_connect() return -EALREADY immediately when sock state is already SS_CONNECTING. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08vsock: Enable y2038 safe timeval for timeoutRichard Palethorpe1-10/+15
Reuse the timeval compat code from core/sock to handle 32-bit and 64-bit timeval structures. Also introduce a new socket option define to allow using y2038 safe timeval under 32-bit. The existing behavior of sock_set_timeout and vsock's timeout setter differ when the time value is out of bounds. vsocks current behavior is retained at the expense of not being able to share the full implementation. This allows the LTP test vsock01 to pass under 32-bit compat mode. Fixes: fe0c72f3db11 ("socket: move compat timeout handling into sock.c") Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Cc: Richard Palethorpe <rpalethorpe@richiejp.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08vsock: Refactor vsock_*_getsockopt to resemble sock_getsockoptRichard Palethorpe1-37/+28
In preparation for sharing the implementation of sock_get_timeout. Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Cc: Richard Palethorpe <rpalethorpe@richiejp.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06af_vsock: rename variables in receive loopArseny Krasnov1-5/+5
Record is supported via MSG_EOR flag, while current logic operates with message, so rename variables from 'record' to 'message'. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210903123306.3273757-1-arseny.krasnov@kaspersky.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-29net: sock: introduce sk_error_reportAlexander Aring1-1/+1
This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, for example to trace socket errors. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22vsock: notify server to shutdown when client has pending signalLongpeng(Mike)1-1/+1
The client's sk_state will be set to TCP_ESTABLISHED if the server replay the client's connect request. However, if the client has pending signal, its sk_state will be set to TCP_CLOSE without notify the server, so the server will hold the corrupt connection. client server 1. sk_state=TCP_SYN_SENT | 2. call ->connect() | 3. wait reply | | 4. sk_state=TCP_ESTABLISHED | 5. insert to connected list | 6. reply to the client 7. sk_state=TCP_ESTABLISHED | 8. insert to connected list | 9. *signal pending* <--------------------- the user kill client 10. sk_state=TCP_CLOSE | client is exiting... | 11. call ->release() | virtio_transport_close if (!(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_CLOSING)) return true; *return at here, the server cannot notice the connection is corrupt* So the client should notify the peer in this case. Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Brazdil <dbrazdil@google.com> Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lkml.org/lkml/2021/5/17/418 Signed-off-by: lixianming <lixianming5@huawei.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18vsock: rename vsock_wait_data()Stefano Garzarella1-6/+8
vsock_wait_data() is used only by STREAM and SEQPACKET sockets, so let's rename it to vsock_connectible_wait_data(), using the same nomenclature (connectible) used in other functions after the introduction of SEQPACKET. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18vsock: rename vsock_has_data()Stefano Garzarella1-2/+2
vsock_has_data() is used only by STREAM and SEQPACKET sockets, so let's rename it to vsock_connectible_has_data(), using the same nomenclature (connectible) used in other functions after the introduction of SEQPACKET. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: update comments for stream socketsArseny Krasnov1-14/+17
Replace 'stream' to 'connection oriented' in comments as SEQPACKET is also connection oriented. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: rest of SEQPACKET supportArseny Krasnov1-2/+46
Add socket ops for SEQPACKET type and .seqpacket_allow() callback to query transports if they support SEQPACKET. Also split path for data check for STREAM and SEQPACKET branches. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement send logic for SEQPACKETArseny Krasnov1-5/+15
Update current stream enqueue function for SEQPACKET support: 1) Call transport's seqpacket enqueue callback. 2) Return value from enqueue function is whole record length or error for SOCK_SEQPACKET. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: implement SEQPACKET receive loopArseny Krasnov1-1/+54
Add receive loop for SEQPACKET. It looks like receive loop for STREAM, but there are differences: 1) It doesn't call notify callbacks. 2) It doesn't care about 'SO_SNDLOWAT' and 'SO_RCVLOWAT' values, because there is no sense for these values in SEQPACKET case. 3) It waits until whole record is received. 4) It processes and sets 'MSG_TRUNC' flag. So to avoid extra conditions for two types of socket inside one loop, two independent functions were created. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: separate receive data loopArseny Krasnov1-49/+67
Some code in receive data loop could be shared between SEQPACKET and STREAM sockets, while another part is type specific, so move STREAM specific data receive logic to '__vsock_stream_recvmsg()' dedicated function, while checks, that will be same for both STREAM and SEQPACKET sockets, stays in 'vsock_connectible_recvmsg()'. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: separate wait data loopArseny Krasnov1-72/+84
Wait loop for data could be shared between SEQPACKET and STREAM sockets, so move it to dedicated function. While moving the code around, let's update an old comment. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11af_vsock: update functions for connectible socketArseny Krasnov1-30/+34
Prepare af_vsock.c for SEQPACKET support: rename some functions such as setsockopt(), getsockopt(), connect(), recvmsg(), sendmsg() in general manner, because they are shared with stream sockets. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-29net: vsock: Fix a typoLu Wei1-1/+1
Modify "occured" to "occurred" in net/vmw_vsock/af_vsock.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-19selinux: vsock: Set SID for socket returned by accept()David Brazdil1-0/+1
For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10vsock: fix locking in vsock_shutdown()Stefano Garzarella1-3/+5
In vsock_shutdown() we touched some socket fields without holding the socket lock, such as 'state' and 'sk_flags'. Also, after the introduction of multi-transport, we are accessing 'vsk->transport' in vsock_send_shutdown() without holding the lock and this call can be made while the connection is in progress, so the transport can change in the meantime. To avoid issues, we hold the socket lock when we enter in vsock_shutdown() and release it when we leave. Among the transports that implement the 'shutdown' callback, only hyperv_transport acquired the lock. Since the caller now holds it, we no longer take it. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-07net/vmw_vsock: improve locking in vsock_connect_timeout()Norbert Slusarek1-4/+1
A possible locking issue in vsock_connect_timeout() was recognized by Eric Dumazet which might cause a null pointer dereference in vsock_transport_cancel_pkt(). This patch assures that vsock_transport_cancel_pkt() will be called within the lock, so a race condition won't occur which could result in vsk->transport to be set to NULL. Fixes: 380feae0def7 ("vsock: cancel packets when failing to connect") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-07net/vmw_vsock: fix NULL pointer dereferenceNorbert Slusarek1-1/+1
In vsock_stream_connect(), a thread will enter schedule_timeout(). While being scheduled out, another thread can enter vsock_stream_connect() as well and set vsk->transport to NULL. In case a signal was sent, the first thread can leave schedule_timeout() and vsock_transport_cancel_pkt() will be called right after. Inside vsock_transport_cancel_pkt(), a null dereference will happen on transport->cancel_pkt. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/trinity-c2d6cede-bfb1-44e2-85af-1fbc7f541715-1612535117028@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02vsock: fix the race conditions in multi-transport supportAlexander Popov1-5/+12
There are multiple similar bugs implicitly introduced by the commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading"). The bug pattern: [1] vsock_sock.transport pointer is copied to a local variable, [2] lock_sock() is called, [3] the local variable is used. VSOCK multi-transport support introduced the race condition: vsock_sock.transport value may change between [1] and [2]. Let's copy vsock_sock.transport pointer to local variables after the lock_sock() call. Fixes: c0cfa2d8a788fcf4 ("vsock: add multi-transports support") Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Link: https://lore.kernel.org/r/20210201084719.2257066-1-alex.popov@linux.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-15af_vsock: Assign the vsock transport considering the vsock address flagsAndra Paraschiv1-2/+7
The vsock flags field can be set in the connect path (user space app) and the (listen) receive path (kernel space logic). When the vsock transport is assigned, the remote CID is used to distinguish between types of connection. Use the vsock flags value (in addition to the CID) from the remote address to decide which vsock transport to assign. For the sibling VMs use case, all the vsock packets need to be forwarded to the host, so always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag is set. For the other use cases, the vsock transport assignment logic is not changed. Changelog v3 -> v4 * Update the "remote_flags" local variable type to reflect the change of the "svm_flags" field to be 1 byte in size. v2 -> v3 * Update bitwise check logic to not compare result to the flag value. v1 -> v2 * Use bitwise operator to check the vsock flag. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. * Merge the checks for the g2h transport assignment in one "if" block. Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-15af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive pathAndra Paraschiv1-0/+12
The vsock flags can be set during the connect() setup logic, when initializing the vsock address data structure variable. Then the vsock transport is assigned, also considering this flags field. The vsock transport is also assigned on the (listen) receive path. The flags field needs to be set considering the use case. Set the value of the vsock flags of the remote address to the one targeted for packets forwarding to the host, if the following conditions are met: * The source CID of the packet is higher than VMADDR_CID_HOST. * The destination CID of the packet is higher than VMADDR_CID_HOST. Changelog v3 -> v4 * No changes. v2 -> v3 * No changes. v1 -> v2 * Set the vsock flag on the receive path in the vsock transport assignment logic. * Use bitwise operator for the vsock flag setup. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Signed-off-by: Jakub Kicinski <kuba@kernel.org>