summaryrefslogtreecommitdiff
path: root/include/net/tls.h
AgeCommit message (Collapse)AuthorFilesLines
2022-08-11net/tls: Use RCU API to access tls_ctx->netdevMaxim Mikityanskiy1-1/+1
Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29net/tls: Multi-threaded calls to TX tls_dev_delTariq Toukan1-0/+2
Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: do not use the standard strparserJakub Kicinski1-2/+17
TLS is a relatively poor fit for strparser. We pause the input every time a message is received, wait for a read which will decrypt the message, start the parser, repeat. strparser is built to delineate the messages, wrap them in individual skbs and let them float off into the stack or a different socket. TLS wants the data pages and nothing else. There's no need for TLS to keep cloning (and occasionally skb_unclone()'ing) the TCP rx queue. This patch uses a pre-allocated skb and attaches the skbs from the TCP rx queue to it as frags. TLS is careful never to modify the input skb without CoW'ing / detaching it first. Since we call TCP rx queue cleanup directly we also get back the benefit of skb deferred free. Overall this results in a 6% gain in my benchmarks. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18tls: rx: async: hold onto the input skbJakub Kicinski1-0/+1
Async crypto currently benefits from the fact that we decrypt in place. When we allow input and output to be different skbs we will have to hang onto the input while we move to the next record. Clone the inputs and keep them on a list. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18tls: rx: allow only one reader at a timeJakub Kicinski1-0/+3
recvmsg() in TLS gets data from the skb list (rx_list) or fresh skbs we read from TCP via strparser. The former holds skbs which were already decrypted for peek or decrypted and partially consumed. tls_wait_data() only notices appearance of fresh skbs coming out of TCP (or psock). It is possible, if there is a concurrent call to peek() and recv() that the peek() will move the data from input to rx_list without recv() noticing. recv() will then read data out of order or never wake up. This is not a practical use case/concern, but it makes the self tests less reliable. This patch solves the problem by allowing only one reader in. Because having multiple processes calling read()/peek() is not normal avoid adding a lock and try to fast-path the single reader case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09tls: create an internal headerJakub Kicinski1-276/+1
include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09tls: rx: always allocate max possible aad size for decryptJakub Kicinski1-0/+1
AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3Jakub Kicinski1-0/+3
Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-19tls: Add opt-in zerocopy mode of sendfile()Boris Pismenny1-0/+1
TLS device offload copies sendfile data to a bounce buffer before transmitting. It allows to maintain the valid MAC on TLS records when the file contents change and a part of TLS record has to be retransmitted on TCP level. In many common use cases (like serving static files over HTTPS) the file contents are not changed on the fly. In many use cases breaking the connection is totally acceptable if the file is changed during transmission, because it would be received corrupted in any case. This commit allows to optimize performance for such use cases to providing a new optional mode of TLS sendfile(), in which the extra copy is skipped. Removing this copy improves performance significantly, as TLS and TCP sendfile perform the same operations, and the only overhead is TLS header/trailer insertion. The new mode can only be enabled with the new socket option named TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards compatibility with existing applications that rely on the copying behavior. The new mode is safe, meaning that unsolicited modifications of the file being sent can't break integrity of the kernel. The worst thing that can happen is sending a corrupted TLS record, which is in any case not forbidden when using regular TCP sockets. Sockets other than TLS device offload are not affected by the new socket option. The actual status of zerocopy sendfile can be queried with sock_diag. Performance numbers in a single-core test with 24 HTTPS streams on nginx, under 100% CPU load: * non-zerocopy: 33.6 Gbit/s * zerocopy: 79.92 Gbit/s CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz Signed-off-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-12net: remove noblock parameter from recvmsg() entitiesOliver Hartkopp1-1/+1
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg(). Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g. err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); or in err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition). Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-10tls: rx: simplify async waitJakub Kicinski1-1/+0
Since we are protected from async completions by decrypt_compl_lock we can drop the async_notify and reinit the completion before we start waiting. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: use a define for tag lengthJakub Kicinski1-0/+1
TLS 1.3 has to strip padding, and it starts out 16 bytes from the end of the record. Make it clear this is because of the auth tag. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the decryption status in socket contextJakub Kicinski1-1/+0
Similar justification to previous change, the information about decryption status belongs in the skb. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08tls: rx: don't store the record type in socket contextJakub Kicinski1-7/+3
Original TLS implementation was handling one record at a time. It stashed the type of the record inside tls context (per socket structure) for convenience. When async crypto support was added [1] the author had to use skb->cb to store the type per-message. The use of skb->cb overlaps with strparser, however, so a hybrid approach was taken where type is stored in context while parsing (since we parse a message at a time) but once parsed its copied to skb->cb. Recently a workaround for sockmaps [2] exposed the previously private struct _strp_msg and started a trend of adding user fields directly in strparser's header. This is cleaner than storing information about an skb in the context. This change is not strictly necessary, but IMHO the ownership of the context field is confusing. Information naturally belongs to the skb. [1] commit 94524d8fc965 ("net/tls: Add support for async decryption of tls records") [2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-10net/tls: Provide {__,}tls_driver_ctx() unconditionallyDimitris Michailidis1-2/+0
Having the definitions of {__,}tls_driver_ctx() under an #if guard means code referencing them also needs to rely on the preprocessor. The protection doesn't appear needed so make the definitions unconditional. Fixes: db37bc177dae ("net/funeth: add the data path") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-8/+3
include/net/sock.h 7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable") 4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net/tls: Fix flipped sign in tls_err_abort() callsDaniel Jordan1-7/+2
sk->sk_err appears to expect a positive value, a convention that ktls doesn't always follow and that leads to memory corruption in other code. For instance, [kworker] tls_encrypt_done(..., err=<negative error from crypto request>) tls_err_abort(.., err) sk->sk_err = err; [task] splice_from_pipe_feed ... tls_sw_do_sendpage if (sk->sk_err) { ret = -sk->sk_err; // ret is positive splice_from_pipe_feed (continued) ret = actor(...) // ret is still positive and interpreted as bytes // written, resulting in underflow of buf->len and // sd->len, leading to huge buf->offset and bogus // addresses computed in later calls to actor() Fix all tls_err_abort() callers to pass a negative error code consistently and centralize the error-prone sign flip there, throwing in a warning to catch future misuse and uninlining the function so it really does only warn once. Cc: stable@vger.kernel.org Fixes: c46234ebb4d1e ("tls: RX path for ktls") Reported-by: syzbot+b187b77c8474f9648fae@syzkaller.appspotmail.com Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26net: Rename ->stream_memory_read to ->sock_is_readableCong Wang1-1/+1
The proto ops ->stream_memory_read() is currently only used by TCP to check whether psock queue is empty or not. We need to rename it before reusing it for non-TCP protocols, and adjust the exsiting users accordingly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
2021-10-25net/tls: tls_crypto_context add supported algorithms contextTianjia Zhang1-0/+2
tls already supports the SM4 GCM/CCM algorithms. It is also necessary to add support for these two algorithms in tls_crypto_context to avoid potential issues caused by forced type conversion. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28net/tls: support SM4 CCM algorithmTianjia Zhang1-1/+2
The IV of CCM mode has special requirements, this patch supports CCM mode of SM4 algorithm. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: sock: introduce sk_error_reportAlexander Aring1-1/+1
This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, for example to trace socket errors. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net/tls: Remove the __TLS_DEC_STATS() macro.Kuniyuki Iwashima1-2/+0
The commit d26b698dd3cd ("net/tls: add skeleton of MIB statistics") introduced __TLS_DEC_STATS(), but it is not used and __SNMP_DEC_STATS() is not defined also. Let's remove it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02net/tls: Fix use-after-free after the TLS device goes down and upMaxim Mikityanskiy1-0/+9
When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context. This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode. On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets. The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del). A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver). Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02net/tls: Replace TLS_RX_SYNC_RUNNING with RCUMaxim Mikityanskiy1-1/+0
RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+6
Trivial conflict in CAN, keep the net-next + the byteswap wrapper. Conflicts: drivers/net/can/usb/gs_usb.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28net/tls: add CHACHA20-POLY1305 specific behaviorVadim Fedorenko1-3/+6
RFC 7905 defines special behavior for ChaCha-Poly TLS sessions. The differences are in the calculation of nonce and the absence of explicit IV. This behavior is like TLSv1.3 partly. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28net/tls: add CHACHA20-POLY1305 specific defines and structuresVadim Fedorenko1-0/+1
To provide support for ChaCha-Poly cipher we need to define specific constants and structures. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28net/tls: make inline helpers protocol-awareVadim Fedorenko1-14/+12
Inline functions defined in tls.h have a lot of AES-specific constants. Remove these constants and change argument to struct tls_prot_info to have an access to cipher type in later patches Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-26net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy1-0/+6
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18net/tls: Fix wrong record sn in async mode of device resyncTariq Toukan1-1/+15
In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-10net/tls: remove a duplicate function prototypeRandy Dunlap1-4/+0
Remove one of the two instances of the function prototype for tls_validate_xmit_skb(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Boris Pismenny <borisp@nvidia.com> Cc: Aviad Yehezkel <aviadye@nvidia.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-06-30net/tls: fix sign extension issue when left shifting u16 valueColin Ian King1-1/+1
Left shifting the u16 value promotes it to a int and then it gets sign extended to a u64. If len << 16 is greater than 0x7fffffff then the upper bits get set to 1 because of the implicit sign extension. Fix this by casting len to u64 before shifting it. Addresses-Coverity: ("integer handling issues") Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net/tls: Add asynchronous resyncBoris Pismenny1-1/+37
This patch adds support for asynchronous resynchronization in tls_device. Async resync follows two distinct stages: 1. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. 2. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-28Revert "net/tls: Add force_resync for driver resync"Boris Pismenny1-11/+1
This reverts commit b3ae2459f89773adcbf16fef4b68deaaa3be1929. Revert the force resync API. Not in use. To be replaced by a better async resync API downstream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-02bpf: Fix running sk_skb program types with ktlsJohn Fastabend1-0/+9
KTLS uses a stream parser to collect TLS messages and send them to the upper layer tls receive handler. This ensures the tls receiver has a full TLS header to parse when it is run. However, when a socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS is enabled we end up with two stream parsers running on the same socket. The result is both try to run on the same socket. First the KTLS stream parser runs and calls read_sock() which will tcp_read_sock which in turn calls tcp_rcv_skb(). This dequeues the skb from the sk_receive_queue. When this is done KTLS code then data_ready() callback which because we stacked KTLS on top of the bpf stream verdict program has been replaced with sk_psock_start_strp(). This will in turn kick the stream parser again and eventually do the same thing KTLS did above calling into tcp_rcv_skb() and dequeuing a skb from the sk_receive_queue. At this point the data stream is broke. Part of the stream was handled by the KTLS side some other bytes may have been handled by the BPF side. Generally this results in either missing data or more likely a "Bad Message" complaint from the kTLS receive handler as the BPF program steals some bytes meant to be in a TLS header and/or the TLS header length is no longer correct. We've already broke the idealized model where we can stack ULPs in any order with generic callbacks on the TX side to handle this. So in this patch we do the same thing but for RX side. We add a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict program is running and add a tls_sw_has_ctx_rx() helper so BPF side can learn there is a TLS ULP on the socket. Then on BPF side we omit calling our stream parser to avoid breaking the data stream for the KTLS receiver. Then on the KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS receiver is done with the packet but before it posts the msg to userspace. This gives us symmetry between the TX and RX halfs and IMO makes it usable again. On the TX side we process packets in this order BPF -> TLS -> TCP and on the receive side in the reverse order TCP -> TLS -> BPF. Discovered while testing OpenSSL 3.0 Alpha2.0 release. Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+4
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net/tls: Add force_resync for driver resyncTariq Toukan1-1/+11
This patch adds a field to the tls rx offload context which enables drivers to force a send_resync call. This field can be used by drivers to request a resync at the next possible tls record. It is beneficial for hardware that provides the resync sequence number asynchronously. In such cases, the packet that triggered the resync does not contain the information required for a resync. Instead, the driver requests resync for all the following TLS record until the asynchronous notification with the resync request TCP sequence arrives. A following series for mlx5e ConnectX-6DX TLS RX offload support will use this mechanism. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26net/tls: fix race condition causing kernel panicVinay Kumar Yadav1-0/+4
tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently. // tls_sw_recvmsg() if (atomic_read(&ctx->decrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait); else reinit_completion(&ctx->async_wait.completion); //tls_decrypt_done() pending = atomic_dec_return(&ctx->decrypt_pending); if (!pending && READ_ONCE(ctx->async_notify)) complete(&ctx->async_wait.completion); Consider the scenario tls_decrypt_done() is about to run complete() if (!pending && READ_ONCE(ctx->async_notify)) and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(), then tls_decrypt_done() runs complete(). This sequence of execution results in wrong completion. Consequently, for next decrypt request, it will not wait for completion, eventually on connection close, crypto resources freed, there is no way to handle pending decrypt response. This race condition can be avoided by having atomic_read() mutually exclusive with atomic_dec_return(),complete().Intoduced spin lock to ensure the mutual exclution. Addressed similar problem in tx direction. v1->v2: - More readable commit message. - Corrected the lock to fix new race scenario. - Removed barrier which is not needed now. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net/tls: add helper for testing if socket is RX offloadedJakub Kicinski1-0/+9
There is currently no way for driver to reliably check that the socket it has looked up is in fact RX offloaded. Add a helper. This allows drivers to catch misbehaving firmware. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-29net/tls: use sg_next() to walk sg entriesJakub Kicinski1-1/+1
Partially sent record cleanup path increments an SG entry directly instead of using sg_next(). This should not be a problem today, as encrypted messages should be always allocated as arrays. But given this is a cleanup path it's easy to miss was this ever to change. Use sg_next(), and simplify the code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-29net/tls: remove the dead inplace_crypto codeJakub Kicinski1-1/+0
Looks like when BPF support was added by commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling") and commit d829e9c4112b ("tls: convert to generic sk_msg interface") it broke/removed the support for in-place crypto as added by commit 4e6d47206c32 ("tls: Add support for inplace records encryption"). The inplace_crypto member of struct tls_rec is dead, inited to zero, and sometimes set to zero again. It used to be set to 1 when record was allocated, but the skmsg code doesn't seem to have been written with the idea of in-place crypto in mind. Since non trivial effort is required to bring the feature back and we don't really have the HW to measure the benefit just remove the left over support for now to avoid confusing readers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock from commit c8183f548902 ("s390/qeth: fix potential deadlock on workqueue flush"), removed the code which was removed by commit 9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code"). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-20net/tls: enable sk_msg redirect to tls socket egressWillem de Bruijn1-0/+2
Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket with TLS_TX takes the following path: tcp_bpf_sendmsg_redir tcp_bpf_push_locked tcp_bpf_push kernel_sendpage_locked sock->ops->sendpage_locked Also update the flags test in tls_sw_sendpage_locked to allow flag MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this. Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t Link: https://github.com/wdebruij/kerneltools/commits/icept.2 Fixes: 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP") Fixes: f3de19af0f5b ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+5
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net/tls: add a TX lockJakub Kicinski1-0/+5
TLS TX needs to release and re-acquire the socket lock if send buffer fills up. TLS SW TX path currently depends on only allowing one thread to enter the function by the abuse of sk_write_pending. If another writer is already waiting for memory no new ones are allowed in. This has two problems: - writers don't wake other threads up when they leave the kernel; meaning that this scheme works for single extra thread (second application thread or delayed work) because memory becoming available will send a wake up request, but as Mallesham and Pooja report with larger number of threads it leads to threads being put to sleep indefinitely; - the delayed work does not get _scheduled_ but it may _run_ when other writers are present leading to crashes as writers don't expect state to change under their feet (same records get pushed and freed multiple times); it's hard to reliably bail from the work, however, because the mere presence of a writer does not guarantee that the writer will push pending records before exiting. Ensuring wakeups always happen will make the code basically open code a mutex. Just use a mutex. The TLS HW TX path does not have any locking (not even the sk_write_pending hack), yet it uses a per-socket sg_tx_data array to push records. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: Mallesham Jatharakonda <mallesh537@gmail.com> Reported-by: Pooja Trivedi <poojatrivedi@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-07net/tls: store decrypted on a single bitJakub Kicinski1-1/+1
Use a single bit instead of boolean to remember if packet was already decrypted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-07net/tls: store async_capable on a single bitJakub Kicinski1-2/+2
Store async_capable on a single bit instead of a full integer to save space. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-07net/tls: pass context to tls_device_decrypted()Jakub Kicinski1-2/+5
Avoid unnecessary pointer chasing and calculations, callers already have most of the state tls_device_decrypted() needs. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-06net/tls: add skeleton of MIB statisticsJakub Kicinski1-0/+13
Add a skeleton structure for adding TLS statistics. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-06net/tls: add tracing for device/offload eventsJakub Kicinski1-7/+1
Add tracing of device-related interaction to aid performance analysis, especially around resync: tls:tls_device_offload_set tls:tls_device_rx_resync_send tls:tls_device_rx_resync_nh_schedule tls:tls_device_rx_resync_nh_delay tls:tls_device_tx_resync_req tls:tls_device_tx_resync_send Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>