summaryrefslogtreecommitdiff
path: root/drivers/nvme/target/tcp.c
AgeCommit message (Collapse)AuthorFilesLines
2020-09-27nvmet-tcp: have queue io_work context run on sock incoming cpuMark Wunderlich1-11/+10
No real good need to spread queues artificially. Usually the target will serve multiple hosts, and it's better to run on the socket incoming cpu for better affinitization rather than spread queues on all online cpus. We rely on RSS to spread the work around sufficiently. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-29nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pduZiye Yang1-1/+9
When handling commands without in-capsule data, we assign the ttag assuming we already have the queue commands array allocated (based on the queue size information in the connect data payload). However if the connect itself did not send the connect data in-capsule we have yet to allocate the queue commands,and we will assign a bogus ttag and suffer a NULL dereference when we receive the corresponding h2cdata pdu. Fix this by checking if we already allocated commands before dereferencing it when handling h2cdata, if we didn't, its for sure a connect and we should use the preallocated connect command. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2020-07-08nvmet-tcp: simplify nvmet_process_resp_listSagi Grimberg1-9/+3
We can make it shorter and simpler without some redundant checks. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvmet-tcp: remove has_keyed_sgls initializationMax Gurtovoy1-1/+0
Since the nvmet_tcp_ops is static, there is no need to initialize values to zero. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-12Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block fixes from Jens Axboe: "Some followup fixes for this merge window. In particular: - Seqcount write missing preemption disable for stats (Ahmed) - blktrace fixes (Chaitanya) - Redundant initializations (Colin) - Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max, Niklas, Rikard) - loop flag bug regression fix (Martijn) - blk-mq tagging fixes (Christoph, Ming)" * tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block: umem: remove redundant initialization of variable ret pktcdvd: remove redundant initialization of variable ret nvmet: fail outstanding host posted AEN req nvme-pci: use simple suspend when a HMB is enabled nvme-fc: don't call nvme_cleanup_cmd() for AENs nvmet-tcp: constify nvmet_tcp_ops nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops nvme: do not call del_gendisk() on a disk that was never added blk-mq: fix blk_mq_all_tag_iter blk-mq: split out a __blk_mq_get_driver_tag helper blktrace: fix endianness for blk_log_remap() blktrace: fix endianness in get_pdu_int() blktrace: use errno instead of bi_status block: nr_sects_write(): Disable preemption on seqcount write block: remove the error argument to the block_bio_complete tracepoint loop: Fix wrong masking of status flags block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed
2020-06-11nvmet-tcp: constify nvmet_tcp_opsMax Gurtovoy1-2/+2
nvmet_tcp_ops is never modified and can be made const to allow the compiler to put it in read-only memory, as done in other transports. Before: text data bss dec hex filename 16164 160 12 16336 3fd0 drivers/nvme/target/tcp.o After: text data bss dec hex filename 16277 64 12 16353 3fe1 drivers/nvme/target/tcp.o Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-44/+10
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-05-28ipv4: add ip_sock_set_tosChristoph Hellwig1-8/+2
Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_nodelayChristoph Hellwig1-10/+2
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_priorityChristoph Hellwig1-14/+4
Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_no_lingerChristoph Hellwig1-5/+1
Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_reuseaddrChristoph Hellwig1-7/+1
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27nvmet-tcp: move send/recv error handling in the send/recv methods instead of ↵Sagi Grimberg1-19/+24
call-sites Have routines handle errors and just bail out of the poll loop. This simplifies the code and will help as we may enhance the poll loop logic and these are somewhat in the way. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvmet-tcp: set MSG_EOR if we send last payload in the batchSagi Grimberg1-0/+2
when trying to send the pdu data digest, we should set this flag. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvmet-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to sendSagi Grimberg1-4/+4
We can signal the stack that this is not the last page coming and the stack can build a larger tso segment, so go ahead and use it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-03-30Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+32
Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...
2020-03-25nvmet-tcp: optimize tcp stack TX when data digest is usedSagi Grimberg1-2/+5
If we have a 4-byte data digest to send to the wire, but we have more data to send, set MSG_MORE to tell the stack that more is coming. Reviewed-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-25nvmet-tcp: fix maxh2cdata icresp parameterSagi Grimberg1-1/+1
MAXH2CDATA is not zero based. Also no reason to limit ourselves to 1M transfers as we can do more easily. Make this an arbitrary limit of 16M. Reported-by: Wenhua Liu <liuw@vmware.com> Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-20nvmet-tcp: set MSG_MORE only if we actually have more to sendSagi Grimberg1-3/+9
When we send PDU data, we want to optimize the tcp stack operation if we have more data to send. So when we set MSG_MORE when: - We have more fragments coming in the batch, or - We have a more data to send in this PDU - We don't have a data digest trailer - We optimize with the SUCCESS flag and omit the NVMe completion (used if sq_head pointer update is disabled) This addresses a regression in QD=1 with SUCCESS flag optimization as we unconditionally set MSG_MORE when we didn't actually have more data to send. Fixes: 70583295388a ("nvmet-tcp: implement C2HData SUCCESS optimization") Reported-by: Mark Wunderlich <mark.wunderlich@intel.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-04nvmet-tcp: set SO_PRIORITY for accepted socketsWunderlich, Mark1-0/+26
Enable ability to associate all sockets related to NVMf TCP traffic to a priority group that will perform optimized network processing for this traffic class. Maintain initial default behavior of using priority of zero. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-04nvmet: Open code nvmet_req_execute()Christoph Hellwig1-3/+3
Now that nvmet_req_execute does nothing, open code it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [split patch, update changelog] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvmet-tcp: Don't set the request's data_lenLogan Gunthorpe1-4/+2
It's not apprporiate for the transports to set the data_len field of the request which is only used by the core. In this case, just use a variable on the stack to store the length of the sgl for comparison. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvmet-tcp: Don't check data_len in nvmet_tcp_map_data()Logan Gunthorpe1-1/+1
None of the other transports check data_len which is verified in core code. The function should instead check that the sgl length is non-zero. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-25nvmet-tcp: remove superflous check on request sglSagi Grimberg1-8/+4
Now that sgl_free is null safe, drop the superflous check. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: Add TOS for tcp transportIsrael Rukshin1-0/+11
Set the outgoing packets type of service (TOS) according to the receiving TOS. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: fix possible memory leakSagi Grimberg1-0/+1
when we uninit a command in error flow we also need to free an iovec if it was allocated. Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvmet-tcp: fix possible NULL derefSagi Grimberg1-4/+8
We must only call sgl_free for sgl that we actually allocated. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-04-25nvmet-tcp: don't fail maxr2t greater than 1Sagi Grimberg1-6/+0
The host may support it, but nothing prevents us from sending a single r2t at a time like we do anyways. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-25nvmet: rename nvme_completion instances from rsp to cqeMax Gurtovoy1-4/+4
Use NVMe namings for improving code readability. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-05nvmet-tcp: implement C2HData SUCCESS optimizationSagi Grimberg1-3/+21
TP 8000 says that the use of the SUCCESS flag depends on weather the controller support disabling sq_head pointer updates. Given that we support it by default, makes sense that we go the extra mile to actually use the SUCCESS flag. When we create the C2HData PDU header, we check if sqhd_disabled is set on our queue, if so, we set the SUCCESS flag in the PDU header and skip sending a completion response capsule. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Oliver Smith-Denny <osmithde@cisco.com> Tested-by: Oliver Smith-Denny <osmithde@cisco.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-16nvmet-tcp: fix uninitialized variable accessSagi Grimberg1-1/+1
If we end up in nvmet_tcp_try_recv_one with a bogus state queue receive state we will access result which is uninitialized. Initialize restult to 0 which will be considered as if no data was received by the tcp socket. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18nvmet-tcp: fix endianess annotationsChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-13nvmet-tcp: add NVMe over TCP target driverSagi Grimberg1-0/+1737
This patch implements the TCP transport driver for the NVMe over Fabrics target stack. This allows exporting NVMe over Fabrics functionality over good old TCP/IP. The driver implements the TP 8000 of how nvme over fabrics capsules and data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte stream. nvme-tcp header and data digest are supported as well. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>