summaryrefslogtreecommitdiff
path: root/net/rxrpc
AgeCommit message (Collapse)AuthorFilesLines
2016-09-08rxrpc: Rewrite the data and ack handling codeDavid Howells21-3286/+1983
Rewrite the data and ack handling code such that: (1) Parsing of received ACK and ABORT packets and the distribution and the filing of DATA packets happens entirely within the data_ready context called from the UDP socket. This allows us to process and discard ACK and ABORT packets much more quickly (they're no longer stashed on a queue for a background thread to process). (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead keep track of the offset and length of the content of each packet in the sk_buff metadata. This means we don't do any allocation in the receive path. (3) Jumbo DATA packet parsing is now done in data_ready context. Rather than cloning the packet once for each subpacket and pulling/trimming it, we file the packet multiple times with an annotation for each indicating which subpacket is there. From that we can directly calculate the offset and length. (4) A call's receive queue can be accessed without taking locks (memory barriers do have to be used, though). (5) Incoming calls are set up from preallocated resources and immediately made live. They can than have packets queued upon them and ACKs generated. If insufficient resources exist, DATA packet #1 is given a BUSY reply and other DATA packets are discarded). (6) sk_buffs no longer take a ref on their parent call. To make this work, the following changes are made: (1) Each call's receive buffer is now a circular buffer of sk_buff pointers (rxtx_buffer) rather than a number of sk_buff_heads spread between the call and the socket. This permits each sk_buff to be in the buffer multiple times. The receive buffer is reused for the transmit buffer. (2) A circular buffer of annotations (rxtx_annotations) is kept parallel to the data buffer. Transmission phase annotations indicate whether a buffered packet has been ACK'd or not and whether it needs retransmission. Receive phase annotations indicate whether a slot holds a whole packet or a jumbo subpacket and, if the latter, which subpacket. They also note whether the packet has been decrypted in place. (3) DATA packet window tracking is much simplified. Each phase has just two numbers representing the window (rx_hard_ack/rx_top and tx_hard_ack/tx_top). The hard_ack number is the sequence number before base of the window, representing the last packet the other side says it has consumed. hard_ack starts from 0 and the first packet is sequence number 1. The top number is the sequence number of the highest-numbered packet residing in the buffer. Packets between hard_ack+1 and top are soft-ACK'd to indicate they've been received, but not yet consumed. Four macros, before(), before_eq(), after() and after_eq() are added to compare sequence numbers within the window. This allows for the top of the window to wrap when the hard-ack sequence number gets close to the limit. Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also to indicate when rx_top and tx_top point at the packets with the LAST_PACKET bit set, indicating the end of the phase. (4) Calls are queued on the socket 'receive queue' rather than packets. This means that we don't need have to invent dummy packets to queue to indicate abnormal/terminal states and we don't have to keep metadata packets (such as ABORTs) around (5) The offset and length of a (sub)packet's content are now passed to the verify_packet security op. This is currently expected to decrypt the packet in place and validate it. However, there's now nowhere to store the revised offset and length of the actual data within the decrypted blob (there may be a header and padding to skip) because an sk_buff may represent multiple packets, so a locate_data security op is added to retrieve these details from the sk_buff content when needed. (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is individually secured and needs to be individually decrypted. The code to do this is broken out into rxrpc_recvmsg_data() and shared with the kernel API. It now iterates over the call's receive buffer rather than walking the socket receive queue. Additional changes: (1) The timers are condensed to a single timer that is set for the soonest of three timeouts (delayed ACK generation, DATA retransmission and call lifespan). (2) Transmission of ACK and ABORT packets is effected immediately from process-context socket ops/kernel API calls that cause them instead of them being punted off to a background work item. The data_ready handler still has to defer to the background, though. (3) A shutdown op is added to the AF_RXRPC socket so that the AFS filesystem can shut down the socket and flush its own work items before closing the socket to deal with any in-progress service calls. Future additional changes that will need to be considered: (1) Make sure that a call doesn't hog the front of the queue by receiving data from the network as fast as userspace is consuming it to the exclusion of other calls. (2) Transmit delayed ACKs from within recvmsg() when we've consumed sufficiently more packets to avoid the background work item needing to run. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08rxrpc: Preallocate peers, conns and calls for incoming service requestsDavid Howells8-10/+315
Make it possible for the data_ready handler called from the UDP transport socket to completely instantiate an rxrpc_call structure and make it immediately live by preallocating all the memory it might need. The idea is to cut out the background thread usage as much as possible. [Note that the preallocated structs are not actually used in this patch - that will be done in a future patch.] If insufficient resources are available in the preallocation buffers, it will be possible to discard the DATA packet in the data_ready handler or schedule a BUSY packet without the need to schedule an attempt at allocation in a background thread. To this end: (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a maximum number each of the listen backlog size. The backlog size is limited to a maxmimum of 32. Only this many of each can be in the preallocation buffer. (2) For userspace sockets, the preallocation is charged initially by listen() and will be recharged by accepting or rejecting pending new incoming calls. (3) For kernel services {,re,dis}charging of the preallocation buffers is handled manually. Two notifier callbacks have to be provided before kernel_listen() is invoked: (a) An indication that a new call has been instantiated. This can be used to trigger background recharging. (b) An indication that a call is being discarded. This is used when the socket is being released. A function, rxrpc_kernel_charge_accept() is called by the kernel service to preallocate a single call. It should be passed the user ID to be used for that call and a callback to associate the rxrpc call with the kernel service's side of the ID. (4) Discard the preallocation when the socket is closed. (5) Temporarily bump the refcount on the call allocated in rxrpc_incoming_call() so that rxrpc_release_call() can ditch the preallocation ref on service calls unconditionally. This will no longer be necessary once the preallocation is used. Note that this does not yet control the number of active service calls on a client - that will come in a later patch. A future development would be to provide a setsockopt() call that allows a userspace server to manually charge the preallocation buffer. This would allow user call IDs to be provided in advance and the awkward manual accept stage to be bypassed. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08rxrpc: Add tracepoints to record received packets and end of data_readyDavid Howells1-2/+6
Add two tracepoints: (1) Record the RxRPC protocol header of packets retrieved from the UDP socket by the data_ready handler. (2) Record the outcome of the data_ready handler. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08rxrpc: Remove skb_count from struct rxrpc_callDavid Howells2-23/+12
Remove the sk_buff count from the rxrpc_call struct as it's less useful once we stop queueing sk_buffs. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08rxrpc: Convert rxrpc_local::services to an hlistDavid Howells5-11/+11
Convert the rxrpc_local::services list to an hlist so that it can be accessed under RCU conditions more readily. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08rxrpc: Fix ASSERTCMP and ASSERTIFCMP to handle signed valuesDavid Howells1-11/+13
Fix ASSERTCMP and ASSERTIFCMP to be able to handle signed values by casting both parameters to the type of the first before comparing. Without this, both values are cast to unsigned long, which means that checks for values less than zero don't work. The downside of this is that the state enum values in struct rxrpc_call and struct rxrpc_connection can't be bitfields as __typeof__ can't handle them. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Add tracepoint for working out where aborts happenDavid Howells8-90/+91
Add a tracepoint for working out where local aborts happen. Each tracepoint call is labelled with a 3-letter code so that they can be distinguished - and the DATA sequence number is added too where available. rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can indicate the circumstances when it aborts a call. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Fix returns of call completion helpersDavid Howells1-5/+8
rxrpc_set_call_completion() returns bool, not int, so the ret variable should match this. rxrpc_call_completed() and __rxrpc_call_completed() should return the value of rxrpc_set_call_completion(). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Calls shouldn't hold socket refsDavid Howells11-279/+303
rxrpc calls shouldn't hold refs on the sock struct. This was done so that the socket wouldn't go away whilst the call was in progress, such that the call could reach the socket's queues. However, we can mark the socket as requiring an RCU release and rely on the RCU read lock. To make this work, we do: (1) rxrpc_release_call() removes the call's call user ID. This is now only called from socket operations and not from the call processor: rxrpc_accept_call() / rxrpc_kernel_accept_call() rxrpc_reject_call() / rxrpc_kernel_reject_call() rxrpc_kernel_end_call() rxrpc_release_calls_on_socket() rxrpc_recvmsg() Though it is also called in the cleanup path of rxrpc_accept_incoming_call() before we assign a user ID. (2) Pass the socket pointer into rxrpc_release_call() rather than getting it from the call so that we can get rid of uninitialised calls. (3) Fix call processor queueing to pass a ref to the work queue and to release that ref at the end of the processor function (or to pass it back to the work queue if we have to requeue). (4) Skip out of the call processor function asap if the call is complete and don't requeue it if the call is complete. (5) Clean up the call immediately that the refcount reaches 0 rather than trying to defer it. Actual deallocation is deferred to RCU, however. (6) Don't hold socket refs for allocated calls. (7) Use the RCU read lock when queueing a message on a socket and treat the call's socket pointer according to RCU rules and check it for NULL. We also need to use the RCU read lock when viewing a call through procfs. (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call() if this hasn't been done yet so that we can then disconnect the call. Once the call is disconnected, it won't have any access to the connection struct and the UDP socket for the call work processor to be able to send the ACK. Terminal retransmission will be handled by the connection processor. (9) Release all calls immediately on the closing of a socket rather than trying to defer this. Incomplete calls will be aborted. The call refcount model is much simplified. Refs are held on the call by: (1) A socket's user ID tree. (2) A socket's incoming call secureq and acceptq. (3) A kernel service that has a call in progress. (4) A queued call work processor. We have to take care to put any call that we failed to queue. (5) sk_buffs on a socket's receive queue. A future patch will get rid of this. Whilst we're at it, we can do: (1) Get rid of the RXRPC_CALL_EV_RELEASE event. Release is now done entirely from the socket routines and never from the call's processor. (2) Get rid of the RXRPC_CALL_DEAD state. Calls now end in the RXRPC_CALL_COMPLETE state. (3) Get rid of the rxrpc_call::destroyer work item. Calls are now torn down when their refcount reaches 0 and then handed over to RCU for final cleanup. (4) Get rid of the rxrpc_call::deadspan timer. Calls are cleaned up immediately they're finished with and don't hang around. Post-completion retransmission is handled by the connection processor once the call is disconnected. (5) Get rid of the dead call expiry setting as there's no longer a timer to set. (6) rxrpc_destroy_all_calls() can just check that the call list is empty. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Use rxrpc_is_service_call() rather than rxrpc_conn_is_service()David Howells1-2/+2
Use rxrpc_is_service_call() rather than rxrpc_conn_is_service() if the call is available just in case call->conn is NULL. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Pass the connection pointer to rxrpc_post_packet_to_call()David Howells1-3/+4
Pass the connection pointer to rxrpc_post_packet_to_call() as the call might get disconnected whilst we're looking at it, but the connection pointer determined by rxrpc_data_read() is guaranteed by RCU for the duration of the call. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Cache the security index in the rxrpc_call structDavid Howells5-2/+7
Cache the security index in the rxrpc_call struct so that we can get at it even when the call has been disconnected and the connection pointer cleared. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Use call->peer rather than call->conn->params.peerDavid Howells1-3/+5
Use call->peer rather than call->conn->params.peer to avoid the possibility of call->conn being NULL and, whilst we're at it, check it for NULL before we access it. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Improve the call tracking tracepointDavid Howells9-43/+76
Improve the call tracking tracepoint by showing more differentiation between some of the put and get events, including: (1) Getting and putting refs for the socket call user ID tree. (2) Getting and putting refs for queueing and failing to queue the call processor work item. Note that these aren't necessarily used in this patch, but will be taken advantage of in future patches. An enum is added for the event subtype numbers rather than coding them directly as decimal numbers and a table of 3-letter strings is provided rather than a sequence of ?: operators. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Delete unused rxrpc_kernel_free_skb()David Howells1-13/+0
Delete rxrpc_kernel_free_skb() as it's unused. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-07rxrpc: Whitespace cleanupDavid Howells1-2/+1
Remove some whitespace. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc Move enum rxrpc_command to sendmsg.cDavid Howells2-7/+7
Move enum rxrpc_command to sendmsg.c as it's now only used in that file. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc: Rearrange net/rxrpc/sendmsg.cDavid Howells1-281/+277
Rearrange net/rxrpc/sendmsg.c to be in a more logical order. This makes it easier to follow and eliminates forward declarations. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc: Split sendmsg from packet transmission codeDavid Howells5-633/+657
Split the sendmsg code from the packet transmission code (mostly to be found in output.c). Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc: Don't change the epochDavid Howells1-24/+8
It seems the local epoch should only be changed on boot, so remove the code that changes it for client connections. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc: Randomise epoch and starting client conn ID valuesDavid Howells1-1/+8
Create a random epoch value rather than a time-based one on startup and set the top bit to indicate that this is the case. Also create a random starting client connection ID value. This will be incremented from here as new client connections are created. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-04rxrpc: The client call state must be changed before attachment to connDavid Howells2-2/+4
We must set the client call state to RXRPC_CALL_CLIENT_SEND_REQUEST before attaching the call to the connection struct, not after, as it's liable to receive errors and conn aborts as soon as the assignment is made - and these will cause its state to be changed outside of the initiating thread's control. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-03rxrpc: Fix uninitialised variable warningDavid Howells1-3/+2
Fix the following uninitialised variable warning: ../net/rxrpc/call_event.c: In function 'rxrpc_process_call': ../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in this function [-Wmaybe-uninitialized] _debug("post net error %d", error); ^ Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-03rxrpc: fix undefined behavior in rxrpc_mark_call_releasedArnd Bergmann1-1/+1
gcc -Wmaybe-initialized correctly points out a newly introduced bug through which we can end up calling rxrpc_queue_call() for a dead connection: net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released': net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this function [-Werror=maybe-uninitialized] This sets the 'sched' variable to zero to restore the previous behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-02rxrpc: Don't expose skbs to in-kernel users [ver #2]David Howells9-61/+214
Don't expose skbs to in-kernel users, such as the AFS filesystem, but instead provide a notification hook the indicates that a call needs attention and another that indicates that there's a new call to be collected. This makes the following possibilities more achievable: (1) Call refcounting can be made simpler if skbs don't hold refs to calls. (2) skbs referring to non-data events will be able to be freed much sooner rather than being queued for AFS to pick up as rxrpc_kernel_recv_data will be able to consult the call state. (3) We can shortcut the receive phase when a call is remotely aborted because we don't have to go through all the packets to get to the one cancelling the operation. (4) It makes it easier to do encryption/decryption directly between AFS's buffers and sk_buffs. (5) Encryption/decryption can more easily be done in the AFS's thread contexts - usually that of the userspace process that issued a syscall - rather than in one of rxrpc's background threads on a workqueue. (6) AFS will be able to wait synchronously on a call inside AF_RXRPC. To make this work, the following interface function has been added: int rxrpc_kernel_recv_data( struct socket *sock, struct rxrpc_call *call, void *buffer, size_t bufsize, size_t *_offset, bool want_more, u32 *_abort_code); This is the recvmsg equivalent. It allows the caller to find out about the state of a specific call and to transfer received data into a buffer piecemeal. afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction logic between them. They don't wait synchronously yet because the socket lock needs to be dealt with. Five interface functions have been removed: rxrpc_kernel_is_data_last() rxrpc_kernel_get_abort_code() rxrpc_kernel_get_error_number() rxrpc_kernel_free_skb() rxrpc_kernel_data_consumed() As a temporary hack, sk_buffs going to an in-kernel call are queued on the rxrpc_call struct (->knlrecv_queue) rather than being handed over to the in-kernel user. To process the queue internally, a temporary function, temp_deliver_data() has been added. This will be replaced with common code between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30rxrpc: Pass struct socket * to more rxrpc kernel interface functionsDavid Howells2-11/+14
Pass struct socket * to more rxrpc kernel interface functions. They should be starting from this rather than the socket pointer in the rxrpc_call struct if they need to access the socket. I have left: rxrpc_kernel_is_data_last() rxrpc_kernel_get_abort_code() rxrpc_kernel_get_error_number() rxrpc_kernel_free_skb() rxrpc_kernel_data_consumed() unmodified as they're all about to be removed (and, in any case, don't touch the socket). Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Use call->peer rather than going to the connectionDavid Howells1-5/+5
Use call->peer rather than call->conn->params.peer as call->conn may become NULL. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Provide a way for AFS to ask for the peer address of a callDavid Howells1-0/+15
Provide a function so that kernel users, such as AFS, can ask for the peer address of a call: void rxrpc_kernel_get_peer(struct rxrpc_call *call, struct sockaddr_rxrpc *_srx); In the future the kernel service won't get sk_buffs to look inside. Further, this allows us to hide any canonicalisation inside AF_RXRPC for when IPv6 support is added. Also propagate this through to afs_find_server() and issue a warning if we can't handle the address family yet. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Trace rxrpc_call usageDavid Howells11-34/+104
Add a trace event for debuging rxrpc_call struct usage. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Calls should only have one terminal stateDavid Howells12-184/+226
Condense the terminal states of a call state machine to a single state, plus a separate completion type value. The value is then set, along with error and abort code values, only when the call is transitioned to the completion state. Helpers are provided to simplify this. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30rxrpc: Fix a potential NULL-pointer deref in rxrpc_abort_callsDavid Howells1-11/+15
The call pointer in a channel on a connection will be NULL if there's no active call on that channel. rxrpc_abort_calls() needs to check for this before trying to take the call's state_lock. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24rxrpc: Improve management and caching of client connection objectsDavid Howells8-157/+933
Improve the management and caching of client rxrpc connection objects. From this point, client connections will be managed separately from service connections because AF_RXRPC controls the creation and re-use of client connections but doesn't have that luxury with service connections. Further, there will be limits on the numbers of client connections that may be live on a machine. No direct restriction will be placed on the number of client calls, excepting that each client connection can support a maximum of four concurrent calls. Note that, for a number of reasons, we don't want to simply discard a client connection as soon as the last call is apparently finished: (1) Security is negotiated per-connection and the context is then shared between all calls on that connection. The context can be negotiated again if the connection lapses, but that involves holding up calls whilst at least two packets are exchanged and various crypto bits are performed - so we'd ideally like to cache it for a little while at least. (2) If a packet goes astray, we will need to retransmit a final ACK or ABORT packet. To make this work, we need to keep around the connection details for a little while. (3) The locally held structures represent some amount of setup time, to be weighed against their occupation of memory when idle. To this end, the client connection cache is managed by a state machine on each connection. There are five states: (1) INACTIVE - The connection is not held in any list and may not have been exposed to the world. If it has been previously exposed, it was discarded from the idle list after expiring. (2) WAITING - The connection is waiting for the number of client conns to drop below the maximum capacity. Calls may be in progress upon it from when it was active and got culled. The connection is on the rxrpc_waiting_client_conns list which is kept in to-be-granted order. Culled conns with waiters go to the back of the queue just like new conns. (3) ACTIVE - The connection has at least one call in progress upon it, it may freely grant available channels to new calls and calls may be waiting on it for channels to become available. The connection is on the rxrpc_active_client_conns list which is kept in activation order for culling purposes. (4) CULLED - The connection got summarily culled to try and free up capacity. Calls currently in progress on the connection are allowed to continue, but new calls will have to wait. There can be no waiters in this state - the conn would have to go to the WAITING state instead. (5) IDLE - The connection has no calls in progress upon it and must have been exposed to the world (ie. the EXPOSED flag must be set). When it expires, the EXPOSED flag is cleared and the connection transitions to the INACTIVE state. The connection is on the rxrpc_idle_client_conns list which is kept in order of how soon they'll expire. A connection in the ACTIVE or CULLED state must have at least one active call upon it; if in the WAITING state it may have active calls upon it; other states may not have active calls. As long as a connection remains active and doesn't get culled, it may continue to process calls - even if there are connections on the wait queue. This simplifies things a bit and reduces the amount of checking we need do. There are a couple flags of relevance to the cache: (1) EXPOSED - The connection ID got exposed to the world. If this flag is set, an extra ref is added to the connection preventing it from being reaped when it has no calls outstanding. This flag is cleared and the ref dropped when a conn is discarded from the idle list. (2) DONT_REUSE - The connection should be discarded as soon as possible and should not be reused. This commit also provides a number of new settings: (*) /proc/net/rxrpc/max_client_conns The maximum number of live client connections. Above this number, new connections get added to the wait list and must wait for an active conn to be culled. Culled connections can be reused, but they will go to the back of the wait list and have to wait. (*) /proc/net/rxrpc/reap_client_conns If the number of desired connections exceeds the maximum above, the active connection list will be culled until there are only this many left in it. (*) /proc/net/rxrpc/idle_conn_expiry The normal expiry time for a client connection, provided there are fewer than reap_client_conns of them around. (*) /proc/net/rxrpc/idle_conn_fast_expiry The expedited expiry time, used when there are more than reap_client_conns of them around. Note that I combined the Tx wait queue with the channel grant wait queue to save space as only one of these should be in use at once. Note also that, for the moment, the service connection cache still uses the old connection management code. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24rxrpc: Dup the main conn list for the proc interfaceDavid Howells5-4/+11
The main connection list is used for two independent purposes: primarily it is used to find connections to reap and secondarily it is used to list connections in procfs. Split the procfs list out from the reap list. This allows us to stop using the reap list for client connections when they acquire a separate management strategy from service collections. The client connections will not be on a management single list, and sometimes won't be on a management list at all. This doesn't leave them floating, however, as they will also be on an rb-tree rooted on the socket so that the socket can find them to dispatch calls. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24rxrpc: Make /proc/net/rxrpc_calls saferDavid Howells4-9/+26
Make /proc/net/rxrpc_calls safer by stashing a copy of the peer pointer in the rxrpc_call struct and checking in the show routine that the peer pointer, the socket pointer and the local pointer obtained from the socket pointer aren't NULL before we use them. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-24rxrpc: Fix conn-based retransmitDavid Howells2-1/+2
If a duplicate packet comes in for a call that has just completed on a connection's channel then there will be an oops in the data_ready handler because it tries to examine the connection struct via a call struct (which we don't have - the pointer is unset). Since the connection struct pointer is available to us, go direct instead. Also, the ACK packet to be retransmitted needs three octets of padding between the soft ack list and the ackinfo. Fixes: 18bfeba50dfd0c8ee420396f2570f16a0bdbd7de ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor") Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Perform terminal call ACK/ABORT retransmission from conn processorDavid Howells4-4/+157
Perform terminal call ACK/ABORT retransmission in the connection processor rather than in the call processor. With this change, once last_call is set, no more incoming packets will be routed to the corresponding call or any earlier calls on that channel (call IDs must only increase on a channel on a connection). Further, if a packet's callNumber is before the last_call ID or a packet is aimed at successfully completed service call then that packet is discarded and ignored. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Calculate serial skew on packet receptionDavid Howells5-31/+43
Calculate the serial number skew in the data_ready handler when a packet has been received and a connection looked up. The skew is cached in the sk_buff's priority field. The connection highest received serial number is updated at this time also. This can be done without locks or atomic instructions because, at this point, the code is serialised by the socket. This generates more accurate skew data because if the packet is offloaded to a work queue before this is determined, more packets may come in, bumping the highest serial number and thereby increasing the apparent skew. This also removes some unnecessary atomic ops. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Set connection expiry on idle, not putDavid Howells2-27/+26
Set the connection expiry time when a connection becomes idle rather than doing this in rxrpc_put_connection(). This makes the put path more efficient (it is likely to be called occasionally whilst a connection has outstanding calls because active workqueue items needs to be given a ref). The time is also preset in the connection allocator in case the connection never gets used. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Use a tracepoint for skb accounting debuggingDavid Howells9-38/+79
Use a tracepoint to log various skb accounting points to help in debugging refcounting errors. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Drop channel number field from rxrpc_call structDavid Howells5-10/+8
Drop the channel number (channel) field from the rxrpc_call struct to reduce the size of the call struct. The field is redundant: if the call is attached to a connection, the channel can be obtained from there by AND'ing with RXRPC_CHANNELMASK. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: When clearing a socket, clear the call sets in the right orderDavid Howells1-6/+6
When clearing a socket, we should clear the securing-in-progress list first, then the accept queue and last the main call tree because that's the order in which a call progresses. Not that a call should move from the accept queue to the main tree whilst we're shutting down a socket, but it a call could possibly move from sequreq to acceptq whilst we're clearing up. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Tidy up the rxrpc_call struct a bitDavid Howells5-20/+21
Do a little tidying of the rxrpc_call struct: (1) in_clientflag is no longer compared against the value that's in the packet, so keeping it in this form isn't necessary. Use a flag in flags instead and provide a pair of wrapper functions. (2) We don't read the epoch value, so that can go. (3) Move what remains of the data that were used for hashing up in the struct to be with the channel number. (4) Get rid of the local pointer. We can get at this via the socket struct and we only use this in the procfs viewer. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-23rxrpc: Remove RXRPC_CALL_PROC_BUSYDavid Howells2-7/+0
Remove RXRPC_CALL_PROC_BUSY as work queue items are now 100% non-reentrant. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: Free packets discarded in data_readyDavid Howells1-0/+2
Under certain conditions, the data_ready handler will discard a packet. These need to be freed. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: Fix a use-after-push in data_ready handlerDavid Howells1-6/+11
Fix a use of a packet after it has been enqueued onto the packet processing queue in the data_ready handler. Once on a call's Rx queue, we mustn't touch it any more as it may be dequeued and freed by the call processor running on a work queue. Save the values we need before enqueuing. Without this, we can get an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 000000000000009c IP: [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc] PGD 0 Oops: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.7.0-fsdevel+ #1336 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff88040d6863c0 task.stack: ffff88040d68c000 RIP: 0010:[<ffffffffa01854e8>] [<ffffffffa01854e8>] rxrpc_fast_process_packet+0x724/0xa11 [af_rxrpc] RSP: 0018:ffff88041fb03a78 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff8803ff195b00 RCX: 0000000000000001 RDX: ffffffffa01854d1 RSI: 0000000000000008 RDI: ffff8803ff195b00 RBP: ffff88041fb03ab0 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88041fb038c8 R11: 0000000000000000 R12: ffff880406874800 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000009c CR3: 0000000001c14000 CR4: 00000000001406e0 Stack: ffff8803ff195ea0 ffff880408348800 ffff880406874800 ffff8803ff195b00 ffff880408348800 ffff8803ff195ed8 0000000000000000 ffff88041fb03af0 ffffffffa0186072 0000000000000000 ffff8804054da000 0000000000000000 Call Trace: <IRQ> [<ffffffffa0186072>] rxrpc_data_ready+0x89d/0xbae [af_rxrpc] [<ffffffff814c94d7>] __sock_queue_rcv_skb+0x24c/0x2b2 [<ffffffff8155c59a>] __udp_queue_rcv_skb+0x4b/0x1bd [<ffffffff8155e048>] udp_queue_rcv_skb+0x281/0x4db [<ffffffff8155ea8f>] __udp4_lib_rcv+0x7ed/0x963 [<ffffffff8155ef9a>] udp_rcv+0x15/0x17 [<ffffffff81531d86>] ip_local_deliver_finish+0x1c3/0x318 [<ffffffff81532544>] ip_local_deliver+0xbb/0xc4 [<ffffffff81531bc3>] ? inet_del_offload+0x40/0x40 [<ffffffff815322a9>] ip_rcv_finish+0x3ce/0x42c [<ffffffff81532851>] ip_rcv+0x304/0x33d [<ffffffff81531edb>] ? ip_local_deliver_finish+0x318/0x318 [<ffffffff814dff9d>] __netif_receive_skb_core+0x601/0x6e8 [<ffffffff814e072e>] __netif_receive_skb+0x13/0x54 [<ffffffff814e082a>] netif_receive_skb_internal+0xbb/0x17c [<ffffffff814e1838>] napi_gro_receive+0xf9/0x1bd [<ffffffff8144eb9f>] rtl8169_poll+0x32b/0x4a8 [<ffffffff814e1c7b>] net_rx_action+0xe8/0x357 [<ffffffff81051074>] __do_softirq+0x1aa/0x414 [<ffffffff810514ab>] irq_exit+0x3d/0xb0 [<ffffffff810184a2>] do_IRQ+0xe4/0xfc [<ffffffff81612053>] common_interrupt+0x93/0x93 <EOI> [<ffffffff814af837>] ? cpuidle_enter_state+0x1ad/0x2be [<ffffffff814af832>] ? cpuidle_enter_state+0x1a8/0x2be [<ffffffff814af96a>] cpuidle_enter+0x12/0x14 [<ffffffff8108956f>] call_cpuidle+0x39/0x3b [<ffffffff81089855>] cpu_startup_entry+0x230/0x35d [<ffffffff810312ea>] start_secondary+0xf4/0xf7 Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: Once packet posted in data_ready, don't retry postingDavid Howells1-5/+3
Once a packet has been posted to a connection in the data_ready handler, we mustn't try reposting if we then find that the connection is dying as the refcount has been given over to the dying connection and the packet might no longer exist. Losing the packet isn't a problem as the peer will retransmit. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: Don't access connection from call if pointer is NULLDavid Howells1-0/+4
The call state machine processor sets up the message parameters for a UDP message that it might need to transmit in advance on the basis that there's a very good chance it's going to have to transmit either an ACK or an ABORT. This requires it to look in the connection struct to retrieve some of the parameters. However, if the call is complete, the call connection pointer may be NULL to dissuade the processor from transmitting a message. However, there are some situations where the processor is still going to be called - and it's still going to set up message parameters whether it needs them or not. This results in a NULL pointer dereference at: net/rxrpc/call_event.c:837 To fix this, skip the message pre-initialisation if there's no connection attached. Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: Need to flag call as being released on connect failureDavid Howells1-0/+2
If rxrpc_new_client_call() fails to make a connection, the call record that it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed to rxrpc_put_call() to indicate that it no longer has any attachment to the AF_RXRPC socket. Without this, an assertion failure may occur at: net/rxrpc/call_object:635 Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-09rxrpc: fix uninitialized pointer dereference in debug codeArnd Bergmann1-0/+1
A newly added bugfix caused an uninitialized variable to be used for printing debug output. This is harmless as long as the debug setting is disabled, but otherwise leads to an immediate crash. gcc warns about this when -Wmaybe-uninitialized is enabled: net/rxrpc/call_object.c: In function 'rxrpc_release_call': net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized] The initialization was removed but one of the users remains. This adds back the initialization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and replying") Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-06rxrpc: Fix races between skb free, ACK generation and replyingDavid Howells7-46/+45
Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>