summaryrefslogtreecommitdiff
path: root/drivers/block/rbd.c
AgeCommit message (Collapse)AuthorFilesLines
2013-05-02rbd: make rbd spec names pointer to constAlex Elder1-7/+9
Make the names and image id in an rbd_spec be pointers to constant data. This required the use of a local variable to hold the snapshot name in rbd_add_parse_args() to avoid a warning. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: set snapshot id in rbd_dev_probe_update_spec()Alex Elder1-4/+22
Set the rbd spec's snapshot id for an image getting mapped in rbd_dev_probe_update_spec() rather than rbd_dev_set_mapping(). This is the more logical place for that to happen (even though it means we might look up the snapshot by name twice). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: have snap_by_name() return a snapshotAlex Elder1-20/+15
A function called snap_by_name() ought to just look up a snapshot by name. It does that, but then it assigns some stuff to the rbd device structure as well. Change the function to do just the lookup, and have the caller do the assignments that follow. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: fix image id leak in initial probeAlex Elder1-5/+9
If a format 2 image id is found for an image being mapped, but the subsequent probe of the image fails, rbd_dev_probe() quits without freeing the image id. Fix that. Also drop a redundant hunk of code in rbd_dev_image_id(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: have rbd_dev_image_id() set format 1 image idAlex Elder1-30/+32
Currently, rbd_dev_probe() assumes that any error returned by rbd_dev_image_id() is most likely -ENOENT, and responds by calling the format 1 probe routine, rbd_dev_v1_probe(). Then, at the top of rbd_dev_v1_probe(), an empty string is allocated for the image id. This is sort of unbalanced. Fix this by having rbd_dev_image_id() look for -ENOENT from its "get_id" method call. If that is seen, have it allocate the empty string there rather than depending on rbd_dev_v1_probe() to do it. Given that this is effectively defining the format of the image, set rbd_dev->image_format inside rbd_dev_image_id() rather than in the format-specific probe routines. Also drop a redundant hunk of code in rbd_dev_image_id(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: avoid dropping extra reference in rbd_free_disk()Alex Elder1-3/+5
I found during some failure injection testing that the call to rbd_free_disk() in the error path of rbd_dev_probe_finish() was dropping an extra reference to the disk queue. The problem occurred when put_disk tried to drop a reference to the disk's queue. A call to blk_cleanup_queue() just prior to that will have also dropped a reference to the queue. The problem is that the reference dropped by put_disk() is assumed to have been taken by add_disk(). Our code has error paths that can occur after the disk and its queue are initialized, but before the call to add_disk(), and in those paths we won't have that extra reference. The fix is easy though. In rbd_free_disk() we're already checking the disk's GENHD_FL_UP flag. That flag is an indication that add_disk() has been called, so just call blk_cleanup_queue() conditional on that flag being set. This resolves: http://tracker.ceph.com/issues/4800 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: use rbd_obj_method_sync() return valueAlex Elder1-13/+12
Now that rbd_obj_method_sync() returns the number of bytes returned by the method call, that value should be used by callers to ensure we don't overrun the valid portion of the buffer. Fix the two spots that remained that weren't doing that, rbd_dev_image_name() and rbd_dev_v2_snap_name(). Rearrange the error path slightly in rbd_dev_v2_snap_name(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: fix leak of format 2 snapshot namesAlex Elder1-16/+14
When the snapshot context for an rbd device gets updated (or the initial one is recorded) a a list of snapshot structures is created to represent them, one entry per snapshot. Each entry includes a dynamically-allocated copy of the snapshot name. Currently the name is allocated in rbd_snap_create(), as a duplicate of the passed-in name. For format 1 images, the snapshot name provided is just a pointer to an existing name. But for format 2 images, the passed-in name is already dynamically allocated, and in the the process of duplicating it here we are leaking the passed-in name. Fix this by dynamically allocating the name for format 1 snapshots also, and then stop allocating a duplicate in rbd_snap_create(). Change rbd_dev_v1_snap_info() so none of its parameters is side-effected unless it's going to return success. This is part of: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: rename __rbd_add_snap_dev()Alex Elder1-8/+11
Rename __rbd_add_snap_dev() to be rbd_snap_create(). We no longer have devices for non-mapped snapshots, and we're not actually "adding" it to the list in this function, just creating it. Rename rbd_remove_snap_dev() to be rbd_snap_destroy() for reasons similar to the above. Stop having this function delete the snapshot from its list (to be symmetrical with its create counterpart) and do that in the caller instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: only update values on snap_info successAlex Elder1-5/+19
Change rbd_dev_v2_snap_info() so it only ever sets values of the size and features parameters if looking up the snapshot name was successful. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: make snap_size order parameter optionalAlex Elder1-3/+3
Only one of the two callers of _rbd_dev_v2_snap_size() needs the order value returned. So make that an optional argument--a null pointer if the caller doesn't need it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: fix leak of snapshots during initial probeAlex Elder1-20/+30
When an rbd image is initially mapped, its snapshot context is collected, and then a list of snapshot entries representing the snapshots in that context is created. The list is created using rbd_dev_snaps_update(). (This function also supports updating an existing snapshot list based on a new snapshot context.) If an error occurs, updating the list is aborted, and the list is currently left as-is, in an inconsistent state. At that point, there may be a partially-constructed list, but the calling functions (rbd_dev_probe_finish() from rbd_dev_probe() from rbd_add()) never clean them up. So this constitutes a leak. A snapshot list that is inconsistent with the current snapshot context is of no use, and might even be actively bad. So rather than just having the caller clean it up, have rbd_dev_snaps_update() just clear out the entire snapshot list in the event an error occurs. The other place rbd_dev_snaps_update() is used is when a refresh is triggered, either because of a watch callback or via a write to the /sys/bus/rbd/devices/<id>/refresh interface. An error while updating the snapshots has no substantive effect in either of those cases, but one of them issues a warning. Move that warning to the common rbd_dev_refresh() function so it gets issued regardless of how it got initiated. This is part of: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: don't create sysfs entries for non-mapped snapshotsAlex Elder1-133/+4
When an rbd image gets mapped a device entry gets created for it under /sys/bus/rbd/devices/<id>/. Inside that directory there are sysfs files that contain information about the image: its size, feature bits, major device number, and so on. Additionally, if that image has any snapshots, a device entry gets created for each of those as a "child" of the mapped device. Each of these is a subdirectory of the mapped device, and each directory contains a few files with information about the snapshot (its snapshot id, size, and feature mask). There is no clear benefit to having those device entries for the snapshots. The information provided via sysfs of of little real value--and all of it is available via rbd CLI commands. If we still wanted to see the kernel's view of this information it could be done much more simply by including it in a single sysfs file for the mapped image. But there *is* a clear cost to supporting them. Every time a snapshot context changes, these entries need to be updated (deleted snapshots removed, new snapshots created). The rbd driver is notified of changes to the snapshot context via callbacks from an osd, and care must be taken to coordinate removal of snapshot data structures with the possibility of one these notifications occurring. Things would be considerably simpler if we just didn't have to maintain device entries for the snapshots. So get rid of them. The ability to map a snapshot of an rbd image will remain; the only thing lost will be the ability to query these sysfs directories for information about snapshots of mapped images. This resolves: http://tracker.ceph.com/issues/4796 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: activate support for layered imagesAlex Elder1-1/+3
Now that we have most everything in place to support layered rbd images, enable support for them in the kernel client. Issue a warning to the log that the support is considered experimental whenever a format 2 layered image is mapped. Note that we also have to claim to support the STRIPINGV2 feature, due to a mistake in the way the rbd CLI set up those flags. This feature can work if it has the right parameters, and safeguards have been put in place to reject those images that do not have compatible parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: get and check striping parametersAlex Elder1-0/+61
If an rbd format 2 image indicates it supports the STRIPINGV2 feature we need to find out its stripe unit and stripe count in order to know whether we can use it. We don't yet support fancy striping fully, but if the default parameters are used the behavior is indistinguishible from non-fancy striping. This is necessary because some images require the STRIPINGV2 feature even if they use the default parameters. (Which is to say the feature bit was erroneously set even if the feature was not used.) This resolves: http://tracker.ceph.com/issues/4709 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: have rbd_obj_method_sync() return transfer countAlex Elder1-27/+33
Callers of rbd_obj_method_sync() don't know how many bytes of data got returned by the class method call. As a result, they have been assuming enough got returned to decode whatever was expected. This isn't safe. We know how many bytes got transferred, so have rbd_obj_method_sync() return that amount (rather than just 0) if the call is successful. Change all callers to use this return value to ensure decoding of the results is done safely. On the other hand, most callers of rbd_obj_method_sync() only indicate success or failure, so all of *their* callers can simply test for non-zero result. This resolves: http://tracker.ceph.com/issues/4773 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: void data pointers for rbd_obj_method_sync()Alex Elder1-24/+20
Make the inbound and outbound data parameters have void rather than character type for rbd_obj_method_sync(). This makes it more clear they don't expect typed data, and eliminates the need for some silly type casts. One more unrelated change: define the features buffer used in _rbd_dev_v2_snap_features() to be a packed data structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: give rbd_obj_read_sync() buffer void typeAlex Elder1-3/+2
Make the buf parameter into which the data is to be read have type void pointer. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: enforce parent overlapAlex Elder1-10/+54
A clone image has a defined overlap point with its parent image. That is the byte offset beyond which the parent image has no defined data to back the clone, and anything thereafter can be viewed as being zero-filled by the clone image. This is needed because a clone image can be resized. If it gets resized larger than the snapshot it is based on, the overlap defines the original size. If the clone gets resized downward below the original size the new clone size defines the overlap. If the clone is subsequently resized to be larger, the overlap won't be increased because the previous resize invalidated any parent data beyond that point. This resolves: http://tracker.ceph.com/issues/4724 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: issue a copyup for layered writesAlex Elder1-12/+137
This implements the main copyup functionality for layered writes. Here we add a copyup_pages field to the object request, which is used only for copyup requests to keep track of the page array containing data read from the parent image. A copyup request is currently the only request rbd has that requires two osd operations. Because of this we handle copyup specially. All image object requests get an osd request allocated when they are created. For a write request, if a copyup is required, the osd request originally allocated is released, and a new one (with room for two osd ops) is allocated to replace it. A new function rbd_osd_req_create_copyup() allocates an osd request suitable for a copyup request. The first op is then filled with a copyup object class method call, supplying the array of pages containing data read from the parent. The second op is filled in with the original write request. The original request otherwise remains intact, and it describes the original write request (found in the second osd op). The presence of the copyup op is sort of implicit; a non-null copyup_pages field could be used to distinguish between a "normal" write request and a request containing both a copyup call and a write. This resolves: http://tracker.ceph.com/issues/3419 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: implement full object parent readsAlex Elder1-9/+143
As a step toward implementing layered writes, implement reading the data for a target object from the parent image for a write request whose target object is known to not exist. Add a copyup_pages field to an image request to track the page array used (only) for such a request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: revalidate_disk upon rbd resizeLaurent Barbe1-0/+1
If rbd disk is open and rbd resize is done, new size is not visible by filesystem. Like is done in virtio-blk and dm driver, revalidate_disk() permits to update the bd_inode size. Signed-off-by: Laurent Barbe <laurent@ksperis.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-05-02rbd: support page array image requestsAlex Elder1-20/+66
This patch adds the ability to build an image request whose data will be written from or read into memory described by a page array. (Previously only bio lists were supported.) Originally this was going to define a new function for this purpose but it was largely identical to the rbd_img_request_fill_bio(). So instead, rbd_img_request_fill_bio() has been generalized to handle both types of image request. For the moment we still only fill image requests with bio data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define zero_pages()Alex Elder1-8/+47
Define a new function zero_pages() that zeroes a range of memory defined by a page array, along the lines of zero_bio_chain(). It saves and the irq flags like bvec_kmap_irq() does, though I'm not sure at this point that it's necessary. Update rbd_img_obj_request_read_callback() to use the new function if the object request contains page rather than bio data. For the moment, only bio data is used for osd READ ops. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: encapsulate submission of image object requestsAlex Elder1-22/+43
Object requests that are part of an image request are subject to some additional handling. Define rbd_img_obj_request_submit() to encapsulate that, and use it when initially submitting an image object request, and when re-submitting it during callback of an object existence check. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define separate read and write format funcsAlex Elder1-21/+28
Separate rbd_osd_req_format() into two functions, one for read requests and the other for write requests. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: issue stat request before layered writeAlex Elder1-8/+155
This is a step toward fully implementing layered writes. Add checks before request submission for the object(s) associated with an image request. For write requests, if we don't know that the target object exists, issue a STAT request to find out. When that request completes, mark the known and exists flags for the original object request accordingly and re-submit the object request. (Note that this still does the existence check only; the copyup operation is not yet done.) A new object request is created to perform the existence check. A pointer to the original request is added to that object request to allow the stat request to re-issue the original request after updating its flags. If there is a failure with the stat request the error code is stored with the original request, which is then completed. This resolves: http://tracker.ceph.com/issues/3418 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: add target object existence flagsAlex Elder1-0/+37
This creates two new flags for object requests to indicate what is known about the existence of the object to which a request is to be sent. The KNOWN flag will be true if the the EXISTS flag is meaningful. That is: KNOWN EXISTS ----- ------ 0 0 don't know whether the object exists 0 1 (not used/invalid) 1 0 object is known to not exist 1 0 object is known to exist This will be used in determining how to handle write requests for data objects for layered rbd images. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: always check IMG_DATA flagAlex Elder1-21/+30
In a few spots, whether the an object request's img_request pointer is null is used to determine whether an object request is being done as part of an image data request. Stop doing that, and instead always use the object request IMG_DATA flag for that purpose. Swap the order of the definition of the IMG_DATA and DONE flag helpers, because obj_request_done_set() now refers to obj_request_img_data_set() to get its rbd_dev value. This will become important because the img_request pointer is about to become part of a union. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: adjust image object request ref countingAlex Elder1-7/+1
An extra reference is taken when an object request is added as one of the requests making up an image object. A reference is dropped again when the image's object requests get submitted. The original reference for the object request will remain throughout this period, so we don't need to add and then take away an extra one. This can be interpreted as the image request inheriting the original object request's reference. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: kill off osd data write_request parametersAlex Elder1-2/+2
In the incremental move toward supporting distinct data items in an osd request some of the functions had "write_request" parameters to indicate, basically, whether the data belonged to in_data or the out_data. Now that we maintain the data fields in the op structure there is no need to indicate the direction, so get rid of the "write_request" parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: implement layered readsAlex Elder1-12/+85
Implement layered read requests for format 2 rbd images. If an rbd image is a clone of a snapshot, the snapshot will be the clone's "parent" image. When an object read request on a clone comes back with ENOENT it indicates that the clone is not yet populated with that portion of the image's data, and the parent image should be consulted to satisfy the read. When this occurs, a new image request is created, directed to the parent image. The offset and length of the image are the same as the image-relative offset and length of the object request that produced ENOENT. Data from the parent image therefore satisfies the object read request for the original image request. While this code works, it will not be active until we enable the layering feature (by adding RBD_FEATURE_LAYERING to the value of RBD_FEATURES_SUPPORTED). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: probe the parent of an image if presentAlex Elder1-5/+76
Call the probe function for the parent device if one is present. Since we don't formally support the layering feature we won't be using this functionality just yet. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: add an object request flag for image data objectsAlex Elder1-4/+33
Add a flag to distinguish between object requests being done on standalone objects and requests being sent for objects representing rbd image data (i.e., object requests that are the result of image request). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define an rbd object request flags fieldAlex Elder1-29/+29
We're going to need some more Boolean values for object requests, so create a flags bit field and use it to record whether the request is done. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: encapsulate image object end request handlingAlex Elder1-25/+29
Encapsulate the code that completes processing of an object request that's part of an image request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define image request layered flagAlex Elder1-0/+16
Define a flag indicating whether an image request is for a layered image (one with a parent image to which requests will be redirected if the target object of a request does not exist). The code that checks this flag will be added shortly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define image request originator flagAlex Elder1-5/+26
Define a flag indicating whether an image request originated from the Linux block layer (from blk_fetch_request()) or whether it was initiated in order to satisfy an object request for a child image of a layered rbd device. For image requests initiated by objects of child images we'll save a pointer to the object request rather than the Linux block request. For now, only block requests are used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: define image request flagsAlex Elder1-9/+35
There are several Boolean values we'll be maintaining for image requests. Switch from the single write_request field to a general-purpose flags field, and use one if its bits to represent the direction of I/O for the image request. Define helper functions for setting and testing that flag. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: record image-relative offset in object requestsAlex Elder1-10/+19
For an image object request we will need to know what offset within the rbd image the request covers. Record that when the object request gets created. Update the I/O error warnings so they use this so what's reported is more informative. Rename a local variable to fit the convention used everywhere else. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: record aggregate image transfer countAlex Elder1-0/+18
Compute the total number of bytes transferred for an image request--the sum across each of the request's object requests. To avoid contention do it only when all object requests are complete, in rbd_img_request_complete(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: record overall image request resultAlex Elder1-4/+9
If any image object request produces a non-zero result, preserve that as the result of the overall image request. If multiple objects have non-zero results, save only the first one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: update feature bitsAlex Elder1-3/+6
There is a new rbd feature bit defined for "fancy striping." Add it to the ones defined in the kernel client. Change RBD_FEATURES_ALL so it represents the set of all feature bits (rather than just the ones we support). Define a new symbol RBD_FEATURES_SUPPORTED to indicate the supported ones. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: make method call data be a separate data itemAlex Elder1-2/+13
Right now the data for a method call is specified via a pointer and length, and it's copied--along with the class and method name--into a pagelist data item to be sent to the osd. Instead, encode the data in a data item separate from the class and method names. This will allow large amounts of data to be supplied to methods without copying. Only rbd uses the class functionality right now, and when it really needs this it will probably need to use a page array rather than a page list. But this simple implementation demonstrates the functionality on the osd client, and that's enough for now. This resolves: http://tracker.ceph.com/issues/4104 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: combine initializing and setting osd dataAlex Elder1-14/+6
This ends up being a rather large patch but what it's doing is somewhat straightforward. Basically, this is replacing two calls with one. The first of the two calls is initializing a struct ceph_osd_data with data (either a page array, a page list, or a bio list); the second is setting an osd request op so it associates that data with one of the op's parameters. In place of those two will be a single function that initializes the op directly. That means we sort of fan out a set of the needed functions: - extent ops with pages data - extent ops with pagelist data - extent ops with bio list data and - class ops with page data for receiving a response We also have define another one, but it's only used internally: - class ops with pagelist data for request parameters Note that we *still* haven't gotten rid of the osd request's r_data_in and r_data_out fields. All the osd ops refer to them for their data. For now, these data fields are pointers assigned to the appropriate r_data_* field when these new functions are called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: rearrange some code for consistencyAlex Elder1-66/+62
This patch just trivially moves around some code for consistency. In preparation for initializing osd request data fields in ceph_osdc_build_request(), I wanted to verify that rbd did in fact call that immediately before it called ceph_osdc_start_request(). It was true (although image requests are built in a group and then started as a group). But I made the changes here just to make it more obvious, by making all of the calls follow a common sequence: osd_req_op_<optype>_init(); ceph_osd_data_<type>_init() osd_req_op_<optype>_<datafield>() rbd_osd_req_format() ... ret = rbd_obj_request_submit() I moved the initialization of the callback for image object requests into rbd_img_request_fill_bio(), again, for consistency. To avoid a forward reference, I moved the definition of rbd_img_obj_callback() up in the file. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: separate initialization of osd dataAlex Elder1-19/+8
The osd data for a request is currently initialized inside rbd_osd_req_create(), but that assumes an object request's data belongs in the osd request's data in or data out field. There are only three places where requests with data are set up, and it turns out it's easier to call just the osd data init routines directly there rather than handling it in rbd_osd_req_create(). (The real motivation here is moving toward getting rid of the osd request in and out data fields.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02rbd: don't set data in rbd_osd_req_format_op()Alex Elder1-30/+25
Currently an object request has its osd request's data field set in rbd_osd_req_format_op(). That assumes a single osd op per object request, and that won't be the case for long. Move the code that sets this out and into the caller. Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(), removing the notion that it's doing anything op-specific. This and the next patch resolve: http://tracker.ceph.com/issues/4658 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: specify osd op by index in requestAlex Elder1-20/+15
An osd request now holds all of its source op structures, and every place that initializes one of these is in fact initializing one of the entries in the the osd request's array. So rather than supplying the address of the op to initialize, have caller specify the osd request and an indication of which op it would like to initialize. This better hides the details the op structure (and faciltates moving the data pointers they use). Since osd_req_op_init() is a common routine, and it's not used outside the osd client code, give it static scope. Also make it return the address of the specified op (so all the other init routines don't have to repeat that code). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02libceph: add data pointers in osd op structuresAlex Elder1-4/+20
An extent type osd operation currently implies that there will be corresponding data supplied in the data portion of the request (for write) or response (for read) message. Similarly, an osd class method operation implies a data item will be supplied to receive the response data from the operation. Add a ceph_osd_data pointer to each of those structures, and assign it to point to eithre the incoming or the outgoing data structure in the osd message. The data is not always available when an op is initially set up, so add two new functions to allow setting them after the op has been initialized. Begin to make use of the data item pointer available in the osd operation rather than the request data in or out structure in places where it's convenient. Add some assertions to verify pointers are always set the way they're expected to be. This is a sort of stepping stone toward really moving the data into the osd request ops, to allow for some validation before making that jump. This is the first in a series of patches that resolve: http://tracker.ceph.com/issues/4657 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>