summaryrefslogtreecommitdiff
path: root/fs/ocfs2
AgeCommit message (Collapse)AuthorFilesLines
2012-03-14Treat writes as new when holes span across page boundariesGoldwyn Rodrigues1-0/+6
commit 272b62c1f0f6f742046e45b50b6fec98860208a0 upstream. When a hole spans across page boundaries, the next write forces a read of the block. This could end up reading existing garbage data from the disk in ocfs2_map_page_blocks. This leads to non-zero holes. In order to avoid this, mark the writes as new when the holes span across page boundaries. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: jlbec <jlbec@evilplan.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-06-26Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a ↵Tristan Ye1-2/+5
right number. commit acf3bb007e5636ef4c17505affb0974175108553 upstream. Current refcounttree codes actually didn't writeback the new pages out in write-back mode, due to a bug of always passing a ZERO number of clusters to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-06-26ocfs2_connection_find() returns pointer to bad structuredann frazier1-1/+1
commit 226291aa4641fa13cb5dec3bcb3379faa83009e2 upstream. If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause an oops when ocfs2_control_send_down() dereferences c->oc_conn: Call Trace: [<ffffffffa00c2a3c>] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user] [<ffffffffa00c2a95>] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user] [<ffffffff81143a88>] vfs_write+0xb8/0x1a0 [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0 [<ffffffff811442f1>] sys_write+0x51/0x80 [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b Fix by explicitly returning NULL if no match is found. Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-01-07ocfs2: Don't walk off the end of fast symlinks.Joel Becker1-1/+1
commit 1fc8a117865b54590acd773a55fbac9221b018f0 upstream. ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-01-07ocfs2: Fix incorrect checksum validation errorSunil Mushran1-1/+5
commit f5ce5a08a40f2086435858ddc80cb40394b082eb upstream. For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to read the inode off the disk. The latter first checks to see if that block is cached in the journal, and, if so, returns that block. That is ok. But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum of such blocks. Blocks that are cached in the journal may not have had their checksum computed as yet. We should not validate the checksums of such blocks. Fixes ossbz#1282 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Singed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2010-08-27Fix the nested PR lock calling issue in ACLJiaju Zhang1-3/+21
commit 845b6cf34150100deb5f58c8a37a372b111f2918 upstream. Hi, Thanks a lot for all the review and comments so far;) I'd like to send the improved (V4) version of this patch. This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2 and Samba integration using scenario, the symptom is several smbd processes will be hung under heavy workload. Finally we found out it is the nested PR lock calling that leads to this deadlock: node1 node2 gr PR | V PR(EX)---> BAST:OCFS2_LOCK_BLOCKED | V rq PR | V wait=1 After requesting the 2nd PR lock, the process "smbd" went into D state. It can only be woken up when the 1st PR lock's RO holder equals zero. There should be an ocfs2_inode_unlock in the calling path later on, which can decrement the RO holder. But since it has been in uninterruptible sleep, the unlock function has no chance to be called. The related stack trace is: smbd D ffff8800013d0600 0 9522 5608 0x00000000 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340 Call Trace: [<ffffffff80350425>] schedule_timeout+0x175/0x210 [<ffffffff8034f580>] wait_for_common+0xf0/0x210 [<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2] [<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2] [<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2] [<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2] [<ffffffff800e3507>] acl_permission_check+0x57/0xb0 [<ffffffff800e357d>] generic_permission+0x1d/0xc0 [<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2] [<ffffffff800e3f65>] inode_permission+0x45/0x100 [<ffffffff800d86b3>] sys_chdir+0x53/0x90 [<ffffffff80007458>] system_call_fastpath+0x16/0x1b [<00007f34a4ef6927>] 0x7f34a4ef6927 For details, please see: https://bugzilla.novell.com/show_bug.cgi?id=614332 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278 Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2/dlm: remove potential deadlock -V3Wengang Wang1-4/+0
commit b11f1f1ab73fd358b1b734a9427744802202ba68 upstream. When we need to take both dlm_domain_lock and dlm->spinlock, we should take them in order of: dlm_domain_lock then dlm->spinlock. There is pathes disobey this order. That is calling dlm_lockres_put() with dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at the ref and dlm_put() locks on dlm_domain_lock. Fix: Don't grab/put the dlm when the initialising/releasing lockres. That grab is not required because we don't call dlm_unregister_domain() based on refcount. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2/dlm: avoid incorrect bit set in refmap on recovery masterWengang Wang2-25/+31
commit a524812b7eaa7783d7811198921100f079034e61 upstream. In the following situation, there remains an incorrect bit in refmap on the recovery master. Finally the recovery master will fail at purging the lockres due to the incorrect bit in refmap. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. For mormal, re-send the deref message to it to recovery master can fix it. Well, ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And the later is simpler. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2: Count more refcount records in file system fragmentation.Tao Ma1-5/+15
commit 8a2e70c40ff58f82dde67770e6623ca45f0cb0c8 upstream. The refcount record calculation in ocfs2_calc_refcount_meta_credits is too optimistic that we can always allocate contiguous clusters and handle an already existed refcount rec as a whole. Actually because of file system fragmentation, we may have the chance to split a refcount record into 3 parts during the transaction. So consider the worst case in record calculation. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2 fix o2dlm dlm run purgelist (rev 3)Srinivas Eeda1-46/+34
commit 7beaf243787f85a2ef9213ccf13ab4a243283fde upstream. This patch fixes two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2/dlm: fix a dead lockWengang Wang1-3/+2
commit 6d98c3ccb52f692f1a60339dde7c700686a5568b upstream. When we have to take both dlm->master_lock and lockres->spinlock, take them in order lockres->spinlock and then dlm->master_lock. The patch fixes a violation of the rule. We can simply move taking dlm->master_lock to where we have dropped res->spinlock since when we access res->state and free mle memory we don't need master_lock's protection. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-27ocfs2: do not overwrite error codes in ocfs2_init_aclTiger Yang1-2/+7
commit 6eda3dd33f8a0ce58ee56a11351758643a698db4 upstream. Setting the acl while creating a new inode depends on the error codes of posix_acl_create_masq. This patch fix a issue of overwriting the error codes of it. Reported-by: Pawel Zawora <pzawora@gmail.com> Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ocfs2: make xattr extension work with new local alloc reservation.Tao Ma1-33/+49
commit a78f9f4668949a6588b8872f162e86685c63d023 upstream. The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ocfs2: When zero extending, do it by page.Joel Becker2-64/+84
commit a4bfb4cf11fd2211b788af59dc8a8b4394bca227 upstream. ocfs2_zero_extend() does its zeroing block by block, but it calls a function named ocfs2_write_zero_page(). Let's have ocfs2_write_zero_page() handle the page level. From ocfs2_zero_extend()'s perspective, it is now page-at-a-time. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ocfs2: No need to zero pages past i_size.Joel Becker1-4/+18
commit 693c241a5f6aa01417f5f4caf9f82e60e316398d upstream. When ocfs2 fills a hole, it does so by allocating clusters. When a cluster is larger than the write, ocfs2 must zero the portions of the cluster outside of the write. If the clustersize is smaller than a pagecache page, this is handled by the normal pagecache mechanisms, but when the clustersize is larger than a page, ocfs2's write code will zero the pages adjacent to the write. This makes sure the entire cluster is zeroed correctly. Currently ocfs2 behaves exactly the same when writing past i_size. However, this means ocfs2 is writing zeroed pages for portions of a new cluster that are beyond i_size. The page writeback code isn't expecting this. It treats all pages past the one containing i_size as left behind due to a previous truncate operation. Thankfully, ocfs2 calculates the number of pages it will be working on up front. The rest of the write code merely honors the original calculation. We can simply trim the number of pages to only cover the actual file data. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-04ocfs2: Avoid a gcc warning in ocfs2_wipe_inode().Joel Becker1-1/+1
gcc warns that a variable is uninitialized. It's actually handled, but an early return fools gcc. Let's just initialize the variable to a garbage value that will crash if the usage is ever broken. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-01ocfs2: Avoid direct write if we fall back to buffered I/OLi Dongyang1-11/+14
when we fall back to buffered write from direct write, we call __generic_file_aio_write() but that will end up doing direct write even we are only prepared to do buffered write because the file has the O_DIRECT flag set. This is a fix for https://bugzilla.novell.com/show_bug.cgi?id=591039 revised with Joel's comments. Signed-off-by: Li Dongyang <lidongyang@novell.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-01Merge branch 'skip_delete_inode' of ↵Joel Becker21-60/+84
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2-mark into ocfs2-fixes
2010-04-24ocfs2_dlmfs: Fix math error when reading LVB.Joel Becker1-1/+1
When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-24ocfs2: Update VFS inode's id info after reflink.Tao Ma1-0/+3
In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-24ocfs2: potential ERR_PTR dereference on error pathsDan Carpenter1-0/+1
If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-23ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod()Mark Fasheh1-15/+25
If we get a failure during creation of an inode we'll allow the orphan code to remove the inode, which is correct. However, we need to ensure that we don't get any errors after the call to ocfs2_add_entry(), otherwise we could leave a dangling directory reference. The solution is simple - in both cases, all I had to do was move ocfs2_dentry_attach_lock() above the ocfs2_add_entry() call. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-04-23ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error pathLi Dongyang1-5/+11
Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod, so we can kill the inode in case of error. [ Fixed up comment style -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-04-23ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error pathLi Dongyang1-0/+1
Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR when we get an error after allocating one, so that we can kill the inode. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-04-23ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe codeLi Dongyang3-29/+39
Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-04-16ocfs2: Reset status if we want to restart file extension.Tao Ma1-0/+1
In __ocfs2_extend_allocation, we will restart our file extension if ((!status) && restart_func). But there is a bug that the status is still left as -EGAIN. This is really an old bug, but it is masked by the return value of ocfs2_journal_dirty. So it show up when we make ocfs2_journal_dirty void. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-01ocfs2: Compute metaecc for superblocks during online resize.Joel Becker1-0/+2
Online resize writes out the new superblock and its backups directly. The metaecc data wasn't being recomputed. Let's do that directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>[ Cc: stable@kernel.org
2010-03-30ocfs2: Check the owner of a lockres inside the spinlockWengang Wang1-3/+2
The checking of lockres owner in dlm_update_lvb() is not inside spinlock protection. I don't see problem in current call path of dlm_update_lvb(). But just for code robustness. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-30ocfs2: one more warning fix in ocfs2_file_aio_write(), v2Coly Li1-3/+3
This patch fixes another compiling warning in ocfs2_file_aio_write() like this, fs/ocfs2/file.c: In function ‘ocfs2_file_aio_write’: fs/ocfs2/file.c:2026: warning: suggest parentheses around ‘&&’ within ‘||’ As Joel suggested, '!ret' is unary, this version removes the wrap from '!ret'. Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-30ocfs2_dlmfs: User DLM_* when decoding file open flags.Tao Ma1-6/+6
In commit 0016eedc4185a3cd7e578b027a6e69001b85d6c4, we have changed dlmfs to use stackglue. So when use DLM* when we decode dlm flags from open level. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo19-11/+8
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24ocfs2: Fix a race in o2dlm lockres masterySrinivas Eeda1-3/+1
In o2dlm, the master of a lock resource keeps a map of all interested nodes. This prevents the master from purging the resource before an interested node can create a lock. A race between the mastery thread and the mastery handler allowed an interested node to discover who the master is without informing the master directly. This is easily fixed by holding the dlm spinlock a little longer in the mastery handler. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-24Ocfs2: Handle deletion of reflinked oprhan inodes correctly.Tristan Ye1-0/+15
The rule is that all inodes in the orphan dir have ORPHANED_FL, otherwise we treated it as an ERROR. This rule works well except for some rare cases of reflink operation: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1215 The problem is caused by how reflink and our orphan_scan thread interact. * The orphan scan pulls the orphans into a queue first, then runs the queue at a later time. We only hold the orphan_dir's lock during scanning. * Reflink create a oprhaned target in orphan_dir as its first step. It removes the target and clears the flag as the final step. These two steps take the orphan_dir's lock, but it is not held for the duration. Based on the above semantics, a reflink inode can be moved out of the orphan dir and have its ORPHANED_FL cleared before the queue of orphans is run. This leads to a ERROR in ocfs2_query_wipde_inode(). This patch teaches ocfs2_query_wipe_inode() to detect previously orphaned reflink targets. If a reflink fails or a crash occurs during the relfink operation, the inode will retain ORPHANED_FL and will be properly wiped. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-24Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir.Tristan Ye1-5/+23
Currently, some callers were missing to journal the dirty inode after adding it to orphan dir. Now we're going to journal such modifications within the ocfs2_orphan_add() itself, It's safe to do so, though some existing caller may duplicate this, and it makes the logic look more straightforward anyway. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-24ocfs2: Clear undo bits when local alloc is freedMark Fasheh4-46/+95
When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-20ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block.Tao Ma1-6/+4
You can't store a pointer that you haven't filled in yet and expect it to work. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-20ocfs2: Fix the update of name_offset when removing xattrsTao Ma1-1/+1
When replacing a xattr's value, in some case we wipe its name/value first and then re-add it. The wipe is done by ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or block. We currently adjust name_offset for all the entries which have (offset < name_offset). This does not adjust the entrie we're replacing. Since we are replacing the entry, we don't adjust the total entry count. When we calculate a new namevalue location, we trust the entries now-wrong offset in ocfs2_xa_get_free_start(). The solution is to also adjust the name_offset for the replaced entry, allowing ocfs2_xa_get_free_start() to calculate the new namevalue location correctly. The following script can trigger a kernel panic easily. echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE=$MNT_DIR/$RANDOM for((i=0;i<76;i++)) do string_76="a$string_76" done string_78="aa$string_76" string_82="aaaa$string_78" touch $FILE setfattr -n 'user.test1234567890' -v $string_76 $FILE setfattr -n 'user.test1234567890' -v $string_78 $FILE setfattr -n 'user.test1234567890' -v $string_82 $FILE Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-18ocfs2: Always try for maximum bits with new local alloc windowsMark Fasheh1-2/+2
What we were doing before was to ask for the current window size as the maximum allocation. This had the effect of limiting the amount of allocation we could get for the local alloc during times when the window size was shrunk due to fragmentation. In some cases, that could actually *increase* fragmentation by artificially limiting the number of bits we can accept. So while we still want to ask for a minimum number of bits equal to window size, there is no reason why we should limit the number of bits the local alloc should accept. Hence always allow the maximum number of local alloc bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: set i_mode on disk during acl operationsMark Fasheh1-5/+72
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory inode, but never setting it on the disk copy. Thus, acls were some times not getting propagated between nodes. This patch fixes the issue by adding a helper function ocfs2_acl_set_mode() which does this the right way. ocfs2_set_acl() and ocfs2_init_acl() are then updated to call ocfs2_acl_set_mode(). Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: Update i_blocks in reflink operations.Tao Ma1-0/+1
In reflink, we need to upate i_blocks for the target inode. Reported-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17ocfs2: Change bg_chain check for ocfs2_validate_gd_parent.Tao Ma1-5/+8
In ocfs2_validate_gd_parent, we check bg_chain against the cl_next_free_rec of the dinode. Actually in resize, we have the chance of bg_chain == cl_next_free_rec. So add some additional condition check for it. I also rename paramter "clean_error" to "resize", since the old one is not clearly enough to indicate that we should only meet with this case in resize. btw, the correpsonding bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-17[PATCH] Skip check for mandatory locks when unlockingSachin Prabhu1-1/+1
ocfs2_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. ocfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-13Merge branch 'for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-13fs/ocfs2/cluster/tcp.c: remove use of NIPQUAD, use %pI4Joe Perches1-2/+2
Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-08Merge branch 'for-next' into for-linusJiri Kosina2-2/+2
Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c
2010-03-08Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy1-1/+1
Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-06bitops: rename for_each_bit() to for_each_set_bit()Akinobu Mita1-1/+1
Rename for_each_bit to for_each_set_bit in the kernel source tree. To permit for_each_clear_bit(), should that ever be added. The patch includes a macro to map the old for_each_bit() onto the new for_each_set_bit(). This is a (very) temporary thing to ease the migration. [akpm@linux-foundation.org: add temporary for_each_bit()] Suggested-by: Alexey Dobriyan <adobriyan@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06Merge branch 'for_linus' of ↵Linus Torvalds8-77/+71
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05dquot: cleanup dquot initialize routineChristoph Hellwig5-12/+11
Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: move dquot initialization responsibility into the filesystemChristoph Hellwig3-0/+18
Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>