summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)AuthorFilesLines
2018-01-18gfs2: Turn trunc_dealloc into punch_holeAndreas Gruenbacher1-59/+120
Add an upper bound to the range of blocks to deallocate blocks to function trunc_dealloc so that this function can be used for truncating a file as well as for punching a hole into a file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18gfs2: Generalize truncate codeAndreas Gruenbacher1-47/+75
Pull the code for computing the range of metapointers to iterate out of gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating metapointers within a block), and trunc_dealloc (for walking the metadata tree). In sweep_bh_for_rgrps, move the code for looking up the resource group descriptor of the current resource group out of the inner loop. The metatype check moves to trunc_dealloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17Turn gfs2_block_truncate_page into gfs2_block_zero_rangeAndreas Gruenbacher1-8/+10
Turn gfs2_block_truncate_page into a function that zeroes a range within a block rather than only the end of a block. This will be used for cleaning the end of the first partial block and the start of the last partial block when punching a hole in a file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Improve non-recursive delete algorithmAndreas Gruenbacher1-20/+31
In rare cases, the current non-recursive delete algorithm doesn't deallocate empty intermediary indirect blocks. This should have very little practical effect, but deallocating all blocks correctly should still be preferable as it is cleaner and easier to validate. The fix consists of using the first block to deallocate to compute the start marker of the truncate point instead of the last block that needs to be kept. With that change, computing which indirect blocks are still needed becomes relatively easy. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Fix metadata read-ahead during truncateAndreas Gruenbacher1-17/+25
The metadata read-ahead algorithm broke when switching from recursive to non-recursive delete: the current algorithm reads ahead blocks at height N - 1 while deallocating the blocks at hight N. However, deallocating the blocks at height N requires a complete walk of the metadata tree, not only down to height N - 1. Consequently, all blocks below height N - 1 will be accessed without read-ahead. Fix this by issuing read-aheads as early as possible, after each metapath lookup. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Clean up {lookup,fillup}_metapathAndreas Gruenbacher1-44/+30
Split out the entire lookup loop from lookup_metapath and fillup_metapath. Make both functions return the actual height in mp->mp_aheight, and return 0 on success. Handle lookup errors properly in trunc_dealloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Remove minor gfs2_journaled_truncate inefficienciesAndreas Gruenbacher1-0/+13
First, this function truncates the file in chunks. When the original file size isn't block aligned, each chunk that is truncated will remain be misaligned. This is inefficient. Second, this function doesn't recognize where holes are, so it loops through them. For each chunk of a hole, it creates a new transaction. At least avoid creating another transactions whe the current one is still empty. (An better fix would be to skip large holes, of course.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: truncate: Remove unnecessary oldsize parametersAndreas Gruenbacher1-10/+7
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Clean up trunc_start error pathAndreas Gruenbacher1-10/+5
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Remove pointless BUG_ONAndreas Gruenbacher1-1/+0
The current transaction is being dereferenced before asserting that is not NULL; that isn't going to help. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17gfs2: Add gfs2_blk2rgrpd comment and fix incorrect useSteven Whitehouse2-1/+8
Document when to use gfs2_blk2rgrpd for "inexact" resource group matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in sweep_bh_for_rgrps. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22gfs2: Trim the ordered write list in gfs2_ordered_write()Abhi Das1-2/+5
We iterate through the entire ordered writes list in gfs2_ordered_write() to write out inodes. It's a good place to try and shrink the list by throwing out inodes that don't have any pages. Signed-off-by: Abhi Das <adas@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22GFS2: Reduce code redundancy writing log headersBob Peterson3-64/+43
Before this patch, there was a lot of code redundancy between functions log_write_header (which uses bio) and clean_journal (which uses buffer_head). This patch reduces the redundancy to simplify the code and make log header writing more consistent. We want more consistency and reduced redundancy because we plan to add a bunch of new fields to improve performance (by eliminating the local statfs and quota files) improve metadata integrity (by adding new crcs and such) and for better debugging (by adding new fields to track when and where metadata was pushed through the journals.) We don't want to duplicate setting these new fields, nor allow for human error in the process. This reduction in code redundancy is accomplished by introducing a new helper function, gfs2_write_log_header which uses bio rather than bh. That simplifies recovery function clean_journal() to use the new helper function and iomap rather than redundancy and block_map (and eventually we can maybe remove block_map). It also reduces our dependency on buffer_heads. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add a crc field to resource group headersAndrew Price1-0/+5
Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This allows us to check resource group headers' integrity and removes the requirement to check them against the rindex entries in fsck. If this field is found to be zero, it should be ignored (or updated with an accurate value). Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add rindex fields to rgrp headersAndrew Price1-0/+5
Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields are identical to their counterparts in struct gfs2_rindex and are intended to reduce the use of the rindex. For now the fields are only written back as the in-memory equivalents in struct gfs2_rgrpd are set using values from the rindex. However, they are needed at this point so that userspace can make use of them, allowing a migration away from the rindex over time. The new fields take up previously reserved space which was explicitly zeroed on write so, in clusters with mixed kernels, these fields could get zeroed after being set and this should not be treated as an error. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12gfs2: Add a next-resource-group pointer to resource groupsAndrew Price1-1/+5
Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The rg_skip field has the following meaning: - If rg_skip is zero, it is considered unset and not useful. - If rg_skip is non-zero, its value will be the number of blocks between this rgrp's address and the next rgrp's address. This can be used as a hint by fsck.gfs2 when rebuilding a bad rindex, for example. This will provide less dependency on the rindex in future, and allow tools such as fsck.gfs2 to iterate the resource groups without keeping the rindex around. The field is updated in gfs2_rgrp_out() so that existing file systems will have it set. This means that any resource groups that aren't ever written will not be updated. The final rgrp is a special case as there is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is extended). Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so the rg_skip field can get set back to 0 in cases where nodes with and without this patch are mixed in a cluster. In some cases, the field may bounce between being set by one node and then zeroed by another which may harm performance slightly, e.g. when two nodes create many small files. In testing this situation is rare but it becomes more likely as the filesystem fills up and there are fewer resource groups to choose from. The problem goes away when all nodes are running with this patch. Dipping into the space currently occupied by the rg_reserved field would have resulted in the same problem as it is also explicitly zeroed, so unfortunately there is no other way around it. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-11rhashtable: Change rhashtable_walk_start to return voidTom Herbert1-5/+2
Most callers of rhashtable_walk_start don't care about a resize event which is indicated by a return value of -EAGAIN. So calls to rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something like this is common: ret = rhashtable_walk_start(rhiter); if (ret && ret != -EAGAIN) goto out; Since zero and -EAGAIN are the only possible return values from the function this check is pointless. The condition never evaluates to true. This patch changes rhashtable_walk_start to return void. This simplifies code for the callers that ignore -EAGAIN. For the few cases where the caller cares about the resize event, particularly where the table can be walked in mulitple parts for netlink or seq file dump, the function rhashtable_walk_start_check has been added that returns -EAGAIN on a resize event. Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds3-14/+14
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27gfs2: Remove unused gfs2_write_jdata_pagevec parameterAndreas Gruenbacher1-3/+2
As a follow-up to commit d2bc5b3c67a9, remove the end parameter which is now unused. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27gfs2: Fix wrong error handling in init_gfs2_fs()Tetsuo Handa1-46/+44
init_gfs2_fs() is calling e.g. calling unregister_shrinker() without register_shrinker() when an error occurred during initialization. Rename goto labels and call appropriate undo function. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27GFS2: Combine gfs2_free_di with gfs2_free_uninit_diBob Peterson1-8/+2
Before this patch, function gfs2_free_di was 4 lines of code, and one of those lines was to call gfs2_free_uninit_di. Although unlikely, if function gfs2_free_uninit_di encountered an error finding the block to be freed, the error was silently ignored by the caller, which went ahead and improperly did a quota-change operation and meta_wipe despite the error. This patch combines the two functions into one to make the code more readable and fixes the bug by returning from the combined function before it takes those next incorrect steps. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-16mm, pagevec: remove cold parameter for pagevecsMel Gorman1-1/+1
Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()Jan Kara1-1/+1
All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16gfs2: use pagevec_lookup_range_tag()Jan Kara1-18/+2
We want only pages from given range in gfs2_write_cache_jdata(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-9-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15Merge tag 'gfs2-4.15.fixes' of ↵Linus Torvalds11-209/+469
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got a total of 17 GFS2 patches for this merge window. The patches are basically in three categories: (1) patches related to broken xfstest cases, (2) patches related to improving iomap and start using it in GFS2, and (3) general typos and clarifications. Please note that one of the iomap patches extends beyond GFS2 and affects other file systems, but it was publically reviewed by a variety of file system people in the community. From Andreas Gruenbacher: - rename variable 'bsize' to 'factor' to clarify the logic related to gfs2_block_map. - correctly set ctime in the setflags ioctl, which fixes broken xfstests test 277. - fix broken xfstest 258, due to an atime initialization problem. - fix broken xfstest 307, in which GFS2 was not setting ctime when setting acls. - switch general iomap code from blkno to disk offset for a variety of file systems. - add a new IOMAP_F_DATA_INLINE flag for iomap to indicate blocks that have data mixed with metadata. - implement SEEK_HOLE and SEEK_DATA via iomap in GFS2. - fix failing xfstest case 066, which was due to not properly syncing dirty inodes when changing extended attributes. - fix a minor typo in a comment. - partially fix xfstest 424, which involved GET_FLAGS and SET_FLAGS ioctl. This is also a cleanup and simplification of the translation of flags from fs flags to gfs2 flags. - add support for STATX_ATTR_ in statx, which fixed broken xfstest 424. - fix for failing xfstest 093 which fixes a recursive glock problem with gfs2_xattr_get and _set From me: - make inode height info part of the 'metapath' data structure to facilitate using iomap in GFS2. - start using iomap inside GFS2 and switch GFS2's block_map functions to use iomap under the covers. - switch GFS2's fiemap implementation from using block_map to using iomap under the covers. - fix journaled data pages not being properly synced to media when writing inodes. This was caught with xfstests. - fix another failing xfstest case in which switching a file from ordered_write to journaled data via set_flags caused a deadlock" * tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Allow gfs2_xattr_set to be called with the glock held gfs2: Add support for statx inode flags gfs2: Fix and clean up {GET,SET}FLAGS ioctl gfs2: Fix a harmless typo gfs2: Fix xattr fsync GFS2: Take inode off order_write list when setting jdata flag GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap GFS2: Switch fiemap implementation to use iomap GFS2: Implement iomap for block_map GFS2: Make height info part of metapath gfs2: Always update inode ctime in set_acl gfs2: Support negative atimes gfs2: Update ctime in setflags ioctl gfs2: Clarify gfs2_block_map
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2-0/+2
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31gfs2: Allow gfs2_xattr_set to be called with the glock heldAndreas Gruenbacher1-7/+16
On the following call path: gfs2_setattr -> setattr_prepare -> ... -> cap_inode_killpriv -> ... -> gfs2_xattr_set the glock is locked in gfs2_setattr, so check for recursive locking in gfs2_xattr_set as gfs2_xattr_get already does. While at it, get rid of need_unlock in gfs2_xattr_get. Fixes xfstest generic/093. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31gfs2: Add support for statx inode flagsAndreas Gruenbacher1-0/+14
Add support for the STATX_ATTR_ flags in statx. (Compression, encryption, and the nodump flag are not supported by gfs2.) Partially fixes xfstest generic/424. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31gfs2: Fix and clean up {GET,SET}FLAGS ioctlAndreas Gruenbacher1-55/+47
Switch to a simple array for mapping between the FS_*_FL and GFS_DIF_* flags. Clarify how the mapping between FS_JOURNAL_DATA_FL and the filesystem flags works. The GFS2_DIF_SYSTEM flag cannot be set from user space, so remove it from GFS2_FLAGS_USER_SET. Fail with -EINVAL when trying to set flags that are not supported instead of silently ignoring those flags. Partially fixes xfstest generic/424. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31gfs2: Fix a harmless typoAndreas Gruenbacher1-1/+1
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31gfs2: Fix xattr fsyncAndreas Gruenbacher1-32/+8
Make sure that changing xattrs marks the corresponding inode dirty so that a subsequent fsync will sync those changes to disk. We set I_DIRTY_SYNC as well as I_DIRTY_DATASYNC so that both fsync and fdatasync will sync xattr changes: xattrs can contain information critical to how the data can be accessed, so we don't want fdatasync to skip them. Fixes xfstest generic/066. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31GFS2: Take inode off order_write list when setting jdata flagBob Peterson1-1/+3
This patch fixes a deadlock caused when the jdata flag is set for inodes that are already on the ordered write list. Since it is on the ordered write list, log_flush calls gfs2_ordered_write which calls filemap_fdatawrite. But since the inode had the jdata flag set, that calls gfs2_jdata_writepages, which tries to start a new transaction. A new transaction cannot be started because it tries to acquire the log_flush rwsem which is already locked by the log flush operation. The bottom line is: We cannot switch an inode from ordered to jdata until we eliminate any ordered data pages (via log flush) or any log_flush operation afterward will create the circular dependency above. So we need to flush the log before setting the diskflags to switch the file mode, then we need to remove the inode from the ordered writes list. Before this patch, the log flush was done for jdata->ordered, but that's wrong. If we're going from jdata to ordered, we don't need to call gfs2_log_flush because the call to filemap_fdatawrite will do it for us: filemap_fdatawrite() -> __filemap_fdatawrite_range() __filemap_fdatawrite_range() -> do_writepages() do_writepages() -> gfs2_jdata_writepages() gfs2_jdata_writepages() -> gfs2_log_flush() This patch modifies function do_gfs2_set_flags so that if a file has its jdata flag set, and it's already on the ordered write list, the log will be flushed and it will be removed from the list before setting the flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALLBob Peterson1-2/+3
In function gfs2_write_inode, starting with patch a9185b41a4f84, we only flush the log and call filemap_fdatawait if we're passed in a wbc sync_mode of WB_SYNC_ALL. We also need to do these things if we're evicting a jdata inode, because we might have jdata pages still attached to bufdata descriptors that need to be revoked, but by the time it gets to evict() it's too late to start a new transaction. This patch changes it to treat jdata inodes as if WB_SYNC_ALL had been specified. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31gfs2: Implement SEEK_HOLE / SEEK_DATA via iomapAndreas Gruenbacher3-3/+54
So far, lseek on gfs2 did not report holes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31GFS2: Switch fiemap implementation to use iomapBob Peterson2-25/+10
This patch switches GFS2's implementation of fiemap from the old block_map code to the new iomap interface. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31GFS2: Implement iomap for block_mapBob Peterson3-68/+274
This patch implements iomap for block mapping, and switches the block_map function to use it under the covers. The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has reached a "metadata boundary" and fetching the next mapping is likely to incur an additional I/O. This flag is used for setting the bh buffer boundary flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31GFS2: Make height info part of metapathBob Peterson1-47/+62
This patch eliminates height parameters from function gfs2_bmap_alloc. Function find_metapath determines the metapath's "find height", also known as the desired height. Function lookup_metapath determines the metapath's "actual height", previously known as starting height or sheight. Function gfs2_bmap_alloc now gets both height values from the metapath. This simplification was done as a step toward switching the block_map functions to using iomap. The bh_map responsibilities are also removed from function gfs2_bmap_alloc for the same reason. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-16Merge branch 'master' of ↵Andreas Gruenbacher6-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
2017-09-26Merge tag 'gfs2-for-linus-4.14-rc3' of ↵Linus Torvalds1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Bob Peterson: "GFS2: Fix an old regression in GFS2's debugfs interface This fixes a regression introduced by commit 88ffbf3e037e ("GFS2: Use resizable hash table for glocks"). The regression caused the glock dump in debugfs to not report all the glocks, which makes debugging extremely difficult" * tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix debugfs glocks dump
2017-09-25gfs2: Always update inode ctime in set_aclAndreas Gruenbacher1-0/+1
Three-entry POSIX ACLs can be stored in the file mode permission bits, with no need to store them in extended attributes. When a process sets such a minimal ACL, the kernel updates the file mode like chmod does, and removes any existing extended attributes for that ACL. Make sure the ctime is always updated in that case. Fixes xfstest generic/307. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Support negative atimesAndreas Gruenbacher1-1/+2
When inodes are read from disk, GFS2 will only update in-memory atimes older than the on-disk atimes; this prevents atimes from going backwards. The atimes of newly allocated inodes are initialized to 0. This means that when an atime is explicitly set to a negative value, this value will not persist. Fix by setting the atime of newly allocated inodes to the lowest possible value instead of 0. Fixes xfstest generic/258. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Update ctime in setflags ioctlAndreas Gruenbacher1-0/+1
The FS_IOC_SETFLAGS ioctl is supposed to update the inode ctime. Fixes xfstests generic/277. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Clarify gfs2_block_mapAndreas Gruenbacher1-4/+10
Add a comment about the logical block size for directories. Rename "bsize" in gfs2_block_map to "factor". Fix a typo in the description of metaptr1. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Fix debugfs glocks dumpAndreas Gruenbacher1-9/+5
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a single buffer: the right function for restarting an rhashtable iteration from the beginning of the hash table is rhashtable_walk_enter; rhashtable_walk_stop + rhashtable_walk_start will just resume from the current position. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # v4.3+
2017-09-15Merge branch 'work.mount' of ↵Linus Torvalds6-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds3-3/+3
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-09-07Merge tag 'wberr-v4.14-1' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2017-09-06Merge tag 'gfs2-4.14.fixes' of ↵Linus Torvalds20-110/+321
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got a whopping 29 GFS2 patches for this merge window, mainly because we held some back from the previous merge window until we could get them perfected and well tested. We have a couple patch sets, including my patch set for protecting glock gl_object and Andreas Gruenbacher's patch set to fix the long-standing shrink- slab hang, plus a bunch of assorted bugs and cleanups. Summary: - I fixed a bug whereby an IO error would lead to a double-brelse. - Andreas Gruenbacher made a minor cleanup to call his relatively new function, gfs2_holder_initialized, rather than doing it manually. This was just missed by a previous patch set. - Jan Kara fixed a bug whereby the SGID was being cleared when inheriting ACLs. - Andreas found a bug and fixed it in his previous patch, "Get rid of flush_delayed_work in gfs2_evict_inode". A call to flush_delayed_work was deleted from *gfs2_inode_lookup and added to gfs2_create_inode. - Wang Xibo found and fixed a list_add call in inode_go_lock that specified the parameters in the wrong order. - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's metadata reads that were accidentally missing them. - I submitted a 4-patch set to protect the glock gl_object field. GFS2 was setting and checking gl_object with no locking mechanism, so the value was occasionally stomped on, which caused file system corruption. - I submitted a small cleanup to function gfs2_clear_rgrpd. It was needlessly adding rgrp glocks to the lru list, then pulling them back off immediately. The rgrp glocks don't use the lru list anyway, so doing so was just a waste of time. - I submitted a patch that checks the GLOF_LRU flag on a glock before trying to remove it from the lru_list. This avoids a lot of unnecessary spin_lock contention. - I submitted a patch to delete GFS2's debugfs files only after we evict all the glocks. Before this patch, GFS2 would delete the debugfs files, and if unmount hung waiting for a glock, there was no way to debug the problem. Now, if a hang occurs during umount, we can examine the debugfs files to figure out why it's hung. - Andreas Gruenbacher submitted a patch to fix some trivial typos. - Andreas also submitted a five-part patch set to fix the longstanding hang involving the slab shrinker: dlm requires memory, calls the inode shrinker, which calls gfs2's evict, which calls back into DLM before it can evict an inode. - Abhi Das submitted a patch to forcibly flush the active items list to relieve memory pressure. This fixes a long-standing bug whereby GFS2 was getting hung permanently in balance_dirty_pages. - Thomas Tai submitted a patch to fix a slab corruption problem due to a residual pointer left in the lock_dlm lockstruct. - I submitted a patch to withdraw the file system if IO errors are encountered while writing to the journals or statfs system file which were previously not being sent back up. Before, some IO errors were sometimes not be detected for several hours, and at recovery time, the journal errors made journal replay impossible. - Andreas has a patch to fix an annoying format-truncation compiler warning so GFS2 compiles cleanly. - I have a patch that fixes a handful of sparse compiler warnings. - Andreas fixed up an useless gl_object warning caused by an earlier patch. - Arvind Yadav added a patch to properly constify our rhashtable params declare. - I added a patch to fix a regression caused by the non-recursive delete and truncate patch that caused file system blocks to not be properly freed. - Ernesto A. Fernández added a patch to fix a place where GFS2 would send back the wrong return code setting extended attributes. - Ernesto also added a patch to fix a case in which GFS2 was improperly setting an inode's i_mode, potentially granting access to the wrong users" * tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits) gfs2: preserve i_mode if __gfs2_set_acl() fails gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing GFS2: Fix non-recursive truncate bug gfs2: constify rhashtable_params GFS2: Fix gl_object warnings GFS2: Fix up some sparse warnings gfs2: Silence gcc format-truncation warning GFS2: Withdraw for IO errors writing to the journal or statfs gfs2: fix slab corruption during mounting and umounting gfs file system gfs2: forcibly flush ail to relieve memory pressure gfs2: Clean up waiting on glocks gfs2: Defer deleting inodes under memory pressure gfs2: gfs2_evict_inode: Put glocks asynchronously gfs2: Get rid of gfs2_set_nlink gfs2: gfs2_glock_get: Wait on freeing glocks gfs2: Fix trivial typos GFS2: Delete debugfs files only after we evict the glocks GFS2: Don't waste time locking lru_lock for non-lru glocks GFS2: Don't bother trying to add rgrps to the lru list GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode ...
2017-08-31gfs2: preserve i_mode if __gfs2_set_acl() failsErnesto A. Fernández1-5/+8
When changing a file's acl mask, __gfs2_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31gfs2: don't return ENODATA in __gfs2_xattr_set unless replacingErnesto A. Fernández1-2/+6
The function __gfs2_xattr_set() will return -ENODATA when called to remove a xattr that does not exist. The result is that setfacl will show an exit status of 1 when called to set only a file's mode bits (on a file with no ACLs), despite succeeding. A "No data available" error will be printed as well. To fix this return 0 instead, except when the XATTR_REPLACE flag is set, in which case -ENODATA is appropriate. This is consistent with how most other xattr setting functions work, in other filesystems. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>