summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-06-10i3c: mctp: Add GETMWL/SETMWL and GETMRL/SETMRL supportHEADdev-5.15-intelOleksandr Shulzhenko1-1/+52
We need to set MRL/MWL to 69: 64 byte MCTP payload + 4 byte MCTP header + 1 byte PEC as per MCTP o. I3C spec paragraph 5.4.2. We also need to verify whether MCTP packet size meets the requirement and throw the error otherwise. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-10i3c: device: Add SETMWL/GETMWL and SETMRL/GETMRL supportOleksandr Shulzhenko4-4/+159
Add the GETMRL/SETMRL and GETMWL/SETMWL CCC commands functions. The functions are needed to configure maximum read/write length for the specific I3C device. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-09soc: aspeed: mctp: Add IOCTL to set own EIDIwona Winiarska2-1/+25
The driver doesn't implement MCTP logic, which means that it requires its clients to prepare valid MCTP over PCIe packets. Because we support MCTP stacks in both kernel and userspace, and since MCTP protocol is a part of userspace, we need userspace to update own EID information in kernel to provide it for in-kernel aspeed-mctp clients. Previously, own EID information was updated by userspace using SET_EID_INFO which now is deprecated and replaced by SET_EID_EXT_INFO to pass CPU EID and Domain ID mappings. To provide own EID information for in-kernel clients, add a dedicated IOCTL. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-06-09soc: aspeed: mctp: Mark GET_EID_INFO/SET_EID_INFO as deprecatedIwona Winiarska1-6/+8
ASPEED_MCTP_GET_EID_INFO and ASPEED_MCTP_SET_EID_INFO shouldn't be used in future projects - they are replaced by EXT_INFO variants. Update documentation for aspeed-mctp UAPI header to mark them as deprecated. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-06-06ARM: configs: intel_bmc: Enable PECI I3C driverOleksandr Shulzhenko1-0/+1
Enable PECI over I3C driver for I3C device. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-01peci: busses: add PECI o. MCTP o. I3C platform driverOleksandr Shulzhenko3-0/+268
Add PECI adapter that allows to send MCTP PCI VDM packets via i3c-mctp driver. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-01i3c: mctp: Add PECI-related functionality to I3C MCTP driverOleksandr Shulzhenko2-0/+66
- Handler for PECI messages have been added; - i3c_mctp_client and platform_device for PECI have been added. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-06-01i3c: mctp: Extend MCTP o. I3C driver with the TX/RX functionsOleksandr Shulzhenko2-60/+257
Extend the driver with the functionality necessary to send and receive MCTP over I3C packets by another component in kernel: - i3c_mctp_client concept needed to implement packet queue so that the driver can manage and classify packets for several clients respectively; - the function that provides MCTP target EID; - the functions to alloc and free mctp_i3c_packet; - the functions to send and receive MCTP packets over I3C. Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
2022-05-27Revert "i3c: mctp: Replace redundant MKDEV calls"Iwona Winiarska1-3/+4
This reverts commit 825fd72c22d729a4ad0d4d3ee488525710fe9877. Minor was passed as one of the arguments - it was not redundant and removing it introduces a regression. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-25ARM: configs: intel_bmc: Enable I3C MCTP driversIwona Winiarska1-0/+2
Enable I3C MCTP drivers for both Controller and Target. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-25i3c: mctp: Add I3C Target MCTP driverIwona Winiarska3-2/+400
Add a simple device driver that implements necessary system calls to support userspace MCTP communication when I3C HW is configured as I3C Target. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-25i3c: mctp: Replace redundant MKDEV callsIwona Winiarska1-4/+3
Since alloc_chrdev_region() returns devt as an output parameter, calling MKDEV() on it is a no-op. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-25i3c: mctp: Use ID instead of I3C devname in chardev nameIwona Winiarska1-1/+1
Using I3C device name in char device name makes it longer. There is no need to keep information about I3C device in name, since we have access to this information via sysfs. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23hwmon: peci-dimmpower: Add Domain ID supportIwona Winiarska1-2/+2
Change peci-dimmpower hwmon device name to use both CPU ID and Domain ID. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23hwmon: peci-cpupower: Add Domain ID supportIwona Winiarska1-2/+2
Change peci-cpupower hwmon device name to use both CPU ID and Domain ID. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23hwmon: peci-dimmtemp: Add Domain ID supportIwona Winiarska1-4/+4
Change peci-dimmtemp hwmon device name to use both CPU ID and Domain ID. Use the corresponding Domain ID value in PECI commands. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23hwmon: peci-cputemp: Add Domain ID supportIwona Winiarska1-5/+5
Change peci-cputemp hwmon device name to use both CPU ID and Domain ID. Use the corresponding Domain ID value in PECI commands. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23peci: core: Use "domain" of-propertyIwona Winiarska1-1/+9
peci_client can be created dynamically using sysfs API or during build time using DTS. Let's use extended DTS properties to create peci_client corresponding to specific Domain ID. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23dt-bindings: Extend peci-client bindings with Domain IDIwona Winiarska1-2/+9
Add "domain" field to specify the corresponding Domain ID for peci-client instance. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23peci: Extend peci client with Domain ID informationIwona Winiarska4-18/+48
Since PECI Target addressing has been changed to use Domain ID information, each peci_client instance should correspond with both CPU ID and Domain ID. Let's extend peci_client structure, sysfs API and intel-peci-client to use Domain ID. Make MFD device ID as a combination of peci adapter number, CPU ID and Domain ID. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23peci: core: Check PECI address immediatelyIwona Winiarska1-14/+17
PECI address parameter provided to create peci_client instance is not validated in the function that handles data passed from userspace or DTS, it is only done later on in peci_new_device(). Let's move PECI address verification before peci_new_device() is called. Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23clk: aspeed: Don't convert pointer-to-const-charIwona Winiarska1-1/+1
Using pointer-to-char for local variable to store pointer-to-const-char causes compilation warining: drivers/clk/clk-aspeed.c:574:16: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] 574 | parent_name = gd->parent_name; | Let's fix warning by changing local variable type to pointer-to-const-char. Fixes: 51181b65c8d6 ("Add high speed baud rate support for UART") Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23mtd: spi-nor: aspeed: Fix wrong print formatIwona Winiarska1-1/+1
Fix the following compilation warning: drivers/mtd/spi-nor/controllers/aspeed-smc.c:1334:17: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'unsigned int' [-Wformat=] 1334 | dev_info(dev, "tx width: %ld\n", spi_tx_width); | ^~~~~~~~~~~~~~~~~ Fixes: 9b4da1ccbb55 ("SPI Quad IO driver support AST2600") Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-23peci: mctp: Remove useless class specifierIwona Winiarska1-1/+1
Remove "static" specifier for mctp_peci_vdm_hdr struct declaration to fix compilation warning: drivers/peci/busses/peci-mctp.c:45:1: warning: useless storage class specifier in empty declaration 45 | } __packed; | ^ Fixes: 00037aaa6440 ("peci: mctp: Add peci-mctp adapter") Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
2022-05-20i3c: mctp: Enable PECZbigniew Lukwinski1-0/+4
Following DMTF DSP0233 standard, MCTP over I3C requires PEC generation and validation for private read and write SDR transfers. To enable PEC, I3C driver PEC support is used. Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
2022-05-20i3c: dw: master: Enable PEC support in hardwareZbigniew Lukwinski1-0/+6
DWC MIPI I3C Controller supports automatic PEC generation and validation for private write and read SDR transfers. This feature could be used for upper layers, e.g. MCTP over I3C to offload PEC handling to the hardware. PEC supported by DWC MIPI I3C Controller is described in JESD403-1. Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
2022-05-20i3c: master: Enable support for PECZbigniew Lukwinski5-0/+45
PEC (Packet Error Check) support enabled in I3C controller driver. API for I3C device driver to control (enable or disable) PEC support in lower layer driver added. Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
2022-05-12Merge commit '49caedb668e476c100d727f2174724e0610a2b92' of ↵Sujoy Ray2310-14136/+33286
https://github.com/openbmc/linux into openbmc/dev-5.15-intel-bump_v5.15.36 Signed-off-by: Sujoy Ray <sujoy.ray@intel.com>
2022-04-29i2c: mux: Add mux deselect support on timeoutArun P. Mohanan2-5/+11
Add support to deselect the mux when there is a timeout. The mux idle_state settings will be configured on startup. In case of MCTP it is MUX_IDLE_DISCONNECT. But when there is a timeout, mux ends up in connected position and the devices behind the mux will appear under different muxes connected to the same bus. This change fix the same. Signed-off-by: Arun P. Mohanan <arun.p.m@linux.intel.com>
2022-04-28Merge tag 'v5.15.36' into dev-5.15Joel Stanley163-701/+1036
This is the 5.15.36 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-27Linux 5.15.36Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20220426081747.286685339@linuxfoundation.org Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Slade Watkins <slade@sladewatkins.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27arm64: dts: qcom: add IPA qcom,qmp propertyAlex Elder3-0/+6
commit 73419e4d2fd1b838fcb1df6a978d67b3ae1c5c01 upstream. At least three platforms require the "qcom,qmp" property to be specified, so the IPA driver can request register retention across power collapse. Update DTS files accordingly. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220201140723.467431-1-elder@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27block/compat_ioctl: fix range check in BLKGETSIZEKhazhismel Kumykov1-1/+1
commit ccf16413e520164eb718cf8b22a30438da80ff23 upstream. kernel ulong and compat_ulong_t may not be same width. Use type directly to eliminate mismatches. This would result in truncation rather than EFBIG for 32bit mode for large disks. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controllerTudor Ambarus1-0/+3
commit 8c235cc25087495c4288d94f547e9d3061004991 upstream. Use the spi_mem_default_supports_op() core helper in order to take into account the buswidth specified by the user in device tree. Cc: <stable@vger.kernel.org> Fixes: 0e6aae08e9ae ("spi: Add QuadSPI driver for Atmel SAMA5D2") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20220406133604.455356-1-tudor.ambarus@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27jbd2: fix a potential race while discarding reserved buffers after an abortYe Bin1-1/+3
commit 23e3d7f7061f8682c751c46512718f47580ad8f0 upstream. we got issue as follows: [ 72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal [ 72.826847] EXT4-fs (sda): Remounting filesystem read-only fallocate: fallocate failed: Read-only file system [ 74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay [ 74.793597] ------------[ cut here ]------------ [ 74.794203] kernel BUG at fs/jbd2/transaction.c:2063! [ 74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150 [ 74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60 [ 74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202 [ 74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90 [ 74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20 [ 74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90 [ 74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00 [ 74.807301] FS: 0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000 [ 74.808338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0 [ 74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 74.811897] Call Trace: [ 74.812241] <TASK> [ 74.812566] __jbd2_journal_refile_buffer+0x12f/0x180 [ 74.813246] jbd2_journal_refile_buffer+0x4c/0xa0 [ 74.813869] jbd2_journal_commit_transaction.cold+0xa1/0x148 [ 74.817550] kjournald2+0xf8/0x3e0 [ 74.819056] kthread+0x153/0x1c0 [ 74.819963] ret_from_fork+0x22/0x30 Above issue may happen as follows: write truncate kjournald2 generic_perform_write ext4_write_begin ext4_walk_page_buffers do_journal_get_write_access ->add BJ_Reserved list ext4_journalled_write_end ext4_walk_page_buffers write_end_fn ext4_handle_dirty_metadata ***************JBD2 ABORT************** jbd2_journal_dirty_metadata -> return -EROFS, jh in reserved_list jbd2_journal_commit_transaction while (commit_transaction->t_reserved_list) jh = commit_transaction->t_reserved_list; truncate_pagecache_range do_invalidatepage ext4_journalled_invalidatepage jbd2_journal_invalidatepage journal_unmap_buffer __dispose_buffer __jbd2_journal_unfile_buffer jbd2_journal_put_journal_head ->put last ref_count __journal_remove_journal_head bh->b_private = NULL; jh->b_bh = NULL; jbd2_journal_refile_buffer(journal, jh); bh = jh2bh(jh); ->bh is NULL, later will trigger null-ptr-deref journal_free_journal_head(jh); After commit 96f1e0974575, we no longer hold the j_state_lock while iterating over the list of reserved handles in jbd2_journal_commit_transaction(). This potentially allows the journal_head to be freed by journal_unmap_buffer while the commit codepath is also trying to free the BJ_Reserved buffers. Keeping j_state_lock held while trying extends hold time of the lock minimally, and solves this issue. Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27netfilter: nft_ct: fix use after free when attaching zone templateFlorian Westphal1-1/+4
commit 34243b9ec856309339172b1507379074156947e8 upstream. The conversion erroneously removed the refcount increment. In case we can use the percpu template, we need to increment the refcount, else it will be released when the skb gets freed. In case the slowpath is taken, the new template already has a refcount of 1. Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: force overhead calculation if the s_overhead_cluster makes no senseTheodore Ts'o1-3/+12
commit 85d825dbf4899a69407338bae462a59aa9a37326 upstream. If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix overhead calculation to account for the reserved gdt blocksTheodore Ts'o1-1/+3
commit 10b01ee92df52c8d7200afead4d5e5f55a5c58b1 upstream. The kernel calculation was underestimating the overhead by not taking into account the reserved gdt blocks. With this change, the overhead calculated by the kernel matches the overhead calculation in mke2fs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4, doc: fix incorrect h_reserved sizewangjianjian (C)1-1/+1
commit 7102ffe4c166ca0f5e35137e9f9de83768c2d27d upstream. According to document and code, ext4_xattr_header's size is 32 bytes, so h_reserved size should be 3. Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com> Link: https://lore.kernel.org/r/92fcc3a6-7d77-8c09-4126-377fcb4c46a5@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: limit length to bitmap_maxbytes - blocksize in punch_holeTadeusz Struk1-1/+10
commit 2da376228a2427501feb9d15815a45dbdbdd753e upstream. Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the bitmap_maxbytes for the inode. This triggers a BUG in ext4_ind_remove_space(). According to the comments in this function the 'end' parameter needs to be one block after the last block to be removed. In the case when the BUG is triggered it points to the last block. Modify the ext4_punch_hole() function and add constraint that caps the length to satisfy the one before laster block requirement. LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331 LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix use-after-free in ext4_search_dirYe Bin2-2/+6
commit c186f0887fe7061a35cebef024550ec33ef8fbd8 upstream. We got issue as follows: EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue ================================================================== BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline] BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline] BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331 CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:83 [inline] dump_stack+0x144/0x187 lib/dump_stack.c:124 print_address_description+0x7d/0x630 mm/kasan/report.c:387 __kasan_report+0x132/0x190 mm/kasan/report.c:547 kasan_report+0x47/0x60 mm/kasan/report.c:564 ext4_search_dir fs/ext4/namei.c:1394 [inline] search_dirblock fs/ext4/namei.c:1199 [inline] __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 ext4_lookup_entry fs/ext4/namei.c:1622 [inline] ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690 __lookup_hash+0xc5/0x190 fs/namei.c:1451 do_rmdir+0x19e/0x310 fs/namei.c:3760 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x445e59 Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002 R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3 flags: 0x200000000000000() raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== ext4_search_dir: ... de = (struct ext4_dir_entry_2 *)search_buf; dlimit = search_buf + buf_size; while ((char *) de < dlimit) { ... if ((char *) de + de->name_len <= dlimit && ext4_match(dir, fname, de)) { ... } ... de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); if (de_len <= 0) return -1; offset += de_len; de = (struct ext4_dir_entry_2 *) ((char *) de + de_len); } Assume: de=0xffff8881317c2fff dlimit=0x0xffff8881317c3000 If read 'de->name_len' which address is 0xffff8881317c3005, obviously is out of range, then will trigger use-after-free. To solve this issue, 'dlimit' must reserve 8 bytes, as we will read 'de->name_len' to judge if '(char *) de + de->name_len' out of range. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix symlink file size not match to file contentYe Bin1-1/+3
commit a2b0b205d125f27cddfb4f7280e39affdaf46686 upstream. We got issue as follows: [home]# fsck.ext4 -fn ram0yb e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. Clear? no Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). Fix? no As the symlink file size does not match the file content. If the writeback of the symlink data block failed, ext4_finish_bio() handles the end of IO. However this function fails to mark the buffer with BH_write_io_error and so when unmount does journal checkpoint it cannot detect the writeback error and will cleanup the journal. Thus we've lost the correct data in the journal area. To solve this issue, mark the buffer as BH_write_io_error in ext4_finish_bio(). Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix fallocate to use file_modified to update permissions consistentlyDarrick J. Wong3-9/+32
commit ad5cd4f4ee4d5fcdb1bfb7a0c073072961e70783 upstream. Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27netfilter: conntrack: avoid useless indirection during conntrack destructionFlorian Westphal3-8/+14
commit 6ae7989c9af0d98ab64196f4f4c6f6499454bd23 upstream. nf_ct_put() results in a usesless indirection: nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock + indirect call of ct_hooks->destroy(). There are two _put helpers: nf_ct_put and nf_conntrack_put. The latter is what should be used in code that MUST NOT cause a linker dependency on the conntrack module (e.g. calls from core network stack). Everyone else should call nf_ct_put() instead. A followup patch will convert a few nf_conntrack_put() calls to nf_ct_put(), in particular from modules that already have a conntrack dependency such as act_ct or even nf_conntrack itself. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27netfilter: conntrack: convert to refcount_t apiFlorian Westphal11-33/+27
commit 719774377622bc4025d2a74f551b5dc2158c6c30 upstream. Convert nf_conn reference counting from atomic_t to refcount_t based api. refcount_t api provides more runtime sanity checks and will warn on certain constructs, e.g. refcount_inc() on a zero reference count, which usually indicates use-after-free. For this reason template allocation is changed to init the refcount to 1, the subsequenct add operations are removed. Likewise, init_conntrack() is changed to set the initial refcount to 1 instead refcount_inc(). This is safe because the new entry is not (yet) visible to other cpus. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUsMingwei Zhang1-3/+6
commit d45829b351ee6ec5f54dd55e6aca1f44fe239fe6 upstream. Use clflush_cache_range() to flush the confidential memory when SME_COHERENT is supported in AMD CPU. Cache flush is still needed since SME_COHERENT only support cache invalidation at CPU side. All confidential cache lines are still incoherent with DMA devices. Cc: stable@vger.kerel.org Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220421031407.2516575-3-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27KVM: nVMX: Defer APICv updates while L2 is active until L1 is activeSean Christopherson3-0/+11
commit 7c69661e225cc484fbf44a0b99b56714a5241ae3 upstream. Defer APICv updates that occur while L2 is active until nested VM-Exit, i.e. until L1 regains control. vmx_refresh_apicv_exec_ctrl() assumes L1 is active and (a) stomps all over vmcs02 and (b) neglects to ever updated vmcs01. E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv becomes unhibited while L2 is active, KVM will set various APICv controls in vmcs02 and trigger a failed VM-Entry. The kicker is that, unless running with nested_early_check=1, KVM blames L1 and chaos ensues. In all cases, ignoring vmcs02 and always deferring the inhibition change to vmcs01 is correct (or at least acceptable). The ABSENT and DISABLE inhibitions cannot truly change while L2 is active (see below). IRQ_BLOCKING can change, but it is firmly a best effort debug feature. Furthermore, only L2's APIC is accelerated/virtualized to the full extent possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR interception will apply to the virtual APIC managed by KVM. The exception is the SELF_IPI register when x2APIC is enabled, but that's an acceptable hole. Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the MSRs to L2, but for that to work in any sane capacity, L1 would need to pass through IRQs to L2 as well, and IRQs must be intercepted to enable virtual interrupt delivery. I.e. exposing Auto EOI to L2 and enabling VID for L2 are, for all intents and purposes, mutually exclusive. Lack of dynamic toggling is also why this scenario is all but impossible to encounter in KVM's current form. But a future patch will pend an APICv update request _during_ vCPU creation to plug a race where a vCPU that's being created doesn't get included in the "all vCPUs request" because it's not yet visible to other vCPUs. If userspaces restores L2 after VM creation (hello, KVM selftests), the first KVM_RUN will occur while L2 is active and thus service the APICv update request made during VM creation. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220420013732.3308816-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a raceSean Christopherson1-1/+14
commit 423ecfea77dda83823c71b0fad1c2ddb2af1e5fc upstream. Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an in-kernel local APIC and APICv enabled at the module level. Consuming kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens before the vCPU is fully onlined, i.e. it won't get the request made to "all" vCPUs. If APICv is globally inhibited between setting apicv_active and onlining the vCPU, the vCPU will end up running with APICv enabled and trigger KVM's sanity check. Mark APICv as active during vCPU creation if APICv is enabled at the module level, both to be optimistic about it's final state, e.g. to avoid additional VMWRITEs on VMX, and because there are likely bugs lurking since KVM checks apicv_active in multiple vCPU creation paths. While keeping the current behavior of consuming kvm_apicv_activated() is arguably safer from a regression perspective, force apicv_active so that vCPU creation runs with deterministic state and so that if there are bugs, they are found sooner than later, i.e. not when some crazy race condition is hit. WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 Modules linked in: CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014 RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 Call Trace: <TASK> vcpu_run arch/x86/kvm/x86.c:10039 [inline] kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234 kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a call to KVM_SET_GUEST_DEBUG. r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0) r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async) r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async) r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002) ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5}) ioctl$KVM_RUN(r2, 0xae80, 0x0) Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Fixes: 8df14af42f00 ("kvm: x86: Add support for dynamic APICv activation") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220420013732.3308816-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdogLike Xu3-6/+12
commit 75189d1de1b377e580ebd2d2c55914631eac9c64 upstream. NMI-watchdog is one of the favorite features of kernel developers, but it does not work in AMD guest even with vPMU enabled and worse, the system misrepresents this capability via /proc. This is a PMC emulation error. KVM does not pass the latest valid value to perf_event in time when guest NMI-watchdog is running, thus the perf_event corresponding to the watchdog counter will enter the old state at some point after the first guest NMI injection, forcing the hardware register PMC0 to be constantly written to 0x800000000001. Meanwhile, the running counter should accurately reflect its new value based on the latest coordinated pmc->counter (from vPMC's point of view) rather than the value written directly by the guest. Fixes: 168d918f2643 ("KVM: x86: Adjust counter sample period after a wrmsr") Reported-by: Dongli Cao <caodongli@kingsoft.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Yanan Wang <wangyanan55@huawei.com> Tested-by: Yanan Wang <wangyanan55@huawei.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220409015226.38619-1-likexu@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27arm_pmu: Validate single/group leader eventsRob Herring1-6/+4
commit e5c23779f93d45e39a52758ca593bd7e62e9b4be upstream. In the case where there is only a cycle counter available (i.e. PMCR_EL0.N is 0) and an event other than CPU cycles is opened, the open should fail as the event can never possibly be scheduled. However, the event validation when an event is opened is skipped when the group leader is opened. Fix this by always validating the group leader events. Reported-by: Al Grant <al.grant@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220408203330.4014015-1-robh@kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>