Age | Commit message (Collapse) | Author | Files | Lines |
|
We need to set MRL/MWL to 69: 64 byte MCTP payload + 4 byte MCTP header
+ 1 byte PEC as per MCTP o. I3C spec paragraph 5.4.2.
We also need to verify whether MCTP packet size meets the requirement and
throw the error otherwise.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
Add the GETMRL/SETMRL and GETMWL/SETMWL CCC commands functions.
The functions are needed to configure maximum read/write length
for the specific I3C device.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
The driver doesn't implement MCTP logic, which means that it requires
its clients to prepare valid MCTP over PCIe packets.
Because we support MCTP stacks in both kernel and userspace, and since
MCTP protocol is a part of userspace, we need userspace to update own
EID information in kernel to provide it for in-kernel aspeed-mctp
clients.
Previously, own EID information was updated by userspace using
SET_EID_INFO which now is deprecated and replaced by SET_EID_EXT_INFO to
pass CPU EID and Domain ID mappings.
To provide own EID information for in-kernel clients, add a dedicated
IOCTL.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
ASPEED_MCTP_GET_EID_INFO and ASPEED_MCTP_SET_EID_INFO shouldn't be used
in future projects - they are replaced by EXT_INFO variants.
Update documentation for aspeed-mctp UAPI header to mark them as
deprecated.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Enable PECI over I3C driver for I3C device.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
Add PECI adapter that allows to send MCTP PCI VDM packets
via i3c-mctp driver.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
- Handler for PECI messages have been added;
- i3c_mctp_client and platform_device for PECI have been added.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
Extend the driver with the functionality necessary to send
and receive MCTP over I3C packets by another component in kernel:
- i3c_mctp_client concept needed to implement packet queue so that
the driver can manage and classify packets for several clients
respectively;
- the function that provides MCTP target EID;
- the functions to alloc and free mctp_i3c_packet;
- the functions to send and receive MCTP packets over I3C.
Signed-off-by: Oleksandr Shulzhenko <oleksandr.shulzhenko.viktorovych@intel.com>
|
|
This reverts commit 825fd72c22d729a4ad0d4d3ee488525710fe9877.
Minor was passed as one of the arguments - it was not redundant and
removing it introduces a regression.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Enable I3C MCTP drivers for both Controller and Target.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Add a simple device driver that implements necessary system calls to
support userspace MCTP communication when I3C HW is configured as I3C
Target.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Since alloc_chrdev_region() returns devt as an output parameter, calling
MKDEV() on it is a no-op.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Using I3C device name in char device name makes it longer.
There is no need to keep information about I3C device in name, since we
have access to this information via sysfs.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Change peci-dimmpower hwmon device name to use both CPU ID and Domain ID.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Change peci-cpupower hwmon device name to use both CPU ID and Domain ID.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Change peci-dimmtemp hwmon device name to use both CPU ID and Domain ID.
Use the corresponding Domain ID value in PECI commands.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Change peci-cputemp hwmon device name to use both CPU ID and Domain ID.
Use the corresponding Domain ID value in PECI commands.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
peci_client can be created dynamically using sysfs API or during build
time using DTS.
Let's use extended DTS properties to create peci_client corresponding to
specific Domain ID.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Add "domain" field to specify the corresponding Domain ID for
peci-client instance.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Since PECI Target addressing has been changed to use Domain ID
information, each peci_client instance should correspond with both CPU
ID and Domain ID.
Let's extend peci_client structure, sysfs API and intel-peci-client to
use Domain ID. Make MFD device ID as a combination of peci adapter
number, CPU ID and Domain ID.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
PECI address parameter provided to create peci_client instance is not
validated in the function that handles data passed from userspace or
DTS, it is only done later on in peci_new_device().
Let's move PECI address verification before peci_new_device() is called.
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Using pointer-to-char for local variable to store pointer-to-const-char
causes compilation warining:
drivers/clk/clk-aspeed.c:574:16: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
574 | parent_name = gd->parent_name;
|
Let's fix warning by changing local variable type to pointer-to-const-char.
Fixes: 51181b65c8d6 ("Add high speed baud rate support for UART")
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Fix the following compilation warning:
drivers/mtd/spi-nor/controllers/aspeed-smc.c:1334:17: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'unsigned int' [-Wformat=]
1334 | dev_info(dev, "tx width: %ld\n", spi_tx_width);
| ^~~~~~~~~~~~~~~~~
Fixes: 9b4da1ccbb55 ("SPI Quad IO driver support AST2600")
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Remove "static" specifier for mctp_peci_vdm_hdr struct declaration to
fix compilation warning:
drivers/peci/busses/peci-mctp.c:45:1: warning: useless storage class specifier in empty declaration
45 | } __packed;
| ^
Fixes: 00037aaa6440 ("peci: mctp: Add peci-mctp adapter")
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
|
|
Following DMTF DSP0233 standard, MCTP over I3C requires PEC generation
and validation for private read and write SDR transfers. To enable PEC,
I3C driver PEC support is used.
Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
|
|
DWC MIPI I3C Controller supports automatic PEC generation and
validation for private write and read SDR transfers. This feature
could be used for upper layers, e.g. MCTP over I3C to offload PEC
handling to the hardware.
PEC supported by DWC MIPI I3C Controller is described in JESD403-1.
Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
|
|
PEC (Packet Error Check) support enabled in I3C controller driver. API
for I3C device driver to control (enable or disable) PEC support in
lower layer driver added.
Signed-off-by: Zbigniew Lukwinski <zbigniew.lukwinski@linux.intel.com>
|
|
https://github.com/openbmc/linux into openbmc/dev-5.15-intel-bump_v5.15.36
Signed-off-by: Sujoy Ray <sujoy.ray@intel.com>
|
|
Add support to deselect the mux when there is a timeout.
The mux idle_state settings will be configured on startup. In case of
MCTP it is MUX_IDLE_DISCONNECT. But when there is a timeout, mux ends
up in connected position and the devices behind the mux will appear under
different muxes connected to the same bus. This change fix the same.
Signed-off-by: Arun P. Mohanan <arun.p.m@linux.intel.com>
|
|
This is the 5.15.36 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
Link: https://lore.kernel.org/r/20220426081747.286685339@linuxfoundation.org
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Slade Watkins <slade@sladewatkins.com>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 73419e4d2fd1b838fcb1df6a978d67b3ae1c5c01 upstream.
At least three platforms require the "qcom,qmp" property to be
specified, so the IPA driver can request register retention across
power collapse. Update DTS files accordingly.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220201140723.467431-1-elder@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ccf16413e520164eb718cf8b22a30438da80ff23 upstream.
kernel ulong and compat_ulong_t may not be same width. Use type directly
to eliminate mismatches.
This would result in truncation rather than EFBIG for 32bit mode for
large disks.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c235cc25087495c4288d94f547e9d3061004991 upstream.
Use the spi_mem_default_supports_op() core helper in order to take into
account the buswidth specified by the user in device tree.
Cc: <stable@vger.kernel.org>
Fixes: 0e6aae08e9ae ("spi: Add QuadSPI driver for Atmel SAMA5D2")
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20220406133604.455356-1-tudor.ambarus@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23e3d7f7061f8682c751c46512718f47580ad8f0 upstream.
we got issue as follows:
[ 72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
[ 72.826847] EXT4-fs (sda): Remounting filesystem read-only
fallocate: fallocate failed: Read-only file system
[ 74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
[ 74.793597] ------------[ cut here ]------------
[ 74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
[ 74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
[ 74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
[ 74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
[ 74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
[ 74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
[ 74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
[ 74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
[ 74.807301] FS: 0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
[ 74.808338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
[ 74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 74.811897] Call Trace:
[ 74.812241] <TASK>
[ 74.812566] __jbd2_journal_refile_buffer+0x12f/0x180
[ 74.813246] jbd2_journal_refile_buffer+0x4c/0xa0
[ 74.813869] jbd2_journal_commit_transaction.cold+0xa1/0x148
[ 74.817550] kjournald2+0xf8/0x3e0
[ 74.819056] kthread+0x153/0x1c0
[ 74.819963] ret_from_fork+0x22/0x30
Above issue may happen as follows:
write truncate kjournald2
generic_perform_write
ext4_write_begin
ext4_walk_page_buffers
do_journal_get_write_access ->add BJ_Reserved list
ext4_journalled_write_end
ext4_walk_page_buffers
write_end_fn
ext4_handle_dirty_metadata
***************JBD2 ABORT**************
jbd2_journal_dirty_metadata
-> return -EROFS, jh in reserved_list
jbd2_journal_commit_transaction
while (commit_transaction->t_reserved_list)
jh = commit_transaction->t_reserved_list;
truncate_pagecache_range
do_invalidatepage
ext4_journalled_invalidatepage
jbd2_journal_invalidatepage
journal_unmap_buffer
__dispose_buffer
__jbd2_journal_unfile_buffer
jbd2_journal_put_journal_head ->put last ref_count
__journal_remove_journal_head
bh->b_private = NULL;
jh->b_bh = NULL;
jbd2_journal_refile_buffer(journal, jh);
bh = jh2bh(jh);
->bh is NULL, later will trigger null-ptr-deref
journal_free_journal_head(jh);
After commit 96f1e0974575, we no longer hold the j_state_lock while
iterating over the list of reserved handles in
jbd2_journal_commit_transaction(). This potentially allows the
journal_head to be freed by journal_unmap_buffer while the commit
codepath is also trying to free the BJ_Reserved buffers. Keeping
j_state_lock held while trying extends hold time of the lock
minimally, and solves this issue.
Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 34243b9ec856309339172b1507379074156947e8 upstream.
The conversion erroneously removed the refcount increment.
In case we can use the percpu template, we need to increment
the refcount, else it will be released when the skb gets freed.
In case the slowpath is taken, the new template already has a
refcount of 1.
Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 85d825dbf4899a69407338bae462a59aa9a37326 upstream.
If the file system does not use bigalloc, calculating the overhead is
cheap, so force the recalculation of the overhead so we don't have to
trust the precalculated overhead in the superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 10b01ee92df52c8d7200afead4d5e5f55a5c58b1 upstream.
The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks. With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7102ffe4c166ca0f5e35137e9f9de83768c2d27d upstream.
According to document and code, ext4_xattr_header's size is 32 bytes, so
h_reserved size should be 3.
Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com>
Link: https://lore.kernel.org/r/92fcc3a6-7d77-8c09-4126-377fcb4c46a5@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2da376228a2427501feb9d15815a45dbdbdd753e upstream.
Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.
LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000
Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c186f0887fe7061a35cebef024550ec33ef8fbd8 upstream.
We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331
CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:83 [inline]
dump_stack+0x144/0x187 lib/dump_stack.c:124
print_address_description+0x7d/0x630 mm/kasan/report.c:387
__kasan_report+0x132/0x190 mm/kasan/report.c:547
kasan_report+0x47/0x60 mm/kasan/report.c:564
ext4_search_dir fs/ext4/namei.c:1394 [inline]
search_dirblock fs/ext4/namei.c:1199 [inline]
__ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
__lookup_hash+0xc5/0x190 fs/namei.c:1451
do_rmdir+0x19e/0x310 fs/namei.c:3760
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x445e59
Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
The buggy address belongs to the page:
page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
flags: 0x200000000000000()
raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
ext4_search_dir:
...
de = (struct ext4_dir_entry_2 *)search_buf;
dlimit = search_buf + buf_size;
while ((char *) de < dlimit) {
...
if ((char *) de + de->name_len <= dlimit &&
ext4_match(dir, fname, de)) {
...
}
...
de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
if (de_len <= 0)
return -1;
offset += de_len;
de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
}
Assume:
de=0xffff8881317c2fff
dlimit=0x0xffff8881317c3000
If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
out of range, then will trigger use-after-free.
To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
'de->name_len' to judge if '(char *) de + de->name_len' out of range.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a2b0b205d125f27cddfb4f7280e39affdaf46686 upstream.
We got issue as follows:
[home]# fsck.ext4 -fn ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no
Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no
As the symlink file size does not match the file content. If the writeback
of the symlink data block failed, ext4_finish_bio() handles the end of IO.
However this function fails to mark the buffer with BH_write_io_error and
so when unmount does journal checkpoint it cannot detect the writeback
error and will cleanup the journal. Thus we've lost the correct data in the
journal area. To solve this issue, mark the buffer as BH_write_io_error in
ext4_finish_bio().
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ad5cd4f4ee4d5fcdb1bfb7a0c073072961e70783 upstream.
Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files. This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range). Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.
The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6ae7989c9af0d98ab64196f4f4c6f6499454bd23 upstream.
nf_ct_put() results in a usesless indirection:
nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock +
indirect call of ct_hooks->destroy().
There are two _put helpers:
nf_ct_put and nf_conntrack_put. The latter is what should be used in
code that MUST NOT cause a linker dependency on the conntrack module
(e.g. calls from core network stack).
Everyone else should call nf_ct_put() instead.
A followup patch will convert a few nf_conntrack_put() calls to
nf_ct_put(), in particular from modules that already have a conntrack
dependency such as act_ct or even nf_conntrack itself.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 719774377622bc4025d2a74f551b5dc2158c6c30 upstream.
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.
For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.
Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().
This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d45829b351ee6ec5f54dd55e6aca1f44fe239fe6 upstream.
Use clflush_cache_range() to flush the confidential memory when
SME_COHERENT is supported in AMD CPU. Cache flush is still needed since
SME_COHERENT only support cache invalidation at CPU side. All confidential
cache lines are still incoherent with DMA devices.
Cc: stable@vger.kerel.org
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-3-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c69661e225cc484fbf44a0b99b56714a5241ae3 upstream.
Defer APICv updates that occur while L2 is active until nested VM-Exit,
i.e. until L1 regains control. vmx_refresh_apicv_exec_ctrl() assumes L1
is active and (a) stomps all over vmcs02 and (b) neglects to ever updated
vmcs01. E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no
APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv
becomes unhibited while L2 is active, KVM will set various APICv controls
in vmcs02 and trigger a failed VM-Entry. The kicker is that, unless
running with nested_early_check=1, KVM blames L1 and chaos ensues.
In all cases, ignoring vmcs02 and always deferring the inhibition change
to vmcs01 is correct (or at least acceptable). The ABSENT and DISABLE
inhibitions cannot truly change while L2 is active (see below).
IRQ_BLOCKING can change, but it is firmly a best effort debug feature.
Furthermore, only L2's APIC is accelerated/virtualized to the full extent
possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR
interception will apply to the virtual APIC managed by KVM.
The exception is the SELF_IPI register when x2APIC is enabled, but that's
an acceptable hole.
Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the
MSRs to L2, but for that to work in any sane capacity, L1 would need to
pass through IRQs to L2 as well, and IRQs must be intercepted to enable
virtual interrupt delivery. I.e. exposing Auto EOI to L2 and enabling
VID for L2 are, for all intents and purposes, mutually exclusive.
Lack of dynamic toggling is also why this scenario is all but impossible
to encounter in KVM's current form. But a future patch will pend an
APICv update request _during_ vCPU creation to plug a race where a vCPU
that's being created doesn't get included in the "all vCPUs request"
because it's not yet visible to other vCPUs. If userspaces restores L2
after VM creation (hello, KVM selftests), the first KVM_RUN will occur
while L2 is active and thus service the APICv update request made during
VM creation.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220420013732.3308816-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 423ecfea77dda83823c71b0fad1c2ddb2af1e5fc upstream.
Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an
in-kernel local APIC and APICv enabled at the module level. Consuming
kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can
race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens
before the vCPU is fully onlined, i.e. it won't get the request made to
"all" vCPUs. If APICv is globally inhibited between setting apicv_active
and onlining the vCPU, the vCPU will end up running with APICv enabled
and trigger KVM's sanity check.
Mark APICv as active during vCPU creation if APICv is enabled at the
module level, both to be optimistic about it's final state, e.g. to avoid
additional VMWRITEs on VMX, and because there are likely bugs lurking
since KVM checks apicv_active in multiple vCPU creation paths. While
keeping the current behavior of consuming kvm_apicv_activated() is
arguably safer from a regression perspective, force apicv_active so that
vCPU creation runs with deterministic state and so that if there are bugs,
they are found sooner than later, i.e. not when some crazy race condition
is hit.
WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
Modules linked in:
CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014
RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877
Call Trace:
<TASK>
vcpu_run arch/x86/kvm/x86.c:10039 [inline]
kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234
kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a
call to KVM_SET_GUEST_DEBUG.
r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0)
r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async)
r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async)
r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002)
ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5})
ioctl$KVM_RUN(r2, 0xae80, 0x0)
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Fixes: 8df14af42f00 ("kvm: x86: Add support for dynamic APICv activation")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220420013732.3308816-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 75189d1de1b377e580ebd2d2c55914631eac9c64 upstream.
NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.
This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.
Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.
Fixes: 168d918f2643 ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: Dongli Cao <caodongli@kingsoft.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e5c23779f93d45e39a52758ca593bd7e62e9b4be upstream.
In the case where there is only a cycle counter available (i.e.
PMCR_EL0.N is 0) and an event other than CPU cycles is opened, the open
should fail as the event can never possibly be scheduled. However, the
event validation when an event is opened is skipped when the group
leader is opened. Fix this by always validating the group leader events.
Reported-by: Al Grant <al.grant@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220408203330.4014015-1-robh@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|