summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms
AgeCommit message (Collapse)AuthorFilesLines
2024-02-05powerpc/pseries/iommu: Fix iommu initialisation during DLPAR addGaurav Batra1-0/+4
When a PCI device is dynamically added, the kernel oopses with a NULL pointer dereference: BUG: Kernel NULL pointer dereference on read at 0x00000030 Faulting instruction address: 0xc0000000006bbe5c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs xsk_diag bonding nft_compat nf_tables nfnetlink rfkill binfmt_misc dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c mlx5_core mlxfw sd_mod t10_pi sg tls ibmvscsi ibmveth scsi_transport_srp vmx_crypto pseries_wdt psample dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 17 PID: 2685 Comm: drmgr Not tainted 6.7.0-203405+ #66 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries NIP: c0000000006bbe5c LR: c000000000a13e68 CTR: c0000000000579f8 REGS: c00000009924f240 TRAP: 0300 Not tainted (6.7.0-203405+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002220 XER: 20040006 CFAR: c000000000a13e64 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 0 ... NIP sysfs_add_link_to_group+0x34/0x94 LR iommu_device_link+0x5c/0x118 Call Trace: iommu_init_device+0x26c/0x318 (unreliable) iommu_device_link+0x5c/0x118 iommu_init_device+0xa8/0x318 iommu_probe_device+0xc0/0x134 iommu_bus_notifier+0x44/0x104 notifier_call_chain+0xb8/0x19c blocking_notifier_call_chain+0x64/0x98 bus_notify+0x50/0x7c device_add+0x640/0x918 pci_device_add+0x23c/0x298 of_create_pci_dev+0x400/0x884 of_scan_pci_dev+0x124/0x1b0 __of_scan_bus+0x78/0x18c pcibios_scan_phb+0x2a4/0x3b0 init_phb_dynamic+0xb8/0x110 dlpar_add_slot+0x170/0x3b8 [rpadlpar_io] add_slot_store.part.0+0xb4/0x130 [rpadlpar_io] kobj_attr_store+0x2c/0x48 sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x350/0x4a0 ksys_write+0x84/0x140 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec Commit a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains") broke DLPAR add of PCI devices. The above added iommu_device structure to pci_controller. During system boot, PCI devices are discovered and this newly added iommu_device structure is initialized by a call to iommu_device_register(). During DLPAR add of a PCI device, a new pci_controller structure is allocated but there are no calls made to iommu_device_register() interface. Fix is to register the iommu device during DLPAR add as well. Fixes: a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains") Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> [mpe: Trim oops and tweak some change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240122222407.39603-1-gbatra@linux.ibm.com
2024-01-18Merge tag 'tty-6.8-rc1' of ↵Linus Torvalds2-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial updates from Greg KH: "Here is the big set of tty and serial driver changes for 6.8-rc1. As usual, Jiri has a bunch of refactoring and cleanups for the tty core and drivers in here, along with the usual set of rs485 updates (someday this might work properly...) Along with those, in here are changes for: - sc16is7xx serial driver updates - platform driver removal api updates - amba-pl011 driver updates - tty driver binding updates - other small tty/serial driver updates and changes All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (197 commits) serial: sc16is7xx: refactor EFR lock serial: sc16is7xx: reorder code to remove prototype declarations serial: sc16is7xx: refactor FIFO access functions to increase commonality serial: sc16is7xx: drop unneeded MODULE_ALIAS serial: sc16is7xx: replace hardcoded divisor value with BIT() macro serial: sc16is7xx: add explicit return for some switch default cases serial: sc16is7xx: add macro for max number of UART ports serial: sc16is7xx: add driver name to struct uart_driver serial: sc16is7xx: use i2c_get_match_data() serial: sc16is7xx: use spi_get_device_match_data() serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield serial: sc16is7xx: improve do/while loop in sc16is7xx_irq() serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq() serial: sc16is7xx: set safe default SPI clock frequency serial: sc16is7xx: add check for unsupported SPI modes during probe serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error serial: 8250_exar: Set missing rs485_supported flag serial: omap: do not override settings for RS485 support serial: core, imx: do not set RS485 enabled if it is not supported serial: core: make sure RS485 cannot be enabled when it is not supported ...
2024-01-12Merge tag 'pull-dcache' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09Merge tag 'powerpc-6.8-1' of ↵Linus Torvalds25-45/+799
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add initial support to recognise the HeXin C2000 processor. - Add papr-vpd and papr-sysparm character device drivers for VPD & sysparm retrieval, so userspace tools can be adapted to avoid doing raw firmware calls from userspace. - Sched domains optimisations for shared processor partitions on P9/P10. - A series of optimisations for KVM running as a nested HV under PowerVM. - Other small features and fixes. Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand, Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao, Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and Zhao Ke. * tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits) powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2 powerpc/86xx: Drop unused CONFIG_MPC8610 powerpc/powernv: Add error handling to opal_prd_range_is_valid selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES" powerpc/hvcall: Reorder Nestedv2 hcall opcodes powerpc/ps3: Add missing set_freezable() for ps3_probe_thread() powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn() powerpc/fsl: Fix fsl,tmu-calibration to match the schema powerpc/smp: Dynamically build Powerpc topology powerpc/smp: Avoid asym packing within thread_group of a core powerpc/smp: Add __ro_after_init attribute powerpc/smp: Disable MC domain for shared processor powerpc/smp: Enable Asym packing for cores on shared processor powerpc/sched: Cleanup vcpu_is_preempted() powerpc: add cpu_spec.cpu_features to vmcoreinfo powerpc/imc-pmu: Add a null pointer check in update_events_in_group() powerpc/powernv: Add a null pointer check in opal_powercap_init() powerpc/powernv: Add a null pointer check in opal_event_init() powerpc/powernv: Add a null pointer check to scom_debug_init_one() ...
2024-01-09mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov1-1/+1
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29powerpc/86xx: Drop unused CONFIG_MPC8610Michael Ellerman1-7/+0
The MPC8610 symbol used to be default y if MPC8610_HPCD, but since MPC8610_HPCD was removed MPC8610 is now never used. Remove it. Fixes: 248667f8bbde ("powerpc: drop HPCD/MPC8610 evaluation platform support") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231123032902.2760818-1-mpe@ellerman.id.au
2023-12-21powerpc/powernv: Add error handling to opal_prd_range_is_validHaoran Liu1-0/+2
In the opal_prd_range_is_valid function within opal-prd.c, error handling was missing for the of_get_address call. This patch adds necessary error checking, ensuring that the function gracefully handles scenarios where of_get_address fails. Signed-off-by: Haoran Liu <liuhaoran14@163.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231127144108.29782-1-liuhaoran14@163.com
2023-12-21powerpc/ps3: Add missing set_freezable() for ps3_probe_thread()Kevin Hao1-0/+1
The kernel thread function ps3_probe_thread() invokes the try_to_freeze() in its loop. But all the kernel threads are non-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231221044510.1802429-4-haokexin@gmail.com
2023-12-21powerpc/mpc83xx: Use wait_event_freezable() for freezable kthreadKevin Hao1-2/+1
A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: wait_event_interruptible(); try_to_freeze(); We can change it to a simple wait_event_freezable() and then eliminate a function call. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231221044510.1802429-3-haokexin@gmail.com
2023-12-21powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn()Kevin Hao1-0/+2
The kernel thread function agent_thread_fn() invokes the try_to_freeze() in its loop. But all the kernel threads are non-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231221044510.1802429-2-haokexin@gmail.com
2023-12-13powerpc/powernv: Add a null pointer check in opal_powercap_init()Kunwu Chan1-0/+6
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Fixes: b9ef7b4b867f ("powerpc: Convert to using %pOFn instead of device_node.name") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231126095739.1501990-1-chentao@kylinos.cn
2023-12-13powerpc/powernv: Add a null pointer check in opal_event_init()Kunwu Chan1-0/+2
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Fixes: 2717a33d6074 ("powerpc/opal-irqchip: Use interrupt names if present") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231127030755.1546750-1-chentao@kylinos.cn
2023-12-13powerpc/powernv: Add a null pointer check to scom_debug_init_one()Kunwu Chan1-0/+5
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Add a null pointer check, and release 'ent' to avoid memory leaks. Fixes: bfd2f0d49aef ("powerpc/powernv: Get rid of old scom_controller abstraction") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231208085937.107210-1-chentao@kylinos.cn
2023-12-13powerpc/pseries/vas: Migration suspend waits for no in-progress open windowsHaren Myneni2-7/+46
The hypervisor returns migration failure if all VAS windows are not closed. During pre-migration stage, vas_migration_handler() sets migration_in_progress flag and closes all windows from the list. The allocate VAS window routine checks the migration flag, setup the window and then add it to the list. So there is possibility of the migration handler missing the window that is still in the process of setup. t1: Allocate and open VAS t2: Migration event window lock vas_pseries_mutex If migration_in_progress set unlock vas_pseries_mutex return open window HCALL unlock vas_pseries_mutex Modify window HCALL lock vas_pseries_mutex setup window migration_in_progress=true Closes all windows from the list // May miss windows that are // not in the list unlock vas_pseries_mutex lock vas_pseries_mutex return if nr_closed_windows == 0 // No DLPAR CPU or migration add window to the list // Window will be added to the // list after the setup is completed unlock vas_pseries_mutex return unlock vas_pseries_mutex Close VAS window // due to DLPAR CPU or migration return -EBUSY This patch resolves the issue with the following steps: - Set the migration_in_progress flag without holding mutex. - Introduce nr_open_wins_progress counter in VAS capabilities struct - This counter tracks the number of open windows are still in progress - The allocate setup window thread closes windows if the migration is set and decrements nr_open_window_progress counter - The migration handler waits for no in-progress open windows. The code flow with the fix is as follows: t1: Allocate and open VAS t2: Migration event window lock vas_pseries_mutex If migration_in_progress set unlock vas_pseries_mutex return open window HCALL nr_open_wins_progress++ // Window opened, but not // added to the list yet unlock vas_pseries_mutex Modify window HCALL migration_in_progress=true setup window lock vas_pseries_mutex Closes all windows from the list While nr_open_wins_progress { unlock vas_pseries_mutex lock vas_pseries_mutex sleep if nr_closed_windows == 0 // Wait if any open window in or migration is not started // progress. The open window // No DLPAR CPU or migration // thread closes the window without add window to the list // adding to the list and return if nr_open_wins_progress-- // the migration is in progress. unlock vas_pseries_mutex return Close VAS window nr_open_wins_progress-- unlock vas_pseries_mutex return -EBUSY lock vas_pseries_mutex } unlock vas_pseries_mutex return Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231125235104.3405008-1-haren@linux.ibm.com
2023-12-13powerpc/pseries/papr-sysparm: Expose character device to user spaceNathan Lynch1-2/+156
Until now the papr_sysparm APIs have been kernel-internal. But user space needs access to PAPR system parameters too. The only method available to user space today to get or set system parameters is using sys_rtas() and /dev/mem to pass RTAS-addressable buffers between user space and firmware. This is incompatible with lockdown and should be deprecated. So provide an alternative ABI to user space in the form of a /dev/papr-sysparm character device with just two ioctl commands (get and set). The data payloads involved are small enough to fit in the ioctl argument buffer, making the code relatively simple. Exposing the system parameters through sysfs has been considered but it would be too awkward: * The kernel currently does not have to contain an exhaustive list of defined system parameters. This is a convenient property to maintain because we don't have to update the kernel whenever a new parameter is added to PAPR. Exporting a named attribute in sysfs for each parameter would negate this. * Some system parameters are text-based and some are not. * Retrieval of at least one system parameter requires input data, which a simple read-oriented interface can't support. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-11-e9eafd0c8c6c@linux.ibm.com
2023-12-13powerpc/pseries/papr-sysparm: Validate buffer object lengthsNathan Lynch1-0/+47
The ability to get and set system parameters will be exposed to user space, so let's get a little more strict about malformed papr_sysparm_buf objects. * Create accessors for the length field of struct papr_sysparm_buf. The length is always stored in MSB order and this is better than spreading the necessary conversions all over. * Reject attempts to submit invalid buffers to RTAS. * Warn if RTAS returns a buffer with an invalid length, clamping the returned length to a safe value that won't overrun the buffer. These are meant as precautionary measures to mitigate both firmware and kernel bugs in this area, should they arise, but I am not aware of any. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-10-e9eafd0c8c6c@linux.ibm.com
2023-12-13powerpc/pseries: Add papr-vpd character driver for VPD retrievalNathan Lynch2-0/+542
PowerVM LPARs may retrieve Vital Product Data (VPD) for system components using the ibm,get-vpd RTAS function. We can expose this to user space with a /dev/papr-vpd character device, where the programming model is: struct papr_location_code plc = { .str = "", }; /* obtain all VPD */ int devfd = open("/dev/papr-vpd", O_RDONLY); int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc); size_t size = lseek(vpdfd, 0, SEEK_END); char *buf = malloc(size); pread(devfd, buf, size, 0); When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE), the file contains the result of a complete ibm,get-vpd sequence. The file contents are immutable from the POV of user space. To get a new view of the VPD, the client must create a new handle. This design choice insulates user space from most of the complexities that ibm,get-vpd brings: * ibm,get-vpd must be called more than once to obtain complete results. * Only one ibm,get-vpd call sequence should be in progress at a time; interleaved sequences will disrupt each other. Callers must have a protocol for serializing their use of the function. * A call sequence in progress may receive a "VPD changed, try again" status, requiring the client to abandon the sequence and start over. The memory required for the VPD buffers seems acceptable, around 20KB for all VPD on one of my systems. And the value of the /rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is consistently 300KB across various systems I've checked. I've implemented support for this new ABI in the rtas_get_vpd() function in librtas, which the vpdupdate command currently uses to populate its VPD database. I've verified that an unmodified vpdupdate binary generates an identical database when using a librtas.so that prefers the new ABI. Along with the papr-vpd.h header exposed to user space, this introduces a common papr-miscdev.h uapi header to share a base ioctl ID with similar drivers to come. Tested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-9-e9eafd0c8c6c@linux.ibm.com
2023-12-08tty: hvc: convert to u8 and size_tJiri Slaby (SUSE)2-8/+10
Switch character types to u8 and sizes to size_t. To conform to characters/sizes in the rest of the tty layer. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Amit Shah <amit@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux.dev Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20231206073712.17776-13-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-01powerpc/pseries/memhp: Log more error conditions in add pathNathan Lynch1-1/+6
When an add operation for multiple LMBs fails, there is currently little indication from the kernel of what went wrong. Be a little more verbose about error conditions in the add paths. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231114-pseries-memhp-fixes-v1-3-fb8f2bb7c557@linux.ibm.com
2023-12-01powerpc/pseries/memhp: Fix access beyond end of drmem arrayNathan Lynch1-4/+5
dlpar_memory_remove_by_index() may access beyond the bounds of the drmem lmb array when the LMB lookup fails to match an entry with the given DRC index. When the search fails, the cursor is left pointing to &drmem_info->lmbs[drmem_info->n_lmbs], which is one element past the last valid entry in the array. The debug message at the end of the function then dereferences this pointer: pr_debug("Failed to hot-remove memory at %llx\n", lmb->base_addr); This was found by inspection and confirmed with KASAN: pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 1234 ================================================================== BUG: KASAN: slab-out-of-bounds in dlpar_memory+0x298/0x1658 Read of size 8 at addr c000000364e97fd0 by task bash/949 dump_stack_lvl+0xa4/0xfc (unreliable) print_report+0x214/0x63c kasan_report+0x140/0x2e0 __asan_load8+0xa8/0xe0 dlpar_memory+0x298/0x1658 handle_dlpar_errorlog+0x130/0x1d0 dlpar_store+0x18c/0x3e0 kobj_attr_store+0x68/0xa0 sysfs_kf_write+0xc4/0x110 kernfs_fop_write_iter+0x26c/0x390 vfs_write+0x2d4/0x4e0 ksys_write+0xac/0x1a0 system_call_exception+0x268/0x530 system_call_vectored_common+0x15c/0x2ec Allocated by task 1: kasan_save_stack+0x48/0x80 kasan_set_track+0x34/0x50 kasan_save_alloc_info+0x34/0x50 __kasan_kmalloc+0xd0/0x120 __kmalloc+0x8c/0x320 kmalloc_array.constprop.0+0x48/0x5c drmem_init+0x2a0/0x41c do_one_initcall+0xe0/0x5c0 kernel_init_freeable+0x4ec/0x5a0 kernel_init+0x30/0x1e0 ret_from_kernel_user_thread+0x14/0x1c The buggy address belongs to the object at c000000364e80000 which belongs to the cache kmalloc-128k of size 131072 The buggy address is located 0 bytes to the right of allocated 98256-byte region [c000000364e80000, c000000364e97fd0) ================================================================== pseries-hotplug-mem: Failed to hot-remove memory at 0 Log failed lookups with a separate message and dereference the cursor only when it points to a valid entry. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: 51925fb3c5c9 ("powerpc/pseries: Implement memory hotplug remove in the kernel") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231114-pseries-memhp-fixes-v1-1-fb8f2bb7c557@linux.ibm.com
2023-12-01powerpc/44x: select I2C for CURRITUCKRandy Dunlap1-0/+1
Fix build errors when CURRITUCK=y and I2C is not builtin (=m or is not set). Fixes these build errors: powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `avr_halt_system': ppc476.c:(.text+0x58): undefined reference to `i2c_smbus_write_byte_data' powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function `ppc47x_device_probe': ppc476.c:(.init.text+0x18): undefined reference to `i2c_register_driver' Fixes: 2a2c74b2efcb ("IBM Akebono: Add the Akebono platform") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202312010820.cmdwF5X9-lkp@intel.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231201055159.8371-1-rdunlap@infradead.org
2023-12-01powerpc/85xx: Fix typo in code commentDario Binacchi1-1/+1
s/singals/signals/ Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231124100241.660374-1-dario.binacchi@amarulasolutions.com
2023-12-01powerpc: Add PVN support for HeXin C2000 processorZhao Ke1-1/+2
HeXin Tech Co. has applied for a new PVN from the OpenPower Community for its new processor C2000. The OpenPower has assigned a new PVN and this newly assigned PVN is 0x0066, add pvr register related support for this PVN. Signed-off-by: Zhao Ke <ke.zhao@shingroup.cn> Link: https://discuss.openpower.foundation/t/how-to-get-a-new-pvr-for-processors-follow-power-isa/477/10 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231129075845.57976-1-ke.zhao@shingroup.cn
2023-11-30powerpc/44x: Make ppc44x_idle_init() staticMichael Ellerman1-1/+1
The 44x/fsp2_defconfig build fails with: arch/powerpc/platforms/44x/idle.c:30:12: error: no previous prototype for ‘ppc44x_idle_init’ [-Werror=missing-prototypes] 30 | int __init ppc44x_idle_init(void) | ^~~~~~~~~~~~~~~~ Fix it by making ppc44x_idle_init() static. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231129131919.2528517-4-mpe@ellerman.id.au
2023-11-30powerpc/512x: Fix missing prototype warningsMichael Ellerman1-0/+2
The mpc512x_defconfig build fails with: arch/powerpc/platforms/512x/mpc5121_ads_cpld.c:142:1: error: no previous prototype for ‘mpc5121_ads_cpld_map’ [-Werror=missing-prototypes] 142 | mpc5121_ads_cpld_map(void) | ^~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/512x/mpc5121_ads_cpld.c:157:1: error: no previous prototype for ‘mpc5121_ads_cpld_pic_init’ [-Werror=missing-prototypes] 157 | mpc5121_ads_cpld_pic_init(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~ There are prototypes for these functions but the header they are in is not included by mpc5121_ads_cpld.c. Include it to fix the build error. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231129131919.2528517-3-mpe@ellerman.id.au
2023-11-30powerpc/512x: Make pdm360ng_init() staticMichael Ellerman1-1/+1
The mpc512x_defconfig config fails with: arch/powerpc/platforms/512x/pdm360ng.c:104:13: error: no previous prototype for ‘pdm360ng_init’ [-Werror=missing-prototypes] 104 | void __init pdm360ng_init(void) | ^~~~~~~~~~~~~ Fix it by making pdm360ng_init() static. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231129131919.2528517-2-mpe@ellerman.id.au
2023-11-28powerpc/rtas_pci: rename and properly expose config access APIsNathan Lynch1-9/+9
The rtas_read_config() and rtas_write_config() functions in kernel/rtas_pci.c have external linkage and two users in arch/powerpc: the rtas_pci code itself and the pseries platform's "enhanced error handling" (EEH) support code. The prototypes for these functions in asm/ppc-pci.h have until now been guarded by CONFIG_EEH since the only external caller is the pseries EEH code. However, this presumably has always generated warnings when built with !CONFIG_EEH and -Wmissing-prototypes: arch/powerpc/kernel/rtas_pci.c:46:5: error: no previous prototype for function 'rtas_read_config' [-Werror,-Wmissing-prototypes] 46 | int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) arch/powerpc/kernel/rtas_pci.c:98:5: error: no previous prototype for function 'rtas_write_config' [-Werror,-Wmissing-prototypes] 98 | int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val) The introduction of commit c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally") forces the issue. The efika and chrp platform code have (static) functions with the same names but different signatures. We may as well eliminate the potential for conflicts and confusion by renaming the globally visible versions as their prototypes get moved out of the CONFIG_EEH-guarded region; their current names are too generic anyway. Since they operate on objects of the type 'struct pci_dn *', give them the slightly more verbose prefix "rtas_pci_dn_" and fix up all the call sites. Fixes: c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYt0LLXtjSz+Hkf3Fhm-kf0ZQanrhUS+zVZGa3O+Wt2+vg@mail.gmail.com/ Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231127-rtas-pci-rw-config-v1-1-385d29ace3df@linux.ibm.com
2023-11-25dentry: switch the lists of children to hlistAl Viro1-2/+3
Saves a pointer per struct dentry and actually makes the things less clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use of hlist_for_... iterators. A couple of new helpers - d_first_child() and d_next_sibling(), to make the expressions less awful. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-21powerpc/rtas: Move post_mobility_fixup() declaration to pseriesNathan Lynch2-0/+2
This is a pseries-specific function declaration that doesn't belong in rtas.h. Move it to the pseries platform code and adjust pseries/suspend.c accordingly. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231106-rtas-trivial-v1-5-61847655c51f@linux.ibm.com
2023-11-21powerpc/powermac: mark smp_psurge_{give,take}_timebase staticArnd Bergmann1-2/+2
These functions are only called locally and should be static like the other corresponding functions are: arch/powerpc/platforms/powermac/smp.c:416:13: error: no previous prototype for 'smp_psurge_take_timebase' [-Werror=missing-prototypes] 416 | void __init smp_psurge_take_timebase(void) | ^~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/powermac/smp.c:432:13: error: no previous prototype for 'smp_psurge_give_timebase' [-Werror=missing-prototypes] 432 | void __init smp_psurge_give_timebase(void) | ^~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231108125843.3806765-20-arnd@kernel.org
2023-11-21powerpc/pasemi: mark pas_shutdown() staticArnd Bergmann1-1/+1
Allmodconfig builds show a warning about one function that is accidentally marked global: arch/powerpc/platforms/pasemi/setup.c:67:6: error: no previous prototype for 'pas_shutdown' [-Werror=missing-prototypes] Fixes: 656fdf3ad8e0 ("powerpc/pasemi: Add Nemo board device init code.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231108125843.3806765-19-arnd@kernel.org
2023-11-21powerpc/ps3: move udbg_shutdown_ps3gelic prototypeArnd Bergmann3-13/+2
Allmodconfig kernels produce a missing-prototypes warning: arch/powerpc/platforms/ps3/gelic_udbg.c:239:6: error: no previous prototype for 'udbg_shutdown_ps3gelic' [-Werror=missing-prototypes] Move the declaration from a local header to asm/ps3.h where it can be seen from both the caller and the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Geoff Levand <geoff@infradead.org> Acked-by: Jakub Kicinski <kuba@kernel.org> [mpe: Drop CONFIG_PS3GELIC_UDBG to fix build error] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231108125843.3806765-18-arnd@kernel.org
2023-11-07powerpc/pseries/rtas-work-area: Fix rtas_work_area_reserve_arena() kernel-docNathan Lynch1-0/+1
>From a W=1 build: >> arch/powerpc/platforms/pseries/rtas-work-area.c:189: warning: Function parameter or member 'limit' not >> described in 'rtas_work_area_reserve_arena' Add the missing description of the limit parameter. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309131221.Bm1pg96n-lkp@intel.com/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231106-rtas-trivial-v1-1-61847655c51f@linux.ibm.com
2023-11-03Merge tag 'powerpc-6.7-1' of ↵Linus Torvalds14-37/+47
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for KVM running as a nested hypervisor under development versions of PowerVM, using the new PAPR nested virtualisation API - Add support for the BPF prog pack allocator - A rework of the non-server MMU handling to support execute-only on all platforms - Some optimisations & cleanups for the powerpc qspinlock code - Various other small features and fixes Thanks to Aboorva Devarajan, Aditya Gupta, Amit Machhiwal, Benjamin Gray, Christophe Leroy, Dr. David Alan Gilbert, Gaurav Batra, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Julia Lawall, Kautuk Consul, Kuan-Wei Chiu, Michael Neuling, Minjie Du, Muhammad Muzammil, Naveen N Rao, Nicholas Piggin, Nick Child, Nysal Jan K.A, Peter Lafreniere, Rob Herring, Sachin Sant, Sebastian Andrzej Siewior, Shrikanth Hegde, Srikar Dronamraju, Stanislav Kinsburskii, Vaibhav Jain, Wang Yufen, Yang Yingliang, and Yuan Tan. * tag 'powerpc-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (100 commits) powerpc/vmcore: Add MMU information to vmcoreinfo Revert "powerpc: add `cur_cpu_spec` symbol to vmcoreinfo" powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free] powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack powerpc/bpf: implement bpf_arch_text_copy powerpc/code-patching: introduce patch_instructions() powerpc/32s: Implement local_flush_tlb_page_psize() powerpc/pseries: use kfree_sensitive() in plpks_gen_password() powerpc/code-patching: Perform hwsync in __patch_instruction() in case of failure powerpc/fsl_msi: Use device_get_match_data() powerpc: Remove cpm_dp...() macros powerpc/qspinlock: Rename yield_propagate_owner tunable powerpc/qspinlock: Propagate sleepy if previous waiter is preempted powerpc/qspinlock: don't propagate the not-sleepy state powerpc/qspinlock: propagate owner preemptedness rather than CPU number powerpc/qspinlock: stop queued waiters trying to set lock sleepy powerpc/perf: Fix disabling BHRB and instruction sampling powerpc/trace: Add support for HAVE_FUNCTION_ARG_ACCESS_API powerpc/tools: Pass -mabi=elfv2 to gcc-check-mprofile-kernel.sh ...
2023-11-02Merge tag 'sysctl-6.7-rc1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ...
2023-11-02Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linuxLinus Torvalds3-0/+138
Pull block updates from Jens Axboe: - Improvements to the queue_rqs() support, and adding null_blk support for that as well (Chengming) - Series improving badblocks support (Coly) - Key store support for sed-opal (Greg) - IBM partition string handling improvements (Jan) - Make number of ublk devices supported configurable (Mike) - Cancelation improvements for ublk (Ming) - MD pull requests via Song: - Handle timeout in md-cluster, by Denis Plotnikov - Cleanup pers->prepare_suspend, by Yu Kuai - Rewrite mddev_suspend(), by Yu Kuai - Simplify md_seq_ops, by Yu Kuai - Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk - Make rdev add/remove independent from daemon thread, by Yu Kuai - Refactor code around quiesce() and mddev_suspend(), by Yu Kuai - NVMe pull request via Keith: - nvme-auth updates (Mark) - nvme-tcp tls (Hannes) - nvme-fc annotaions (Kees) - Misc cleanups and improvements (Jiapeng, Joel) * tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits) block: ublk_drv: Remove unused function md: cleanup pers->prepare_suspend() nvme-auth: allow mixing of secret and hash lengths nvme-auth: use transformed key size to create resp nvme-auth: alloc nvme_dhchap_key as single buffer nvmet-tcp: use 'spin_lock_bh' for state_lock() powerpc/pseries: PLPKS SED Opal keystore support block: sed-opal: keystore access for SED Opal keys block:sed-opal: SED Opal keystore ublk: simplify aborting request ublk: replace monitor with cancelable uring_cmd ublk: quiesce request queue when aborting queue ublk: rename mm_lock as lock ublk: move ublk_cancel_dev() out of ub->mutex ublk: make sure io cmd handled in submitter task context ublk: don't get ublk device reference in ublk_abort_queue() ublk: Make ublks_max configurable ublk: Limit dev_id/ub_number values md-cluster: check for timeout while a new disk adding nvme: rework NVME_AUTH Kconfig selection ...
2023-10-30Merge tag 'vfs-6.7.ctime' of ↵Linus Torvalds1-1/+1
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...
2023-10-30Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-4/+7
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Rename and export helpers that get write access to a mount. They are used in overlayfs to get write access to the upper mount. - Print the pretty name of the root device on boot failure. This helps in scenarios where we would usually only print "unknown-block(1,2)". - Add an internal SB_I_NOUMASK flag. This is another part in the endless POSIX ACL saga in a way. When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip the umask because if the relevant inode has POSIX ACLs set it might take the umask from there. But if the inode doesn't have any POSIX ACLs set then we apply the umask in the filesytem itself. So we end up with: (1) no SB_POSIXACL -> strip umask in vfs (2) SB_POSIXACL -> strip umask in filesystem The umask semantics associated with SB_POSIXACL allowed filesystems that don't even support POSIX ACLs at all to raise SB_POSIXACL purely to avoid umask stripping. That specifically means NFS v4 and Overlayfs. NFS v4 does it because it delegates this to the server and Overlayfs because it needs to delegate umask stripping to the upper filesystem, i.e., the filesystem used as the writable layer. This went so far that SB_POSIXACL is raised eve on kernels that don't even have POSIX ACL support at all. Stop this blatant abuse and add SB_I_NOUMASK which is an internal superblock flag that filesystems can raise to opt out of umask handling. That should really only be the two mentioned above. It's not that we want any filesystems to do this. Ideally we have all umask handling always in the vfs. - Make overlayfs use SB_I_NOUMASK too. - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in IS_POSIXACL() if the kernel doesn't have support for it. This is a very old patch but it's only possible to do this now with the wider cleanup that was done. - Follow-up work on fake path handling from last cycle. Citing mostly from Amir: When overlayfs was first merged, overlayfs files of regular files and directories, the ones that are installed in file table, had a "fake" path, namely, f_path is the overlayfs path and f_inode is the "real" inode on the underlying filesystem. In v6.5, we took another small step by introducing of the backing_file container and the file_real_path() helper. This change allowed vfs and filesystem code to get the "real" path of an overlayfs backing file. With this change, we were able to make fsnotify work correctly and report events on the "real" filesystem objects that were accessed via overlayfs. This method works fine, but it still leaves the vfs vulnerable to new code that is not aware of files with fake path. A recent example is commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version"). This commit uses direct referencing to f_path in IMA code that otherwise uses file_inode() and file_dentry() to reference the filesystem objects that it is measuring. This contains work to switch things around: instead of having filesystem code opt-in to get the "real" path, have generic code opt-in for the "fake" path in the few places that it is needed. Is it far more likely that new filesystems code that does not use the file_dentry() and file_real_path() helpers will end up causing crashes or averting LSM/audit rules if we keep the "fake" path exposed by default. This change already makes file_dentry() moot, but for now we did not change this helper just added a WARN_ON() in ovl_d_real() to catch if we have made any wrong assumptions. After the dust settles on this change, we can make file_dentry() a plain accessor and we can drop the inode argument to ->d_real(). - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small change but it really isn't and I would like to see everyone on their tippie toes for any possible bugs from this work. Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for files since a very long time because of the nasty interactions between the SCM_RIGHTS file descriptor garbage collection. So extending it makes a lot of sense but it is a subtle change. There are almost no places that fiddle with file rcu semantics directly and the ones that did mess around with struct file internal under rcu have been made to stop doing that because it really was always dodgy. I forgot to put in the link tag for this change and the discussion in the commit so adding it into the merge message: https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com Cleanups: - Various smaller pipe cleanups including the removal of a spin lock that was only used to protect against writes without pipe_lock() from O_NOTIFICATION_PIPE aka watch queues. As that was never implemented remove the additional locking from pipe_write(). - Annotate struct watch_filter with the new __counted_by attribute. - Clarify do_unlinkat() cleanup so that it doesn't look like an extra iput() is done that would cause issues. - Simplify file cleanup when the file has never been opened. - Use module helper instead of open-coding it. - Predict error unlikely for stale retry. - Use WRITE_ONCE() for mount expiry field instead of just commenting that one hopes the compiler doesn't get smart. Fixes: - Fix readahead on block devices. - Fix writeback when layztime is enabled and inodes whose timestamp is the only thing that changed reside on wb->b_dirty_time. This caused excessively large zombie memory cgroup when lazytime was enabled as such inodes weren't handled fast enough. - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()" * tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits) file, i915: fix file reference for mmap_singleton() vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs chardev: Simplify usage of try_module_get() ovl: rely on SB_I_NOUMASK fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n fs: store real path instead of fake path in backing file f_path fs: create helper file_user_path() for user displayed mapped file path fs: get mnt_writers count for an open backing file's real path vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked vfs: predict the error in retry_estale as unlikely backing file: free directly vfs: fix readahead(2) on block devices io_uring: use files_lookup_fd_locked() file: convert to SLAB_TYPESAFE_BY_RCU vfs: shave work on failed file open fs: simplify misleading code to remove ambiguity regarding ihold()/iput() watch_queue: Annotate struct watch_filter with __counted_by fs/pipe: use spinlock in pipe_read() only if there is a watch_queue fs/pipe: remove unnecessary spinlock from pipe_write() ...
2023-10-20powerpc/pseries: use kfree_sensitive() in plpks_gen_password()Minjie Du1-2/+2
password might contain private information, so better use kfree_sensitive to free it. In plpks_gen_password() use kfree_sensitive(). Signed-off-by: Minjie Du <duminjie@vivo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230717092648.9752-1-duminjie@vivo.com
2023-10-20powerpc/pseries: fix potential memory leak in init_cpu_associativity()Wang Yufen1-1/+3
If the vcpu_associativity alloc memory successfully but the pcpu_associativity fails to alloc memory, the vcpu_associativity memory leaks. Fixes: d62c8deeb6e6 ("powerpc/pseries: Provide vcpu dispatch statistics") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Reviewed-by: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1671003983-10794-1-git-send-email-wangyufen@huawei.com
2023-10-20powerpc/vas: Limit open window failure messages in log buffferHaren Myneni2-20/+18
The VAS open window call prints error message and returns -EBUSY after the migration suspend event initiated and until the resume event completed on the destination system. It can cause the log buffer filled with these error messages if the user space issues continuous open window calls. Similar case even for DLPAR CPU remove event when no credits are available until the credits are freed or with the other DLPAR CPU add event. So changes in the patch to use pr_err_ratelimited() instead of pr_err() to display open window failure and not-available credits error messages. Use pr_fmt() and make the corresponding changes to have the consistencein prefix all pr_*() messages (vas-api.c). Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler") Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Use "vas-api" as the prefix to match the file name.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231019215033.1335251-1-haren@linux.ibm.com
2023-10-19powerpc/pseries/iommu: enable_ddw incorrectly returns direct mapping for ↵Gaurav Batra1-4/+4
SR-IOV device When a device is initialized, the driver invokes dma_supported() twice - first for streaming mappings followed by coherent mappings. For an SR-IOV device, default window is deleted and DDW created. With vPMEM enabled, TCE mappings are dynamically created for both vPMEM and SR-IOV device. There are no direct mappings. First time when dma_supported() is called with 64 bit mask, DDW is created and marked as dynamic window. The second time dma_supported() is called, enable_ddw() finds existing window for the device and incorrectly returns it as "direct mapping". This only happens when size of DDW is big enough to map max LPAR memory. This results in streaming TCEs to not get dynamically mapped, since code incorrently assumes these are already pre-mapped. The adapter initially comes up but goes down due to EEH. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231003030802.47914-1-gbatra@linux.vnet.ibm.com
2023-10-19file: convert to SLAB_TYPESAFE_BY_RCUChristian Brauner1-4/+7
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing. The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists. With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference. In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug. Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch. Cc: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-19powerpc/fadump: Annotate endianness cast with __forceBenjamin Gray1-1/+1
Sparse reports an endianness error with the else case of val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) : (u64)(reg_entry->reg_val)); This is a safe operation because the code is explicitly working with dynamic endianness, so add the __force annotation to tell Sparse to ignore it. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231011053711.93427-13-bgray@linux.ibm.com
2023-10-19powerpc: Annotate endianness of various variables and functionsBenjamin Gray2-2/+4
Sparse reports several endianness warnings on variables and functions that are consistently treated as big endian. There are no multi-endianness shenanigans going on here so fix these low hanging fruit up in one patch. All changes are just type annotations; no endianness switching operations are introduced by this patch. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231011053711.93427-7-bgray@linux.ibm.com
2023-10-19powerpc: Use NULL instead of 0 for null pointersBenjamin Gray2-5/+5
Sparse reports several uses of 0 for pointer arguments and comparisons. Replace with NULL to better convey the intent. Remove entirely if a comparison to follow the kernel style of implicit boolean conversions. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231011053711.93427-5-bgray@linux.ibm.com
2023-10-19powerpc: Untangle fixmap.h and pgtable.h and mmu.hChristophe Leroy2-0/+3
fixmap.h need pgtable.h for [un]map_kernel_page() pgtable.h need fixmap.h for FIXADDR_TOP. Untangle the two files by moving FIXADDR_TOP into pgtable.h Also move VIRT_IMMR_BASE to fixmap.h to avoid fixmap.h in mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/5eba12392a018be28ad0a02ed844767b132589e7.1695659959.git.christophe.leroy@csgroup.eu
2023-10-18Merge branch fixes into nextMichael Ellerman2-9/+2
Merge our fixes branch to bring in commits that are prerequisities for further development or would cause conflicts.
2023-10-18spufs: convert to new timestamp accessorsJeff Layton1-1/+1
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-1-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>