summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2024-03-31Merge tag 'kbuild-fixes-v6.9' of ↵Linus Torvalds7-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Deduplicate Kconfig entries for CONFIG_CXL_PMU - Fix unselectable choice entry in MIPS Kconfig, and forbid this structure - Remove unused include/asm-generic/export.h - Fix a NULL pointer dereference bug in modpost - Enable -Woverride-init warning consistently with W=1 - Drop KCSAN flags from *.mod.c files * tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: Fix typo HEIGTH to HEIGHT Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries kbuild: make -Woverride-init warnings more consistent modpost: do not make find_tosym() return NULL export.h: remove include/asm-generic/export.h kconfig: do not reparent the menu inside a choice block MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
2024-03-31kbuild: make -Woverride-init warnings more consistentArnd Bergmann7-10/+10
The -Woverride-init warn about code that may be intentional or not, but the inintentional ones tend to be real bugs, so there is a bit of disagreement on whether this warning option should be enabled by default and we have multiple settings in scripts/Makefile.extrawarn as well as individual subsystems. Older versions of clang only supported -Wno-initializer-overrides with the same meaning as gcc's -Woverride-init, though all supported versions now work with both. Because of this difference, an earlier cleanup of mine accidentally turned the clang warning off for W=1 builds and only left it on for W=2, while it's still enabled for gcc with W=1. There is also one driver that only turns the warning off for newer versions of gcc but not other compilers, and some but not all the Makefiles still use a cc-disable-warning conditional that is no longer needed with supported compilers here. Address all of the above by removing the special cases for clang and always turning the warning off unconditionally where it got in the way, using the syntax that is supported by both compilers. Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-03-29Merge tag 'drm-intel-fixes-2024-03-28' of ↵Dave Airlie22-63/+161
https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes Core/GT Fixes: - Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c (Joonas) - Update a MTL workaround (Tejas) - Fix locking inversion in hwmon's sysfs (Janusz) - Remove a bogus error message around PXP (Jose) - Fix UAF on VMA (Janusz) - Reset queue_priority_hint on parking (Chris) Display Fixes: - Remove duplicated audio enable/disable on SDVO and DP (Ville) - Disable AuxCCS for Xe driver (Juha-Pekka) - Revert init order of MIPI DSI (Ville) - DRRS debugfs fix with an extra refactor patch (Bhanuprakash) - VRR related fixes (Ville) - Fix a JSL eDP corruption (Jonathon) - Fix the cursor physical dma address (Ville) - BIOS VBT related fix (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZgYaIVgjIs30mIvS@intel.com
2024-03-28Merge tag 'drm-misc-fixes-2024-03-28' of ↵Dave Airlie7-22/+32
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: bridge: - select DRM_KMS_HELPER dma-buf: - fix NULL-pointer deref dp: - fix div-by-zero in DP MST unplug code fbdev: - select FB_IOMEM_FOPS for SBus nouveau: - dmem: handle kcalloc() allocation failures qxl: - remove unused variables rockchip: - vop2: remove support for AR30 and AB30 formats sched: - fix NULL-pointer deref vmwgfx: - debugfs: create ttm_resource_manager entry only if needed Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240328134417.GA8673@localhost.localdomain
2024-03-28drm/i915/bios: Tolerate devdata==NULL in ↵Ville Syrjälä1-0/+3
intel_bios_encoder_supports_dp_dual_mode() If we have no VBT, or the VBT didn't declare the encoder in question, we won't have the 'devdata' for the encoder. Instead of oopsing just bail early. We won't be able to tell whether the port is DP++ or not, but so be it. Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10464 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240319092443.15769-1-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 26410896206342c8a80d2b027923e9ee7d33b733) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915: Pre-populate the cursor physical dma addressVille Syrjälä3-3/+12
Calling i915_gem_object_get_dma_address() from the vblank evade critical section triggers might_sleep(). While we know that we've already pinned the framebuffer and thus i915_gem_object_get_dma_address() will in fact not sleep in this case, it seems reasonable to keep the unconditional might_sleep() for maximum coverage. So let's instead pre-populate the dma address during fb pinning, which all happens before we enter the vblank evade critical section. We can use u32 for the dma address as this class of hardware doesn't support >32bit addresses. Cc: stable@vger.kernel.org Fixes: 0225a90981c8 ("drm/i915: Make cursor plane registers unlocked") Reported-by: Borislav Petkov <bp@alien8.de> Closes: https://lore.kernel.org/intel-gfx/20240227100342.GAZd2zfmYcPS_SndtO@fat_crate.local/ Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240325175738.3440-1-ville.syrjala@linux.intel.com Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> (cherry picked from commit c1289a5c3594cf04caa94ebf0edeb50c62009f1f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/gt: Reset queue_priority_hint on parkingChris Wilson2-3/+3
Originally, with strict in order execution, we could complete execution only when the queue was empty. Preempt-to-busy allows replacement of an active request that may complete before the preemption is processed by HW. If that happens, the request is retired from the queue, but the queue_priority_hint remains set, preventing direct submission until after the next CS interrupt is processed. This preempt-to-busy race can be triggered by the heartbeat, which will also act as the power-management barrier and upon completion allow us to idle the HW. We may process the completion of the heartbeat, and begin parking the engine before the CS event that restores the queue_priority_hint, causing us to fail the assertion that it is MIN. <3>[ 166.210729] __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1)) <0>[ 166.210781] Dumping ftrace buffer: <0>[ 166.210795] --------------------------------- ... <0>[ 167.302811] drm_fdin-1097 2..s1. 165741070us : trace_ports: 0000:00:02.0 rcs0: promote { ccid:20 1217:2 prio 0 } <0>[ 167.302861] drm_fdin-1097 2d.s2. 165741072us : execlists_submission_tasklet: 0000:00:02.0 rcs0: preempting last=1217:2, prio=0, hint=2147483646 <0>[ 167.302928] drm_fdin-1097 2d.s2. 165741072us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 1217:2, current 0 <0>[ 167.302992] drm_fdin-1097 2d.s2. 165741073us : __i915_request_submit: 0000:00:02.0 rcs0: fence 3:4660, current 4659 <0>[ 167.303044] drm_fdin-1097 2d.s1. 165741076us : execlists_submission_tasklet: 0000:00:02.0 rcs0: context:3 schedule-in, ccid:40 <0>[ 167.303095] drm_fdin-1097 2d.s1. 165741077us : trace_ports: 0000:00:02.0 rcs0: submit { ccid:40 3:4660* prio 2147483646 } <0>[ 167.303159] kworker/-89 11..... 165741139us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence c90:2, current 2 <0>[ 167.303208] kworker/-89 11..... 165741148us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:c90 unpin <0>[ 167.303272] kworker/-89 11..... 165741159us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 1217:2, current 2 <0>[ 167.303321] kworker/-89 11..... 165741166us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:1217 unpin <0>[ 167.303384] kworker/-89 11..... 165741170us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 3:4660, current 4660 <0>[ 167.303434] kworker/-89 11d..1. 165741172us : __intel_context_retire: 0000:00:02.0 rcs0: context:1216 retire runtime: { total:56028ns, avg:56028ns } <0>[ 167.303484] kworker/-89 11..... 165741198us : __engine_park: 0000:00:02.0 rcs0: parked <0>[ 167.303534] <idle>-0 5d.H3. 165741207us : execlists_irq_handler: 0000:00:02.0 rcs0: semaphore yield: 00000040 <0>[ 167.303583] kworker/-89 11..... 165741397us : __intel_context_retire: 0000:00:02.0 rcs0: context:1217 retire runtime: { total:325575ns, avg:0ns } <0>[ 167.303756] kworker/-89 11..... 165741777us : __intel_context_retire: 0000:00:02.0 rcs0: context:c90 retire runtime: { total:0ns, avg:0ns } <0>[ 167.303806] kworker/-89 11..... 165742017us : __engine_park: __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1)) <0>[ 167.303811] --------------------------------- <4>[ 167.304722] ------------[ cut here ]------------ <2>[ 167.304725] kernel BUG at drivers/gpu/drm/i915/gt/intel_engine_pm.c:283! <4>[ 167.304731] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <4>[ 167.304734] CPU: 11 PID: 89 Comm: kworker/11:1 Tainted: G W 6.8.0-rc2-CI_DRM_14193-gc655e0fd2804+ #1 <4>[ 167.304736] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022 <4>[ 167.304738] Workqueue: i915-unordered retire_work_handler [i915] <4>[ 167.304839] RIP: 0010:__engine_park+0x3fd/0x680 [i915] <4>[ 167.304937] Code: 00 48 c7 c2 b0 e5 86 a0 48 8d 3d 00 00 00 00 e8 79 48 d4 e0 bf 01 00 00 00 e8 ef 0a d4 e0 31 f6 bf 09 00 00 00 e8 03 49 c0 e0 <0f> 0b 0f 0b be 01 00 00 00 e8 f5 61 fd ff 31 c0 e9 34 fd ff ff 48 <4>[ 167.304940] RSP: 0018:ffffc9000059fce0 EFLAGS: 00010246 <4>[ 167.304942] RAX: 0000000000000200 RBX: 0000000000000000 RCX: 0000000000000006 <4>[ 167.304944] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 <4>[ 167.304946] RBP: ffff8881330ca1b0 R08: 0000000000000001 R09: 0000000000000001 <4>[ 167.304947] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881330ca000 <4>[ 167.304948] R13: ffff888110f02aa0 R14: ffff88812d1d0205 R15: ffff88811277d4f0 <4>[ 167.304950] FS: 0000000000000000(0000) GS:ffff88844f780000(0000) knlGS:0000000000000000 <4>[ 167.304952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 167.304953] CR2: 00007fc362200c40 CR3: 000000013306e003 CR4: 0000000000770ef0 <4>[ 167.304955] PKRU: 55555554 <4>[ 167.304957] Call Trace: <4>[ 167.304958] <TASK> <4>[ 167.305573] ____intel_wakeref_put_last+0x1d/0x80 [i915] <4>[ 167.305685] i915_request_retire.part.0+0x34f/0x600 [i915] <4>[ 167.305800] retire_requests+0x51/0x80 [i915] <4>[ 167.305892] intel_gt_retire_requests_timeout+0x27f/0x700 [i915] <4>[ 167.305985] process_scheduled_works+0x2db/0x530 <4>[ 167.305990] worker_thread+0x18c/0x350 <4>[ 167.305993] kthread+0xfe/0x130 <4>[ 167.305997] ret_from_fork+0x2c/0x50 <4>[ 167.306001] ret_from_fork_asm+0x1b/0x30 <4>[ 167.306004] </TASK> It is necessary for the queue_priority_hint to be lower than the next request submission upon waking up, as we rely on the hint to decide when to kick the tasklet to submit that first request. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Closes: https://gitlab.freedesktop.org/drm/intel/issues/10154 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240318135906.716055-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 98850e96cf811dc2d0a7d0af491caff9f5d49c1e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/vma: Fix UAF on destroy against retire raceJanusz Krzysztofik1-7/+43
Object debugging tools were sporadically reporting illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle. [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915] [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0 ... [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1 [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022 [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915] [161.360592] RIP: 0010:debug_print_object+0x80/0xb0 ... [161.361347] debug_object_free+0xeb/0x110 [161.361362] i915_active_fini+0x14/0x130 [i915] [161.361866] release_references+0xfe/0x1f0 [i915] [161.362543] i915_vma_parked+0x1db/0x380 [i915] [161.363129] __gt_park+0x121/0x230 [i915] [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915] That has been tracked down to be happening when another thread is deactivating the VMA inside __active_retire() helper, after the VMA's active counter has been already decremented to 0, but before deactivation of the VMA's object is reported to the object debugging tool. We could prevent from that race by serializing i915_active_fini() with __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from being used, e.g. from __i915_vma_retire() called at the end of __active_retire(), after that VMA has been already freed by a concurrent i915_vma_destroy() on return from the i915_active_fini(). Then, we should rather fix the issue at the VMA level, not in i915_active. Since __i915_vma_parked() is called from __gt_park() on last put of the GT's wakeref, the issue could be addressed by holding the GT wakeref long enough for __active_retire() to complete before that wakeref is released and the GT parked. I believe the issue was introduced by commit d93939730347 ("drm/i915: Remove the vma refcount") which moved a call to i915_active_fini() from a dropped i915_vma_release(), called on last put of the removed VMA kref, to i915_vma_parked() processing path called on last put of a GT wakeref. However, its visibility to the object debugging tool was suppressed by a bug in i915_active that was fixed two weeks later with commit e92eb246feb9 ("drm/i915/active: Fix missing debug object activation"). A VMA associated with a request doesn't acquire a GT wakeref by itself. Instead, it depends on a wakeref held directly by the request's active intel_context for a GT associated with its VM, and indirectly on that intel_context's engine wakeref if the engine belongs to the same GT as the VMA's VM. Those wakerefs are released asynchronously to VMA deactivation. Fix the issue by getting a wakeref for the VMA's GT when activating it, and putting that wakeref only after the VMA is deactivated. However, exclude global GTT from that processing path, otherwise the GPU never goes idle. Since __i915_vma_retire() may be called from atomic contexts, use async variant of wakeref put. Also, to avoid circular locking dependency, take care of acquiring the wakeref before VM mutex when both are needed. v7: Add inline comments with justifications for: - using untracked variants of intel_gt_pm_get/put() (Nirmoy), - using async variant of _put(), - not getting the wakeref in case of a global GTT, - always getting the first wakeref outside vm->mutex. v6: Since __i915_vma_active/retire() callbacks are not serialized, storing a wakeref tracking handle inside struct i915_vma is not safe, and there is no other good place for that. Use untracked variants of intel_gt_pm_get/put_async(). v5: Replace "tile" with "GT" across commit description (Rodrigo), - avoid mentioning multi-GT case in commit description (Rodrigo), - explain why we need to take a temporary wakeref unconditionally inside i915_vma_pin_ww() (Rodrigo). v4: Refresh on top of commit 5e4e06e4087e ("drm/i915: Track gt pm wakerefs") (Andi), - for more easy backporting, split out removal of former insufficient workarounds and move them to separate patches (Nirmoy). - clean up commit message and description a bit. v3: Identify root cause more precisely, and a commit to blame, - identify and drop former workarounds, - update commit message and description. v2: Get the wakeref before VM mutex to avoid circular locking dependency, - drop questionable Fixes: tag. Fixes: d93939730347 ("drm/i915: Remove the vma refcount") Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875 Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: stable@vger.kernel.org # v5.19+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-6-janusz.krzysztofik@linux.intel.com (cherry picked from commit f3c71b2ded5c4367144a810ef25f998fd1d6c381) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915: Do not print 'pxp init failed with 0' when it succeedJosé Roberto de Souza1-1/+1
It is misleading, if the intention was to also print something in case it succeed it should have a different string. Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Fixes: 698e19da2914 ("drm/i915: Skip pxp init if gt is wedged") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240320210547.71937-1-jose.souza@intel.com (cherry picked from commit d437099ab21cd4c6ce5d578b765df642d759c929) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()Jonathon Hall1-1/+1
Since commit 0c65dc062611 ("drm/i915/jsl: s/JSL/JASPERLAKE for platform/subplatform defines"), boot freezes on a Jasper Lake tablet (Librem 11), usually with graphical corruption on the eDP display, but sometimes just a black screen. This commit was included in 6.6 and later. That commit was intended to refactor EHL and JSL macros, but the change to ehl_combo_pll_div_frac_wa_needed() started matching JSL incorrectly when it was only intended to match EHL. It replaced: return ((IS_PLATFORM(i915, INTEL_ELKHARTLAKE) && IS_JSL_EHL_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) || with: return (((IS_ELKHARTLAKE(i915) || IS_JASPERLAKE(i915)) && IS_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) || Remove IS_JASPERLAKE() to fix the regression. Signed-off-by: Jonathon Hall <jonathon.hall@puri.sm> Cc: stable@vger.kernel.org Fixes: 0c65dc062611 ("drm/i915/jsl: s/JSL/JASPERLAKE for platform/subplatform defines") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240313135424.3731410-1-jonathon.hall@puri.sm Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 1ef48859317b2a77672dea8682df133abf9c44ed) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/hwmon: Fix locking inversion in sysfs getterJanusz Krzysztofik1-18/+19
In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an rpm wakeref. That results in lock inversion: <4> [197.079335] ====================================================== <4> [197.085473] WARNING: possible circular locking dependency detected <4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted <4> [197.098096] ------------------------------------------------------ <4> [197.104231] prometheus-node/839 is trying to acquire lock: <4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350 <4> [197.116939] but task is already holding lock: <4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915] <4> [197.131543] which lock already depends on the new lock. ... <4> [197.507922] Chain exists of: fs_reclaim --> &gt->reset.mutex --> &hwmon->hwmon_lock <4> [197.518528] Possible unsafe locking scenario: <4> [197.524411] CPU0 CPU1 <4> [197.528916] ---- ---- <4> [197.533418] lock(&hwmon->hwmon_lock); <4> [197.537237] lock(&gt->reset.mutex); <4> [197.543376] lock(&hwmon->hwmon_lock); <4> [197.549682] lock(fs_reclaim); ... <4> [197.632548] Call Trace: <4> [197.634990] <TASK> <4> [197.637088] dump_stack_lvl+0x64/0xb0 <4> [197.640738] check_noncircular+0x15e/0x180 <4> [197.652968] check_prev_add+0xe9/0xce0 <4> [197.656705] __lock_acquire+0x179f/0x2300 <4> [197.660694] lock_acquire+0xd8/0x2d0 <4> [197.673009] fs_reclaim_acquire+0xa1/0xd0 <4> [197.680478] __kmalloc+0x9a/0x350 <4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0 <4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0 <4> [197.720608] acpi_ns_get_node+0x3b/0x60 <4> [197.724428] acpi_get_handle+0x57/0xb0 <4> [197.728164] acpi_has_method+0x20/0x50 <4> [197.731896] acpi_pci_set_power_state+0x43/0x120 <4> [197.736485] pci_power_up+0x24/0x1c0 <4> [197.740047] pci_pm_default_resume_early+0x9/0x30 <4> [197.744725] pci_pm_runtime_resume+0x2d/0x90 <4> [197.753911] __rpm_callback+0x3c/0x110 <4> [197.762586] rpm_callback+0x58/0x70 <4> [197.766064] rpm_resume+0x51e/0x730 <4> [197.769542] rpm_resume+0x267/0x730 <4> [197.773020] rpm_resume+0x267/0x730 <4> [197.776498] rpm_resume+0x267/0x730 <4> [197.779974] __pm_runtime_resume+0x49/0x90 <4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915] <4> [197.789070] hwm_energy+0x55/0x100 [i915] <4> [197.793183] hwm_read+0x9a/0x310 [i915] <4> [197.797124] hwmon_attr_show+0x36/0x120 <4> [197.800946] dev_attr_show+0x15/0x60 <4> [197.804509] sysfs_kf_seq_show+0xb5/0x100 Acquire the wakeref before the lock and hold it as long as the lock is also held. Follow that pattern across the whole source file where similar lock inversion can happen. v2: Keep hardware read under the lock so the whole operation of updating energy from hardware is still atomic (Guenter), - instead, acquire the rpm wakeref before the lock and hold it as long as the lock is held, - use the same aproach for other similar places across the i915_hwmon.c source file (Rodrigo). Fixes: 1b44019a93e2 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> # v6.5+ Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240311203500.518675-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 71b218771426ea84c0e0148a2b7ac52c1f76e792) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/dsb: Fix DSB vblank waits when using VRRVille Syrjälä1-0/+14
Looks like the undelayed vblank gets signalled exactly when the active period ends. That is a problem for DSB+VRR when we are already in vblank and expect DSB to start executing as soon as we send the push. Instead of starting, the DSB just keeps on waiting for the undelayed vblank which won't signal until the end of the next frame's active period, which is far too late. The end result is that DSB won't have even started executing by the time the flips/etc. have completed. We then wait for an extra 1ms, after which we terminate the DSB and report a timeout: [drm] *ERROR* [CRTC:80:pipe A] DSB 0 timed out waiting for idle (current head=0xfedf4000, head=0x0, tail=0x1080) To fix this let's configure DSB to use the so called VRR "safe window" instead of the undelayed vblank to trigger the DSB vblank logic, when VRR is enabled. Cc: stable@vger.kernel.org Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9927 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240306040806.21697-3-ville.syrjala@linux.intel.com Reviewed-by: Animesh Manna <animesh.manna@intel.com> (cherry picked from commit 41429d9b68367596eb3d6d5961e6295c284622a7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/vrr: Generate VRR "safe window" for DSBVille Syrjälä2-4/+5
Looks like TRANS_CHICKEN bit 31 means something totally different depending on the platform: TGL: generate VRR "safe window" for DSB ADL/DG2: make TRANS_SET_CONTEXT_LATENCY effective with VRR So far we've only set this on ADL/DG2, but when using DSB+VRR we also need to set it on TGL. And a quick test on MTL says it doesn't need this bit for either of those purposes, even though it's still documented as valid in bspec. Cc: stable@vger.kernel.org Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9927 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240306040806.21697-2-ville.syrjala@linux.intel.com Reviewed-by: Animesh Manna <animesh.manna@intel.com> (cherry picked from commit 810e4519a1b34b5a0ff0eab32e5b184f533c5ee9) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_statusBhanuprakash Modem1-3/+2
Remove duplicate checks for debugfs entry "DRRS capable:". Fixes: 20af10845864 ("drm/i915/display/debugfs: New entry "DRRS capable" to i915_drrs_status") Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com> Signed-off-by: Bhanuprakash Modem <bhanuprakash.modem@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240227123833.2799647-2-bhanuprakash.modem@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 3d81fceb60f20fe2ceed2198636ee6dc9ef46775) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/drrs: Refactor CPU transcoder DRRS checkBhanuprakash Modem3-10/+14
Rename cpu_transcoder_has_drrs() to intel_cpu_transcoder_has_drrs() and move it to intel_drrs.[ch]. V2: - Move helpers to intel_drrs.[ch] (Jani) - Fix commit message (Jani) Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com> Signed-off-by: Bhanuprakash Modem <bhanuprakash.modem@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240228055502.2857819-1-bhanuprakash.modem@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 2d04f8158548103c082190c8dbf6a19097e2423e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/mtl: Update workaround 14018575942Tejas Upadhyay1-0/+1
Applying WA 14018575942 only on Compute engine has impact on some apps like chrome. Updating this WA to apply on Render engine as well as it is helping with performance on Chrome. Note: There is no concern from media team thus not applying WA on media engines. We will revisit if any issues reported from media team. V2(Matt): - Use correct WA number Fixes: 668f37e1ee11 ("drm/i915/mtl: Update workaround 14018778641") Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240228103738.2018458-1-tejas.upadhyay@intel.com (cherry picked from commit 71271280175aa0ed6673e40cce7c01296bcd05f6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostlyVille Syrjälä2-7/+39
Reinstate commit 88b065943cb5 ("drm/i915/dsi: Do display on sequence later on icl+"), for the most part. Turns out some machines (eg. Chuwi Minibook X) really do need that updated order. It is also the order the Windows driver uses. However we can't just undo the revert since that would again break Lenovo 82TQ. After staring at the VBT sequences for both machines I've concluded that the Lenovo 82TQ sequences look somewhat broken: - INIT_OTP is not present at all - what should be in INIT_OTP is found in DISPLAY_ON - what should be in DISPLAY_ON is found in BACKLIGHT_ON (along with the actual backlight stuff) The Chuwi Minibook X on the other hand has a full complement of sequences in its VBT. So let's try to deal with the broken sequences in the Lenovo 82TQ VBT by simply swapping the (non-existent) INIT_OTP sequence with the DISPLAY_ON sequence. Thus we execute DISPLAY_ON when intending to execute INIT_OTP, and execute nothing at all when intending to execute DISPLAY_ON. That should be 100% equivalent to the revert, for such broken VBTs. Cc: stable@vger.kernel.org Fixes: 6992eb815d08 ("Revert "drm/i915/dsi: Do display on sequence later on icl+"") References: https://gitlab.freedesktop.org/drm/intel/-/issues/10071 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10334 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240305083659.8396-1-ville.syrjala@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 94ae4612ea336bfc3c12b3fc68467c6711a4f39b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915/display: Disable AuxCCS framebuffers if built for XeJuha-Pekka Heikkila1-0/+3
AuxCCS framebuffers don't work on Xe driver hence disable them from plane capabilities until they are fixed. FlatCCS framebuffers work and they are left enabled. CCS is left untouched for i915 driver. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/933 Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240228140225.858145-1-juhapekka.heikkila@gmail.com (cherry picked from commit b7232a730fbf043f54fb46fbf4a6e92936770e79) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DPVille Syrjälä2-6/+0
Looks like I misplaced a few hunks when I moved the audio enable/disable out from the encoder enable/disable hooks. So we are now doing a double audio enable/disable on SDVO and g4x+ DP. Probably harmless as doing it twice shouldn't really change anything, but let's do it just once, as intended. Fixes: cff742cc6851 ("drm/i915: Hoist the encoder->audio_{enable,disable}() calls higher up") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240226193251.29619-1-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 315bd0a0825776d6c66d474bf572db64fa019ad8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.cJoonas Lahtinen1-0/+2
Add standalone includes for BUG_ON and BUILD_BUG_ON to avoid build failure after linux-next include refactoring. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240308144643.137831-1-joonas.lahtinen@linux.intel.com (cherry picked from commit 4df6ac223cad36e7384ed00fe6efc114279f0df6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-28drm/qxl: remove unused variable from `qxl_process_single_command()`Miguel Ojeda1-3/+1
Clang 14 in an (essentially) defconfig loongarch64 build for next-20240327 reports [1]: drivers/gpu/drm/qxl/qxl_ioctl.c:148:14: error: variable 'num_relocs' set but not used [-Werror,-Wunused-but-set-variable] The variable was originally used in the `out_free_bos` label, but commit 74d9a6335dce ("drm/qxl: Simplify cleaning qxl processing command") removed the use that happened in that label. Thus remove the unused variable. Fixes: 74d9a6335dce ("drm/qxl: Simplify cleaning qxl processing command") Closes: https://lore.kernel.org/lkml/CANiq72kqqQfUxLkHJYqeBAhpc6YcX7bfR96gmmbF=j8hEOykqw@mail.gmail.com/ [1] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20240327175556.233126-2-ojeda@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`Miguel Ojeda1-2/+0
Clang 14 in an (essentially) defconfig loongarch64 build for next-20240326 reports [1]: drivers/gpu/drm/qxl/qxl_cmd.c:424:6: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] The variable is already unused in the version that got into the tree. Thus remove the unused variable. Fixes: f64122c1f6ad ("drm: add new QXL driver. (v1.4)") Closes: https://lore.kernel.org/lkml/CANiq72mjc5t4n25SQvYSrOEhxxpXYPZ4pPzneSJHEnc3qApu2Q@mail.gmail.com/ [1] Closes: https://lore.kernel.org/all/20240327163331.GB1153323@dev-arch.thelio-3990X/ Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20240327175556.233126-1-ojeda@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28drm/i915: add bug.h include to i915_memcpy.cDave Airlie1-0/+1
This is stopping me building here for some reason, /home/airlied/devel/kernel/dim/drm-fixes/drivers/gpu/drm/i915/i915_memcpy.c: In function ‘i915_unaligned_memcpy_from_wc’: /home/airlied/devel/kernel/dim/drm-fixes/drivers/gpu/drm/i915/i915_memcpy.c:33:25: error: implicit declaration of function ‘BUG_ON’; did you mean ‘CI_BUG_ON’? [-Werror=implicit-function-declaration] 33 | #define CI_BUG_ON(expr) BUG_ON(expr) | ^~~~~~ /home/airlied/devel/kernel/dim/drm-fixes/drivers/gpu/drm/i915/i915_memcpy.c:144:9: note: in expansion of macro ‘CI_BUG_ON’ 144 | CI_BUG_ON(!i915_has_memcpy_from_wc()); | ^~~~~~~~~ engage maintainer overrides :-) Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-03-28Merge tag 'amd-drm-fixes-6.9-2024-03-27' of ↵Dave Airlie35-272/+328
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.9-2024-03-27: amdgpu: - SMU 14.0.1 updates - DCN 3.5.x updates - VPE fix - eDP panel flickering fix - Suspend fix - PSR fix - DCN 3.0+ fix - VCN 4.0.6 updates - debugfs fix amdkfd: - DMA-Buf fix - GFX 9.4.2 TLB flush fix - CP interrupt fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240328025342.8700-1-alexander.deucher@amd.com
2024-03-27drm/vmwgfx: Create debugfs ttm_resource_manager entry only if neededJocelyn Falempe1-6/+9
The driver creates /sys/kernel/debug/dri/0/mob_ttm even when the corresponding ttm_resource_manager is not allocated. This leads to a crash when trying to read from this file. Add a check to create mob_ttm, system_mob_ttm, and gmr_ttm debug file only when the corresponding ttm_resource_manager is allocated. crash> bt PID: 3133409 TASK: ffff8fe4834a5000 CPU: 3 COMMAND: "grep" #0 [ffffb954506b3b20] machine_kexec at ffffffffb2a6bec3 #1 [ffffb954506b3b78] __crash_kexec at ffffffffb2bb598a #2 [ffffb954506b3c38] crash_kexec at ffffffffb2bb68c1 #3 [ffffb954506b3c50] oops_end at ffffffffb2a2a9b1 #4 [ffffb954506b3c70] no_context at ffffffffb2a7e913 #5 [ffffb954506b3cc8] __bad_area_nosemaphore at ffffffffb2a7ec8c #6 [ffffb954506b3d10] do_page_fault at ffffffffb2a7f887 #7 [ffffb954506b3d40] page_fault at ffffffffb360116e [exception RIP: ttm_resource_manager_debug+0x11] RIP: ffffffffc04afd11 RSP: ffffb954506b3df0 RFLAGS: 00010246 RAX: ffff8fe41a6d1200 RBX: 0000000000000000 RCX: 0000000000000940 RDX: 0000000000000000 RSI: ffffffffc04b4338 RDI: 0000000000000000 RBP: ffffb954506b3e08 R8: ffff8fee3ffad000 R9: 0000000000000000 R10: ffff8fe41a76a000 R11: 0000000000000001 R12: 00000000ffffffff R13: 0000000000000001 R14: ffff8fe5bb6f3900 R15: ffff8fe41a6d1200 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffb954506b3e00] ttm_resource_manager_show at ffffffffc04afde7 [ttm] #9 [ffffb954506b3e30] seq_read at ffffffffb2d8f9f3 RIP: 00007f4c4eda8985 RSP: 00007ffdbba9e9f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000000037e000 RCX: 00007f4c4eda8985 RDX: 000000000037e000 RSI: 00007f4c41573000 RDI: 0000000000000003 RBP: 000000000037e000 R8: 0000000000000000 R9: 000000000037fe30 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4c41573000 R13: 0000000000000003 R14: 00007f4c41572010 R15: 0000000000000003 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Fixes: af4a25bbe5e7 ("drm/vmwgfx: Add debugfs entries for various ttm resource managers") Cc: <stable@vger.kernel.org> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240312093551.196609-1-jfalempe@redhat.com
2024-03-27drm/amdgpu: fix deadlock while reading mqd from debugfsJohannes Weiner1-17/+29
An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amdgpu: enable UMSCH 4.0.6Lang Yu3-4/+16
Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amdgpu/umsch: update UMSCH 4.0 FW interfaceLang Yu2-12/+21
Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Set DCN351 BB and IP the same as DCN35Xi Liu1-5/+1
[WHY & HOW] DCN351 and DCN35 should use the same bounding box and IP settings. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Xi Liu <xi.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Fix bounds check for dcn35 DcfClocksRoman Li1-1/+1
[Why] NumFclkLevelsEnabled is used for DcfClocks bounds check instead of designated NumDcfClkLevelsEnabled. That can cause array index out-of-bounds access. [How] Use designated variable for dcn35 DcfClocks bounds check. Fixes: a8edc9cc0b14 ("drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Remove MPC rate control logic from DCN30 and aboveGeorge Shen6-155/+41
[Why] MPC flow rate control is not needed for DCN30 and above. Current logic that uses it can result in underflow for certain edge cases (such as DSC N422 + ODM combine + 422 left edge pixel). [How] Remove MPC flow rate control logic and programming for DCN30 and above. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: fix a dereference of a NULL pointerWenjing Liu1-2/+4
[why&how] In some platform out_transfer_func may not be popualted. We need to check for null before dereferencing it. Fixes: d2dea1f14038 ("drm/amd/display: Generalize new minimal transition path") Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Send DTBCLK disable message on first commitTaimur Hassan1-0/+5
[Why] Previous patch to allow DTBCLK disable didn't address boot case. Driver thinks DTBCLK is disabled by default, so we don't send disable message to PMFW. DTBCLK is then enabled at idle desktop on boot, burning power. [How] Set dtbclk_en to true on boot so that disable message is sent during first commit. Fixes: 27750e176a4f ("drm/amd/display: Allow DTBCLK disable for DCN35") Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <syed.hassan@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Update dcn351 to latest dcn35 configSung Joon Kim3-7/+15
[why & how] There were some fixes in dcn35 that need to be ported over to dcn351 to prevent any regression. Signed-off-by: Sung Joon Kim <sungkim@amd.com> Reviewed-by: Liu, Xi (Alex) <xiliu102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: fix IPX enablementHamza Mahfooz2-4/+6
We need to re-enable idle power optimizations after entering PSR. Since, we get kicked out of idle power optimizations before entering PSR (entering PSR requires us to write to DCN registers, which isn't allowed while we are in IPS). Fixes: a9b1a4f684b3 ("drm/amd/display: Add more checks for exiting idle in DC") Tested-by: Mark Broadworth <mark.broadworth@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd: Flush GFXOFF requests in prepare stageMario Limonciello1-0/+2
If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amdkfd: range check cp bad op exception interruptsJonathan Kim3-3/+6
Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27Revert "drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP ↵Harry Wentland2-13/+8
displays without PSR" This causes flicker on a bunch of eDP panels. The info_packet code also caused regressions on other OSes that we haven't' seen on Linux yet, but that is likely due to the fact that we haven't had a chance to test those environments on Linux. We'll need to revisit this. This reverts commit 202260f64519e591b5cd99626e441b6559f571a3. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3207 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3151 Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-03-27drm/amdkfd: fix TLB flush after unmap for GFX9.4.2Eric Huang1-1/+1
TLB flush after unmap accidentially was removed on gfx9.4.2. It is to add it back. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-03-27drm/amdgpu/vpe: power on vpe when hw_initPeyton Lee1-0/+6
To fix mode2 reset failure. Should power on VPE when hw_init. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: increase bb clock for DCN351Xi Liu1-14/+76
[Why and how] Bounding box clocks for DCN351 should be increased as per request Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Xi Liu <xi.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Prevent crash when disable streamChris Park1-1/+2
[Why] Disabling stream encoder invokes a function that no longer exists. [How] Check if the function declaration is NULL in disable stream encoder. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Chris Park <chris.park@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/display: Increase Z8 watermark times.Natanel Roizenman2-4/+4
Increase Z8 watermark times from 210->250us and 320->350us. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Natanel Roizenman <natanel.roizenman@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amdkfd: Check cgroup when returning DMABuf infoMukul Joshi1-2/+2
Check cgroup permissions when returning DMA-buf info and based on cgroup info return the GPU id of the GPU that have access to the BO. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27drm/amd/swsmu: add smu 14.0.1 vcn and jpeg msglima10024-27/+82
add new vcn and jpeg msg v2: squash in updates (Alex) v3: rework code for better compat with other smu14.x variants (Alex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: lima1002 <li.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-26drm/rockchip: vop2: Remove AR30 and AB30 format supportAndy Yan1-2/+0
The Alpha blending for 30 bit RGB/BGR are not functioning properly for rk3568/rk3588, so remove it from the format list. Fixes: bfd8a5c228fa ("drm/rockchip: vop2: Add more supported 10bit formats") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240304100952.3592984-1-andyshrk@163.com
2024-03-25Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann1083-14553/+186042
Backmerging to get drm-misc-fixes to the state of v6.9-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-03-25drm/xe: Fix END redefinitionLucas De Marchi1-11/+9
mips declares an END macro in its headers so it can't be used without namespace in a driver like xe. Instead of coming up with a longer name, just remove the macro and replace its use with 0 since it's still clear what that means: set_offsets() was already using that implicitly when checking the data variable. Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: http://kisskb.ellerman.id.au/kisskb/buildresult/15143996/ Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322145037.196548-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 35b22649eb4155ca6bcffcb2c6e2a1d311aaaf72) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-25drm/xe/query: fix gt_id bounds checkMatthew Auld1-1/+1
The user provided gt_id should always be less than the XE_MAX_GT_PER_TILE. Fixes: 7793d00d1bf5 ("drm/xe: Correlate engine and cpu timestamps with better accuracy") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240321110629.334701-2-matthew.auld@intel.com (cherry picked from commit 4b275f502a0d3668195762fb55fa00e659ad1b0b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-25drm/xe/device: fix XE_MAX_TILES_PER_DEVICE checkMatthew Auld1-1/+1
Here XE_MAX_TILES_PER_DEVICE is the gt array size, therefore the gt index should always be less than. v2 (Lucas): - Add fixes tag. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240318180532.57522-6-matthew.auld@intel.com (cherry picked from commit a96cd71ec7be0790f9fc4039ad21be8d214b03a4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>