summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_gem_request.c
AgeCommit message (Collapse)AuthorFilesLines
2017-02-16drm/i915: Check for timeout completion when waiting for the rq to submittedChris Wilson1-1/+6
We first wait for a request to be submitted to hw and assigned a seqno, before we can wait for the hw to signal completion (otherwise we don't know the hw id we need to wait upon). Whilst waiting for the request to be submitted, we may exceed the user's timeout and need to propagate the error back. v2: Make ETIME into an error from wait_for_execute for consistent exit handling. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion") Testcase: igt/gem_wait/basic-await Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+ Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/20170208181238.7232-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (cherry picked from commit 969bb72cbfd906d347cf76dc9b8c8dbaf83ba27a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-16drm/i915: Construct a request even if the GPU is currently hungChris Wilson1-25/+3
As we now have the ability to directly reset the GPU from the waiter (and so do not need to drop the lock in order to let the reset proceed) and also do not lose requests over a reset, we can now simply queue the request to occur after the reset rather than roundtripping to userspace (or worse failing with EIO). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-11drm/i915: Add a sanity check that no request is submitted in the middleChris Wilson1-0/+7
It is an error to start a new request on the same timeline (ringbuffer) as the current one before the current is submitted. If there are two requests emitting to the ringbuffer at the same time, the operation is undefined. We can catch this by checking for the timeline having a later seqno than ours when we come to submit our request. Currently we have this check at the end of __i915_add_request, but having an early check as well isolates a failure in the caller versus a failure in sealing the request (i.e. from inside __i915_add_request itself). For example, CI is currently tripping over this late assertion on ctg/ilk: [ 100.329399] [IGT] gem_cs_tlb: starting subtest basic-default [ 100.336333] ------------[ cut here ]------------ [ 100.336341] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:908! [ 100.336347] invalid opcode: 0000 [#1] PREEMPT SMP [ 100.336351] Modules linked in: snd_hda_intel i915 snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm coretemp mei_me lpc_ich mei e1000e ptp pps_core [last unloaded: i915] [ 100.336373] CPU: 0 PID: 6308 Comm: gem_cs_tlb Tainted: G U 4.10.0-rc3-CI-CI_DRM_2045+ #1 [ 100.336380] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 100.336386] task: ffff88012b738040 task.stack: ffffc90000560000 [ 100.336441] RIP: 0010:__i915_add_request+0x4aa/0x510 [i915] [ 100.336445] RSP: 0018:ffffc90000563ac0 EFLAGS: 00010212 [ 100.336451] RAX: 0000000000005d52 RBX: ffff880133bb84c0 RCX: 0000000000000001 [ 100.336456] RDX: 0000000080000001 RSI: ffff88012b738860 RDI: 00000000ffffffff [ 100.336461] RBP: ffffc90000563b00 R08: ffff880133bb8780 R09: 0000000000000000 [ 100.336466] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012f53d950 [ 100.336472] R13: ffff88012a2b0af8 R14: ffff88012a5b0008 R15: ffff88012f53d960 [ 100.336477] FS: 00007f0d19da38c0(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000 [ 100.336483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.336488] CR2: 00007f0d17706000 CR3: 000000012aa3e000 CR4: 00000000000406f0 [ 100.336496] Call Trace: [ 100.336527] i915_gem_switch_to_kernel_context+0x131/0x1b0 [i915] [ 100.336559] i915_gem_evict_vm+0x202/0x2b0 [i915] [ 100.336590] i915_gem_execbuffer_reserve.isra.9+0x3ae/0x440 [i915] [ 100.336623] i915_gem_do_execbuffer.isra.15+0x6d9/0x1b20 [i915] [ 100.336656] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 100.336666] drm_ioctl+0x200/0x450 [ 100.336697] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 100.336708] do_vfs_ioctl+0x90/0x6e0 [ 100.336716] ? up_read+0x1a/0x40 [ 100.336723] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 100.336730] SyS_ioctl+0x3c/0x70 [ 100.336738] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 100.336745] RIP: 0033:0x7f0d187cb357 [ 100.336750] RSP: 002b:00007ffe0b2f7c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 100.336761] RAX: ffffffffffffffda RBX: 00007ffe0b2f7d60 RCX: 00007f0d187cb357 [ 100.336768] RDX: 00007ffe0b2f7d00 RSI: 0000000040406469 RDI: 0000000000000003 [ 100.336775] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000022 [ 100.336782] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000002 [ 100.336789] R13: 0000000000419101 R14: 00007ffe0b2f7d60 R15: 00007ffe0b2f7d50 [ 100.336797] Code: 5f 74 1e e9 d4 fb ff ff e8 bc 1e 9c e0 e9 ae fb ff ff 4c 89 e7 e8 77 22 fd ff e9 88 fd ff ff 0f 0b e8 a3 1e 9c e0 e9 b1 fb ff ff <0f> 0b 0f 0b e8 fd af ab e0 85 c0 75 c2 48 c7 c2 80 2c 71 a0 be [ 100.336877] RIP: __i915_add_request+0x4aa/0x510 [i915] RSP: ffffc90000563ac0 [ 100.336886] ---[ end trace 22b36545479e5eb7 ]--- Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111140858.1922-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-04Merge tag 'v4.10-rc2' into drm-intel-next-queuedDaniel Vetter1-1/+1
Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've done the backmerge directly because Dave is on vacation. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-12-18drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfuncChris Wilson1-4/+1
A fairly trivial move of a matching pair of routines (for preparing a request for construction) onto an engine vfunc. The ulterior motive is to be able to create a mock request implementation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
2016-12-18drm/i915: Unify active context tracking between legacy/execlists/gucChris Wilson1-13/+25
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-13Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-265/+499
Pull drm updates from Dave Airlie: "This is the main pull request for drm for 4.10 kernel. New drivers: - ZTE VOU display driver (zxdrm) - Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson) - MXSFB support (mxsfb) Core: - Format handling has been reworked - Better atomic state debugging - drm_mm leak debugging - Atomic explicit fencing support - fbdev helper ops - Documentation updates - MST fbcon fixes Bridge: - Silicon Image SiI8620 driver Panel: - Add support for new simple panels i915: - GVT Device model - Better HDMI2.0 support on skylake - More watermark fixes - GPU idling rework for suspend/resume - DP Audio workarounds - Scheduler prep-work - Opregion CADL handling - GPU scheduler and priority boosting amdgfx/radeon: - Support for virtual devices - New VM manager for non-contig VRAM buffers - UVD powergating - SI register header cleanup - Cursor fixes - Powermanagement fixes nouveau: - Powermangement reworks for better voltage/clock changes - Atomic modesetting support - Displayport Multistream (MST) support. - GP102/104 hang and cursor fixes - GP106 support hisilicon: - hibmc support (BMC chip for aarch64 servers) armada: - add tracing support for overlay change - refactor plane support - de-midlayer the driver omapdrm: - Timing code cleanups rcar-du: - R8A7792/R8A7796 support - Misc fixes. sunxi: - A31 SoC display engine support imx-drm: - YUV format support - Cleanup plane atomic update mali-dp: - Misc fixes dw-hdmi: - Add support for HDMI i2c master controller tegra: - IOMMU support fixes - Error handling fixes tda998x: - Fix connector registration - Improved robustness - Fix infoframe/audio compliance virtio: - fix busid issues - allocate more vbufs qxl: - misc fixes and cleanups. vc4: - Fragment shader threading - ETC1 support - VEC (tv-out) support msm: - A5XX GPU support - Lots of atomic changes tilcdc: - Misc fixes and cleanups. etnaviv: - Fix dma-buf export path - DRAW_INSTANCED support - fix driver on i.MX6SX exynos: - HDMI refactoring fsl-dcu: - fbdev changes" * tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits) drm/nouveau/kms/nv50: fix atomic regression on original G80 drm/nouveau/bl: Do not register interface if Apple GMUX detected drm/nouveau/bl: Assign different names to interfaces drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2 drm/nouveau/ltc: protect clearing of comptags with mutex drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap drm/nouveau/core: recognise GP106 chipset drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode drm/nouveau/gr/gf100-: properly ack all FECS error interrupts drm/nouveau/fifo/gf100-: recover from host mmu faults drm: Add fake controlD* symlinks for backwards compat drm/vc4: Don't use drm_put_dev drm/vc4: Document VEC DT binding drm/vc4: Add support for the VEC (Video Encoder) IP drm: Add TV connector states to drm_connector_state drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum drm/vc4: Fix ->clock_select setting for the VEC encoder drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well drm/amdgpu: use pin rather than pin_restricted in a few cases ...
2016-12-05drm/i915: Hold a reference on the request for its fence chainChris Wilson1-7/+27
Currently, we have an active reference for the request until it is retired. Though it cannot be retired before it has been executed by hardware, the request may be completed before we have finished processing the execute fence, i.e. we may continue to process that fence as we free the request. Fixes: 5590af3e115a ("drm/i915: Drive request submission through fence callbacks") Fixes: 23902e49c999 ("drm/i915: Split request submit/execute phase into two") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-3-chris@chris-wilson.co.uk (cherry picked from commit 48bc2a4a427ad81578f887d71d45794619a77211) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-11-25drm/i915: Integrate i915_sw_fence with debugobjectsChris Wilson1-0/+9
Add the tracking required to enable debugobjects for fences to improve error detection in BAT. The debugobject interface lets us track the lifetime and phases of the fences even while being embedded into larger structs, i.e. to check they are not used after they have been released. v2: Don't populate the stubs, debugobjects checks for a NULL pointer and treats it equivalently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-4-chris@chris-wilson.co.uk
2016-11-25drm/i915: Hold a reference on the request for its fence chainChris Wilson1-7/+27
Currently, we have an active reference for the request until it is retired. Though it cannot be retired before it has been executed by hardware, the request may be completed before we have finished processing the execute fence, i.e. we may continue to process that fence as we free the request. Fixes: 5590af3e115a ("drm/i915: Drive request submission through fence callbacks") Fixes: 23902e49c999 ("drm/i915: Split request submit/execute phase into two") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-3-chris@chris-wilson.co.uk
2016-11-25drm/i915: Assert no external observers when unwind a failed request allocChris Wilson1-0/+5
Before we return the request back to the kmem_cache after a failed i915_gem_request_alloc(), we should assert that it has not been added to any global state tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-2-chris@chris-wilson.co.uk
2016-11-25drm/i915: Add is-completed assert to request retire entrypointChris Wilson1-0/+2
While we will check that the request is completed prior to being retired, by placing an assert that the request is complete at the entrypoint of the function we can more clearly document the function's preconditions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-1-chris@chris-wilson.co.uk
2016-11-25drm/i915: Rename i915_gem_timeline.next_seqno to .seqnoJoonas Lahtinen1-7/+7
Rename i915_gem_timeline member 'next_seqno' into 'seqno' as the variable is pre-increment. We've already had two bugs due to the confusing name, second is fixed as follow-up patch. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161124144750.2610-1-chris@chris-wilson.co.uk
2016-11-21drm/i915: Wipe hang stats as an embedded structMika Kuoppala1-2/+2
Bannable property, banned status, guilty and active counts are properties of i915_gem_context. Make them so. v2: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com
2016-11-21drm/i915: Use request retirement as context progressMika Kuoppala1-0/+4
As hangcheck score was removed, the active decay of score was removed also. This removed feature for hangcheck to detect if the gpu client was accidentally or maliciously causing intermittent hangs. Reinstate the scoring as a per context property, so that if one context starts to act unfavourably, ban it. v2: ban_period_secs as a gate to score check (Chris) v3: decay in proper spot. scores as tunables (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-11-18drm/i915: Check that each request phase is completed before retiringChris Wilson1-0/+2
Trying to chase an impossible bug (ivb): [ 207.765411] [drm:i915_reset_and_wakeup [i915]] resetting chip [ 207.765734] [drm:i915_gem_reset [i915]] resetting render ring to restart from tail of request 0x4ee834 [ 207.765791] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on RC6p on RC6pp off [ 207.767213] [drm:intel_guc_setup [i915]] GuC fw status: path (null), fetch NONE, load NONE [ 207.767515] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:203! [ 207.767551] invalid opcode: 0000 [#1] PREEMPT SMP [ 207.767576] Modules linked in: snd_hda_intel i915 cdc_ncm usbnet mii x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me mei snd_pcm sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915] [ 207.767808] CPU: 3 PID: 8855 Comm: gem_ringfill Tainted: G U 4.9.0-rc5-CI-Patchwork_3052+ #1 [ 207.767854] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012 [ 207.767894] task: ffff88012c82a740 task.stack: ffffc9000383c000 [ 207.767927] RIP: 0010:[<ffffffffa00a0a3a>] [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915] [ 207.767999] RSP: 0018:ffffc9000383fb20 EFLAGS: 00010293 [ 207.768027] RAX: 00000000004ee83c RBX: ffff880135dcb480 RCX: 00000000004ee83a [ 207.768062] RDX: ffff88012fea42a8 RSI: 0000000000000001 RDI: ffff88012c82af68 [ 207.768095] RBP: ffffc9000383fb48 R08: 0000000000000000 R09: 0000000000000000 [ 207.768129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880135dcb480 [ 207.768163] R13: ffff88012fea42a8 R14: 0000000000000000 R15: 00000000000001d8 [ 207.768200] FS: 00007f955f658740(0000) GS:ffff88013e2c0000(0000) knlGS:0000000000000000 [ 207.768239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 207.768258] CR2: 0000555899725930 CR3: 00000001316f6000 CR4: 00000000001406e0 [ 207.768286] Stack: [ 207.768299] ffff880135dcb480 ffff880135dcbe00 ffff88012fea42a8 0000000000000000 [ 207.768350] 00000000000001d8 ffffc9000383fb70 ffffffffa00a1339 0000000000000000 [ 207.768402] ffff88012f296c88 00000000000003f0 ffffc9000383fbb0 ffffffffa00b582d [ 207.768453] Call Trace: [ 207.768493] [<ffffffffa00a1339>] i915_gem_request_retire_upto+0x49/0x90 [i915] [ 207.768553] [<ffffffffa00b582d>] intel_ring_begin+0x15d/0x2d0 [i915] [ 207.768608] [<ffffffffa00b59cb>] intel_ring_alloc_request_extras+0x2b/0x40 [i915] [ 207.768667] [<ffffffffa00a2fd9>] i915_gem_request_alloc+0x359/0x440 [i915] [ 207.768723] [<ffffffffa008bd03>] i915_gem_do_execbuffer.isra.15+0x783/0x1a10 [i915] [ 207.768766] [<ffffffff811a6a2e>] ? __might_fault+0x3e/0x90 [ 207.768816] [<ffffffffa008d380>] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 207.768854] [<ffffffff815532a6>] drm_ioctl+0x1f6/0x480 [ 207.768900] [<ffffffffa008d2c0>] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 207.768939] [<ffffffff81202f6e>] do_vfs_ioctl+0x8e/0x690 [ 207.768972] [<ffffffff818193ac>] ? retint_kernel+0x2d/0x2d [ 207.769004] [<ffffffff810d6ef2>] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 207.769039] [<ffffffff812035ac>] SyS_ioctl+0x3c/0x70 [ 207.769068] [<ffffffff818189ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 207.769103] Code: 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 8b 35 fa 7b e1 e1 85 f6 0f 85 55 03 00 00 41 8b 84 24 80 02 00 00 85 c0 75 02 <0f> 0b 49 8b 94 24 a8 00 00 00 48 8b 8a e0 01 00 00 8b 89 c0 00 [ 207.769400] RIP [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915] [ 207.769463] RSP <ffffc9000383fb20> Let's add a couple more BUG_ONs before this to ascertain that the request did make it to hardware. The impossible part of this stacktrace is that request must have been considered completed by the i915_request_wait() before we tried to retire it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161118143412.26508-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-11-18drm/i915: Be more careful to drop the GT wakerefChris Wilson1-8/+9
Since we can retire requests from multiple paths, we cannot assume that i915_gem_retire_requests() is the sole path on which we can transition to gt.active_requests == 0. A consequence of this is that we would skip the function if we had already retired all the requests and not scheduled the idle worker. This is fallout from changing the routine from considering active_engines (for which it was the only consumer) to active_requests. v2: Move kicking the idle working to i915_gem_request_retire() otherwise we could postpone the idle callback everytime we called retire_requests even though we did no work. v3: We only need to move the idle work kicking! v4: Drop the BUG_ON(!awake) as we may be called from the shrinker in the middle of constructing a request before we have marked the device awake. v5: Add a BUG_ON() for active_requests underflow upon retirement (Joonas) Fixes: 28176ef4cfa5 ("drm/i915: Reserve space in the global seqno during request allocation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161115164620.17185-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-16locking/core: Remove cpu_relax_lowlatency() usersChristian Borntraeger1-1/+1
With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them with cpu_relax(). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15drm/i915: Store the execution priority on the contextChris Wilson1-1/+1
In order to support userspace defining different levels of importance to different contexts, and in particular the preferred order of execution, store a priority value on each context. By default, the kernel's context, which is used for idling and other background tasks, is given minimum priority (i.e. all user contexts will execute before the kernel). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-9-chris@chris-wilson.co.uk
2016-11-15drm/i915/scheduler: Execute requests in order of prioritiesChris Wilson1-0/+5
Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
2016-11-15drm/i915/scheduler: Record all dependencies upon request constructionChris Wilson1-1/+90
The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
2016-11-15drm/i915/scheduler: Signal the arrival of a new requestChris Wilson1-0/+13
The start of the scheduler, add a hook into request submission for the scheduler to see the arrival of new requests and prepare its runqueues. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-6-chris@chris-wilson.co.uk
2016-11-15drm/i915: Defer transfer onto execution timeline to actual hw submissionChris Wilson1-13/+25
Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
2016-11-15drm/i915: Split request submit/execute phase into twoChris Wilson1-9/+24
In order to support deferred scheduling, we need to differentiate between when the request is ready to run (i.e. the submit fence is signaled) and when the request is actually run (a new execute fence). This is typically split between the request itself wanting to wait upon others (for which we use the submit fence) and the CPU wanting to wait upon the request, for which we use the execute fence to be sure the hardware is ready to signal completion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-3-chris@chris-wilson.co.uk
2016-11-15drm/i915: Create distinct lockclasses for execution vs user timelinesChris Wilson1-1/+1
In order to simplify the lockdep annotation, as they become more complex in the future with deferred execution and multiple paths through the same functions, create a separate lockclass for the user timeline and the hardware execution timeline. We should only ever be locking the user timeline and the execution timeline in parallel so we only need to create two lock classes, rather than a separate class for every timeline. v2: Rename the lock classes to be more consistent with other lockdep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk
2016-11-09drm/i915: Spin until breadcrumb threads are completeChris Wilson1-3/+2
When we need to reset the global seqno on wraparound, we have to wait until the current rbtrees are drained (or otherwise the next waiter will be out of sequence). The current mechanism to kick and spin until complete, may exit too early as it would break if the target thread was currently running. Instead, we must wake up the threads, but keep spinning until the trees have been deleted. In order to appease Tvrtko, busy spin rather than yield(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161108143719.32215-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-07drm/i915: Avoid early GPU idling due to already pending idle workImre Deak1-3/+3
Atm, in case an idle work handler is already pending but haven't yet started to run, retiring a new request will not extend the active period as required, rather simply leaves the pending idle work to be scheduled at the original expiration time. This may lead to idling the GPU too early. Fix this by using the delayed-work scheduler alternative which makes sure the handler's expiration time is extended in this case. Cc: Chris Wilson <chris@chris-wilson.co.uk> Requested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-1-git-send-email-imre.deak@intel.com
2016-10-28drm/i915: Enable multiple timelinesChris Wilson1-28/+33
With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
2016-10-28drm/i915: Defer setting of global seqno on request to submissionChris Wilson1-7/+23
Defer the assignment of the global seqno on a request to its submission. In the next patch, we will only allocate the global seqno at that time, here we are just enabling the wait-for-submission before wait-for-seqno paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-34-chris@chris-wilson.co.uk
2016-10-28drm/i915: Reserve space in the global seqno during request allocationChris Wilson1-41/+45
A restriction on our global seqno is that they cannot wrap, and that we cannot use the value 0. This allows us to detect when a request has not yet been submitted, its global seqno is still 0, and ensures that hardware semaphores are monotonic as required by older hardware. To meet these restrictions when we defer the assignment of the global seqno, we must check that we have an available slot in the global seqno space during request construction. If that test fails, we wait for all requests to be completed and reset the hardware back to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
2016-10-28drm/i915: Move the global sync optimisation to the timelineChris Wilson1-12/+17
Currently we try to reduce the number of synchronisations (now the number of requests we need to wait upon) by noting that if we have earlier waited upon a request, all subsequent requests in the timeline will be after the wait. This only applies to requests in this timeline, as other timelines will not be ordered by that waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
2016-10-28drm/i915: Defer breadcrumb emissionChris Wilson1-28/+13
Move the actual emission of the breadcrumb for closing the request from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
2016-10-28drm/i915: Record space required for breadcrumb emissionChris Wilson1-0/+1
In the next patch, we will use deferred breadcrumb emission. That requires reserving sufficient space in the ringbuffer to emit the breadcrumb, which first requires us to know how large the breadcrumb is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
2016-10-28drm/i915: Rename ->emit_request to ->emit_breadcrumbChris Wilson1-2/+2
Now that the emission of the request tail and its submission to hardware are two separate steps, engine->emit_request() is confusing. engine->emit_request() is called to emit the breadcrumb commands for the request into the ring, name it such (engine->emit_breadcrumb). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
2016-10-28drm/i915: Introduce a global_seqno for each requestChris Wilson1-5/+14
Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28drm/i915: Wait first for submission, before waiting for request completionChris Wilson1-0/+51
In future patches, we will no longer be able to wait on a static global seqno and instead have to break our wait up into phases. First we wait for the global seqno assignment (upon submission to hardware), and once submitted we wait for the hardware to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-25-chris@chris-wilson.co.uk
2016-10-28drm/i915: Combine seqno + tracking into a global timeline structChris Wilson1-32/+49
Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-10-28drm/i915: Move GEM activity tracking into a common struct reservation_objectChris Wilson1-19/+29
In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which *i915* engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28drm/i915: Markup GEM API with lockdep assertsChris Wilson1-0/+6
Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
2016-10-28drm/i915: Rearrange i915_wait_request() accounting with callersChris Wilson1-113/+33
Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28drm/i915: Support asynchronous waits on struct fence from i915_gem_requestChris Wilson1-0/+48
We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
2016-10-25dma-buf: Rename struct fence to dma_fenceChris Wilson1-16/+16
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
2016-10-14drm/i915: Allocate intel_engine_cs structure only for the enabled enginesAkash Goel1-2/+3
With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-10drm/i915: Distinguish last emitted request from last submitted requestChris Wilson1-2/+3
In order not to trigger hangcheck on a idle-but-waiting engine, we need to distinguish between the pending request queue and the actual execution queue. This is done later in "drm/i915: Enable multiple timelines" but for now we need a temporary fix to prevent blaming the wrong engine for a GPU hang. (Note that this causes a temporary subtle change in how we decide when to allow a waitboost to be re-awarded back to the waiter, the temporary effect is that if the wait is upon the most current execution the wait is given for free, instead of checking to see if the client stalled itself. This will be repaired in "drm/i915: Enable multiple timelines".) Fixes: 0a046a0e93d2 ("drm/i915: Nonblocking request submission") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98104 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-1-chris@chris-wilson.co.uk (cherry picked from commit 8687b3ec852e89630bac650f15136811c7b4c1dc) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-07drm/i915: Distinguish last emitted request from last submitted requestChris Wilson1-2/+3
In order not to trigger hangcheck on a idle-but-waiting engine, we need to distinguish between the pending request queue and the actual execution queue. This is done later in "drm/i915: Enable multiple timelines" but for now we need a temporary fix to prevent blaming the wrong engine for a GPU hang. (Note that this causes a temporary subtle change in how we decide when to allow a waitboost to be re-awarded back to the waiter, the temporary effect is that if the wait is upon the most current execution the wait is given for free, instead of checking to see if the client stalled itself. This will be repaired in "drm/i915: Enable multiple timelines".) Fixes: 0a046a0e93d2 ("drm/i915: Nonblocking request submission") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98104 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-1-chris@chris-wilson.co.uk
2016-09-09drm/i915: Nonblocking request submissionChris Wilson1-6/+15
Now that we have fences in place to drive request submission, we can employ those to queue requests after their dependencies as opposed to stalling in the middle of an execbuf ioctl. (However, we still choose to spin before enabling the IRQ as that is faster - though contentious.) v2: Do the fence ordering first, where we can still fail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-20-chris@chris-wilson.co.uk
2016-09-09drm/i915: Prepare object synchronisation for asynchronicityChris Wilson1-0/+87
We are about to specialize object synchronisation to enable nonblocking execbuf submission. First we make a copy of the current object synchronisation for execbuffer. The general i915_gem_object_sync() will be removed following the removal of CS flips in the near future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
2016-09-09drm/i915: Reorder i915_add_request to separate the phases betterChris Wilson1-14/+14
Let's avoid mixing sealing the hardware commands for the request and adding the request to the software tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-15-chris@chris-wilson.co.uk
2016-09-09drm/i915: Drive request submission through fence callbacksChris Wilson1-1/+26
Drive final request submission from a callback from the fence. This way the request is queued until all dependencies are resolved, at which point it is handed to the backend for queueing to hardware. At this point, no dependencies are set on the request, so the callback is immediate. A side-effect of imposing a heavier-irqsafe spinlock for execlist submission is that we lose the softirq enabling after scheduling the execlists tasklet. To compensate, we manually kickstart the softirq by disabling and enabling the bh around the fence signaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
2016-09-09drm/i915: Perform a direct reset of the GPU from the waiterChris Wilson1-0/+29
If a waiter is holding the struct_mutex, then the reset worker cannot reset the GPU until the waiter returns. We do not want to return -EAGAIN form i915_wait_request as that breaks delicate operations like i915_vma_unbind() which often cannot be restarted easily, and returning -EIO is just as useless (and has in the past proven dangerous). The remaining WARN_ON(i915_wait_request) serve as a valuable reminder that handling errors from an indefinite wait are tricky. We can keep the current semantic that knowing after a reset is complete, so is the request, by performing the reset ourselves if we hold the mutex. uevent emission is still handled by the reset worker, so it may appear slightly out of order with respect to the actual reset (and concurrent use of the device). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk