summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_lrc.c
AgeCommit message (Collapse)AuthorFilesLines
2017-02-16drm/i915: Restore context and pd for ringbuffer submission after resetChris Wilson1-1/+15
Following a reset, the context and page directory registers are lost. However, the queue of requests that we resubmit after the reset may depend upon them - the registers are restored from a context image, but that restore may be inhibited and may simply be absent from the request if it was in the middle of a sequence using the same context. If we prime the CCID/PD registers with the first request in the queue (even for the hung request), we prevent invalid memory access for the following requests (and continually hung engines). v2: Magic BIT(8), reserved for future use but still appears unused. v3: Some commentary on handling innocent vs guilty requests v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my Ivybridge, but this bit probably exists for a reason. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> (cherry picked from commit c0dcb203fb009678e5be9e7782329dcfbbf16439) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-16drm/i915: Let execlist_update_context() cover !FULL_PPGTT mode.Zhi Wang1-1/+2
execlist_update_context() will try to update PDPs in a context before a ELSP submission only for full PPGTT mode, while PDPs was populated during context initialization. Now the latter code path is removed. Let execlist_update_context() also cover !FULL_PPGTT mode. Fixes: 34869776c76b ("drm/i915: check ppgtt validity when init reg state") Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhiyuan Lv <zhiyuan.lv@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1486377436-15380-1-git-send-email-zhi.a.wang@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit 04da811b3d821567e7a9a8a0baf48a6c1718b582) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-19drm/i915: Remove i915_vma_create from VMA APIChris Wilson1-2/+2
With the introduce of i915_vma_instance() for obtaining the VMA singleton for a (obj, vm, view) tuple, we can remove the i915_vma_create() in favour of a single entry point. We do incur a lookup onto an empty tree, but the i915_vma_create() were being called infrequently and during initialisation, so the small overhead is negligible. v2: Drop the i915_ prefix from the now static vma_create() function Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk
2017-01-12drm/i915: Invalidate the guc ggtt TLB upon insertionChris Wilson1-6/+0
Move the GuC invalidation of its ggtt TLB to where we perform the ggtt modification rather than proliferate it into all the callers of the insert (which may or may not in fact have to do the insertion). v2: Just do the guc invalidate unconditionally, (afaict) it has no impact without the guc loaded on gen8+ v3: Conditionally invalidate the guc - just in case that register has not been validated for other modes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk
2017-01-12drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.Francisco Jerez1-10/+0
The WaDisableLSQCROPERFforOCL workaround has the side effect of disabling an L3SQ optimization that has huge performance implications and is unlikely to be necessary for the correct functioning of usual graphic workloads. Userspace is free to re-enable the workaround on demand, and is generally in a better position to determine whether the workaround is necessary than the DRM is (e.g. only during the execution of compute kernels that rely on both L3 fences and HDC R/W requests). The same workaround seems to apply to BDW (at least to production stepping G1) and SKL as well (the internal workaround database claims that it does for all steppings, while the BSpec workaround table only mentions pre-production steppings), but the DRM doesn't do anything beyond whitelisting the L3SQCREG4 register so userspace can enable it when it sees fit. Do the same on KBL platforms. Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- This is followed by a regression of 35% and 10% respectively for the same benchmarks and platform caused by my recent patch series switching userspace to use the dataport constant cache instead of the sampler to implement uniform pull constant loads, which caused us to hit more heavily the L3 cache (and on platforms other than KBL had the opposite effect of improving performance of the same two benchmarks). The overall effect on KBL of this change combined with the recent userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf was affected by the constant cache changes (though it improved as it did on other platforms rather than regressing), but is not significantly affected by this patch (with statistical significance of 5% and sample size 20). v2: Drop some more code to avoid unused variable warning. Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: beignet@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.7+ Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [Removed double Fixes tag] Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
2017-01-12drm/i915: check ppgtt validity when init reg stateZhenyu Wang1-8/+1
Check if ppgtt is valid for context when init reg state. For gvt context which has no i915 allocated ppgtt, failed to check that would cause kernel null ptr reference error. v2: remove !48bit ppgtt case as we'll always update before submit (Chris) Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109131453.3943-1-zhenyuw@linux.intel.com
2017-01-10drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZEChris Wilson1-2/+3
Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-06drm/i915: Simplify testing for am-I-the-kernel-context?Chris Wilson1-1/+1
The kernel context (dev_priv->kernel_context) is unique in that it is not associated with any user filp - it is the only one with ctx->file_priv == NULL. This is a simpler test than comparing it against dev_priv->kernel_context which involves some pointer dancing. In checking that this is true, we notice that the gvt context is allocating itself a i915_hw_ppgtt it doesn't use and not flagging that its file_priv should be invalid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
2017-01-05drm/i915/execlists: Reorder execlists register enablingChris Wilson1-14/+4
Empirically we restart following a GPU reset more successfully if we call lrc_init_hws() (which contains a posting read) last. (The failure mode that was observed was that breadcrumb writes into the HWS from the recovered requests went astray leading to the context-switch maintaining forward progress, but the requests not being retired/completed.) For clarity, lrc_init_hws() is inlined (and the unused function then removed). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-3-chris@chris-wilson.co.uk
2017-01-05drm/i915: Assert that we do create the deferred contextChris Wilson1-0/+1
In order to convince static analyzers that the allocation function returns an error or sets ce->state, assert that it is set afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-2-chris@chris-wilson.co.uk
2016-12-31drm/i915: Complete kerneldoc for struct i915_gem_contextChris Wilson1-1/+1
The existing kerneldoc was outdated, so time for a refresh. v2: Use single line kdoc, mention functions for manipulation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk
2016-12-24drm/i915: re-use computed offset bias for context pinDaniele Ceraolo Spurio1-1/+3
The context has to obey the same offset requirements as the ring, so we can re-use the same bias value we computed for the ring instead of unconditionally using GUC_WOPCM_TOP. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-2-git-send-email-daniele.ceraolospurio@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-24drm/i915: request ring to be pinned above GUC_WOPCM_TOPDaniele Ceraolo Spurio1-1/+1
GuC will validate the ring offset and fail if it is in the [0, GUC_WOPCM_TOP) range. The bias is conditionally applied only if GuC loading is enabled (we can't check for guc submission enabled as in other cases because HuC loading requires this fix). Note that the default context is processed before enable_guc_loading is sanitized, so we might still apply the bias to its ring even if it is not needed. v2: compute the value during ctx init and pass it to intel_ring_pin (Chris), updated commit message Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-18drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfuncChris Wilson1-1/+3
A fairly trivial move of a matching pair of routines (for preparing a request for construction) onto an engine vfunc. The ulterior motive is to be able to create a mock request implementation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
2016-12-18drm/i915/execlists: Request the kernel context be pinned highChris Wilson1-2/+6
PIN_HIGH is an expensive operation (in comparison to allocating from the hole stack) unsuitable for frequent use (such as switching between contexts). However, the kernel context should be pinned just once for the lifetime of the driver, and here it is appropriate to keep it out of the mappable range (in order to maximise mappable space for users). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-6-chris@chris-wilson.co.uk
2016-12-18drm/i915: Unify active context tracking between legacy/execlists/gucChris Wilson1-41/+21
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18drm/i915: Move intel_lrc_context_pin() to avoid the forward declarationChris Wilson1-67/+65
Just a simple move to avoid a forward declaration, though the diff likes to present itself as a move of intel_logical_ring_alloc_request_extras() in the opposite direction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-2-chris@chris-wilson.co.uk
2016-12-16drm/i915: Fix use after free in logical_render_ring_initTvrtko Ursulin1-6/+1
Commit 3b3f1650b1ca ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines") introduced the dynanically allocated engine instances and created an potential use after free scenario in logical_render_ring_init where lrc_destroy_wa_ctx_obj could be called after the engine instance has been freed. This can only happen during engine setup/init error handling which luckily does not happen ever in practice. Fix is to not call lrc_destroy_wa_ctx_obj since it would have already been executed from the preceding engine cleanup. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 3b3f1650b1ca ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1481894322-2145-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-12-05drm/i915/execlists: Use list_safe_reset_next() instead of opencodingChris Wilson1-1/+1
list.h provides a macro for updating the next element in a safe list-iter, so let's use it so that it is hopefully clearer to the reader about the unusual behaviour, and also easier to grep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-6-chris@chris-wilson.co.uk
2016-12-01drm/i915: Make GEM object create and create from data take dev_privTvrtko Ursulin1-2/+2
Makes all GEM object constructors consistent. v2: Fix compilation in GVT code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-11-29Revert "drm/i915/execlists: Use a local lock for dfs_link access"Chris Wilson1-5/+2
This reverts commit 27745e829a5c ("drm/i915/execlists: Use a local lock for dfs_link access") as the struct_mutex was required to prevent concurrent retiring and freeing, now restored in the previous patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161128143649.4289-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-17drm/i915: fix the dequeue logic for single_port_submission contextMin He1-1/+2
For a single_port_submission context, GVT expects that it can only be submitted to port 0, and there shouldn't be any other context in port 1 at the same time. This is required by GVT-g context to have an opportunity to save/restore some non-hw context render registers. This patch is to workaround GVT-g. v2: optimized code by following Chris's advice, and added more comments to explain the patch. v3: followed the coding style. Signed-off-by: Min He <min.he@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1479305104-17049-1-git-send-email-min.he@intel.com
2016-11-17drm/i915/execlists: Use a local lock for dfs_link accessChris Wilson1-2/+5
Avoid requiring struct_mutex for exclusive access to the temporary dfs_link inside the i915_dependency as not all callers may want to touch struct_mutex. So rather than force them to take a highly contended lock, introduce a local lock for the execlists schedule operation. Reported-by: David Weinehall <david.weinehall@linux.intel.com> Fixes: 9a151987d709 ("drm/i915: Add execution priority boosting for mmioflips") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116152721.11053-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-15drm/i915/scheduler: Execute requests in order of prioritiesChris Wilson1-8/+143
Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
2016-11-15drm/i915: Remove engine->execlist_lockChris Wilson1-4/+3
The execlist_lock is now completely subsumed by the engine->timeline->lock, and so we can remove the redundant layer of locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk
2016-11-15drm/i915: Defer transfer onto execution timeline to actual hw submissionChris Wilson1-9/+14
Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
2016-11-07drm/i915: Make sure engines are idle during GPU idling in LR modeImre Deak1-0/+22
We assume that the GPU is idle once receiving the seqno via the last request's user interrupt. In execlist mode the corresponding context completed interrupt can be delayed though and until this latter interrupt arrives we consider the request to be pending on the ELSP submit port. This can cause a problem during system suspend where this last request will be seen by the resume code as still pending. Such pending requests are normally replayed after a GPU reset, but during resume we reset both SW and HW tracking of the ring head/tail pointers, so replaying the pending request with its stale tail pointer will leave the ring in an inconsistent state. A subsequent request submission can lead then to the GPU executing from uninitialized area in the ring behind the above stale tail pointer. Fix this by making sure any pending request on the ELSP port is completed before suspending. I used a polling wait since the completion time I measured was <1ms and since normally we only need to wait during system suspend. GPU idling during runtime suspend is scheduled with a delay (currently 50-100ms) after the retirement of the last request at which point the context completed interrupt must have arrived already. The chance of this bug was increased by commit 1c777c5d1dcdf8fa0223fcff35fb387b5bb9517a Author: Imre Deak <imre.deak@intel.com> Date: Wed Oct 12 17:46:37 2016 +0300 drm/i915/hsw: Fix GPU hang during resume from S3-devices state but it could happen even without the explicit GPU reset, since we disable interrupts afterwards during the suspend sequence. v2: - Do an unlocked poll-wait first. (Chris) v3-4: - s/intel_lr_engines_idle/intel_execlists_idle/ and move i915.enable_execlists check to the new helper. (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470 Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com
2016-10-28drm/i915: Defer breadcrumb emissionChris Wilson1-78/+42
Move the actual emission of the breadcrumb for closing the request from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
2016-10-28drm/i915: Record space required for breadcrumb emissionChris Wilson1-0/+6
In the next patch, we will use deferred breadcrumb emission. That requires reserving sufficient space in the ringbuffer to emit the breadcrumb, which first requires us to know how large the breadcrumb is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
2016-10-28drm/i915: Rename ->emit_request to ->emit_breadcrumbChris Wilson1-5/+5
Now that the emission of the request tail and its submission to hardware are two separate steps, engine->emit_request() is confusing. engine->emit_request() is called to emit the breadcrumb commands for the request into the ring, name it such (engine->emit_breadcrumb). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
2016-10-28drm/i915: Introduce a global_seqno for each requestChris Wilson1-2/+2
Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28drm/i915: Refactor object page APIChris Wilson1-3/+3
The plan is to make obtaining the backing storage for the object avoid struct_mutex (i.e. use its own locking). The first step is to update the API so that normal users only call pin/unpin whilst working on the backing storage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
2016-10-28drm/i915: Reuse the active golden render state batchChris Wilson1-1/+1
The golden render state is constant, but we recreate the batch setting it up for every new context. If we keep that batch in a volatile cache we can safely reuse it whenever we need to initialise a new context. We mark the pages as purgeable and use the shrinker to recover pages from the batch whenever we face memory pressues, recreating that batch afresh on the next new context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-8-chris@chris-wilson.co.uk
2016-10-14drm/i915: Allocate intel_engine_cs structure only for the enabled enginesAkash Goel1-5/+6
With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-07drm/i915/guc: Unwind GuC workqueue reservation if request construction failsChris Wilson1-7/+10
We reserve space in the GuC workqueue for submitting the request in the future. However, if we fail to construct the request, we need to give that reserved space back to the system. Fixes: dadd481bfe55 ("drm/i915/guc: Prepare for nonblocking execbuf submission") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97978 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-4-chris@chris-wilson.co.uk
2016-10-07drm/i915: Reset the breadcrumbs IRQ more carefullyChris Wilson1-1/+1
Along with the interrupt, we want to restore the fake-irq and wait-timeout detection. If we use the breadcrumbs interface to setup the interrupt as it wants, the auxiliary timers will also be restored. v2: Cancel both timers as well, sanitize the IMR. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk
2016-10-05drm/i915/execlists: Move clearing submission count from reset to initChris Wilson1-4/+5
After a GPU reset, we want to replay our queue of requests. However, the GPU reset clobbered the state and we only fixup the state for the guilty request - and engines deemed innocent we try to leave untouched so that we recover as completely as possible. However, we need to clear the sw tracking of the ELSP ports even for innocent requests, so move the clear to the common path of init_hw (from reset_hw). Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-3-chris@chris-wilson.co.uk
2016-10-05drm/i915/execlists: Reinitialise context image after GPU hangChris Wilson1-41/+62
On Braswell, at least, we observe that the context image is written in multiple phases. The first phase is to clear the register state, and subsequently rewrite it. A GPU reset at the right moment can interrupt the context update leaving it corrupt, and our update of the RING_HEAD is not sufficient to restart the engine afterwards. To recover, we need to reset the registers back to their original values. The context state is lost. What we need is a better mechanism to serialise the reset with pending flushes from the GPU. Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-2-chris@chris-wilson.co.uk
2016-10-05drm/i915: Share the computation of ring size for RING_CTL registerChris Wilson1-1/+1
Since both legacy and execlists want to populate the RING_CTL register, share the computation of the right bits for the ring->size. We can then stop masking errors and explicitly forbid them during creation! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-1-chris@chris-wilson.co.uk
2016-09-26drm/i915/skl: drop workarounds for E0 revisionJani Nikula1-3/+2
Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/0633a02177195703502ef2396aab03efc0314334.1474034059.git.jani.nikula@intel.com
2016-09-26drm/i915/skl: drop workarounds for D0 revisionJani Nikula1-6/+4
Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/d28d21ceddeec226b5d1a20a7382bee9a72709a4.1474034059.git.jani.nikula@intel.com
2016-09-26drm/i915/skl: drop workarounds for A0 and B0 revisionsJani Nikula1-5/+3
Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/7929af62a68504c84038a8db1625bd96ebaa9e6f.1474034059.git.jani.nikula@intel.com
2016-09-21drm/i915/execlists: Reset RING registers upon resumeChris Wilson1-19/+31
There is a disparity in the context image saved to disk and our own bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the ring but may not tell the GPU about them, the GPU may be lagging behind our bookkeeping. Upon hibernation we do not save stolen pages, presuming that their contents are volatile. This means that although we start writing into the ring at tail, the GPU starts executing from its HEAD and there may be some garbage in between and so the GPU promptly hangs upon resume. Testcase: igt/gem_exec_suspend/basic-S4 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk
2016-09-15drm/i915/guc: general tidying up (submission)Dave Gordon1-1/+1
Renaming to more consistent scheme, and updating comments, mostly about i915_guc_wq_reserve(), aka i915_guc_wq_check_space(). Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-4-git-send-email-david.s.gordon@intel.com Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-09-09drm/i915: Drive request submission through fence callbacksChris Wilson1-2/+3
Drive final request submission from a callback from the fence. This way the request is queued until all dependencies are resolved, at which point it is handed to the backend for queueing to hardware. At this point, no dependencies are set on the request, so the callback is immediate. A side-effect of imposing a heavier-irqsafe spinlock for execlist submission is that we lose the softirq enabling after scheduling the execlists tasklet. To compensate, we manually kickstart the softirq by disabling and enabling the bh around the fence signaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
2016-09-09drm/i915: Update reset path to fix incomplete requestsChris Wilson1-5/+44
Update reset path in preparation for engine reset which requires identification of incomplete requests and associated context and fixing their state so that engine can resume correctly after reset. The request that caused the hang will be skipped and head is reset to the start of breadcrumb. This allows us to resume from where we left-off. Since this request didn't complete normally we also need to cleanup elsp queue manually. This is vital if we employ nonblocking request submission where we may have a web of dependencies upon the hung request and so advancing the seqno manually is no longer trivial. ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS We change the way we count pending batches. Only the active context involved in the reset is marked as either innocent or guilty, and not mark the entire world as pending. By inspection this only affects igt/gem_reset_stats (which assumes implementation details) and not piglit. ARB_robustness gives this guide on how we expect the user of this interface to behave: * Provide a mechanism for an OpenGL application to learn about graphics resets that affect the context. When a graphics reset occurs, the OpenGL context becomes unusable and the application must create a new context to continue operation. Detecting a graphics reset happens through an inexpensive query. And with regards to the actual meaning of the reset values: Certain events can result in a reset of the GL context. Such a reset causes all context state to be lost. Recovery from such events requires recreation of all objects in the affected context. The current status of the graphics reset state is returned by enum GetGraphicsResetStatusARB(); The symbolic constant returned indicates if the GL context has been in a reset state at any point since the last call to GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context has not been in a reset state since the last call. GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected that is attributable to the current GL context. INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that is not attributable to the current GL context. UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose cause is unknown. The language here is explicit in that we must mark up the guilty batch, but is loose enough for us to relax the innocent (i.e. pending) accounting as only the active batches are involved with the reset. In the future, we are looking towards single engine resetting (with minimal locking), where it seems inappropriate to mark the entire world as innocent since the reset occurred on a different engine. Reducing the information available means we only have to encounter the pain once, and also reduces the information leaking from one context to another. v2: Legacy ringbuffer submission required a reset following hibernation, or else we restore stale values to the RING_HEAD and walked over stolen garbage. v3: GuC requires replaying the requests after a reset. v4: Restore engine IRQ after reset (so waiters will be woken!) Rearm hangcheck if resetting with a waiter. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
2016-09-09drm/i915: Simplify ELSP queue request trackingChris Wilson1-234/+168
Emulate HW to track and manage ELSP queue. A set of SW ports are defined and requests are assigned to these ports before submitting them to HW. This helps in cleaning up incomplete requests during reset recovery easier especially after engine reset by decoupling elsp queue management. This will become more clear in the next patch. In the engine reset case we want to resume where we left-off after skipping the incomplete batch which requires checking the elsp queue, removing element and fixing elsp_submitted counts in some cases. Instead of directly manipulating the elsp queue from reset path we can examine these ports, fix up ringbuffer pointers using the incomplete request and restart submissions again after reset. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-3-git-send-email-arun.siluvery@linux.intel.com Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-6-chris@chris-wilson.co.uk
2016-09-09drm/i915: Reorder submitting the requests to ELSPChris Wilson1-38/+38
Just rearrange the code to reduce churn in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-5-chris@chris-wilson.co.uk
2016-09-09drm/i915: Compute the ELSP register location onceChris Wilson1-9/+7
Similar to the issue with reading from the context status buffer, see commit 26720ab97fea ("drm/i915: Move CSB MMIO reads out of the execlists lock"), we frequently write to the ELSP register (4 writes per interrupt) and know we hold the required spinlock and forcewake throughout. We can further reduce the cost of writing these registers beyond the I915_WRITE_FW() by precomputing the address of the ELSP register. We also note that the subsequent read serves no purpose here, and are happy to see it go. v2: Address I915_WRITE mistakes in changelog text data bss dec hex filename 1259784 4581 576 1264941 134d2d drivers/gpu/drm/i915/i915.ko 1259720 4581 576 1264877 134ced drivers/gpu/drm/i915/i915.ko Saves 64 bytes of address recomputation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-4-chris@chris-wilson.co.uk
2016-09-09drm/i915: Record the position of the workarounds in the tail of the requestChris Wilson1-2/+2
Rather than blindly assuming we need to advance the tail for resubmitting the request via the ELSP, record the position. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-3-chris@chris-wilson.co.uk