summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/uc/intel_guc.h
diff options
context:
space:
mode:
authorMatthew Brost <matthew.brost@intel.com>2022-08-17 05:05:11 +0300
committerJohn Harrison <John.C.Harrison@Intel.com>2022-08-19 01:23:47 +0300
commit6a079903847cce1dd06345127d2a32f26d2cd9c6 (patch)
treed8506939a083fca3de3f6fb82f113c526632915a /drivers/gpu/drm/i915/gt/uc/intel_guc.h
parent61faec5fa66cbd1afcd5074f168f09529f8119bf (diff)
downloadlinux-6a079903847cce1dd06345127d2a32f26d2cd9c6.tar.xz
drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Add a delay, configurable via debugfs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a costly operation as it requires synchronizing with the GuC. So the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than a given threshold (default is 3/4) of the guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in a way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means the totality of all 36 containers and their workloads were not saturating the engines to their max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent whenever a context goes idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
Diffstat (limited to 'drivers/gpu/drm/i915/gt/uc/intel_guc.h')
-rw-r--r--drivers/gpu/drm/i915/gt/uc/intel_guc.h17
1 files changed, 16 insertions, 1 deletions
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 804133df1ac9..944b549b8797 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -113,6 +113,10 @@ struct intel_guc {
*/
struct list_head guc_id_list;
/**
+ * @guc_ids_in_use: Number single-lrc guc_ids in use
+ */
+ u16 guc_ids_in_use;
+ /**
* @destroyed_contexts: list of contexts waiting to be destroyed
* (deregistered with the GuC)
*/
@@ -132,6 +136,16 @@ struct intel_guc {
* @reset_fail_mask: mask of engines that failed to reset
*/
intel_engine_mask_t reset_fail_mask;
+ /**
+ * @sched_disable_delay_ms: schedule disable delay, in ms, for
+ * contexts
+ */
+ u64 sched_disable_delay_ms;
+ /**
+ * @sched_disable_gucid_threshold: threshold of min remaining available
+ * guc_ids before we start bypassing the schedule disable delay
+ */
+ int sched_disable_gucid_threshold;
} submission_state;
/**
@@ -461,9 +475,10 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc);
void intel_guc_submission_cancel_requests(struct intel_guc *guc);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
+void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p);
void intel_guc_write_barrier(struct intel_guc *guc);
-void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p);
+int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc);
#endif