|  | ========================================= | 
|  | I915 GuC Submission/DRM Scheduler Section | 
|  | ========================================= | 
|  |  | 
|  | Upstream plan | 
|  | ============= | 
|  | For upstream the overall plan for landing GuC submission and integrating the | 
|  | i915 with the DRM scheduler is: | 
|  |  | 
|  | * Merge basic GuC submission | 
|  | * Basic submission support for all gen11+ platforms | 
|  | * Not enabled by default on any current platforms but can be enabled via | 
|  | modparam enable_guc | 
|  | * Lots of rework will need to be done to integrate with DRM scheduler so | 
|  | no need to nit pick everything in the code, it just should be | 
|  | functional, no major coding style / layering errors, and not regress | 
|  | execlists | 
|  | * Update IGTs / selftests as needed to work with GuC submission | 
|  | * Enable CI on supported platforms for a baseline | 
|  | * Rework / get CI heathly for GuC submission in place as needed | 
|  | * Merge new parallel submission uAPI | 
|  | * Bonding uAPI completely incompatible with GuC submission, plus it has | 
|  | severe design issues in general, which is why we want to retire it no | 
|  | matter what | 
|  | * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step | 
|  | which configures a slot with N contexts | 
|  | * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to | 
|  | a slot in a single execbuf IOCTL and the batches run on the GPU in | 
|  | paralllel | 
|  | * Initially only for GuC submission but execlists can be supported if | 
|  | needed | 
|  | * Convert the i915 to use the DRM scheduler | 
|  | * GuC submission backend fully integrated with DRM scheduler | 
|  | * All request queues removed from backend (e.g. all backpressure | 
|  | handled in DRM scheduler) | 
|  | * Resets / cancels hook in DRM scheduler | 
|  | * Watchdog hooks into DRM scheduler | 
|  | * Lots of complexity of the GuC backend can be pulled out once | 
|  | integrated with DRM scheduler (e.g. state machine gets | 
|  | simpler, locking gets simpler, etc...) | 
|  | * Execlists backend will minimum required to hook in the DRM scheduler | 
|  | * Legacy interface | 
|  | * Features like timeslicing / preemption / virtual engines would | 
|  | be difficult to integrate with the DRM scheduler and these | 
|  | features are not required for GuC submission as the GuC does | 
|  | these things for us | 
|  | * ROI low on fully integrating into DRM scheduler | 
|  | * Fully integrating would add lots of complexity to DRM | 
|  | scheduler | 
|  | * Port i915 priority inheritance / boosting feature in DRM scheduler | 
|  | * Used for i915 page flip, may be useful to other DRM drivers as | 
|  | well | 
|  | * Will be an optional feature in the DRM scheduler | 
|  | * Remove in-order completion assumptions from DRM scheduler | 
|  | * Even when using the DRM scheduler the backends will handle | 
|  | preemption, timeslicing, etc... so it is possible for jobs to | 
|  | finish out of order | 
|  | * Pull out i915 priority levels and use DRM priority levels | 
|  | * Optimize DRM scheduler as needed | 
|  |  | 
|  | TODOs for GuC submission upstream | 
|  | ================================= | 
|  |  | 
|  | * Need an update to GuC firmware / i915 to enable error state capture | 
|  | * Open source tool to decode GuC logs | 
|  | * Public GuC spec | 
|  |  | 
|  | New uAPI for basic GuC submission | 
|  | ================================= | 
|  | No major changes are required to the uAPI for basic GuC submission. The only | 
|  | change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. | 
|  | This attribute indicates the 2k i915 user priority levels are statically mapped | 
|  | into 3 levels as follows: | 
|  |  | 
|  | * -1k to -1 Low priority | 
|  | * 0 Medium priority | 
|  | * 1 to 1k High priority | 
|  |  | 
|  | This is needed because the GuC only has 4 priority bands. The highest priority | 
|  | band is reserved with the kernel. This aligns with the DRM scheduler priority | 
|  | levels too. | 
|  |  | 
|  | Spec references: | 
|  | ---------------- | 
|  | * https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt | 
|  | * https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority | 
|  | * https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t | 
|  |  | 
|  | New parallel submission uAPI | 
|  | ============================ | 
|  | The existing bonding uAPI is completely broken with GuC submission because | 
|  | whether a submission is a single context submit or parallel submit isn't known | 
|  | until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple | 
|  | contexts in parallel with the GuC the context must be explicitly registered with | 
|  | N contexts and all N contexts must be submitted in a single command to the GuC. | 
|  | The GuC interfaces do not support dynamically changing between N contexts as the | 
|  | bonding uAPI does. Hence the need for a new parallel submission interface. Also | 
|  | the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore | 
|  | I915_SUBMIT_FENCE is by design a future fence, so not really something we should | 
|  | continue to support. | 
|  |  | 
|  | The new parallel submission uAPI consists of 3 parts: | 
|  |  | 
|  | * Export engines logical mapping | 
|  | * A 'set_parallel' extension to configure contexts for parallel | 
|  | submission | 
|  | * Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL | 
|  |  | 
|  | Export engines logical mapping | 
|  | ------------------------------ | 
|  | Certain use cases require BBs to be placed on engine instances in logical order | 
|  | (e.g. split-frame on gen11+). The logical mapping of engine instances can change | 
|  | based on fusing. Rather than making UMDs be aware of fusing, simply expose the | 
|  | logical mapping with the existing query engine info IOCTL. Also the GuC | 
|  | submission interface currently only supports submitting multiple contexts to | 
|  | engines in logical order which is a new requirement compared to execlists. | 
|  | Lastly, all current platforms have at most 2 engine instances and the logical | 
|  | order is the same as uAPI order. This will change on platforms with more than 2 | 
|  | engine instances. | 
|  |  | 
|  | A single bit will be added to drm_i915_engine_info.flags indicating that the | 
|  | logical instance has been returned and a new field, | 
|  | drm_i915_engine_info.logical_instance, returns the logical instance. | 
|  |  | 
|  | A 'set_parallel' extension to configure contexts for parallel submission | 
|  | ------------------------------------------------------------------------ | 
|  | The 'set_parallel' extension configures a slot for parallel submission of N BBs. | 
|  | It is a setup step that must be called before using any of the contexts. See | 
|  | I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for | 
|  | similar existing examples. Once a slot is configured for parallel submission the | 
|  | execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only | 
|  | supports GuC submission. Execlists supports can be added later if needed. | 
|  |  | 
|  | Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and | 
|  | drm_i915_context_engines_parallel_submit to the uAPI to implement this | 
|  | extension. | 
|  |  | 
|  | .. c:namespace-push:: rfc | 
|  |  | 
|  | .. kernel-doc:: include/uapi/drm/i915_drm.h | 
|  | :functions: i915_context_engines_parallel_submit | 
|  |  | 
|  | .. c:namespace-pop:: | 
|  |  | 
|  | Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL | 
|  | ------------------------------------------------------------------- | 
|  | Contexts that have been configured with the 'set_parallel' extension can only | 
|  | submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects | 
|  | in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is | 
|  | set. The number of BBs is implicit based on the slot submitted and how it has | 
|  | been configured by 'set_parallel' or other extensions. No uAPI changes are | 
|  | required to the execbuf2 IOCTL. |