In UE mobile development, the rendering resource initialization process after resource loading often becomes a performance bottleneck. Although resource loading is asynchronous to the Game thread, after loading completes, the Game thread sends a large number of resource initialization commands to the Render thread. When Texture/Buffer creation accumulates on the Render/RHI threads, it indirectly blocks the Game thread (UE’s multi-threading mechanism requires the Render thread to be at most 1 frame behind the Game thread).
Our project has tried several optimization approaches, such as preventing GLES buffer creation from blocking the RHI thread, creating Texture2DArray without flush and locking/unlocking all slices at once, and disabling SubmitOnTextureUnlock on the Vulkan platform. These measures can alleviate stuttering in certain scenarios. However, as map complexity increases, similar issues persist.
The following image shows a typical call stack causing Game thread stuttering due to this issue:
By analyzing the function call stack, we can see that stuttering is primarily caused by the render thread executing large amounts of FRenderResource::InitRHI. In our project, this mainly includes the following resource types:
We introduce new AuxiliaryRHI threads that, similar to the render thread, receive tasks via ENQUEUE_AUXILIARY_RHI_RESOURCE_COMMAND
. If multiple auxiliary threads are enabled, we use ENQUEUE_AUXILIARY_RHI_RESOURCE_COMMAND_ROUND_ROBIN
to distribute creation tasks across different auxiliary threads in a round-robin fashion.
void FColorVertexBuffer::InitRHI_AuxiliaryThread()
{
// Create thread-safe handle to query if the command has completed
AuxiliaryRHIStatusHandle = CreateThreadSafeHandle();
ENQUEUE_AUXILIARY_RHI_RESOURCE_COMMAND_ROUND_ROBIN(ColorVertexBuffer_CreateRHIBuffer_AuxiliaryThread)
([this](int MyContextIndex)
{
RHIBeginResourceCommands_AuxiliaryThread(AuxiliaryRHIStatusHandle, MyContextIndex);
QUICK_SCOPE_CYCLE_COUNTER(FColorVertexBufferInitRHI_AuxiliaryThread);
RHICreateVertexBuffer_AuxiliaryThread(SizeInBytes, Usage, State, CreateInfo, ContextIndex);
ColorComponentsSRV = RHICreateShaderResourceView_AuxiliaryThread(
FShaderResourceViewInitializer(VertexData ? VertexBufferRHI : nullptr, PF_R8G8B8A8),
MyContextIndex
);
RHIEndResourceCommands_AuxiliaryThread(MyContextIndex);
});
}
At the RHI layer, we need to implement corresponding resource creation interfaces:
For Vulkan/Metal, CommandBuffer needs to be maintained separately across different threads. We also need to create corresponding CommandContext for the AuxilairyRHI thread, using the ContextIndex calculated by ENQUEUE_AUXILIARY_RHI_RESOURCE_COMMAND_XXX
to get the Commandbuffer for the appropriate thread.
Additionally, we need to add:
We handle dependency issues by modifying the AsyncLoading module:
This approach avoids the missing draw issues from skip-draw methods seen in [1], though the downside is increased loading delay. Currently, we only enable this through script control during stages that aren’t sensitive to loading speed (such as in-game level streaming). It remains disabled in scenarios with frequent loading, like map transitions or lobbies.
The initialization that was originally on the render thread has been moved to the Auxiliary thread and no longer blocks the execution of the Game thread.
[[UOD2021]《黎明觉醒》中的移动端实时GI和多线程渲染优化 | 光子工作室 魏知晓](https://www.bilibili.com/video/av978281593/) |