双缓冲 - Vulkan 指南

在我们开始实现描述符集以改进向 GPU 发送数据之前，我们需要做一些事情。目前，引擎一次只能执行一帧，这不是最优的。当 GPU 忙于绘制一帧时，CPU 正在等待该帧结束。性能会受到巨大影响，因为 CPU 将花费大量时间等待 GPU。我们将重构引擎中的一些内容，以实现渲染工作的双缓冲。当 GPU 忙于绘制帧 N 时，CPU 将准备帧 N+1 的工作。这样，CPU 将在 GPU 运行时执行工作，而不是等待。这不会增加额外的延迟，并且会大大提高性能。可以使 CPU 提前渲染更多帧，如果您的 CPU 工作差异很大，这可能很有用，但总的来说，仅重叠下一帧就足够并且效果良好。

对象生命周期

大多数 Vulkan 对象在 GPU 执行渲染工作时使用，因此在使用时无法修改或删除它们。命令缓冲区就是一个例子。一旦您将命令缓冲区提交到队列中，在该 GPU 完成执行其命令之前，该缓冲区就无法重置或修改。您可以使用 Fence 来控制这一点。如果您提交一个将发出 fence 信号的命令缓冲区，然后您等到该 fence 发出信号，您可以确定命令缓冲区现在可以重用或修改。对于这些命令中使用的其他相关对象也是如此。

Frame 结构体

我们将把一些与渲染相关的结构从核心 VulkanEngine 类移动到 “Frame” 结构体中。这样我们可以更好地控制它们的生命周期。

vk_engine.h

struct FrameData {
	VkSemaphore _presentSemaphore, _renderSemaphore;
	VkFence _renderFence;	

	VkCommandPool _commandPool;
	VkCommandBuffer _mainCommandBuffer;
};

我们正在将这些结构（信号量、fence、命令池和命令缓冲区）从核心类移动到结构体中。也从类中删除它们。

在它的位置，我们添加一个 FrameData 结构体的固定数组。

//number of frames to overlap when rendering
constexpr unsigned int FRAME_OVERLAP = 2;

class VulkanEngine {
public:


//other code ....
//frame storage
FrameData _frames[FRAME_OVERLAP];

//getter for the frame we are rendering to right now.
FrameData& get_current_frame();

//other code ....
}

get_current_frame() 的实现将是这样的。

FrameData& VulkanEngine::get_current_frame()
{
	return _frames[_frameNumber % FRAME_OVERLAP];
}

每次我们渲染一帧时，_frameNumber 都会增加 1。这在这里非常有用。使用帧重叠 2（默认值），这意味着偶数帧将使用 _frames[0]，而奇数帧将使用 _frames[1]。当 GPU 忙于执行来自帧 0 的渲染命令时，CPU 将写入帧 1 的缓冲区，反之亦然。

现在我们需要修改引擎上的同步结构和命令缓冲区结构，以便它们使用这个 _frames 结构体。

在 init_commands() 函数中，我们将其更改为循环，该循环初始化两个帧的命令

void VulkanEngine::init_commands()
{
	//create a command pool for commands submitted to the graphics queue.
	//we also want the pool to allow for resetting of individual command buffers
	VkCommandPoolCreateInfo commandPoolInfo = vkinit::command_pool_create_info(_graphicsQueueFamily, VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT);

	for (int i = 0; i < FRAME_OVERLAP; i++) {

	
		VK_CHECK(vkCreateCommandPool(_device, &commandPoolInfo, nullptr, &_frames[i]._commandPool));

		//allocate the default command buffer that we will use for rendering
		VkCommandBufferAllocateInfo cmdAllocInfo = vkinit::command_buffer_allocate_info(_frames[i]._commandPool, 1);

		VK_CHECK(vkAllocateCommandBuffers(_device, &cmdAllocInfo, &_frames[i]._mainCommandBuffer));

		_mainDeletionQueue.push_function([=]() {
			vkDestroyCommandPool(_device, _frames[i]._commandPool, nullptr);
		});
	}
}

请注意，我们正在创建 2 个独立的命令池。现在这不是绝对必要的，但如果您每帧创建多个命令缓冲区并希望一次性删除它们，则更为必要。（重置命令池将重置从中创建的所有命令缓冲区）

在 init_sync_structures() 函数中，我们还为每个帧创建一组信号量和 fence

void VulkanEngine::init_sync_structures()
{	
	VkFenceCreateInfo fenceCreateInfo = vkinit::fence_create_info(VK_FENCE_CREATE_SIGNALED_BIT);

	VkSemaphoreCreateInfo semaphoreCreateInfo = vkinit::semaphore_create_info();

	for (int i = 0; i < FRAME_OVERLAP; i++) {     

        VK_CHECK(vkCreateFence(_device, &fenceCreateInfo, nullptr, &_frames[i]._renderFence));

        //enqueue the destruction of the fence
        _mainDeletionQueue.push_function([=]() {
            vkDestroyFence(_device, _frames[i]._renderFence, nullptr);
            });


        VK_CHECK(vkCreateSemaphore(_device, &semaphoreCreateInfo, nullptr, &_frames[i]._presentSemaphore));
        VK_CHECK(vkCreateSemaphore(_device, &semaphoreCreateInfo, nullptr, &_frames[i]._renderSemaphore));

        //enqueue the destruction of semaphores
        _mainDeletionQueue.push_function([=]() {
            vkDestroySemaphore(_device, _frames[i]._presentSemaphore, nullptr);
            vkDestroySemaphore(_device, _frames[i]._renderSemaphore, nullptr);
            });
	}
}

有了这个，我们已经创建了多个帧所需的结构，所以现在我们需要更改渲染循环以使用它们

在 draw() 函数中，将每个 _renderFence 的用法更改为 get_current_frame()._renderFence。对以下内容执行完全相同的操作：_mainCommandBuffer _presentSemaphore _renderSemaphore

示例

    //wait until the GPU has finished rendering the last frame. Timeout of 1 second
	VK_CHECK(vkWaitForFences(_device, 1, &get_current_frame()._renderFence, true, 1000000000));
	VK_CHECK(vkResetFences(_device, 1, &get_current_frame()._renderFence));

    //now that we are sure that the commands finished executing, we can safely reset the command buffer to begin recording again.
	VK_CHECK(vkResetCommandBuffer(get_current_frame()._mainCommandBuffer, 0));

此时，帧重叠应该可以正常工作了。尝试编译并运行程序，并检查验证层是否没有报错。如果有疑问，请将其与本章的示例代码进行比较。

您可以尝试增加 FRAME_OVERLAP 值。通过增加它，在 CPU 比 GPU 快的情况下，您将为程序增加更多延迟。保持在 2，如果您的帧率抖动，可以将其增加到 3 是正常做法。您也可以将其设置为 1 以完全禁用所有帧重叠。

现在我们已经更好地完成了 CPU-GPU 工作重叠，是时候进行描述符集了。

下一步：描述符集