while we're on it, how much performance impact does it have on the CPU when it has to deal with NVMe and 10Gbe? o_o there's only the DMI and a x16 connection reaching between the CPU and the other parts of the system...
short answer if I get the question right: you need a really special case to see performance degradation.
long and boring explanation:
DMI v3 x4 goes to the PCH (plus there is a x1 PCIe uplink) so the theoretical bandwidth is something like 3.93GBps + 925MBps between the CPU and everything the PCH handles (including the 2 NVME x4 drivers in case of the 4 DIMM version of the mobo). so yes, theoritically you can create a bottleneck if you plan to use extremely fast m.2 drives like 2 970 EVOs and/or using up the total bandwidth of the 2 USB 3.0 interfaces, too. in an I/O-intensive case like this where you frequently need 4GBps (gigabyte per second) data though the PCH you might want to go with PCIe version of storage instead installed directly to the x16 slot (with or without bifurcation). this won't happen with the 6-DIMM version tho where there are no m.2 slots.
If you use PCIe bifurcation appropriately (e.g. x4 for the 10G NIC, another x4 for storage and e.g. x8 for a GPU) you most likely will not see any performance impacts on any devices.
Asrock used Intel's recommendation to connect the CPU with PCH because in C621 there is no QAT (quick assist technology - dedicated hardware to offload e.g. encryption from the CPU) nor 4x10G NIC embedded.
using the PCIe lanes of the CPU typically does not create a bottleneck, I have never seen a case where using all PCIe lanes of a CPU made them slowing down. depending on how you distribute your resources across the physical cores however might be important (cpu pinning, cpu isolation, etc)
a typical 2x10G NIC does not even eat up x4 PCIe (theoritical bandwith of x4 is ~32Gbps - gigabit per second) for that you need a 40G NIC. using up all 16 lanes your best choice is a 100G NIC (e.g. mellanox connectx-5) and there is still significant overhead remains.
and modern systems (at least since sandy bridge) have DMA between the NIC and the CPU so actually making the packet reach the CPU L3 cache is almost 'effortless', done by purely hardware so does not use processing power on the actual CPU cores (apart from handling interrupts if you are in interrupt mode but at high packet rates even linux can switch from interrupt mode to poll mode making packet processing more efficient but that is a completely different topic again
)
so you can choose an underpowered CPU for your workload like a W-2125 (4 cores/8 HT) and running high resolution virtual reality with streaming then the CPU itself will not be enough to serve your storage/GPU/NIC.