To add to this,
PCPer have dome some interesting testing, showing the latency impact of inter-CCX vs. inter-core latency within Ryzen (and presumably within Naples, as that also uses the same Infinity Fabric interconnect within the package).
That is AMD's hope. The problem is getting developers on board: it didn't work for Intel with Itanium, and it didn't work for AMD with HSA. On the one hand, splitting a workload into more threads isn't
quite as hard as switching to a new instruction set, or threading to the 'embarrassingly parallel' degree HSA demands; but on the other we've had a decade of quad-core CPUs in general availability and nearly as long with 8-thread capable CPUs (either 8-core or 4-core with SMT), and games have stubbornly remained utilising 2/3 cores for the most part. Some workloads just can't be split effectively. We've also seen in the smartphone industry and accelerated development from single core SoCs, to dual-core and quad-core, to quad-core, to six-core and eight-core, and the 'happy medium' most have settled on once heterogenous cores have been added to the mix is a pair of 'big' cores for high-perfomrance tasks and a pair of smaller cores for background tasks. I believe we may see desktop processors trending this way too: we already have a similar non-silicon implementation in 'turbo boost', with one or two cores clocking higher and the others clocking down, to make use of the available power budget. And as process technology continues to shrink, the power budget becomes more and more of the physical limit on performance (i.e. you can only shove so many electrons through a given die before Bad Things occur at the atomic level).
If I were to bet on it, I'd bet on in a few generations time (maybe after Ice Lake, or maybe with one more iteration on the Core microarchitecture to wring out the last bit of value) we'll see a CPU with a mix of two-four 'big' cores, and the GPU removed and replaced with a Xeon PHi-derived cluster of small x86 cores, the ghost of Larrabee walking again.