HΛD☰S B☰ΛST - Intel Hades Canyon NUC with EXP GDC Beast eGPU

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
So, the new Hades Canyon NUC is on the way.



Should be available in March. The big news is that it has a 24 CU Vega GPU integrated into the package. So, we're looking at GTX 1060 levels of performance combined with nearly a desktop class CPU. Basically it should handle most current games in high detail at nearly 60FPS in a 100W total package.

So, my question is... does anyone know if the Hades Canyon NUC can be modified to work with EXP GDC Beast to create a Hades Beast?

Here is a teardown video of the Hades Canyon NUC.

And here is Stu Lowes similar NUC + Beast build.

So any thoughts on pairing the Hades Canyon with an EXP GDC Beast? Possible? Practical? Or is combining Intel, AMD, and nVidia just too Pagan?

If this IS something that looks like a go, I might just give it a shot.
 
Last edited:

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
I've gotta be honest, I'm not entirely sure why you'd do this when you can use TB3 to achieve essentially the same thing. Also it'd probably make more sense to do this with the less powerful Intel/Vega Hades Canyon unit so you're not rotting all that GPU power (also save $$$). What I've actually been thinking of is doing this with the new generation NuC units that are due out soon which will be 4c8t. If you use a simple M.2 to PCIe adapter setup you'll be able to do this without the beast by using a Dynamo 360. If you wanna go one further you can do this and also add HDPlex AC-DC units for a completely brickless build.
 

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
I've gotta be honest, I'm not entirely sure why you'd do this when you can use TB3 to achieve essentially the same thing. Also it'd probably make more sense to do this with the less powerful Intel/Vega Hades Canyon unit so you're not rotting all that GPU power (also save $$$). What I've actually been thinking of is doing this with the new generation NuC units that are due out soon which will be 4c8t. If you use a simple M.2 to PCIe adapter setup you'll be able to do this without the beast by using a Dynamo 360. If you wanna go one further you can do this and also add HDPlex AC-DC units for a completely brickless build.

Good points! I'll try and address them. You'd choose PCIe instead of TB3 for performance, 20% more. You'd chose Hades Canyon over Bean Canyon for the same reason, performance. Hades Canyon should have the most powerful unlocked NUC CPU for several years. This according to the Intel NUC roadmap showing the forthcoming Bean Canyon NUCs as having only locked ultra low power (28W) CPUs. Another thing the Hades Canyon NUC has going for it connectivity. The standard NUCs have less, and a definite lack of light up LED skulls. ;)

One point I can't disagree with is ... Yes, disabling the Vega GPU seems a bit wasteful. If only we could crossfire it! However, my thought was that disabling it could provide the necessary thermal and power overhead needed to push the unlocked Hades CPU further. But, I may be off base here. We will have to see some real world benchmarks to get a feel for that overclocking potential.

As far as using a Dynamo 360 and HDPlex, I'd LOVE to see that! Even better to use solutions homegrown here at SFF with the help of Larry at HDPlex. Any plans to take this idea further? Or has someone done this already and I missed it? How did they house it? What chassis?

Because the EXP GDC Beast has multiple chassis options it is the most obvious solution. But I'm definitely open to using a Dynamo 360 and HDPlex. In fact, I'd prefer that. Just not sure to how to build and house that.
 
  • Like
Reactions: owliwar

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
As far as using a Dynamo 360 and HDPlex, I'd LOVE to see that! Even better to use solutions homegrown here at SFF with the help of Larry at HDPlex. Any plans to take this idea further? Or has someone done this already and I missed it? How did they house it? What chassis?

So basically were I going to do this (I do plan to make a 3D printable version of this eventually), I would do the following:

Oriented like the Dan A4 SFX, you would have an enclosure of roughly 270mm x 150mm x 90mm (3.45L) with a central mounting plate. On one side you would obviously have a GPU which could be of reference length (267mm) or less. This includes any of EVGA's two fan GTX 1080Ti units:


On the opposite side of the mounting plate, you would attach the HDPlex 300W AC-DC (62mm wide, 40mm deep) , the Dynamo 360 (47mm wide, 25mm deep) and a Mini STX motherboard (147mm wide, 55mm deep). You could sub in a Hades Canyon board if you wanted, but it would be expensive and unnecessary in this case. The way the Dynamo 360 works is you could just take the 6 pin to barrel connector I supply with it, run it outside the back of the case and just plug it into the motherboard's rear I/O.

To run the GPU, you would just use a powered M.2 to PCIe x16 adapter like the ones offered by Era Adapter combined with the PCIe connectors on the Dynamo 360:



This riser could be powered sufficiently by the SATA connector on the Dynamo Mini. You could then hook up a 2.5" HDD to the board for storage and you'd be ready to go. If we're lucky, the newest iteration of the AsRock Micro STX board (if it is ever released) will have more than one M.2 drive which means you wouldn't have to use a 2.5" drive for this.

This would give you the benefit of being able to choose your own full-fat CPU as well as being able to take advantage of proper cooling (ie. non-blower). Also once PCIe 4.0 hits, I am hopeful that M.2 and PCH speeds will double to an effective rate of PCIe 3.0 x8.
 
Last edited:

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
@Kmpkt I think you really are describing something completely different, no? I'm summarizing, but from your description that sounds like an A4 shoebox design albeit with an STX or other mobo and (M.2 PCIe 4x ) instead of mITX and a dynamo and HDPlex instead of SFX. That would be MUCH smaller than the Dan, so intersting. But first, I think we'd need the STX mobo form factor to take off.

Anyways, what you've described and what is in the OP are different ideas. The thought here is to modular-ize the CPU and GPU into distinct, self-contained components. There are some thermal advantages and power delivery challenges to doing this, so tradeoffs. Either way the HΛD☰S B☰ΛST is but an idea. We will need one to be released first. And then we'll need some industrious modders to get their hands on them. :)
 

IntoxicatedPuma

Customizer of Titles
SFFn Staff
Feb 26, 2016
992
1,272
Either way the HΛD☰S B☰ΛST is but an idea. We will need one to be released first

But isn't that running into the same problem you mentioned with the STX? It hasn't taken off yet so it's not totally feasible? I can understand the thermal advantages of having GPU and CPU separated, but that only has the advantage if one is overheated. I think you could get more advantage by having both running on the same cooling system. Generally they won't both be 100% stressed at the same time, which would allow the cooling to be shared between the two (i'm thinking of systems like laptop cooling where CPU and GPU share heatsink and fans)
 

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
If I'm not mistaken, the EXP GDC is an M.2 interface. Unless I'm missing something, by using the Hades Canyon NuC, you'd essentially be pulling your enclosure apart, fixing the M.2 in into the slot, and doing it back up each time. Not exactly functional modularity (my two cents anyhow). Could possibly do an M.2 riser to get an external interface, but I wouldn't trust signal integrity over one of them personally. How were you planning to implement this? Also worth mentioning, you might want to check the PCIe setup for the Hades Canyon NuC. Odds are the two M.2 slots share a single x4 connection through the PCH. If this is the case, then you can back off your storage solution (second M.2, SSD, etc) as well as any peripherals from the bandwidth available to your GPU.
 

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
You'd choose PCIe instead of TB3 for performance, 20% more.

I thought this 20% overhead only applied when you were running a TB3 eGPU on a laptop and sending the signal back to the laptop screen?
 

Reldey

Master of Cramming
Feb 14, 2017
387
405
Possibly dumb question, but this thing has two thunderbolt 3 ports on the back. Is it possible that each would have 4 lanes of connectivity? Could a device be made that makes use of multiple TB3 port's bandwidth?
 

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
Coffee lake only has 16 lanes available outside of the PCH and then a further 4 available to be divided into 24 at the PCH. I would have to imagine the bulk of these are used for the CPU-GPU interface in Coffee lake leaving all of other stuff (M.2, TB3, etc.) behind the chipset. That being said I'd love to be wrong on this one.
 

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
I thought this 20% overhead only applied when you were running a TB3 eGPU on a laptop and sending the signal back to the laptop screen?

I wish that were true. Would make TB3 more viable.

https://egpu.io/forums/mac-setup/pcie-slot-dgpu-vs-thunderbolt-3-egpu-internal-display-test/

The 20% rule of thumb is for ANY 1080P connection over TB3. That performance loss blips UP when routing back into an internal display. Say 25%.
There is some salvation at higher resolutions as less frame chatter.
 

EdZ

Virtual Realist
May 11, 2015
1,578
2,107
I wish that were true. Would make TB3 more viable.

https://egpu.io/forums/mac-setup/pcie-slot-dgpu-vs-thunderbolt-3-egpu-internal-display-test/

The 20% rule of thumb is for ANY 1080P connection over TB3. That performance loss blips UP when routing back into an internal display. Say 25%.
There is some salvation at higher resolutions as less frame chatter.
That's comparing Thunderbolt to PCIe x16, not to PCIe x4. It is also comparing a direct-from-CPU link (the x16 port) to a link that goes via the PCH (Thunderbolt). Thunderbolt itself is a PCIe x4 link so without comparing the two at the same link speed and from the same host, you cannot use that test to determine if there is a performance impact from Thunderbolt overhead. From Techpowerup's PCIe link speed performance scaling test, an x4 PCIe link from the chipset alone, without Thunderbolt, has around a 10% performance deficit at 1920x1080.

Hades Canyon does not have a PCIe x16 link available, only the m.2 slots with an x4 link, which are likely fed from the PCH rather than the CPU (which is using its own x16 link to talk to the on-package Vega GPU). The performance difference between those and the Thunderbolt link is likely to be a lot smaller than 20%.
 

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
Hades Canyon does not have a PCIe x16 link available, only the m.2 slots with an x4 link, which are likely fed from the PCH rather than the CPU (which is using its own x16 link to talk to the on-package Vega GPU). The performance difference between those and the Thunderbolt link is likely to be a lot smaller than 20%.

This is basically what I was getting at. Thanks for putting so nicely @EdZ
 

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
@EdZ - I don't think your calculations are adjusting for the encoding/decoding overhead of TB3? It's that overhead which makes TB3 so much slower than a native PCIe x4 connection. The image below is from the eGPU article I linked too.



I initially linked to this article to demonstrate to @Kmpkt that it is TB3 encoding that causes the 20% performance drop. The internal/external monitor difference is minimal in comparison.

@EdZ ... as you pointed out, the graph above, from the article I linked too, doesn't have a native PCIe x4 bar. Only full PCIe x16. Good point. We can extrapolate that data point and then compare it to TB3.

You mentioned Techpowerup's PCIe link speed performance scaling test as evidence that the jump down from x16 to x4 would incur a 10% performance loss. I'm not sure I agree with that. I say this, because the article authors reached a different conclusion? From the conclusion page of the article...

We hope our data helps settle a lot of flame wars on the forums. PCI-Express 3.0 x8 is just fine for this generation of GPUs, and so is PCI-Express 2.0 x16. You lose about 4% in performance at PCI-Express 2.0 x8 and PCI-Express 3.0 x4, but that's no deal breaker.

So, when I calculate I use the author's conclusion.
PCIe x16 * 96% = PCIe x4
1661 * 96% = 1594

That number represents how I think native PCIe x4 would perform.

I claimed that

TB3 * 120% = PCIe x4

And if we do that math ...

1332 * 120% = 1592

So, I'm off by two points. I don't feel like my claim is off base. If I've missed something in my calculations, please bring it to my attention. Otherwise, I think we might be splitting hairs. I'm going to stand by my claim that Native PCIe x4 is 20% more performant than TB3. We can tease the numbers if we want, be it's a solid claim.
 
Last edited:

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
Thanks for clarification on that @Runamok81, I don't know why I thought eGPU scaling was on pare with M.2. I presume at 1440p and 4K the differences would be smaller?

I will point out that I am nearly certain that the x4 from the Hades Canyon M.2 interface will be behind the chipset. Using the TPU benches, I would expect an 8% loss in performance rather than a 4% loss which puts it around 1528, reducing the difference in performance to ~15% (still huge I know). I also suspect using a faster M.2 SSD like the Samsung 960 pro in the second slot could affect performance (ie. frame drop etc.) as they will nearly saturate the chipset when running at peak speeds.

Also I am still curious as to how do you plan to make this a modular system (presuming modular would mean that the NuC would be physically added/removed from the Beast)?

EDIT: Nevermind. HDMI cable. Got it. This makes more sense now.
 
Last edited:

Runamok81

Runner of Moks
Original poster
Jul 27, 2015
445
621
troywitthoeft.com
But isn't that running into the same problem you mentioned with the STX? It hasn't taken off yet so it's not totally feasible? I can understand the thermal advantages of having GPU and CPU separated, but that only has the advantage if one is overheated. I think you could get more advantage by having both running on the same cooling system. Generally they won't both be 100% stressed at the same time, which would allow the cooling to be shared between the two (i'm thinking of systems like laptop cooling where CPU and GPU share heatsink and fans)

It's risky to design an STX chassis, because there aren't many STX motherboards. STX motherboards need to take off, so designing an STX case can take off. I'm not sure that "take off" risk applies to the NUC + Beast? We're not designing a chassis. We're just waiting for something to be released, not take off.

As far as shared cooling... If you're talking about a situation where the CPU and GPU are TDP matched, then yes a shared cooling solution makes sense. We see this in Hades Canyon and in some gaming laptops. But for almost everything else, a shared solution is not common. A high-end GPU can run 2x to 3x hotter than a CPU. And that's what we'd be doing here. Sharing a heatsink in those situations risks your GPU dumping heat into the CPU over the shared heatsink.

The caveat to this is water cooling. Because water has very different thermal properties than metal. Water has a very high specific heat, so it can absorb a lot of energy before its temperature raises at all. In a loop where CPU/GPU water is shared, the water acts as a heat reservoir and not as a heat sink.
 
Last edited:

EdZ

Virtual Realist
May 11, 2015
1,578
2,107
@Runamok81
You're looking at the numbers for x4 PCIe from the CPU. You need to look at the results for x4 PCIe from the chipset as that is what is providing the m.2 ports in the Hades Canyon NUC (and the m.2 ports on any Intel non-X299 platform).
TB3 does not have any particular overhead, as it is passing PCIe lanes over an external link (PHY layer change, protocol remains intact). The performance impact is from the lanes being provided by the chipset (effectively multiplexing the totally-not-PCIe-x4 DMI 3.0 link from the PCH into the 24 lanes it exposes) rather than directly via the CPU, mainly impacting system memory access. There may some performance overhead for GPUs when using Thunderbolt from the driver side in making sure that a sudden removal of the cable can be handled more gracefully than a total system crash, but it's far from 20%-25%.
Short math:
Impact of thunderbolt vs. x16 (from CPU): ~20% (0.8x)
Impact of x4 (from PCH) vs. x16 (from CPU): ~8% (0.92x)
Impact of Thunderbolt vs. x4 (from PCH): ~13% (0.8x/0.92x)
I'd expect there to be a +/- 5% variance at least based on benchmarking differences between TPU and eGPU.io (due to different workloads being impacted differently due to how much system memory bottlenecks performance). Some of the eGPU.io figures may also be impacted by machines using two-lane TB3 rather than four-lane. This would explain why some of the users there are only getting 10%-15% vs. x16, and some are getting more impact.

I'd expect to see more impact on performance from other devices being present and active on the PCH (e.g. NVME SSDs) than from Thunderbolt overhead alone. While more expensive, Thunderbolt has advantages in stability (both due to official support, and due to using a PHY layer designed for the task rather than extending high-bandwidth links to around 1x their intended path length) and usability (the ability to plug/unplug without complete system failure, and without disassembly).