Raven Ridge and the wait...

Jello

Airflow Optimizer
Nov 15, 2016
376
163
I don't know much about how CPUs are designed and built, I'm simply dreaming/thinking out loud. If any one does have info I can read up on, do share.
 

IntoxicatedPuma

Customizer of Titles
SFFn Staff
Feb 26, 2016
992
1,272
Quite true. Thank goodness @ASRock System is doing a mATX TR4 motherboard. Only issue at that point would be a good top-down cooler such as an Arctic Accelero.

About this, I'd also be all in for a TR4 powered mATX HTPC (22 CU + 8 CPU Core would be nice!). But then of course then we are smashing our heads back into the Memory bandwidth limitations, and also having to run against Intel's Hades Canyon which would be equally powerful graphically, and possible also CPU wise if they are able to get a 6 core coffee lake CPU on board (no idea if the chip can actually fit that).
 
  • Like
Reactions: Soul_Est

Soul_Est

SFF Guru
SFFn Staff
Feb 12, 2016
1,534
1,928
About this, I'd also be all in for a TR4 powered mATX HTPC (22 CU + 8 CPU Core would be nice!). But then of course then we are smashing our heads back into the Memory bandwidth limitations, and also having to run against Intel's Hades Canyon which would be equally powerful graphically, and possible also CPU wise if they are able to get a 6 core coffee lake CPU on board (no idea if the chip can actually fit that).
That is why we have quad-channel memory on TR4. ;) While it won't help as much as we'd hope, it would help a lot.
 

IntoxicatedPuma

Customizer of Titles
SFFn Staff
Feb 26, 2016
992
1,272
Ah I never thought about that. A Threadripper APU might be able to get close to an RX570 then, even without the HBM.

edit: nvm I did some math, the bandwidth of a DDR4 @ 4000mhz equipped TR4 APU would still be around 128gb/s, which is a little above the RX560 and Hades Canyon but still way under an RX570. Might be good enough for solid 1080p gaming though.
 
Last edited:
  • Like
Reactions: Soul_Est

riposte

Trash Compacter
Dec 9, 2017
45
88
How about developing 'RAM' for APU using PCI-E X16 slot with HBM module. I don't know if that is possible.
And maybe we can use that 'RAM' for upgrading our dGPU for extra capacity.
 

IntoxicatedPuma

Customizer of Titles
SFFn Staff
Feb 26, 2016
992
1,272
PCI Express 4.0 x16 maximum throughput is 31.5 GB/s, that's less than even the slowest DDR4 on dual channel, so it probably wouldn't work very well even if someone made such a thing.
 

ignsvn

By Toutatis!
SFFn Staff
Apr 4, 2016
1,710
1,649
How about developing 'RAM' for APU using PCI-E X16 slot with HBM module. I don't know if that is possible.
And maybe we can use that 'RAM' for upgrading our dGPU for extra capacity.

Not sure if it's feasible (both technology and market wise), but I do like your way of thinking!
 

riposte

Trash Compacter
Dec 9, 2017
45
88
PCI Express 4.0 x16 maximum throughput is 31.5 GB/s, that's less than even the slowest DDR4 on dual channel, so it probably wouldn't work very well even if someone made such a thing.

lol, I forget the bandwidth.
Sounds awesome if it's possible with good bandwidth, maybe proprietary slot. APU ITX capable of playing games with high settings 1080p without dGPU...
 

jØrd

S̳C̳S̳I̳ ̳f̳o̳r̳ ̳l̳i̳f̳e̳
sudocide.dev
SFFn Staff
Gold Supporter
LOSIAS
Jul 19, 2015
818
1,359
IIRC part of the speed of HBM comes from it being right next to the die w/ incredibly short traces through the silicon interposer. Whilst my knowledge on this front is far from comprehensive replicating this over a longer distance, with current technology, especially through card edge connectors is currently outside of what can be done or at least what can be done cost effectively. IIRC even DDR4 has specification concerning not only matching the lengths of the PCB traces but also specification pertaining to the maximum length of those traces in order to maintain stability at speed. I also seem to remember that dealing w/ reflections from connectors was a major engineering challenge for maintaining the speed of 10GBASE-T ethernet.

TLDR: driving those kind of speeds over those kinds of distances across different mediums and through connectors is currently outside of current technology. My understanding is limited though so if anyone here who has a deeper understanding who wants to chime in w/ detail / corrections / etc would be appreciated.
 

AleksandarK

/dev/null
May 14, 2017
703
774
IIRC part of the speed of HBM comes from it being right next to the die w/ incredibly short traces through the silicon interposer. Whilst my knowledge on this front is far from comprehensive replicating this over a longer distance, with current technology, especially through card edge connectors is currently outside of what can be done or at least what can be done cost effectively. IIRC even DDR4 has specification concerning not only matching the lengths of the PCB traces but also specification pertaining to the maximum length of those traces in order to maintain stability at speed. I also seem to remember that dealing w/ reflections from connectors was a major engineering challenge for maintaining the speed of 10GBASE-T ethernet.

TLDR: driving those kind of speeds over those kinds of distances across different mediums and through connectors is currently outside of current technology. My understanding is limited though so if anyone here who has a deeper understanding who wants to chime in w/ detail / corrections / etc would be appreciated.
My knowledge is limited too, but i can confirm that the lenght effects speed VERY much. Think of it like electricity flowing tru very long wire. Due to material resistance you will lose some voltage. Thats normal. It is the same with data speed.

HBM achieves higher bandwidth while using less power in a substantially smaller form factor than DDR4 or GDDR5.[6] This is achieved by stacking up to eight DRAM dies, including an optional base die with a memory controller, which are interconnected by through-silicon vias (TSV) and microbumps. The HBM technology is similar in principle but incompatible with the Hybrid Memory Cube interface developed by Micron Technology.[7]

HBM memory bus is very wide in comparison to other DRAM memories such as DDR4 or GDDR5. An HBM stack of four DRAM dies (4-Hi) has two 128-bit channels per die for a total of 8 channels and a width of 1024 bits in total. A graphics card/GPU with four 4-Hi HBM stacks would therefore have a memory bus with a width of 4096 bits. In comparison, the bus width of GDDR memories is 32 bits, with 16 channels for a graphics card with a 512-bit memory interface.[8] HBM supports up to 4 GB per package.

The larger number of connections to the memory, relative to DDR4 or GDDR5, required a new method of connecting the HBM memory to the GPU (or other processor).[9] AMD and Nvidia have both used purpose built silicon chips, called interposers, to connect the memory and GPU. This interposer has the added advantage of requiring the memory and processor to be physically close, decreasing memory paths. However, as semiconductor device fabrication is significantly more expensive than printed circuit board manufacture, this adds cost to the final product.
From WikiPedia
 

IntoxicatedPuma

Customizer of Titles
SFFn Staff
Feb 26, 2016
992
1,272
AMD even went as far to show the overclocking headroom that the Ryzen APU can offer. During an on-site demo we saw the Ryzen 5 2400G improve its 3DMark score by 39% with memory frequency and GPU clock speed increases. Moving the GPU clock from ~1100 MHz to 1675 MHz will mean a significant increase in power consumption, and I do question the size of the audience that wants to overclock an APU. Still – cool to see!
- PCPER

That's impressive overclocking. If the 2400G can run 4000mhz DDR4 and running 1675mhz clock speed, they will still be below the GTX 1050 though. I am guessing if a 90W part is ever released, it'll be something like that overclocked demo.
 

ChainedHope

Airflow Optimizer
Jun 5, 2016
306
459
Just to throw it out there but if a new socket came out (Im going to call it UTR4 for ultimate threadripper) that removed the DRAM slots and instead opted to use HBM as system memory we could see a new kind of beast. CPU+GPU+HBM all on die. Removes trace length issues, HBM solves the bandwidth issue, and now the interposer has everything it needs without outside help. Its probably unreasonable but even if they shoved a few HBM stacks between the GPU and CPU portions and then the HBM was connected to DDR4 (effectively making the HBM into a large cache) it would probably increase the speed and usefulness of the Infinity Fabric that AMD uses.
 

EdZ

Virtual Realist
May 11, 2015
1,578
2,107
Just to throw it out there but if a new socket came out (Im going to call it UTR4 for ultimate threadripper) that removed the DRAM slots and instead opted to use HBM as system memory we could see a new kind of beast. CPU+GPU+HBM all on die. Removes trace length issues, HBM solves the bandwidth issue, and now the interposer has everything it needs without outside help. Its probably unreasonable but even if they shoved a few HBM stacks between the GPU and CPU portions and then the HBM was connected to DDR4 (effectively making the HBM into a large cache) it would probably increase the speed and usefulness of the Infinity Fabric that AMD uses.
Remember interposer size limitations: GP100 and GV100 both had to use double patterning to produce interposers large enough to fit both the GPU die and HBM dies (as the GPU dies themselves were already at the maximum reticule limit), and this is an extremely expensive process. Threadripper is already even larger than the 1200mm^2 GV100 package, around 9600mm^2! That could be shrunk somewhat by shoving the dies closer together on the interposer, but once you start adding on HBM stacks and a big GPU die, you're going to be creating a package that not only costs Big Data money to fab the interposer for, it also has a huge number of dies to attach to. Each time you add a die to an interposer, you add an opportunity for a defect that can kill the entire assembly (some scavenging is possible for HBM failures). 4x CPU dies, a GPU die, and several HBM dies means lots of opportunities to kill your package during assembly. Even if AMD are able to license EMIB from Intel to avoid the need for a monolithic interposer, you still have the problem of assembly, and of ending up with several hundred watts of heat to dissipate from one package.
 

stree

Airflow Optimizer
Dec 10, 2016
307
177
one roadmap I saw had an uprated Bristol ridge coming out about same time as raven ridge........2x cpu and 3 x GPU......... no other info. Anyone know anything about these?

Whooops edit.......... Stoney ridge..... I misread........still intrigued by stoney ridge though
re-edit. It`s a laptop chip anyway..........total goof. apologies
 
Last edited: