On the physical side, this is a case of AMD's propensity for taking an existing technique and giving it a fancy marketing brand. Existing GPUs, AMD and Nvidia, professional and consumer, have had the ability to use DMA for many years to directly access data across the PCIe bus (and even dip into main system RAM and jump over to the SATA bus) without involving the CPU. That's what DMA is.
First off, I realized I made a mistake with the name. I was referring to Heterogeneous System Architecture. The point of which is to allow zero copy operations by sharing a single virtual address space rather than copying between disparate address spaces. I'm not sure you'd exactly call it AMD renaming an existing technique, though, as I mention, it is more or less an extension of NUMA to different types of processors at the same time (NUMA being where each CPU has its own memory pool, but they all share the same virtual space so every CPU can access every bank).
On the logical (memory addressing) side, things are less simple. If you map memory outside the on-board vRAM transparently, applications will start dipping into pools with massive (orders of magnitude) latency penalties without knowing about it. If you rely on explicit DMA, applications may not use those pools at all [1] or may just use traditional transfer requesting because they need that data available at the lowest latency. It's only when you start adding multiple paths to the same storage (e.g. parallel PCIe and NVLink or Infinity Fabric) that using a single logical pool but with explicit compartmentalisation starts making more sense than normal DMA, by moving the memory access decisions from the application to the driver to allow more intelligent link utilisation.
Yeah, there's always issues when applications aren't aware of the underlying system architecture.
[1] Note that gaming is one situation where data is already cached aggressively into vRAM - to the point where the only real time you will not see your vRAM 'usage' maxed out is when the entire level/chunk contents are smaller than your available vRAM - so going "hey, you can now access system RAM and backing stores through the GPU memory pool!" is likely to be met with "OK, but we're already dumping all our stuff to vRAM anyway because we want to avoid accessing those stores directly in the first place"
Yeah, fair point. I wouldn't be surprised if that's part of the reason HSA hasn't really gone anywhere (the other part, of course, being that Nvidia tends to shun AMD stuff, and likes to push their own proprietary technologies, and Intel can also be picky).
Do remember though that GPUs are used for more than just gaming. Gaming being a real time process basically HAS to put the required data as close to the GPU as possible, so even with MMU access to system memory, still has to cache locally. The type of rendering they do for CGI and just general video editing isn't time sensitive, so doesn't need to be aggressively cached, but tends to involve larger datasets meaning storage in system memory or storage. And even though it isn't time sensitive (frames taking too long in a game causes stuttering, frames taking too long for pre-rendered whatever just means the job takes longer), it's still beneficial to reduce the time taken, so you can get more work done.
====
Anyways, regardless of how moving data around happens, the PCIe bus still represents a latency bottleneck and potential bandwidth bottleneck. I would say the Radeon SSG is something akin to when Intel separated the CPU cache from the system bus so that cached data could be accessed without interference from main memory or device access.[/quote][/quote]