GPU Geforce 20 series (RTX) discussion thread (E: 2070 Review unbargo!)

QuantumBraced · Sep 2, 2018

Can I say, one of the things I'm genuinely excited about is NVLink. It enables a 100 GB/s two-way interconnect between the cards, which is 50x higher than SLI. This allows the cards to use a shared memory buffer, which means they can be treated as a single graphics card by the driver. We should in theory see almost perfect scaling in every game, however, NVLink still has a fraction of the local GDDR6 bus bandwidth (616 GB/s for the 2080 Ti) and 1/3rd the bandwidth of the commercial Quadro NVLink (300 GB/s). But it should still offer significant scaling and more importantly, uniform scaling across games that use similar amounts of memory.

I think this may make NVLink a decent upgrade path option. Imagine for the sake of argument that Maxwell had NVLink. Let's say you bought a 980 Ti in the summer of 2015. When the 10-series cards came out, you still had the performance of a 1070, so you decided to skip that generation. Now the 20-series are coming out and your 980 Ti is getting old. Rather than dishing out $700 for a 2080 or $1200 for a 2080 Ti, you can just spend $250 on another 980 Ti and with 85% scaling you have the performance of a 1080 Ti, so you're good for another generation. Clearly we don't have the numbers yet, but if it scales 80-90% in everything, I think that will bring dual-GPU setups back from the dead, and may make MicroATX more of a thing in SFF.

tinyitx · Sep 2, 2018

QuantumBraced said:
I think this may make NVLink a decent upgrade path option. Imagine for the sake of argument that Maxwell had NVLink. Let's say you bought a 980 Ti in the summer of 2015. When the 10-series cards came out, you still had the performance of a 1070, so you decided to skip that generation. Now the 20-series are coming out and your 980 Ti is getting old. Rather than dishing out $700 for a 2080 or $1200 for a 2080 Ti, you can just spend $250 on another 980 Ti and with 85% scaling you have the performance of a 1080 Ti....

If a 980Ti has this 'potential', then I guess its resale value would be higher than US$250 now. Maybe US$400? A card's monetary value is often determined by its 'usefulness/relevancy'.

Anyway, I notice Nvidia only offers NVLink for 2080 and 2080Ti only. That is, 2070 does not seem to have this option. I wonder why. One speculation is that NVLink scales so well that two NVLinked 2070 will deliver a better performance than a 2080Ti but with a cheaper cost. And Nvidia does not wish this to happen.

QuantumBraced · Sep 2, 2018

tinyitx said:
If a 980Ti has this 'potential', then I guess its resale value would be higher than US$250 now. Maybe US$400? A card's monetary value is often determined by its 'usefulness/relevancy'.

Anyway, I notice Nvidia only offers NVLink for 2080 and 2080Ti only. That is, 2070 does not seem to have this option. I wonder why. One speculation is that NVLink scales so well that two NVLinked 2070 will deliver a better performance than a 2080Ti but with a cheaper cost. And Nvidia does not wish this to happen.

Yeah good point... On the other hand, it can't be a lot more than half a 1080 Ti. The real value comes from using a 980 Ti for 3 years and then being able to almost double your performance for the price of a card that's highly devalued as a single GPU. Otherwise, buying 2 980 Tis for the price of a 1080 Ti would not make sense even with NVLink scaling. So it's circumstantial value if you will, which would be worth less on the market. Anyway, this is a hypothetical on top of an assumption haha.

I think the reason they don't offer NVLink on the 2070 is because 2 of them would not outperform a 2080 Ti, so you may as well get a 2080 Ti.

pavel · Sep 3, 2018

QuantumBraced said:
I think the reason they don't offer NVLink on the 2070 is because 2 of them would not outperform a 2080 Ti, so you may as well get a 2080 Ti.

Indeed, the 20xx generation did not get a qualitative improvement over 1080. 2070 is more power thirsty that baseline 1080, while having near similar size. Big part of performance increase comes just from faster memory.

They did not improve the performance per watt significantly.

EdZ · Sep 4, 2018

QuantumBraced said:
Can I say, one of the things I'm genuinely excited about is NVLink. It enables a 100 GB/s two-way interconnect between the cards, which is 50x higher than SLI. This allows the cards to use a shared memory buffer, which means they can be treated as a single graphics card by the driver. We should in theory see almost perfect scaling in every game, however, NVLink still has a fraction of the local GDDR6 bus bandwidth (616 GB/s for the 2080 Ti) and 1/3rd the bandwidth of the commercial Quadro NVLink (300 GB/s). But it should still offer significant scaling and more importantly, uniform scaling across games that use similar amounts of memory.

I think this may make NVLink a decent upgrade path option. Imagine for the sake of argument that Maxwell had NVLink. Let's say you bought a 980 Ti in the summer of 2015. When the 10-series cards came out, you still had the performance of a 1070, so you decided to skip that generation. Now the 20-series are coming out and your 980 Ti is getting old. Rather than dishing out $700 for a 2080 or $1200 for a 2080 Ti, you can just spend $250 on another 980 Ti and with 85% scaling you have the performance of a 1080 Ti, so you're good for another generation. Clearly we don't have the numbers yet, but if it scales 80-90% in everything, I think that will bring dual-GPU setups back from the dead, and may make MicroATX more of a thing in SFF.

NVLink is a really neat interconnect, but it doesn't actually do anything to tackle the issues that make multi-GPU difficult in the first place: needing to split draw calls between discrete devices (even with a high bandwidth NVLink bridge both GPUs still have to be treated as different NUMA groups for any sane scenario) without actually losing performance. For DX11 and DX12 Implicit Multiadapter that works needs to be done by the driver developer with deep access to the specific program in question, and for DX12 and Vulkan Explicit Multiadapter that work needs to be done by the engine and game developers themselves. And that work still has the drawback of only being of benefit to an extremely tiny portion of the install base.
Very rarely was the bandwidth of the SLI bridge (or DMA between cards over PCIe as was more common) limiting actual performance.

QuantumBraced · Sep 4, 2018

EdZ said:
NVLink is a really neat interconnect, but it doesn't actually do anything to tackle the issues that make multi-GPU difficult in the first place: needing to split draw calls between discrete devices (even with a high bandwidth NVLink bridge both GPUs still have to be treated as different NUMA groups for any sane scenario) without actually losing performance. For DX11 and DX12 Implicit Multiadapter that works needs to be done by the driver developer with deep access to the specific program in question, and for DX12 and Vulkan Explicit Multiadapter that work needs to be done by the engine and game developers themselves. And that work still has the drawback of only being of benefit to an extremely tiny portion of the install base.
Very rarely was the bandwidth of the SLI bridge (or DMA between cards over PCIe as was more common) limiting actual performance.

Thanks for the clarification. I didn't understand most of what you said haha, my knowledge is clearly limited. But I actually watched an interview with Tom Petersen since I posted my initial thoughts, where he addressed NVLink in some detail, and my understanding is a bit better now I think. Here are some thoughts that I posted elsewhere, I think some of it relates to what you said:

I think there's a bit of confusion here, because "SLI" is going to remain the branding, but the actual NVLink interface is very different from traditional SLI. SLI HB only has 2 GB/s bandwidth, NVLink on GeForce has 100 GB/s. That is a massive difference that allows the two cards to cross-reference each other's memory and work as one a lot more efficiently. Conversely, SLI uses tricks like AFR to render independently on the cards, then uses the link as a pass-thru to the monitor output. Further communication is carried out over PCIe, which has much less bandwidth than NVLink after everything else, and is noisy and laggy.

I thought they would implement NVLink at the driver level and games would simply see the 2 cards as a single virtual graphics card, but it seems like they're not doing that yet. Instead, they'll focus on optimizing AFR and other legacy SLI implementations in the immediate future. But he did say that NVLink's features would be accessible thru the DX APIs, so we can expect developers to start taking advantage of the new platform soon. The new APIs should allow for easier adoption, so you can expect a lot more games to support scaling and better scaling. That's really the benefit, there are already games that scale at 80% with SLI, but some games scale a lot less, some don't scale at all, and some scale negatively. NVLink should fix that if implemented properly. You can expect higher scaling and much less variation in scaling across games. Seems like he hinted in the future they'll consider a more universal implementation, perhaps at the driver level, but bandwidth at the moment is not sufficient for that -- so maybe when the 300 GB/s NVLink trickles down to consumer GPUs.

That's what I posted on [H], may have made some wrong assumptions again, but it's what I got from the interview. It's all speculation at this point anyway, I hope we'll see some good initial results.

EdZ · Sep 5, 2018

QuantumBraced said:
I think there's a bit of confusion here, because "SLI" is going to remain the branding, but the actual NVLink interface is very different from traditional SLI. SLI HB only has 2 GB/s bandwidth, NVLink on GeForce has 100 GB/s.

Correct, though I don't think actual NVLink bandwidth has been announced (GV100's PCIe varaint needed two card-edge NVlink connectors, and still didn't have a full lane breakout available).

That is a massive difference that allows the two cards to cross-reference each other's memory and work as one a lot more efficiently. Conversely, SLI uses tricks like AFR to render independently on the cards, then uses the link as a pass-thru to the monitor output. Further communication is carried out over PCIe, which has much less bandwidth than NVLink after everything else, and is noisy and laggy.

Bandwidth and cache-cohernacy are unrelated to SLI's use of AFR (or SFR as in the initial guise). There are various work division methods, and they don't generally require much inter-card bandwidth (unlike some GPGPU compute workloads) as almsot any viable method requires both cards to have identical memory contents (the scene being rendered) to start with.

I thought they would implement NVLink at the driver level and games would simply see the 2 cards as a single virtual graphics card, but it seems like they're not doing that yet.

It absolutely does not allow both cards to be treated transparently as 'one GPU' for gaming workloads, as while it allows for simplified DMA it does not come close to the bandwidth of local memory: applications not aware of the split memory pools (NUMA domains) would have no way to tell that Byte A will arrive at full speed, and Byte B will take an order of magnitude or two longer.

Instead, they'll focus on optimizing AFR and other legacy SLI implementations in the immediate future. But he did say that NVLink's features would be accessible thru the DX APIs, so we can expect developers to start taking advantage of the new platform soon. The new APIs should allow for easier adoption, so you can expect a lot more games to support scaling and better scaling. That's really the benefit, there are already games that scale at 80% with SLI, but some games scale a lot less, some don't scale at all, and some scale negatively. NVLink should fix that if implemented properly.

NVLink isn't going to make it all that much easier to implement Explicit Multiadapter as it is today (and that capability has been available for a few years and can even be implemented using PCIe only). The core problem is that while individual passes in a graphical pipeline are extremely parallel, getting between those stages must be done step-by-step. That means you can't just nicely tell your GPUs "OK, you handle geometry, and you handle lighting" as you'd end up with one waiting for the other to finish before it could start its work, and no net gain in performance. Trying to split up a single stage is also extremely hard.

The one exception may be raytracing: while you still need an entire copy of the scene for each GPU, each traced ray (and secondary rays traced from that ray through the scene) is performed in isolation, and just updates its own pixel in the buffer. This makes for a much more nicely divisible workload than raster lighting, which is made up of a bunch of discrete passes and lots of tricks involving grabbing bits of a buffer and re-suing them elsewhere (e.g. screen-space reflections and SSAO). NVLink bandwidth doesn't help you here either as you're still just passing finished buffers between cards (which are pretty tiny, like 250mb for an uncompressed 32bpp UHD buffer) and you still have the problem that you're only accelerating one stage in the rendering pipeline, and need to convince developers to implement raytracing in order to get any benefits.

QuantumBraced · Sep 5, 2018

EdZ said:
Correct, though I don't think actual NVLink bandwidth has been announced (GV100's PCIe varaint needed two card-edge NVlink connectors, and still didn't have a full lane breakout available).
Bandwidth and cache-cohernacy are unrelated to SLI's use of AFR (or SFR as in the initial guise). There are various work division methods, and they don't generally require much inter-card bandwidth (unlike some GPGPU compute workloads) as almsot any viable method requires both cards to have identical memory contents (the scene being rendered) to start with.

It absolutely does not allow both cards to be treated transparently as 'one GPU' for gaming workloads, as while it allows for simplified DMA it does not come close to the bandwidth of local memory: applications not aware of the split memory pools (NUMA domains) would have no way to tell that Byte A will arrive at full speed, and Byte B will take an order of magnitude or two longer.
NVLink isn't going to make it all that much easier to implement Explicit Multiadapter as it is today (and that capability has been available for a few years and can even be implemented using PCIe only). The core problem is that while individual passes in a graphical pipeline are extremely parallel, getting between those stages must be done step-by-step. That means you can't just nicely tell your GPUs "OK, you handle geometry, and you handle lighting" as you'd end up with one waiting for the other to finish before it could start its work, and no net gain in performance. Trying to split up a single stage is also extremely hard.

The one exception may be raytracing: while you still need an entire copy of the scene for each GPU, each traced ray (and secondary rays traced from that ray through the scene) is performed in isolation, and just updates its own pixel in the buffer. This makes for a much more nicely divisible workload than raster lighting, which is made up of a bunch of discrete passes and lots of tricks involving grabbing bits of a buffer and re-suing them elsewhere (e.g. screen-space reflections and SSAO). NVLink bandwidth doesn't help you here either as you're still just passing finished buffers between cards (which are pretty tiny, like 250mb for an uncompressed 32bpp UHD buffer) and you still have the problem that you're only accelerating one stage in the rendering pipeline, and need to convince developers to implement raytracing in order to get any benefits.

Thanks very much for the detailed response. All of that makes sense, at least to the degree I understand it haha. So what do you see the main benefit of NVLink being then? In short and layman terms.

Regarding the massive bandwidth gap between this consumer NVLink and GDDR6 that prevents the cards to be treated as one -- that's exactly what Tom Petersen said in the interview too. But he hinted that perhaps in the future we may see a more universal implementation, but didn't want to discuss further. The commercial NVLink which is only used in Quadros right now has 3x the bandwidth of the GeForce NVLink, or 300 GB/s. That's half of the bandwidth of GDDR6 352-bit. It's close to GDDR5X 256-bit used in the 1080. At that point you may actually be able to just combine the entire memory pool. Yeah, you'll see a performance penalty for sure, but the advantage will be universal implementation and every game scaling at roughly the same rate. I think...? Maybe long-term NVLink will actually catch up with video memory bandwidth.

EdZ · Sep 7, 2018

QuantumBraced said:
So what do you see the main benefit of NVLink being then? In short and layman terms.

Great for HPC use, but probably not much impact for the consumer side of things.

At that point you may actually be able to just combine the entire memory pool. Yeah, you'll see a performance penalty for sure, but the advantage will be universal implementation and every game scaling at roughly the same rate. I think...?/quote]
For transparent shared memory: as well as bandwidth, you also need to consider access latency. Even with an extremely wide NVLink interface, it's still going to take a lot more clock cycles to pull a byte from an adjacent card's memory then to pull from local memory, which again boils down to the same sort of NUMA issues seen on Ryzen and the other Zepellin-based multi-die CPUs (where memory independent workloads scale well, and other workloads hit severe performance issues).
It boils down to the same problem as in CPUs, GPUs (and cloud computing, and HPC, etc): you can treat multiple devices as multiple devices, or a single device as a single device, but if you want to treat multiple devices as a single device you need to write your code with that specific scenario in mind or accept that performance will not come close to direct scaling.

loader963 · Sep 7, 2018

https://m.newegg.com/products/N82E1...52995&PID=7706533&SID=pcg-2985080057049557040

New 1080 Ti’s under 600 after rebates. Hopefully the 2000 series are gonna have a decent bump in performance.

tinyitx · Sep 8, 2018

Subjectively speaking, some 'ugly' looking backplate for the 2080Ti are coming.

Galax has one here:-
http://www.galax.com/media/wysiwyg/corpsite/products/20_Series/2080TiOCW/2080TiOCWhite_09.jpg

EVGA has, at least, one too.
https://www.evga.com/products/product.aspx?pn=11G-P4-2487-KR

EdZ · Sep 10, 2018

It seems like once again the FE is the way to go if you don't want a GPU that is covered in plastic embaressment.

loader963 · Sep 10, 2018

^ is that... is that a digital screen in the side of the card???

Biowarejak · Sep 10, 2018

loader963 said:
^ is that... is that a digital screen in the side of the card???

Pretty sure that's not totally uncommon across the high-end GPU lineup from the manufacturers. Not sure if it's Hall of Fame exclusive though for their brand

loader963 · Sep 10, 2018

In all my 780 Ti, 980 Ti, 1080, and 1080 Ti s, I have never seen one lol. Maybe those extra tensor cores ain’t for ray tracing, but to power those screens instead

!!!

tinyitx · Sep 10, 2018

That is, indeed, a LCD for display of various things like temperature, clock speed, voltages...etc
Here is Galax 1080 Ti HOF, http://www.galax.com/tc/graphics-card/hof/galax-geforcer-gtx-1080-ti-hof.html

Personally, I do not mind having those data displayed in a little screen. But the design of the shroud is less than desirable.

tinyitx · Sep 14, 2018

Some more pics of various brand 2080(Ti). Many of them do not look good at all. Instead of looking high-class and with quality, many look cheap and plastic-ish.

https://news.mydrivers.com/class/12/

TheHig · Sep 14, 2018

The Founders cards are definitely much more pleasing aesthetically for sure. Not too much longer to find out what these can do.

Broxin · Sep 15, 2018

You guys think there is a 20. series card that has the processing power of a 1080ti but consumes less power and is therefor more quiet and hopefully will be in itx format like the zotac 1080ti mini?

tinyitx · Sep 15, 2018

Broxin said:
You guys think there is a 20. series card that has the processing power of a 1080ti but consumes less power and is therefor more quiet and hopefully will be in itx format like the zotac 1080ti mini?

We can all speculate. I think Zotac will have a mini version again, at least for their 2080.
Their 1080Ti Mini is dual 8-pin and this open box shows their 3-fan 2080 is 8+6 pin.
https://unwire.hk/2018/09/14/rtx-2080/parts/

Sometimes, I wonder, if the heatsink-fan of these 2080/2080Ti being so long and huge is only a 'marketing ploy' to catch a consumer psychology of 'bigger is better and meaner' (thus, justifying a higher price) or it is really a thermal necessity.
I think 19th (next Wednesday) is the day that reviews will be out.

GPU Geforce 20 series (RTX) discussion thread (E: 2070 Review unbargo!)

Master of Cramming

Shrink Ray Wielder

Master of Cramming

Caliper Novice

Virtual Realist

Master of Cramming

Virtual Realist

Master of Cramming

Virtual Realist

King of Cable Management

Shrink Ray Wielder

Virtual Realist

King of Cable Management

Maker of Awesome | User 1615

King of Cable Management

Shrink Ray Wielder

Shrink Ray Wielder

King of Cable Management

Cable-Tie Ninja

Shrink Ray Wielder

Similar threads