News NVIDIA Announces GP102-based TITAN X with 3,584 CUDA cores

iFreilicht · Jul 22, 2016

PNP said:
I am getting a sinking feeling about short top-end cards for this generation

I thought AMD was dead set on their high-end cards using HBM2?

Also, what does it really matter? We've got the 1070 and 1060 now, both of which deliver performance that was previously unobtainable in ITX cards, we have the RX480 which will offer stellar performance to price with Vulkan, I think we can wait a year or so for HBM2 manufacturing to mature without missing out too much.

Phuncz · Jul 22, 2016

1,200$, that's a 20% increase in price over the previous one. But from the specs, it doesn't seem like it's going to be much more than a 10% increase in performance over the GTX 1080. This also seems to be mainly a compute card, since they dropped the "GTX" part from the previous model.

EdZ · Jul 22, 2016

Well in (kind of?) good news, this makes it more likely a 1080Ti will appear sometime this year as a GP102-based chip, rather than waiting until at least next year for a GP100 variant.

||| · Jul 22, 2016

PNP said:
...Vega 11 comes out.

Vega 11 or 10? With Polaris, Polaris 10 is the high-end desktop variant and Polaris 11 is smaller, more power efficient mobile version.

I am getting a sinking feeling about short top-end cards for this generation. Samsung has been producing HBM2 since January and it apparently wasn't enough and SK Hynix just started this month. There must be more problems with TSVs than with HBM1 and 28 nm chips.

I believe AMD has an exclusivity deal with SK Hynix, as they have been development partners since the first generation HBM modules and the use on Fury cards.

iFreilicht said:
I thought AMD was dead set on their high-end cards using HBM2?

I would imagine they still are and haven't seen anything indicating otherwise. As per the exclusivity deal, the supply is probably very low for others. I wouldn't be surprised if Nvidia is using Samsung HBM2; I think Intel has an exclusive with Micron for HBM2 for Knights Landing.

Also, what does it really matter? We've got the 1070 and 1060 now, both of which deliver performance that was previously unobtainable in ITX cards, we have the RX480 which will offer stellar performance to price with Vulkan, I think we can wait a year or so for HBM2 manufacturing to mature without missing out too much.

This new Titan X card will probably be the first card to get 60+ FPS at 4K/UHD with the highest settings.

EdZ · Jul 22, 2016

||| said:
I believe AMD has an exclusivity deal with SK Hynix, as they have been development partners since the first generation HBM modules and the use on Fury cards.

SK Hynix are also supplying HBM2 to Nvidia (Nvidia are dual-sourcing HBM2 from SK Hynix and Samsung). Nvidia are also members of the JEDEC HBM working group along with AMD, SK Hynix, Samsung and others.

This new Titan X card will probably be the first card to get 60+ FPS at 4K/UHD with the highest settings.

That rather depends on the game in question, doesn't it? For example, even a 'low end' card will be able to handle HL2 at 4k60 without breaking a sweat.
As faster cards come out, games will add more complex graphical features, and the bar for 'current games' to hit 4k @ 60fps will continue to rise. This is compounded by the 'ultra' settings generally being indistinguishable from 'high'/'very high' in practice (e.g. Doom's 'Ultra Nightmare' quality setting) while tanking performance. Whether the Titan will be "the first card to get 60+ FPS at 4K/UHD with the highest settings" will depend on whether "the first game to add settings too complex for the Titan X to get 60+ FPS at 4K/UHD when set to their highest option" is released before or after the cards ship.

iFreilicht · Jul 22, 2016

||| said:
This new Titan X card will probably be the first card to get 60+ FPS at 4K/UHD with the highest settings.

I'm risking starting an OT discussion about the performance race here: So what? As EdZ said, this highly depends on the game, and I strongly believe that there is a certain border to how small a system can get without sacrificing on performance. If you want to build a uSFF system with a single short GPU, you'll have to accept that playing Crysis 6 or whatever on 4K at 60+FPS just isn't possible right now. A lot of games will look fantastic on medium details in 4K already, and if you want the super-ultra performance you describe, short GPUs aren't the way to go anyway.

Phuncz · Jul 22, 2016

iFreilicht said:
... you'll have to accept that playing Crysis 6 or whatever on 4K at 60+FPS just isn't possible right now.

While online media have been testing Crysis' performance in reviews for many years, I can't believe Crysis 3 (2013) still can't be played at a good pace on 4K:

I don't think we'll need to wait for Crysis 6 to be playable at 4K, if this 3.5 year old behemoth still can't be tackled. Although an option or two set to not the highest setting will probably fix that.

PNP · Jul 22, 2016

iFreilicht said:
I thought AMD was dead set on their high-end cards using HBM2?

That's what they have on their roadmap, sure, but it could be Q2, maybe even Q3 2017 before we know for sure.

iFreilicht said:
Also, what does it really matter?

The 1060 isn't nearly future-proof enough for me and the only ITX 1070 is wider than its PCI bracket.

||| said:
Vega 11 or 10? With Polaris, Polaris 10 is the high-end desktop variant and Polaris 11 is smaller, more power efficient mobile version.

There's been some confusion regarding that, but as far as I know, Vega is code-named in the opposite way.

||| said:
I believe AMD has an exclusivity deal with SK Hynix

Priority, not exclusivity.

||| said:
I wouldn't be surprised if Nvidia is using Samsung HBM2;

I wonder how much capacity is being used for HBM2, because the lines in Korea and China are capable of churning monstrous amounts of flash and DRAM. Certainly GP100 isn't eating it all up.

EdZ · Jul 22, 2016

PNP said:
I wonder how much capacity is being used for HBM2, because the lines in Korea and China are capable of churning monstrous amounts of flash and DRAM. Certainly GP100 isn't eating it all up.

It may well be actually. There are binned versions of the Tesla P100 coming out, but they're not binned by core: they're binned by disabling one of the HBM dies. Either the interposer is the weak link (which seems unlikely, they contain few if any active components and are on very mature high-nm processes) or HBM2 yields are still not amazing (being unable to do a full integrated test until you've soldered the interposer together makes things tricky). If Nvidia are willing to gamble on marginal HBM2 dies, they must be scavenging for as many as they can get to meet demand.

PlayfulPhoenix · Jul 22, 2016

EdZ said:
It may well be actually. There are binned versions of the Tesla P100 coming out, but they're not binned by core: they're binned by disabling one of the HBM dies. Either the interposer is the weak link (which seems unlikely, they contain few if any active components and are on very mature high-nm processes) or HBM2 yields are still not amazing (being unable to do a full integrated test until you've soldered the interposer together makes things tricky). If Nvidia are willing to gamble on marginal HBM2 dies, they must be scavenging for as many as they can get to meet demand.

I'm convinced that this is what's going on. Nvidia is not going to have their flagship Tesla card contain disabled dies by choice. To make it remotely viable in the market, they had no choice.

BirdofPrey · Jul 22, 2016

Having multiple chips on a single package has always been problematic.
The issue is that once the package has been made, you can't easily separate the component parts out again, and the package as a whole has to be tested (and, I'm not sure if it's the case with HBM since it is separate packages on an interposer, but multi chip packages can't even have their individual dies tested before the whole unit is completed).
The Interposer itself might be perfectly fine, but if one of the chips has failed, the entire unit has to be binned which limits yields.

PNP · Jul 22, 2016

EdZ said:
It may well be actually. There are binned versions of the Tesla P100 coming out, but they're not binned by core: they're binned by disabling one of the HBM dies. Either the interposer is the weak link (which seems unlikely, they contain few if any active components and are on very mature high-nm processes) or HBM2 yields are still not amazing (being unable to do a full integrated test until you've soldered the interposer together makes things tricky). If Nvidia are willing to gamble on marginal HBM2 dies, they must be scavenging for as many as they can get to meet demand.

Well, 8 high x 4 stacks = 32 chips per board, and something like the QCT QuantaPlex T21W-3U can be populated by 8 NVLink boards, so that's 256 chips. HBM2 has a die area of 91.99 sq. mm, assuming we exclude 3 mm from the wafer edge (even though I think SK Hynix doesn't use edge exclusion) and use a 100 um scribe, that's 655 dies per wafer. To sell 1 million units, that's 256 million chips or 390,840 (rounded up) wafers. Over seven months, the shipment target would be 55,834 wafers a month which is peanuts for a modern 300 mm wafer memory line at >92% automation let alone several. You could probably have a scrap rate of 10% (that's the whole wafer getting thrown out or ground back down into a test wafer) and still make the target.

As far as the interposer goes, I think assembly on the interposer is the bottleneck rather than supply. Using low melting point solder that will also wet silicon is an absolute pain.

EdZ · Jul 22, 2016

PNP said:
Well, 8 high x 4 stacks = 32 chips per board, and something like the QCT QuantaPlex T21W-3U can be populated by 8 NVLink boards, so that's 256 chips. HBM2 has a die area of 91.99 sq. mm, assuming we exclude 3 mm from the wafer edge (even though I think SK Hynix doesn't use edge exclusion) and use a 100 um scribe, that's 655 dies per wafer. To sell 1 million units, that's 256 million chips or 390,840 (rounded up) wafers. Over seven months, the shipment target would be 55,834 wafers a month which is peanuts for a modern 300 mm wafer memory line at >92% automation let alone several. You could probably have a scrap rate of 10% (that's the whole wafer getting thrown out or ground back down into a test wafer) and still make the target.

As far as the interposer goes, I think assembly on the interposer is the bottleneck rather than supply. Using low melting point solder that will also wet silicon is an absolute pain.

Don't forget assembly of the HBM modules themselves. Stacking chip-on-wafer is tricky enough, die-on-die with TSVs is even trickier.

PlayfulPhoenix · Jul 22, 2016

Can't wait to see how well it overclocks. At 11 TFLOPS of base performance, every 10% clock improvement will deliver ~1TFLOP of additional perf.

PNP · Jul 22, 2016

EdZ said:
Don't forget assembly of the HBM modules themselves. Stacking chip-on-wafer is tricky enough, die-on-die with TSVs is even trickier.

Tricky, yes, but there is considerable risk management before assembly (EDS of all chips before stacking, etc). In any case, with HBM1 there were redundant TSVs that allow a degree of defect while still functioning normally, just like how DRAM and flash are made with slightly more capacity than will be advertised. It's reasonable to assume that HBM2 also has such test TSVs.

Not all stacks survive, and this is true of any manufacturing process, but how much yield loss is the result is questionable.

CC Ricers · Jul 22, 2016

PlayfulPhoenix said:
Can't wait to see how well it overclocks. At 11 TFLOPS of base performance, every 10% clock improvement will deliver ~1TFLOP of additional perf.

You'll need to reach 2Ghz to reach 14 TFLOPS, and I kinda doubt it'll be done easily. Well, not with the stock cooler anyways. Otherwise I don't see much value in it stock, for ~35% more performance than the 1080 at nearly double the price.

EdZ · Jul 22, 2016

PlayfulPhoenix said:
Can't wait to see how well it overclocks. At 11 TFLOPS of base performance, every 10% clock improvement will deliver ~1TFLOP of additional perf.

To put this in perspective: If you could overclock a Titan X from its 1531MHz Boost clock (I assume the 11TFlop figure is at boost rather than base) by 30% to the ~2GHz some GTX 1080s have been achieving with additional cooling, you've just added 3.3TFlop, or stuffing nearly an extra GTX 970 (3.5TFlop) into there!

PNP · Jul 22, 2016

I suppose it depends on what this card is supposed to be. The monikers "GTX" and "GeForce" are noticeably absent from the name. If this card is supposed to be a psuedo-Quadro/Tesla, high clock-speed likely won't be a priority. Higher TDP and no AIB coolers could make for a grim outlook.

I forget, how well did the original Titan overclock?

PlayfulPhoenix · Jul 22, 2016

CC Ricers said:
You'll need to reach 2Ghz to reach 14 TFLOPS, and I kinda doubt it'll be done easily. Well, not with the stock cooler anyways. Otherwise I don't see much value in it stock, for ~35% more performance than the 1080 at nearly double the price.

We'll just have to wait and see on OC performance - the architecture makes very good overclocking possible, but the TDP, cooler, and massive size of that chip are going to have significant effects.

As for the 'value', TITAN is the 'performance-over-value' king of the GPU lineup for Nvidia, the 'performance-over-value' manufacturer of the GPU space. So the pricing is not all that surprising.

EdZ said:
To put this in perspective: If you could overclock a Titan X from its 1531MHz Boost clock (I assume the 11TFlop figure is at boost rather than base) by 30% to the ~2GHz some GTX 1080s have been achieving with additional cooling, you've just added 3.3TFlop, or stuffing nearly an extra GTX 970 (3.5TFlop) into there!

My hope is that you can get that on water. If you can, then I might be a customer.

Lots of folks might, I suspect - that's probably enough for reliable 4K 60FPS gaming. It could be the first card to actually deliver on that oft-repeated promise.

Search

News NVIDIA Announces GP102-based TITAN X with 3,584 CUDA cores

iFreilicht

FlexATX Authority

Phuncz

Lord of the Boards

EdZ

Virtual Realist

|||

King of Cable Management

EdZ

Virtual Realist

iFreilicht

FlexATX Authority

Phuncz

Lord of the Boards

PNP

Airflow Optimizer

EdZ

Virtual Realist

PlayfulPhoenix

Founder of SFF.N

BirdofPrey

Standards Guru

PNP

Airflow Optimizer

EdZ

Virtual Realist

PlayfulPhoenix

Founder of SFF.N

PNP

Airflow Optimizer

CC Ricers

Shrink Ray Wielder

EdZ

Virtual Realist

PNP

Airflow Optimizer

PlayfulPhoenix

Founder of SFF.N