Read more.Quote:
SSG stands for 'solid state graphics', and it keeps 1TB of memory close to the GPU.
Printable View
Read more.Quote:
SSG stands for 'solid state graphics', and it keeps 1TB of memory close to the GPU.
I wonder how quickly we can get to 1TB of stacked ram on the same interposer as the GPU...
Interesting, even if it is highly specialised......I would like to think it's XPoint instead of NAND though.
We're currently on 8GB/stack with a max of 4 stacks, AFAIK, although I suspect it wouldn't be impossible to engineer a chip that could drive more stacks? if you could get up to 8 stacks then you'd need 128GB/stack to hit 1TB .. that's still a 16x increase over the existing per-stack capcity... :O_o1:
$9,999 that's the first time I am seeing a price tag like that for AMD :O
Ah but again.....
Can it run Crysis?!!! :P
AFAIK it's Polaris 10 under the hood, so as long as you can install a DX driver for it ... YES!
EDIT: actually, if it's got a 1TB SSD under the hood and you could find a way to access it from the rest of the system, you could not only RUN Crysis on it ... you could INSTALL Crysis on it!
So they've added storage to a graphics card. Next they'll announce that they've added an x86 chip too and it'll be a full SoC (System on Card). Finally they'll announce a version in itx/matx/atx form factor...
Something i don't understand is how it goes from 17fps to 92fps, does moving what i would assume is an M.2 drive using a PCIe lane onto the card really make that much difference.
Meh - Can it run Crysis?!!, Mate, it can run multiple virtual instance of Crysis all day & night long. LOL .
Good on AMD for pushing the memory tech on this nice innovation - AMD 1st at HBM and 1st at 1TB on gpu. I think this would suit the scientific applications very well with their large data sets.
Most people are saying this is just plain old NAND. I wouldn't be surprised if it transfers to XPoint when that becomes available in volume too.
Before the data was going from SSD to RAM across PCIe, then RAM to VRAM over PCIe, processing, then back again.
This is simpler, closer coupled to the GPU, and probably bypasses CPU involvement and having a complicated filesystem and just makes the SSD appear as a big flat area of memory.
So the figures are believable, apart from one thing. I presume you need to move the film from main SSD onto the GPU SSD at the start, and off again at the end. That might take significant time, but in a long editing session that time might be insignificant compared to faster processing.
Interesting to see this is done on a Fiji, not Polaris.
This is going to be an interesting thing to find out about.
Clearly the GPU that this card is using (which may be Polaris, or may be Vega) has the ability to be a PCIe bus master that understands PCIe attached storage enough to use it as a massive memory pool, likely with paging, etc, like a full computer system.
In effect, the GPU is reporting itself to the operating system as a 1TB memory GPU. You load your up-to-1TB workset onto it, maybe it is even transparently handled like a normal GPU, but the memory capacity is so large you can load 32-128 times as much data at the same time.
The GPU has to have a CPU subsystem to control all this. There were very vague rumours about AMD using ARM cores on future GPUs, and maybe it is for this type of application. This would likely run a highly optimised RTOS to manage the SSDs and memory mapping and management (as the GPU portion itself is still dealing with it's local memory and caches).
The speedup comes from having the SSDs be highly local to the GPU and directly attached and controlled by the GPU, avoiding the main system CPU and memory and buses and operating system, and especially a lot of back and forth between the GPU memory and the CPU's memory.
The Radeon Pro SSG is using a Fury GPU.
If it is happening directly on the GPU then I presume it is making full use of the HSA that AMD have been working on for years to treat VRAM as a cache on the SSD and page fault data in and out automatically. Tie that in with their asynchronous thread management to halt and restart threads blocked on paging faults, and it could be very nice to write for.
Or it could be a horrible cludge, I don't have $10K to go find out :D
Looks like a pro duo with the second GPU replaced with an SSD.
Now I thought I had read that somewhere but find where, but it does look like that. If you can read from one SSD whilst writing to the other then that might help performance a bit.
Edit to add: I read it on Ars Technica, though they say it is Polaris 10 based when others say it is Fuji. Charlie likes it: http://semiaccurate.com/2016/07/25/a...pus-calls-ssg/
Its Fiji based. Look at this comparison of the RX480 and the Fury Nano:
http://core0.staticworld.net/images/...67966-orig.jpg
http://core0.staticworld.net/images/...67966-orig.jpg
Not saying they're not believable, just saying it seems something else is going on than just the simply swapping of what's presumably an M.2 SSD attached to the MoBo to it being attached to the GPU, even accounting for the data being routed via SSD, RAM, and across PCIe, something that i guess would add more to latency than cause a bandwidth bottleneck.
In other words i suspect the over 5x increase in fps is down to more than just the swapping of the storage from MoBo to GPU.
Latency generally *is* a bandwidth bottleneck.
There are several methods generally used to improve performance in computers:
Zero copy I/O.
Simplify the data path for the most common case.
Find processing that isn't strictly necessary, and remove it.
Pre-fetch data before you need it to where it will need to be.
Reduce interrupts/CPU context switches.
Avoid locks on a single resource.
I suspect this helps with all those techniques.
That's not my understanding of latency, I've always considered latency the time delay between the cause and the effect, bandwidth on the other hand is how much data can be sent, if i sent 10 4TB HDD via snail mail that would be high latency and high bandwidth, although maybe not very practical. :)
Bandwidth is data over time. Snail mail of high volume may still give a relatively high bandwidth, but you can increase the bandwidth further by reducing the latency, because latency is part of the time measure.
Put another way, reducing latency *always* increases bandwidth, all other things being equal.
Like i said that not my understanding, latency is just the time taken from cause and effect, reducing that only increases bandwidth if you making multiple requests from different sources, whether you send 4TB of data via snail mail or a broadband connection you're still sending 4TB, it's just the former has a latency of days and the later has a latency of milliseconds.
Besides is there even enough latency in the example they used to account for a 5 fold increase.
EDIT:What you seem to be describing there is throughput, that certainly does increase when you reduce latency.
I guess an analogy would be that the largest bandwidth (I guess that would be the right term?) available is a FedEx plane, according to QI. In theory, you can transport a massive amount of data from one place to another (let's ignore the overhead of actually connecting and reading all those drives), but the latency would be hours!
Works just as well with multiple requests from a single source.
Throughput would be a better word, bandwidth tends to get used quite lazily. You have to tell from context if people mean the maximum bandwidth available or actually in use.
With BeSang super Nand aiming for 2c per GB it would be interesting fitting a deasktop card with a couple of hundred GB's
http://hexus.net/tech/news/storage/94762-besang-incs-3d-super-nand-costs-just-2-per-gigabyte/