DBM - A Technology That Wont Happen *yet*

ChainedHope

Airflow Optimizer
Original poster
Jun 5, 2016
306
459
tl;dr: I made a memory subsystem a few years ago that did a lot of cool stuff but wont ever come out. Wanted to show some research being done with memory right now so people could have an idea of future tech and maybe discuss memory a bit (Its a complicated subject and one of the 3 big issues with performance in PCs).

So a few years back I wrote a white paper and simulated Dynamic Bandwidth Memory (DBM) for CPUs right around the time HBM started being produced and made for AMD's then top of the line GPUs. The paper itself isn't too important as I ended up scrapping it after a few peer reviews and it never got published.

The basic idea (high level) was to emulate the way highways and high traffic areas do with electronically changing signs to relocate how many lanes are being used for in-going and out-going traffic. For this I had to design a new memory standard and a new memory controller, managed to simulate it, and made a VHDL prototype for an FPGA to test with. It managed a bandwidth that could be changed in steps from 512mb/s to 256gb/s and would change around every 5 million instructions between each step depending on the percentage of data being flushed from the cache. You might be wondering why this was even proposed and the main issue it was tackling was Cache Thrashing. There is no point in drawing more information from the memory if it just meant the cached data would be dropped and need to be pulled back later wasting precious cycle time waiting.

By limiting the bandwidth we limited the amount of data that could come over the bus, so less information had the probability of being thrown out by prediction algorithms that were guessing what we would need next. This in turn meant we were waiting less time between information and throwing out less to make room for new data. This ended up being very efficient for low data tasks. On the otherside if we unlocked the limit and let it run at full bandwidth we could pull in more data and if the prediction algorithm was correct (say encoding/decoding a file where it is running data through a bit stream) then it didnt need to throw out the extra data. This made high bandwidth tasks run faster.

The downside was that weird middle ground. Because of the stepping there were some instances where the controller would swap back and forth between two areas because the application would decide that X wasnt enough but Y was too much (X < Y). This ended up creating performance issues. Where the good applications that eventually figured out what step it wanted would get a 1-18% boost depending on the application (all percentages are compared to DDR4), the ones that were on the line would lose out on 3-13% in performance. We never could figure out how to fix this other than adding more steps but it made the complexity much larger and more costly.

Some positives of the system was increased performance in most cases (1-18%), power reqs about 60% less than DDR4, heat was displace-able with just airflow from initial calculations, and a more compatible memory system for high and low bandwidth tasks. It also had the cool feature of being able to change the bandwidth-per-processor because of the distribution system that was made meaning a multi-cpu server/computer could run high bandwidth apps on 1 cpu and low bandwidth apps on another and distribute the bandwidth between them as to not starve the high bandwidth cpu.

It was a really cool idea that never did take off but I wanted to share some stuff with the community just so people had some ideas of what we could see in the future for the memory market. While my implementation will not be used (pretty much ever), there are others working on similar technology and you might be seeing it in the next 8-10 years come to the server market. Its a niche use case, but it would work great in something like AWS or VM Farms because of how the distribution system worked with multi-cpu builds.
 

ChainedHope

Airflow Optimizer
Original poster
Jun 5, 2016
306
459
That is absolutely amazing, @ChainedHope . It is always wonderful to read up on the research that is meant to bring technology forward.

Research is a pretty cool area in general but it gets hate for being boring, taking too long, or abruptly being canceled. The big issue is that you don't see this kind of information written out anywhere because most research is held under NDA for a few years (up to a decade depending on proprietary knowledge). What we usually get is a redacted white paper with limited testing data and it being pretty outdated by the time it comes out to the point of it not being useful to future researchers because they have already figured it out or moved on lol.

A good example is CPU Architecture. Finding anything post-Nehalem (2004) with good documentation is pretty hard or requires you to sign some NDA forms. While its good for initially learning how everything works its not useful for understanding improvements that have been made over the last decade and into the current/next generation. This hurts researchers, students, and people working to increase performance because they have limited information unless someone just drops stacks of files on them and drops their NDAs.
 
  • Like
Reactions: Phuncz and Soul_Est

Kmpkt

Innovation through Miniaturization
KMPKT
Feb 1, 2016
3,382
5,935
Research is a pretty cool area in general but it gets hate for being boring, taking too long, or abruptly being canceled.

Unfortunately progress and profit are more often than not antithetical due to corporate impatience (only the next quarter matters). Take a look at the number of products at CES this year that could be considered innovative rather than iterative (Somewhere between zero and two).
 
  • Like
Reactions: jtd871 and Soul_Est

Phuncz

Lord of the Boards
SFFn Staff
May 9, 2015
5,839
4,906
Very interesting read and I agree: technological academic use shouldn't be locked up for a long period of time so no one benefits from it. Something I think this community almost continuously benefits from. But in our current society, it's the greed of the many that outway the generosity of the few (to bastardize a quote Star Trek). I am grateful for these little nuggets of insight into technology we rarely get to witness.
 
  • Like
Reactions: Soul_Est

AleksandarK

/dev/null
May 14, 2017
703
774
Wow, that amazing.
Too bad that it didnt took off. I am sure that there would be market for it. Even tiny one.

It may had got better performance if built a dedicated IC. FPGAs are generally low speed. But 1-18% performance uplift over DDR4 is pretty solid. It is hard to come close to it, not beat it, so you guys did a great work!
Just one question - Do you still have VHDL code?
 
  • Like
Reactions: Soul_Est

ChainedHope

Airflow Optimizer
Original poster
Jun 5, 2016
306
459
Unfortunately progress and profit are more often than not antithetical due to corporate impatience (only the next quarter matters). Take a look at the number of products at CES this year that could be considered innovative rather than iterative (Somewhere between zero and two).

True. Its the difference between an R&D company and a company that also does R&D. For the most part the companies that also do R&D will kill projects early if they cant see enough gain. For a R&D Company they will usually stick it out for a year or two before they kill something thats not working.

Very interesting read and I agree: technological academic use shouldn't be locked up for a long period of time so no one benefits from it. Something I think this community almost continuously benefits from. But in our current society, it's the greed of the many that outway the generosity of the few (to bastardize a quote Star Trek). I am grateful for these little nuggets of insight into technology we rarely get to witness.

One of the reasons I wanted to share some of my work. While it doesn't help many people it does give an insight into what some people are working on behind closed doors. Luckily enough I had no NDAs for the project since it was a small team of people doing it independently as a side project (me and 2 others). Unfortunately the resources are locked up so I cant upload any of the files, but nothing stops me from talking about it.

Wow, that amazing.
Too bad that it didnt took off. I am sure that there would be market for it. Even tiny one.

It may had got better performance if built a dedicated IC. FPGAs are generally low speed. But 1-18% performance uplift over DDR4 is pretty solid. It is hard to come close to it, not beat it, so you guys did a great work!
Just one question - Do you still have VHDL code?

The 1-18% were simulated values. In real world usage it would have been slightly better than DDR4 in some cases and worse in others. The VHDL and FPGA sample was used for temperature measurements, power usage, etc as well as a proof of concept on real technology.

I do not have the VHDL project anymore. This was a few years back and the research is currently locked in cold storage (tape disks) which only one of the project members have access to for security reasons. I wouldn't be able to get the files let alone upload them anytime soon. But I have no NDA on the project since it was a small team doing the work on the side so I'm free to talk about it (and if anyone took this thread, made their own version, and tried to patent it they would fail if it was too close to ours).