Loading recent posts...

Sep 2, 2012

AMD Different CPU Design Method Experiments [Updated]

AMD has decided to concentrate on the bulk manufacturing process instead of the usual Silicon-on-Insulator (SOI) that has been the company’s main technology for the past eleven years.

Back in 2001, AMD’s Palomino processors, or the famous AthlonXP as most of us knew them, were imbued with IBM’s SOI technology to help achieve higher working frequencies by lowering the amount of current leakage and power consumption. When designing a high frequency microprocessor, the architect is not really concentrating on lowering the power consumption. Some of the internal units that are literally getting very hot when heavily in use could pose a problem to the chip’s overall stability and ability to function. Therefore the designers usually take the decision to move their position to a somewhat “cooler” are of the chip. When designing a microchip with tens of millions of transistors or even billions, moving around some of the functional units might make the overall die size of the chip a whole lot bigger.

Back when Intel and AMD were fighting head to head for the absolute performance crown, some concessions about the die size and power consumption were made. Basically the companies thought it was alright to have a 140 watts CPU TDP if that was what it took to win the benchmarks. The same goes for a 10% to 20% bigger die size. This was back in the days when AMD has its own foundries and all the costs were managed by the company itself, but now that AMD is paying TSMC or GlobalFoundries for manufacturing their designs, the bigger die size would come with a significant increase in costs per chip. Now the company is working with different foundries that are mostly using bulk manufacturing process and have various chip design tools at their disposal and it believes all this diversity comes with impressive potential. Real all about these impressive AMD simulations and potential improvements in Part 2 of our AMD report that's coming later this week.

AMD Phenom CPU Die
Image credits to AMD

Manually designing some of the functional units on a microprocessor was possible and was a method often used back when a CPU had 100,000 or a million or tens of millions of transistors.

Now that CPUs are going well beyond the 2 billion transistors mark, moving millions of transistors around on the design is a very complex operation that usually leads to a great increase in die size. Also, hand drawing the layout is a very complex operation that can be heavily optimized by specialized software. When transistor count is the criteria, the most popular complex chips are the GPUs that have reached and surpassed the 4 billion transistors mark back in 2011 with AMD Tahiti design. Since Nvidia’s and AMD’s GPUs are made at TSMC and both designers pay the foundry for each processed silicon wafer, it is very important that the make as many GPUs per wafer as possible.

For this, each foundry usually provides its customers with automated microchip design tools that will take a design and rearrange the units and the transistors in a manner that’s best suited for that specific manufacturing technology. This way, the specialized software makes the chips much smaller and also makes sure that everything works fine. The design software usually goes for two things: the tightest design when everything works properly. When making networking or DPS processors, this is probably the best approach to get as much CPU dies per wafer and, therefore the lowest manufacturing costs per chip. One thing that might not appeal to enthusiasts is that these chips work at much lower frequencies than the ones with manual intervention on the die design.

Basically AMD’s Bulldozer might not easily reach 2 GHz, if its die design would be so crammed. At 2 GHz, the internal units would likely work fine, but raising the frequency any higher would make the “hot” units give out errors or leak electrons that would affect the surrounding transistors.

AMD Hot Chips 2012 Slide
Image credits to AMD

During this year’s Hot Chips conference in Cupertino, California, AMD has presented what they’ve been able to achieve by using automated design tools (software) to rearrange the units inside its Bulldozer processors.

Like we’ve just explained above, such a design “optimization” is mainly used for very big chips such as GPUs and readers should keep in mind that the fastest GPUs today hardly reach above 1.2 GHz in normal conditions. Such a frequency would be catastrophically low for a CPU like AMD’s Bullzoder and despite the lower manufacturing costs and power consumption, not many would be interested in powering a personal computer with something like this. The thing is that the Bulldozer die doesn’t have as many transistors as a Tahiti dies and as such will be able to reach much higher frequencies than the GPU dues to having a smaller and less complex die with a more modest power consumption.

Now we have a 2 GHz Bulldozer that has a small and economic die size and an improved power consumption level. Interestingly enough, AMD is not aiming for a complete overhaul of its CPU die design. Just like shown at the 2012 Hot Chips conference, the company is only redesigning parts of the CPU using the automated design tools. The units in question become much smaller and manifest lower power consumption. AMD is showing a floating point unit (FPU) in the graphs made public at Hot Chips. The unit has been greatly reduced in size by using a different design solution (software using a High Density cell library). A 30% reduction in die area is touted along with a 15% to 30% power consumption and these are results usually obtained by moving manufacturing from one node to another. Getting a design from 32nm to 28nm manufacturing could take a whole year or even more.

Adding to the expected 20% ~ 30% die shrink and power consumption reduction that comes with such a move an additional 30% shrink and power improvement due to a tighter design would result in impressive results. Such results would be comparable with a move to 20nm manufacturing. The unknown is the frequency of such a design, but the company could opt for a differentiated clock design where some units work at a certain frequency and other units have a much higher functional frequency. Extrapolating from AMD’s graphs, we could imagine a CPU that has more FPUs that are all kept fed by very fast dispatch units with efficient branch prediction. It is all about balance and getting the right recipe, but what AMD is basically saying is that they are working with many solutions that offer good improvements and that combined will offer impressive results.


Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | coupon codes