Loading recent posts...

Oct 14, 2011

Ex-AMD Engineer Blames Bulldozer's Low Performance on Lack of Fine Tuning

AMD recently launched FX-Series processors based on the Bulldozer architecture haven't managed to deliver the performance everybody expected them to, and an ex-AMD engineer has recently come out to share its vision regarding the Bulldozer performance issues. 

Cliff A. Maier has worked as a member of AMD's technical staff until a few years ago, when it left the company at about the same time as AMD has started to use automated design tools for its chips.

According to the engineer, the fact that Bulldozer arrived later than everybody expected it has little to do with its performance problems, as the main issue that affected the architecture was the chip makers adoption of automated design techniques.

Compared to the traditional design techniques that rely on hand-crafting performance-critical parts of the processor, automated tools speed up the design process, but cannot ensure maximum performance and efficiency.

"The management decided there should be such cross-engineering [between AMD and ATI teams within the company] ,which meant we had to stop hand-crafting our CPU designs and switch to an SoC design style,” said Maier in a forum post on Insideris.com.

“This results in giving up a lot of performance, chip area, and efficiency. The reason DEC Alphas were always much faster than anything else is they designed each transistor by hand. Intel and AMD had always done so at least for the critical parts of the chip. 

“That changed before I left - they started to rely on synthesis tools, automatic place and route tools, etc.," continued the engineer.

According to Maier, automatically-generated designs can be 20% bigger and slower that hand-crafted silicon, leading to an increased transistor count, increased die space and low energy-efficieny. 

"I had been in charge of our design flow in the years before I left, and I had tested these tools by asking the companies who sold them to design blocks (adders, multipliers, etc.) using their tools. I let them take as long as they wanted. 

“They always came back to me with designs that were 20% bigger, and 20% slower than our hand-crafted designs, and which suffered from electro-migration and other problems," the former AMD engineer said.

AMD's desktop version of Bulldozer has in total about 2 billion transistors, a particularly large number, which makes it nearly the size of a GPU chip and comes to support Maier's theory.

Each of the Bullodzer modules, containing two computing cores and 2MB of unified L2 cache, includes 213 million transistors and measures 30.9mm2 in size, which means that a quad-module chip should equal about 52 million of transistors and take 123.6mm2 of die space.

In AMD's design, these modules are accompanied by 8MB of Level 3 cache, which should come to include about 405 million transistors, meaning that about 800 million transistors are dedicated to the memory controller, I/O interfaces and various other logic.

This is a particularly large number no matter how you look at it, and is just shy of the 995 million transistors used by Intel in its Sandy Bridge processors that also come with an integrated graphics core and PCI Express controller.

Right now we don't know if this large number of transistors is actually necessary or if it's the result of the automated tools Maier blames for the performance of Bulldozer, but it definitely seems like something fishy is going on with Bulldozer. (via Xbit Labs)


Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | coupon codes