Can an ARM Overcome the Law of Physics?

There has been quite a lot of talk about ARM Holdings and the ARM processor lately. Some of this is due to the pervasiveness of its architecture in many mobile devices, some of it is due to extensive hype over “new technology” versus “old technology” – an unfortunate metaphor.

Are we to believe processor designers who license the rights to the ARM processor technology are going to “one up” traditional server processor architectures simply because they started out with a stripped down, energy-efficient CPU? Let’s take a look at why not!

Benchmarks results specifically targeting these low-power processors have begun to be published. Many of these benchmarks are based on the Dhrystone benchmark, run on 8088-class processors back in the 1980s! Performance for this class of processor is usually measured in DMIP (Dhrystone Millions of Instructions/Sec), roughly based on a VAX780 MIP. These benchmarks are a far cry from industry standard benchmarks such as the SPEC or TPCC warehouse database suits, etc. Before one starts yelling, how can one expect the ARM class of processors to do well on these benchmarks? One cannot simultaneously reject so-called “old technology” while extolling the wonders of 30 battery hr hand held tablet processor in micro servers. It would indeed be interesting to see SPECint2006 results for these processors, but none seem to exist. The same for a tpcc result? It is noteworthy that a dual core 1.6 GHz Atom processor generates about 8000 DMIPS and dual core Cortex A9 about 4000. This means that if Intel had to drop its clock to say 1GHz to be in the same heat dissipation range as the Cortex A9, they would have “similar” performance – in a single socket environment.

In reality, “new technology” (ARM) and “old technology” (Intel, AMD, IBM POWER) are two different technologies, neither chronologically distinct. If we expect to see a farm of micro servers each with 100 ARM or ARM-like Systems on a Chip in 1U form factors, one should expect they will be running commercial grade applications, the least of which would be web and database servers. Would we see a SPECweb2005 result published for 1024 socket ARM-based micro web server? We had better.

Is one supposed to assume that designers of Intel x86 or IBM POWER are simply wasting millions of transistors due to negligence? No! Will the processors in the “new technology” micro servers use a new way for cache coherency heretofore unknown to the world? I doubt it. SMP cache coherency use transistors and utilize bandwidth. As more performance is demanded from these ARM-class micro servers, processor designers will slowly be incorporating techniques from “old technology” such as huge out of order execution windows, complex caches, novel inter-socket communications, multi-threaded execution and the ability to address huge memory spaces. All these require complexity, transistors, and watts. By the time all this has been accomplished the wheel will have been re-invented again, with these micro servers dissipated about the same heat as the “old technology” processors. If it takes a given number of transistors to perform some advanced function such as wide instruction execution and complex branch prediction, etc., the ARM-class of processors will not perform such functions while simultaneously violating the laws of solid state physics.

The hype surrounding this “new technology” sounds striking familiar to what Sun Microsystems claimed in the last half of the previous decade regarding its “disruptive” Niagara “technology”. Sun said Thread Level Parallelism was taking over the data center, since single thread (Instruction Level Parallelism) was out of gas. Intel didn’t think so! AMD didn’t think so! IBM didn’t think so! Sun placed eight very simplistic SPARC cores on a die with each executing at any given clock tick, one of up to eight thread contexts. Sun claimed clocks speed didn’t matter because slow memory interfaces and long latencies determined system throughput, not clock. Sun could claim something on the order of a watt per [thin] thread context, versus perhaps 25W per [heavy] thread from its competition. Well, about half a decade later Sun+Oracle have reached a point where their processors now dissipate basically the same amount of heat as established Intel, AMD, or IBM POWER processor, and are considering reducing thread count and cranking up the clock – to be competitive with their competition. Sun’s [now Oracle's] competition never felt the need to sacrifice single thread performance, all the while adding cores and real Simultaneous Multi Threading. The IBM POWER7 now has eight cores, each capable of executing 4 instruction threads at the same time. A single POWER7 can execute 32 threads simultaneously at a clock rate nearly triple that of Oracle’s Niagara-based processors. So much for the hype! Something similar will have to happen with the “new technologies” such as ARM-class processors in micro servers if they expect to play with “old technology” big boys.

As with most things, you don’t get something for nothing.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s