Moore’s Law has underwritten a remarkable period of growth and stability for the computer industry. The doubling of transistor density at a predictable cadence has fueled five decades of increased processor performance and the rise of the general-purpose computing model. However, according to a pair of researchers at MIT and Aachen University, that’s all ending. Neil Thompson, a Research Scientist at MIT’s Computer Science and A.I. Lab and a Visiting Professor at Harvard, and Svenja Spanuth, a graduate student from RWTH Aachen University, contend what we have been covering here at The Next Platform all along that the disintegration of Moore’s Law, along with new applications like deep learning and cryptocurrency mining, are driving the industry away from general-purpose microprocessors and toward a model that favors specialized microprocessor. “The rise of general-purpose computer chips has been remarkable. So, too, could be their fall,” they argue.
As they point out, general-purpose computing was not always the norm. In the early days of supercomputing, custom-built vector-based architectures from companies like Cray dominated the HPC industry. A version of this still exists in the vector systems built by NEC. But thanks to the speed at which Moore’s Law has improved the price-performance of transistors over the last few decades, the economic forces have greatly favored general-purpose processors.
That’s mainly because the cost of developing and manufacturing a custom chip is between $30 and $80 million. So, even for users demanding high-performance microprocessors, the benefit of adopting a functional architecture is quickly dissipated as the shrinking transistors in general-purpose chips erases any initial performance gains afforded by customized solutions. Meanwhile, the costs incurred by transistor shrinking can be amortized across millions of processors.
However, the computational economics enabled by Moore’s Law is now changing. In recent years, shrinking transistors have become much more expensive as the physical limitations of the underlying semiconductor material begin to assert themselves. The authors point out that in the past 25 years, the cost to build a leading-edge fab has risen 11 percent per year. In 2017, the Semiconductor Industry Association estimated that constructing a new fab cost about $7 billion. Not only does that drive up the fixed costs for chipmakers, but it has also reduced the number of semiconductor manufacturers from 25 in 2002 to just four today: Intel, Taiwan Semiconductor Manufacturing Company (TSMC), Samsung, and GlobalFoundries. The team also highlights a report by the U.S. Bureau of Labor Statistics (BLS) that attempts to quantify microprocessor performance per dollar. By this metric, the BLS determined that improvements have dropped from 48 percent annually in 2000-2004 to 29 percent annually in 2004-2008 to 8 percent in 2008-2013.
All this has fundamentally changed the cost/benefit of shrinking transistors. As the authors note, Intel’s fixed costs have exceeded its variable costs for the first time in its history due to the escalating expense of building and operating new fabs. Even more disconcerting is that companies like Samsung and Qualcomm now believe that the cost of transistors manufactured on the latest process nodes is increasing, further discouraging the pursuit of smaller geometries. Such thinking was likely behind GlobalFoundries’s recent decision to scrap its plans for its 7nm technology.
It’s not just a deteriorating Moore’s Law. The other driver toward specialized processors is a new set of applications not amenable to general-purpose computing. For starters, platforms like mobile devices and the Internet of Things (IoT) demand energy efficiency and cost. You are deployed in such large volumes that they necessitate customized chips even with a relatively robust Moore’s Law in place. Lower-volume applications with even more stringent requirements, such as military and aviation hardware, are also conducive to special-purpose designs. But the authors believe the real watershed moment for the industry is being enabled by deep learning, an application category that cuts across nearly every computing environment – mobile, desktop, embedded, cloud, and supercomputing.
Deep learning and its preferred hardware platform, GPUs, represent the most visible example of how computing may travel from general-purpose to specialized processors. GPUs, which can be viewed as a semi-specialized computing architecture, have become the de-facto platform for training deep neural networks thanks to their ability to do data-parallel processing much more efficiently than CPUs. The authors point out that although GPUs are also being exploited to accelerate scientific and engineering applications, deep learning will be the high-volume application that will make further specialization possible. Of course, it didn’t hurt that GPUs already had a high-volume business in desktop gaming, the application for which it was originally designed.
But for deep learning, GPUs may only be the gateway drug. A.I. and deep learning chips are already in the pipeline from Intel, Fujitsu, and more than a dozen startups. Google’s own Tensor Processing Unit (TPU), purpose-built to train and use neural networks, is now in its third iteration. “Creating a customized processor was very costly for Google, with experts estimating the fixed cost as tens of millions of dollars,” write the authors. “And yet, the benefits were also great – they claim that their performance gain was equivalent to seven years of Moore’s Law – and that the avoided infrastructure costs made it worth it.”
Thompson and Spanuth also noted that specialized processors are increasingly used in supercomputing. They pointed to the November 2018 TOP500 rankings, which showed that for the first time, specialized processors (mainly Nvidia GPUs) rather than CPUs were responsible for most added performance. The authors also performed a regression analysis on the list to show that supercomputers with specialized processors are “improving the number of calculations that they can perform per watt almost five times as fast as those that only use universal processors and that this result is highly statistically significant.”
Thompson and Spanuth offer a mathematical model for determining the cost/benefit of specialization, considering the fixed cost of developing custom chips, the chip volume, the speedup delivered by the custom implementation, and the rate of processor improvement. Since the latter is tied to Moore’s Law, its slowing pace means it’s getting easier to rationalize specialized chips, even if the expected speedups are relatively modest.
“Thus, for many (but not all) applications, it will now be economically viable to get specialized processors – at least in terms of hardware,” claim the authors. “Another way of seeing this is to consider that during the 2000-2004 period, an application with a market size of ~83,000 processors would have required that specialization provide a 100x speed-up to be worthwhile. In 2008-2013, such a processor would only need a 2x speedup.”
Some of these domains, like deep learning, will be in the fast lane because of their size and suitability for specialized hardware. However, while widely used, areas like database processing may become a backwater since this type of transactional computation does not lend itself to specialized chips, say the authors. Still, other areas, like climate modeling, are too small to warrant their customized hardware, although they could benefit from it.
The authors anticipate that cloud computing will, to some extent, blunt the effect of these disparities by offering a variety of infrastructure for smaller and less catered-for communities. The growing availability of more specialized cloud resources like GPUs, FPGAs, and, in the case of Google, TPUs suggest that the haves and have-nots may be able to operate on a more even playing field.
None of this means CPUs or even GPUs are doomed. Although the authors didn’t delve into this aspect, specialized, semi-specialized, and general-purpose compute engines will be integrated on the same chip or processor package. Some chipmakers are already pursuing this path.
Nvidia, for example, incorporated Tensor Cores, its specialized circuitry for deep learning, in its Volta-generation GPUs. By doing so, Nvidia offered a platform that served traditional supercomputing simulations and deep learning applications. Likewise, CPUs are being integrated with specialized logic blocks for things like encryption/decryption, graphics acceleration, signal processing, and deep learning. Expect this trend to continue.