In my recent post about Neural Processors, I noted that almost everyone who can – Google, Apple, IBM, Intel & more – has built neural processors to accelerate neural networks. They are mostly deployed as co-processors to run AI models, but systems will have to do more to adapt as demand for intelligent applications grows. Why? Because AI systems have unique I/O requirements. That’s why the neural processors don’t have caches or need floating point numbers – and their I/O overhead. In the joint application of computer vision, moving many frames of the high-res video also stresses the I/O subsystem. Recurrent neural networks focus on streaming data, another bandwidth-intensive application.
We can limp along today because this is AI’s early days – much as 8-bit processors in 70s PCs worked fine – and only as capacity and performance requirements grow will system architectures have to change. Architects and computer scientists are still learning to optimize data structures and representation for performance. Even so, it is painfully clear that standard x86 architectures will never become preferred AI platforms. As a result, unlike memory buses today, a DNN is most efficient when DRAM bandwidth is evenly partitioned across the DRAM ports. Means that other logic designed for full memory busses, such as multiplexers, is not required either.
Since DRAM can account for as much as 90 percent of energy consumption, minimizing memory logic and using memory efficiently can be a significant cost saving for mobile devices – or a warehouse-scale computer. Memory accesses aren’t the only, or even always the most important, difference between traditional and AI workloads. But there is no doubt that as AI applications grow in sophistication, current architectures – x86 and ARM – will be less relevant.
The Storage Bits take
In the next post, I’ll discuss further the implications that the widespread use of AI applications will have on CPU and server architectures. If applications become widespread – and I believe they will – a new generation of CPUs will be required to run them efficiently and quickly.