The Evolution of Deep Learning and AI

Graphic of a brain floating over silicon chips

(Image credit: Shutterstock)

Deep Learning is a relatively new technology, yet it's already driving great advances in visual recognition, natural language processing, text analysis, cybersecurity and more. Increasingly, it’s the technology powering everything from driverless cars to chatbots to automatic photo tagging and marketing campaigns. And while the development of deep learning has been driven by software frameworks and scientific research, it’s also being accelerated by new developments in hardware. 3rd Generation Intel® Xeon® Scalable Processors are a case in point. Featuring a range of new features designed specifically to optimise deep learning, these new CPUs are the core of a hardware platform with the potential to take AI to another level.

Here, the 3rd generation Intel® Xeon® Scalable processor line builds on the strong foundations laid by the second generation. The first generation of Intel® Xeon® Scalable introduced the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set which, through a double-width register and double the number of registers of the earlier AVX2 instruction set, can accelerate performance on demanding workloads, including deep learning. As Walter Riviera, AI Technical Solution Specialist for Intel puts it, AVX-512 “allows you to crunch more instructions per cycle”.

However, 2nd and 3rd generation Intel® Xeon® Scalable processors target AI applications more specifically through a technology called Intel® DL Boost. At its core is a pair of new instructions, dubbed Virtual Neural Network Instructions, or VNNI.

Getting to grips with VNNI

To understand how this works, we need to take a quick step back and look at the foundation of many deep learning applications, the convolutional neural network. Before they can go to work on a dataset or stream of real-time data, these neural networks have to be trained, in order to, for instance, understand the properties that make an object in an image a cat, rather than a dog or a teddy bear. Once the training is complete, the application goes to work, inferring new results by applying the artificial neurons in the network to new data in a series of steps known as convolutions. Before that happens, though, the neural network needs to be optimised, in order to maximise performance and minimise latency and power draw.

One approach to this is to “prune” the neural network, so that the artificial neurons that play a less active role can be removed without affecting accuracy. However, another, increasingly popular approach involves quantization. This means lowering the numerical precision of the data in the neural network from 32-bit (FP32) to 8-bit (INT8). On its own this can improve throughput and processor and memory utilisation, just because less data has to be transferred and processed for each convolution – the fundamental inferencing step.

However, VNNI takes this further. Most processors, even first-generation Xeon Scalable processors with AVX-512, take three instructions to work through a single convolution, but – thanks to its VNNI instructions – a 3rd gen Intel® Xeon® Scalable processor can do the job in just one. “In the older generations – let’s say ten years ago, or even seven – we had to perform three operations to perform a convolution,” explains Walter Riviera. “Now we can do it in a one shot, and this is a huge advantage in speeding up.”

You can gain the benefits with any deep learning framework that supports INT8. As Riviera says, "When they perform an operation like a convolution at that level – at that high level – a system enabled with VNNI and DLBoost will be able, automatically, to detect that instruction and know the proper set of routines to compute that convolution in a fast manner." If you’re performing inferencing in INT8, you’re going to get an instant boost in computational speed.

Accelerating neural networks

The actual increase will vary from neural network to neural network. “When it comes to deep learning, each different model brings a different challenge,” Riviera notes. “It really depends on the use case, because it depends on how many convolutional steps you have in your network. If you have a gigantic neural network full of convolutional steps, that’s going to be accelerated, but if you have a tiny network, again made of convolutional steps, then that’s going to be even faster. There are plenty of parameters that are going to impact the performance.”

In fact, Riviera believes that VNNI won’t just accelerate convolutional neural networks, but also the recurrent neural networks used, for example, in natural language processing as well. “In a recurrent neural network, you might think that VNNI isn’t worth it, because it doesn’t contain too many convolutional steps. However, we have a great example demonstrating VNNI converting and analysing multiple sentences. Any type of deep learning can benefit from VNNI.”

Precision or performance?

Of course, going with INT8 and VNNI involves a trade-off. “As soon as you quantize the model, you’re losing sensitivity,” says Riviera, “because you’re not able to capture all the tiny numbers at the end.” This could potentially prevent your deep learning application from capturing tiny differences, resulting in less accurate recognitions or predictions. However, the impact isn’t usually enough to make a significant difference in real-world use.

Of course, there are applications in fields like healthcare where any drop might be unacceptable, and you want to work at the full resolution. “The user will always have the final word,” says Riviera, “and if they have a VNNI-capable machine, what they can do is convert the model and test it out.” If they’re happy with the level of accuracy, then they can go ahead and use INT8 and VNNI. If they’re not, they can simply work at full precision. As a developer, the choice is yours.

A DL ecosystem

The other crucial thing about Intel’s new architecture when it comes to Deep Learning is that the CPU isn’t the only element in the platform; it’s supported by an ecosystem of memory, storage, connectivity and software. The move to PCIe 4.0 connectivity makes it possible for the processor to work with newer, faster flash-based storage, while the latest Intel Optane Persistent Memory 200 series memory modules deliver the fast, high-capacity memory to handle massive datasets.

VNNI is also supported by a full range of optimised frameworks, including TensorFlow, MXNet and Pytorch, so that the technology works out of the box. Intel’s oneAPI unified programming model also comes into play, with libraries for AI and deep learning and support for a broad range of architectures and processors, including 3rd generation Intel® Xeon® Scalable processors. “Back in 2017-2018 we started defining ourselves not as a CPU-centric company anymore, but we rather define ourselves as a data-centric company,” says Riviera, “and the reason was that, as a technology leader, we get to look at the future and, possibly, help in inventing it.” Not only does Intel collaborate on DL frameworks in a hardware-agnostic manner; it even provides ready-made containers through its OneContainer portal, giving coders a platform on which to build their own deep learning applications. “We truly believe in the heterogeneous system view,” Riviera says.

It’s this kind of hardware that can drive deep learning forward, to make driverless cars a reality, improve diagnostic tools and treatments for diseases, help governments run more effective public services – or simply get more accurate at sorting through your photos when you upload them to the cloud. And 3rd generation Intel Xeon Scalable processors help to do this not through specialist hardware, but through a CPU that provides strong performance across a broad range of other workloads too. In doing so, it makes deep learning something a wider range of businesses and institutions can use.

Learn more about 3rd Generation Intel® Xeon® Scalable Processors

Disclaimer

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Disclaimer

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Disclaimer

Your costs and results may vary.

Disclaimer

Intel technologies may require enabled hardware, software or service activation.

Disclaimer

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.