AI and Machine Learning Reviews

2019: A Cambrian Explosion In Deep Learning, Part 1

2019: A Cambrian Explosion In Deep Learning, Part 1

I began out writing a single weblog on the approaching yr’s anticipated AI chips, and the way NVIDIA  may reply to the challenges, however I shortly realized it was going to be for much longer than anticipated. Since there’s a lot floor to cowl, I’ve determined to construction this as three hopefully extra consumable articles. I’ve included hyperlinks to earlier missives for these eager to dig a bit of deeper.

  1. Half 1: Introduction and the massive gamers making an attempt to assault NVIDIA: Intel , AMD , Google , Xilinx , Apple , and Qualcomm 
  2. Half 2: Startups and China Inc. and the roles every might play
  3. Half Three: Potential NVIDIA methods to fend off would-be challengers


Within the final 5 years, NVIDIA grew its knowledge middle enterprise right into a multi-billion-dollar juggernaut with out as soon as dealing with a single credible competitor. That is a tremendous reality, and one that’s unparalleled in immediately’s know-how world, to my recollection. Most of this meteoric progress was pushed by demand for quick GPU chips for Synthetic Intelligence (AI) and Excessive-Efficiency Computing (HPC). NVIDIA’s CEO, Jensen Huang, likes to speak concerning the “Cambrian Explosion” in deep studying, referring particularly to the speedy tempo of innovation in neural community algorithms. We’ll contact on what this implies for NVIDIA in Half Three, however I selected to borrow the idea for the title of this collection. We’re on the doorstop of an explosion in specialised AI silicon, from many giant and small corporations all over the world. Three years in the past, it was subsequent to unimaginable to get enterprise funding for a silicon startup. Now, there are dozens of well-funded challengers constructing chips for AI.

Determine 1: NVIDIA likens the explosion of latest varieties of neural networks to the Cambrian Period when life first emerged. NVIDIA

Final yr, NVIDIA and IBM  reached the top of computing with the announcement they have been powering the world’s quickest tremendous pc, ORNL’s Summit (which owes some 95% of its efficiency to NVIDIA’s Volta GPUs). Whereas that is an unimaginable accomplishment, many are starting to marvel if this entire fairy story can final for NVIDIA.

Determine 2: The Summit Supercomputer on the Division of Power’s Oak Ridge Nationwide Labs is now the quickest pc on the earth. NVIDIA

Within the newest reported quarter, NVIDIA knowledge middle income grew by 58% year-over-year to $792M, almost 25% of the corporate’s complete revenues. This quantities to a complete of $2.86B during the last four quarters. If the corporate can keep that progress, it might generate some $four.5B in knowledge middle income in 2019. Feels like heaven, or at the least heaven on earth, proper?

Unquestionably, NVIDIA builds nice merchandise pushed by its highly effective imaginative and prescient of 1 extensible structure. NVIDIA now enjoys a strong and self-sustaining ecosystem of software program, universities, startups, and companions which have enabled it to grow to be the grasp of its personal newly created universe. Whereas some would argue that this ecosystem creates an impenetrable defensive moat, storm clouds at the moment are showing on the horizon. Potential threats are coming from Intel, Google, AMD and scores of US and Chinese language startups, all drawn into the feeding frenzy of AI.

Thus far, for my part, the competitors has principally been smoke with little or no hearth. Dozens of bulletins have been made by rivals, however I’m fairly assured that none of them have truly taken any income from NVIDIA’s coffers, outdoors of Google. Let’s take a look at the aggressive panorama because it at present stands, wanting in the direction of what’s shaping as much as be a really fascinating 2019.

The massive challengers

Whereas the New York Occasions counted over 40 startups getting into this area, let’s be sensible: there’s solely room for a handful of corporations to be really profitable on this market (say revenues larger than $1B). For coaching Deep Neural Networks (DNNs), NVIDIA might be very exhausting to beat, given the power of its merchandise, its put in base, and its pervasive ecosystem. Nevertheless, the inference market, which is at present fairly small, will ultimately exceed the coaching market in complete income. In contrast to coaching, inference shouldn’t be a monolithic market. It’s composed of a myriad of knowledge varieties and related optimized deep studying algorithms within the cloud and on the edge, every with particular efficiency, energy, and latency necessities. Moreover, there isn’t an 800-pound incumbent gorilla in inference—even within the automotive market the place NVIDIA has laid declare to management. For these causes, inference is the place a lot of the new entrants will primarily or initially focus. Let’s take a look at the massive gamers vying for a spot on the desk.


One of many first corporations to show that a specialised chip (often known as an ASIC, or Software Particular Built-in Circuit) can counter the extra programmable and general-purpose (I can’t consider I simply stated that!) GPU for Deep Studying was Google—who, coincidentally, might be one among NVIDIA’s largest clients. As I’ve beforehand coated, Google has now launched 4 “Tensor Processing Models” (TPUs)—chips and boards that speed up deep studying coaching and inference processing within the cloud and, extra just lately, on the edge. The efficiency of a Google TPU for coaching and processing a DNN is fairly strong, delivering as much as 45 trillion operations per second, or TOPS, per chip. This compares to NVIDIA’s Volta, which peaks out at 125 TOPS. The primary couple of TPUs have been actually for inner use and bragging rights, however Google now makes them obtainable as a service to its cloud clients on Google Compute Cloud.

Whereas TPUs have definitely put a kick into Google’s AI step, the market they serve outdoors of Google’s inner use instances (which, granted, is a fairly giant market) is deliberately restricted. TPUs can solely be used for coaching and operating the Google TensorFlow AI framework; you can’t use them to coach or run AI constructed with Apache MxNet or PyTorch, the fast-growing AI framework supported by Fb  and Microsoft . Nor can you employ them for non-AI HPC purposes, the place GPUs reign supreme. Moreover, you can’t purchase TPUs for on-premises computing in company or authorities knowledge facilities and servers. However Google’s comfortable with all that, because it views TPUs and TensorFlow as strategic to the its AI management throughout the board. Software program that’s optimized for its hardware that’s optimized for its software program could make for a strong and sturdy platform.

The extra quick impression of TPU could also be to behave as validating the ASIC idea as an alternative choice to a GPU, at the very least for potential buyers. The CEO of a Deep Studying chip startup shared with me that enterprise capital started flowing freely as soon as Google introduced its TPU. He has subsequently raised tons of of hundreds of thousands of dollars.

Google has been adept at stealing some limelight from NVIDIA’s predictable bulletins on the GPU Know-how Convention (often in March) and I might not be stunned to see the corporate at it once more this yr—maybe with a 7nm TPU product with spectacular efficiency numbers.

To not be outdone, Amazon Net Providers – +zero% introduced final Fall that it, too, was constructing a custom ASIC for inference processing. Nevertheless, the chip continues to be in improvement and the corporate didn’t share any particulars on the design or availability.


Determine Three: Former Nervana CEO Naveen Rao is main AI product improvement at Intel and has been unusually clear concerning the firm’s methods. INTEL

This will get slightly extra difficult since Intel is such a big participant and has a minimum of one iron in each hearth. Whereas the corporate intends to compete for AI coaching and inference with Nervana chips in “late 2019,” it realizes that inference will turn into a bigger market, and has a really robust hand to play. Along with Xeon CPUs (which have been just lately up to date with considerably improved inference efficiency), the corporate acquired MobileEye and Movidius, for Automotive and embedded inference processing respectively. I’ve seen demos of each units, and they’re certainly spectacular. Intel has additionally invested in a run-anywhere software program stack, referred to as OpenVino, which permits builders to coach anyplace after which optimize and run on any Intel processor. Sensible.

In a revelation at CES in Las Vegas, Intel disclosed that it’s working intently with Fb on the inference model of the Nervana Neural Community Processor (NNP-I)—shocking as a result of many had predicted that Fb was working by itself inference accelerator. In the meantime, Naveen Rao, Intel’s VP and GM of AI merchandise, shared on Twitter  that the NNP-I might be an SOC (System-On-a-Chip), inbuilt Intel’s 10nm Fab, and can embrace IceLake x86 cores. Mr. Rao indicated this may be a standard theme sooner or later for Intel, maybe a reference to future X86/GPUs for desktop and laptop computer chips akin to AMD’s APUs.

For coaching, Intel’s unique plan was to announce a “Lake Crest” Nervana NNP in 2017, a yr after the Nervana acquisition. Then it slipped to 2018, after which, nicely, the corporate determined to start out over. This was doubtless not as a result of the primary Nervana half wasn’t any good; fairly, the corporate realized the system simply wasn’t ok to considerably out-perform NVIDIA and the TensorCores it added to Volta and subsequent GPUs. We’ll see this film play out once more, I think, when NVIDIA unveils no matter surprises it’s cooking up for its 7nm half—however I’m getting forward of myself.

Qualcomm and Apple

I embrace these two corporations for the sake of completeness, as each are delivering spectacular AI capabilities targeted on cellular handsets (and, in Qualcomm’s case, IOT units and autonomous automobiles). Apple, in fact, focuses on its A collection CPUs for iPhones and the IOS working system help for in-phone AI. As cellular turns into a dominant platform for AI inference in speech and picture processing, these two gamers have plenty of IP they will use to determine management (though Huawei can also be pushing very exhausting on AI, as we’ll cowl in Half 2).


AMD has been onerous at work for the final three years getting its software program home for AI in working order. Once I labored there in 2015, you couldn’t even run its GPUs on a Linux server with out booting Home windows. The corporate has come a great distance since then, with ROCm software program and compilers to simplify migration from CUDA, and MlOpen (to not be confused with OpenML) to speed up math libraries on its chips. Presently, nevertheless, AMD’s GPUs stay at the least a era behind NVIDIA V100s for AI, and the V100 is approaching two years previous. It stays to be seen how nicely AMD can compete with NVIDIA TensorCores on 7nm. AMD might determine to focus extra on the bigger inference market, maybe with a semi-custom silicon platform for autonomous automobiles, akin to the NVIDIA Xavier SOC. Time will inform.


Make no mistake, Xilinx , the main vendor of programmable logic units (FPGAs), had a incredible 2018. Along with saying its subsequent era structure for 7nm, it scored vital design wins at Microsoft, Baidu , Amazon, Alibaba, Daimler Benz, and others. In AI inference processing, FPGAs have a definite benefit over ASICs as a result of they are often reconfigured on the fly for a selected job at hand. This issues quite a bit when the underlying know-how is altering quickly, as is the case for AI. Microsoft, for instance, confirmed off how its FPGAs (now from Xilinx in addition to Intel) can use 1-bit, Three-bit, or virtually any precision math for particular layers in a deep neural community. This may increasingly so und like a nerdy nit, however this could dramatically velocity processing and scale back latencies, all whereas utilizing far much less energy. Moreover, the upcoming 7nm chip from Xilinx, referred to as Versal, has AI and DSP engines to hurry up application-specific processing alongside the adaptable logic arrays. Versal will begin delivery someday this yr, and I feel it could possibly be a recreation changer for inference processing.

Within the second weblog of this three-part collection, I’ll discover a number of of the startups within the west and in China which are lining as much as play an necessary position on the earth of AI hardware. Thanks for studying, and keep tuned!

← Earlier Publish
Subsequent Publish →

About the author