Nvidia Embraces Deep Neural Nets With Volta

1628 450 965
2017-05-31

At this year's

GPU Technology Conference,

Nvidia's premier conference for technical computing with graphic

processors, the company reserved the top keynote for its CEO Jensen

Huang. Over the years, the GTC conference went from a segment in a

larger, mostly gaming-oriented and somewhat scattershot conference

called "nVision" to become one of the key conferences that mixes

academic and commercial high-performance computing.

Jensen's message was that GPU-accelerated machine learning is growing

to touch every aspect of computing. While it's becoming easier to use

neural nets, the technology still has a way to go to reach a broader

audience. It's a hard problem, but Nvidia likes to tackle hard problems.

Jensen Huang

The Nvidia strategy is to disburse machine learning into every

market. To accomplish this, the company is investing in Deep Learning

Institute, a training program to spread the deep learning neural net

programming model to a new class of developers.

Much as Sun promoted Java with an extensive series of courses, Nvidia

wants to get all programmers to understand neural net programming. With

deep neural networks (DNNs) promulgated into many segments, and with

cloud support from all major cloud service suppliers, deep learning (DL)

can be everywhere -- accessible any way you want it, and integrated

into every framework.

DL also will come to the Edge; IoT will be so ubiquitous that we will

need software writing software, Jensen predicted.  The future of

artificial intelligence is about the automation of automation.

Deep Learning Needs for More Performance

Nvidia's conference is all about building a pervasive ecosystem around

its GPU architectures. The ecosystem influences the next GPU iteration

as well. With early GPUs for high-performance computing and

supercomputers, the market demanded more precise computation in the form

of double precision floating-point format processing, and Nvidia was

the first to add a fp64 unit to its GPUs.

GPUs are the predominant accelerator for machine learning training,

but they also can be used to accelerate the inference (decision)

execution process. Inference doesn't require as much precision, but it

needs fast throughput. For that need, Nvidia's Pascal architecture can

perform fast, 16-bit floating-point math (fp16).

The newest GPU is addressing the need for faster neural net

processing by incorporating a specific processing unit for DNN tensors

in its newest architecture -- Volta. The Volta GPU processor already has

more cores and processing power than the fastest Pascal GPU, but in

addition, the tensor core pushes the DNN performance even further. The

first Volta chip, the V100, is designed for the highest performance.

The V100 is a massive 21 billion transistors in semiconductor company

TSMC's 12nm FFN high-performance manufacturing process. The 12nm

process -- a shrink of the 16nm FF process -- allows the use of models

from 16nm. This reduces the design time.

Even with the shrink, at 815mm2 Nvidia pushed the size of the V100 die to the very limits of the optical reticle.

The V100 builds on Nvidia's work with the high-performance Pascal

P100 GPU, including the same mechanical layout, electrical connects, and

the same power requirements. This makes the V100 an easy upgrade from

the P100 in rack servers.

For traditional GPU processing, the V100 has more than 5,120 CUDA

(compute unified device architecture) cores. The chip is capable of 7.5

Tera FLOPS of fp62 math and 13TF of fp32 math.

Feeding data to the cores requires an enormous amount of memory

bandwidth. The V100 uses second generation high-bandwidth memory (HBM2)

technology to feed 900 Gigabytes/sec of bandwidth to the chip from the

16 GB.

While the V100 supports the traditional PCIe interface, the chip

expands the capability by delivering 300 GB/sec over six NVLink

interfaces for GPU-to-GPU connections or GPU-to-CPU connections

(presently, only IBM's POWER 8 supports Nvidia's NVLink wire-based

communications protocol).

However, the real change in Volta is the addition of the tensor math

unit. With this new unit, it's possible to perform a 4x4x4 matrix

operation in one clock cycle. The tensor unit takes in a 16-bit

floating-point value, and it can perform two matrix operations and an

accumulate -- all in one clock cycle.

Internal computations in the tensor unit are performed with fp32

precision to ensure accuracy over many calculations. The V100 can

perform 120 Tera FLOPS of tensor math using 640 tensor cores. This will

make Volta very fast for deep neural net training and inference.

Because Nvidia already has built an extensive DNN framework with its

CuDNN libraries, software will be able to use the new tensor units right

out of the gate with a new set of libraries.

Nvidia will extend its support for DNN inference with TensorRT --

where it can train neural nets and compile models for real-time

execution. The V100 already has a home waiting for it in the Oak Ridge

National Labs' Summit supercomputer.

Nvidia Drives AI Into Toyota

Bringing DL to a wider market also drove Nvidia to build a new computer

for autonomous driving. The Xavier processor is the next generation of

processor powering the company's Drive PX platform.

This new platform was chosen by Toyota as the basis for production of

autonomous cars in the future. Nvidia couldn't reveal any details of

when we'll see Toyota cars using Xavier on the road, but there will be

various levels of autonomy. including copiloting for commuting and

"guardian angel" accident avoidance.

Unique to the Xavier processor is the DLA, a deep learning

accelerator that offers 10 Tera operations of performance. The custom

DLA will improve power and speed for specialized functions such as

computer vision.

To spread the DLA impact, Nvidia will open source instruction set and

RTL for any third party to integrate. In addition to the DLA, the

Xavier System on Chip will have Nvidia's custom 64-bit ARM core and the

Volta GPU.

Nvidia continues to execute on its high-performance computing roadmap

and is starting to make major changes to its chip architectures to

support deep leaning. With Volta, Nvidia has made the most flexible and

robust platform for deep learning, and it will become the standard

against which all other deep learning platforms are judged.


Info Source:http://www.technewsworld.com/story/84528.html

Online Application close

Inqury

CLOSE
  • title:

  • Leave a message:

You can also contact us by