Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Hardware for Machine Learning, Exercises of Machine Learning

Notre Dame College (NDC)Machine Learning

What does a modern machine learning pipeline look like? • Many different components. DNN training. Preprocessing of the training set.

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

anala 🇺🇸

4.3

(15)

259 documents

1 / 20

This page cannot be seen from the preview

Don't miss anything!

bg1

Hardware for Machine

Learning

CS6787 Lecture 11 — Fall 2017

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

pff

pf12

pf13

pf14

Related documents

Learning Hardware: Evolvable vs Traditional Approaches

High Performance Machine Learning (HPML)

Machine Language Programming Learning Guide for CS 1104 at University of the People

Machine learning the basics of machine learning

Google Machine Learning Crash Course: Introduction to Machine Learning

Machine learning ensemble methods Machine learning ensemble methods

Pattern Recognition and Machine Learning - Machine Learning | CS 446

Machine Learning Assignment: Introduction to Machine Learning Concepts and Techniques

Machine Learning Lecture Notes: Introduction to Machine Learning and Gaussian Distribution

Concept Learning in Machine Learning

(1)

Learning and Generalization in Machine Learning

Assignment - Supervised Learning - Machine Learning - 1

Partial preview of the text

Download Hardware for Machine Learning and more Exercises Machine Learning in PDF only on Docsity!

Hardware for Machine

Learning

CS6787 Lecture 11 — Fall 2017

Recap: modern ML hardware

Lots of different types
- CPUs
- GPUs
- FPGAs
- Specialized accelerators
Right now, GPUs are dominant …we’ll get to why later

What does a modern machine learning pipeline

look like?

Many different components

DNN

training

Preprocessing of the training set

DNN

inference

New examples to be processed

Where can hardware help?

Everywhere!
There’s interest in using hardware everywhere in the pipeline
- both adapting existing hardware architectures , and
- developing new ones
What improvements can we get?
- Lower latency inference
- Higher throughput training
- Lower power cost

Why are GPUs so popular for

machine learning?

Why are GPUs so popular for

training deep neural networks?

FLOPS: GPU vs CPU

FLOPS: f loating p oint o perations p er s econd From Karl Rupp’s blog https://www.karlrupp.net/2016/ /flops-per-cycle-for-cpus-gpus-and- xeon-phis/ This was the best diagram I could find that shows trends over time.

GPU FLOPS

consistently exceed CPU FLOPS Intel Xeon Phi chips are compute- heavy manycore processors that compete with GPUs

Memory bandwidth: CPU vs GPU

GPUs have higher memory bandwidths than CPUs
- E.g. new NVIDIA Tesla V100 has a claimed 900 GB/s memory bandwidth
- Wheras Intel Xeon E7 has only about 100 GB/s memory bandwidth
But, this comparison is unfair!
- GPU memory bandwidth is the bandwidth to GPU memory
- E.g. on a PCIE2, bandwidth is only 32 GB/s for a GPU

Challengers to the GPU

More compute-intensive CPUs
- Like Intel’s Phi line — promise same level of compute performance and better handling of sparsity
Low-power devices
- Like mobile-device-targeted chips
- Configurable hardware like FPGAs and CGRAs
Accelerators that speed up matrix-matrix multiply
- Like Google’s TPU

Will all computation become

dense matrix-matrix multiply?

What if dense matrix multiply takes over?

Great opportunities for new highly specialized hardware
- The TPU is already an example of this
- It’s a glorified matrix-matrix multiply engine
Significant power savings from specialized hardware
- But not as much as if we could use something like sparsity
It might put us all out of work
- Who cares about researching algorithms when there’s only one algorithm anyone cares about?

What if matrix multiply doesn’t take over?

Great opportunities for designing new heterogeneous, application-

specific hardware

We might want one chip for SVRG, one chip for low-precision
Interesting systems/framework opportunities to give users suggestions

for which chips to use

Or even to automatically dispatch work within a heterogeneous datacenter
Community might fragment
Into smaller subgroups working on particular problems

Recent work on hardware for

machine learning

Abstracts from papers at architecture conferences this year

Questions?

Conclusion
- Lots of interesting work on hardware for machine learning
- Lots of opportunities for interdisciplinary research
Upcoming things
- Paper Review #10 — due today
- Project proposal — due today
- Paper Presentation #11 on Wednesday — TPU