Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

A New Golden Age for Computer Architecture:, Schemes and Mind Maps of Computer Architecture and Organization

History, Challenges, and Opportunities. Lessons of last 50 years of Computer Architecture. 1. Software advances can inspire architecture.

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 05/11/2023

ekapad
ekapad 🇮🇳

5

(17)

266 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
8/28/19
1
David Patterson
UC Berkeley and Google
August 22, 2019
Full Turing Lecture:
https://www.acm.org/hennessy-patterson-turing-lecture
1
A New Golden Age for
Computer Architecture:
History, Challenges, and Opportunities
Lessons of last 50 years of Computer Architecture
1. Software advances can inspire architecture
innovations
2. Raising the hardware/software interface creates
opportunities for architecture innovation
3. Ultimately the marketplace settles architecture
debates
2
IBM Compatibility Problem in Early 1960s
By early 1960’s, IBM had 4 incom patible lines of computers!
701 7094
650 7074
702 7080
1401 7010
Each system had its own:
Instruction set architecture (ISA)
I/O system and Secondary Storage:
magnetic tapes, drums and disks
Assemblers, compilers, libraries, ...
Market niche: business, scientific, real time, ...
IBM System/360 one ISA to rule them all
3
Control versus Datapath
Processor designs split between datapath, where numbers are stored and
arithmetic operations computed, and control, which sequences operations on
datapath
Biggest challenge for computer desig ners was getting control correct
Maurice Wilkes invented the
idea of microprogramming to
design the control unit of a
processor*
Logic expensive vs. ROM or RAM
ROM cheaper and faster than RAM
Control design now programming
Condition?
Control
Main Memo ry
Address Data
Control Lines
Datapath
PC
Inst. Reg.
Registers
ALU
Instruction
Busy?
4
* "Micro-programming and the design of the control circuits in an electronic digital computer,"
M. Wilkes, and J. Stringer. Mathematical Proc. of the Cambridge Philosophical Society, Vol. 49, 1953.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download A New Golden Age for Computer Architecture: and more Schemes and Mind Maps Computer Architecture and Organization in PDF only on Docsity!

David Patterson

UC Berkeley and Google

August 22, 2019

Full Turing Lecture: https://www.acm.org/hennessy-patterson-turing-lecture (^) 1

A New Golden Age for

Computer Architecture:

History, Challenges, and Opportunities

Lessons of last 50 years of Computer Architecture

1. Software advances can inspire architecture

innovations

2. Raising the hardware/software interface creates

opportunities for architecture innovation

3. Ultimately the marketplace settles architecture

debates

2

IBM Compatibility Problem in Early 1960s

By early 1960’s, IBM had 4 incompatible lines of computers!

Each system had its own:

▪ Instruction set architecture (ISA) ▪ I/O system and Secondary Storage: magnetic tapes, drums and disks ▪ Assemblers, compilers, libraries,... ▪ Market niche: business, scientific, real time, ...

IBM System/360 – one ISA to rule them all 3

Control versus Datapath

▪ Processor designs split between datapath , where numbers are stored and arithmetic operations computed, and control , which sequences operations on datapath ▪ Biggest challenge for computer designers was getting control correct

Maurice Wilkes invented the idea of microprogramming to design the control unit of a processor*

▪ Logic expensive vs. ROM or RAM ▪ ROM cheaper and faster than RAM ▪ Control design now programming

Condition?

Control

Main Memory

Address Data

Control Lines

Datapath

PC Inst. Reg.Registers ALU

Instruction

Busy?

4

  • "Micro-programming and the design of the control circuits in an electronic digital computer," M. Wilkes, and J. Stringer. Mathematical Proc. of the Cambridge Philosophical Society, Vol. 49, 1953.

Microprogramming in IBM 360

Model M30 M40 M50 M Datapath width 8 bits 16 bits 32 bits 64 bits Microcode size 4k x 50 4k x 52 2.75k x 85 2.75k x 87 Clock cycle time (ROM) 750 ns 625 ns 500 ns 200 ns Main memory cycle time 1500 ns 2500 ns 2000 ns 750 ns Price (1964 $) $192,000 $216,000 $460,000 $1,080, Price (2018 $) $1,560,000 $1,760,000 $3,720,000 $8,720,

Fred Brooks, Jr. 5

IC Technology, Microcode, and CISC

▪ Logic, RAM, ROM all implemented using same transistors

▪ Semiconductor RAM ≈ same speed as ROM

▪ With Moore’s Law, memory for control store could grow

▪ Since RAM, easier to fix microcode bugs

▪ Allowed more complicated ISAs (CISC)

▪ Minicomputer (TTL server) example:

  • Digital Equipment Corp. (DEC)
  • VAX ISA in 1977

▪ 5K x 96b microcode

6

Microprocessor Evolution

▪ Rapid progress in 1970s, fueled by advances in MOS technology, imitated minicomputers and mainframe ISAs ▪ “Microprocessor Wars”: compete by adding instructions (easy for microcode), justified given assembly language programming ▪ Intel iAPX 432: Most ambitious 1970s micro, started in 1975 ▪ 32-bit capability-based, object-oriented architecture, custom OS written in Ada ▪ Severe performance, complexity (multiple chips), and usability problems; announced 1981 ▪ Intel 8086 (1978, 8MHz, 29,000 transistors) ▪ “Stopgap” 16-bit processor, 52 weeks to new chip ▪ ISA architected in 3 weeks (10 person weeks) assembly-compatible with 8 bit 8080 ▪ IBM PC 1981 picks Intel 8088 for 8-bit bus (and Motorola 68000 was late)

7

▪ Estimated PC sales: 250,

▪ Actual PC sales: 100,000,000 ⇒ 8086 “overnight” success

▪ Binary compatibility of PC software ⇒ bright future for 8086

Analyzing Microcoded Machines 1980s

HW/SW interface rises from assembly to HLL programming ▪ Compilers now source of measurements ▪ John Cocke group at IBM ▪ Worked on a simple pipelined processor, 801 minicomputer (ECL server), and advanced compilers inside IBM ▪ Ported their compiler to IBM 370, only used simple register-register and load/store instructions (similar to 801) ▪ Up to 3X faster than existing compilers that used full 370 ISA! ▪ Emer and Clark at DEC in early 1980s* ▪ Found VAX 11/780 average clock cycles per instruction (CPI) = 10! ▪ Found 20% of VAX ISA ⇒ 60% of microcode, but only 0.2% of execution time!

8

  • "A Characterization of Processor Performance in the VAX-11/780," J. Emer and D.Clark, ISCA , 1984.

John Cocke

Moore’s Law Slowdown in Intel Processors

13

Moore, Gordon E. "No exponential is forever: but ‘Forever’ can be delayed!" Solid-State Circuits Conference, 2003.

We’re now in the^ 15X

Post Moore’s Law Era

Technology & Power: Dennard Scaling

Power consumption based on models in Esmaeilzadeh [2011]. 14

Energy scaling for fixed task is better, since more and faster transistors

Power consumption based on models in “Dark Silicon and the End of Multicore Scaling,” Hadi Esmaelizadeh, ISCA, 2011

End of Growth of Single Program Speed?

15

End of the Line? 2X / 20 yrs (3%/yr)

RISC 2X / 1. yrs (52%/yr)

CISC 2X / 3.5 yrs (22%/yr)

End of Dennard ScalingMulticore 2X / 3. yrs (23%/yr)

Am- dahl’s Law2X / 6 yrs ( 12%/yr )

Based on SPECintCPU. Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, 6/e. 2018

Current Security Challenge

● Spectre: speculation ⇒ timing attacks that leak ≥10 kb/s

● More microarchitecture attacks on the way*

● Spectre is bug in computer architecture definition vs chip

● Need Computer Architecture 2.0 to prevent timing leaks**

● Software not yet secure ⇒ how can hardware help?

  • “A Survey of Microarchitectural Timing Attacks and Countermeasures on Contemporary Hardware,” Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser, Journal of Cryptographic Engineering, April, 2018 ** “A Primer on the Meltdown & Spectre Hardware Security Design Flaws and their Important Implications”, Mark Hill, 2/15/18, Computer Architecture Today

Looks Bad!

" What we have before us are some breathtaking

opportunities disguised as insoluble problems ."

-John Gardner, 1965

What Opportunities Left? (Part I)

▪ SW-centric

  • Modern scripting languages are interpreted,

dynamically-typed and encourage reuse

  • Efficient for programmers but not for execution

▪ HW-centric

  • Only path left is Domain Specific Architectures
  • Just do a few tasks, but extremely well

▪ Combination:

  • Domain Specific Languages & Architectures
  • Raises level of HW/SW Interface

18

What’s the Opportunity?

Matrix Multiply: relative speedup to a Python version

(on 18 core Intel CPU)

19

from: “There’s Plenty of Room at the Top,” Leiserson, et. al., Science , to appear.

50X

7X

20X

9X

63,000X

What Opportunities Left?

▪ Only performance path left is Domain Specific

Architectures (DSAs)

  • Just do a few tasks, but extremely well

▪ Achieve higher efficiency by tailoring the

architecture to characteristics of the domain

▪ Not one application, but a domain of

applications

▪ Different from strict ASIC since still runs

software 20

Perf/Watt TPU vs CPU & GPU

25

Measure performance of

Machine Learning?

See MLPerf.org (“SPEC for ML”) ● Benchmark suite being developed by 23 companies and 7 universities ● 1 st^ Results Public 12/12/

Using production applications vs

contemporary CPU and GPU

ML Training

Trends

Moore’s Law performance doubles every 18 months

From “AI and Compute.” Dario Amodei and Danny Hernandez, May 16, 2018

ML Training

Moore’s Law

ML Training

Trends

Since 2012,

AI training state of the art compute

demand 10X per year!

(Moore’s Law “only” 10X in 5 years)

From “AI and Compute.” Dario Amodei and Danny Hernandez, May 16, 2018

ML Training

Moore’s Law

Training: TPUv2 (5/2017), TPUv3 (5/2018)

Peak: 11.5 PetaFLOP/s Peak: >100 PetaFLOP/s

ResNet-50 Speedup: Batch Size, Optimizer, Accuracy

Ying, C., Kumar, S., Chen, D., Wang, T. and Cheng, Y., December 2018. Image Classification at Supercomputer Scale. arXiv preprint arXiv:1811..

29

Current Neural Network Architecture Debate

30

● Google TPU: 1 core per chip, large 2D multiplier,

software controlled memory (instead of caches)

● NVIDIA GPU: 80 cores, many threads (20MB registers),

small multipliers, caches, scatter/gather & coalescing HW

● Microsoft FPGA: customize “hardware” to application

● Intel CPU: 30+ cores, 3 levels of caches, SIMD instructions

● Also bought Altera that supplies Microsoft’s FPGAs

● Also bought Nervana, Movidius, MobilEye to offer custom chip DSA

● > 100 startups with their own architecture bets

● #3. Ultimately the marketplace settles architecture debates

Cerebus announces ML Training “Chip” 8/19/

31

300 mm (12 inch) wafer

215 x 215 mm (8.5 x 8.5 inch) “chip”

32

What Opportunities Left? (Part II)

Software advances can inspire

architecture innovations

● Why open source compilers and

operating systems but not ISAs?

37

NVDLA: An Open DSA and Implementation

● NVDLA: NVIDIA Deep Learning

Accelerator for DNN Inference

● Free & Open: All SW, HW, and

documentation on GitHub

● Scalable, configurable design

● Each block operates independently

or in pipeline to bypass memory

● Data type configurable: int8, int16, fp16,

● 2D MAC array configurable:

8 to 64 x 4 to 64

● Size scales 6X (0.5 - 3mm^2 ), power scales 15X (20 - 300 mW)

● RISC-V core as host (optional) 38

Security and Open Architecture

● Security community likes simple, verifiable (no trap doors),

alterable, free and open architecture and implementations

● Equally important is number of people and organizations

performing architecture experiments

● Want all the best minds to work on security

● Plasticity of FPGAs + open source RISC-V implementations

and SW ⇒ novel architectures can be deployed online,

subjected to real attacks, evaluated & iterated in weeks vs

years (even 100 MHz OK)

● RISC-V may become security exemplar via HW/SW

codesign by architects and security experts

What Opportunities Left? (Part III)

▪ Software advances can inspire innovations

▪ Agile: small teams do short development

between working but incomplete prototypes and

get customer feedback per step

▪ Scrum team organization

  • 5 - 10 person team size
  • 2 - 4 week sprints for next prototype iteration

▪ New CAD enables SW Dev techniques to make

small teams productive via abstraction & reuse

=> Agile Hardware Development

39

Agile Hardware Development Methodology

C++

FPGA

ASIC Flow

Tape-in

Tape-out

Big Chip Tape-out

Small chip

tape-out 100

chips 1x1mm

@ 28nm is

affordable at

40

Lee, Y., Waterman, A., Cook, H., Zimmer, B., Keller, B., Puggelli, A., ... & Chiu, P. F. (2016). “An agile approach to building RISC-V microprocessors.” IEEE Micro , 36 (2), 8-20.

AWS FPGA

F1 instance ⇒

develop new

prototypes

using cloud

(nothing to

buy)

Lessons of last 50 years of Computer Architecture

1. Software advances can inspire architecture innovations

○ Microprogramming - control as SW

○ RISC, x86 ISA - (Hardware) translator vs interpreter

○ Open Architectures & Implementations

○ Agile Hardware Development

2. Raising the HW/SW interface enables arch.

opportunities

○ Assembly to HLL ⇒ RISC

○ HLL to Domain Specific Language⇒DSA

3. Ultimately the marketplace settles architecture debates

○ Losers: 432

○ Winners: IBM S/360, 8086 (PC Era), RISC (Post PC Era)

○ Open vs Proprietary ISA (RISC-V vs ARM): Too soon to tell

○ ML DSA (SIMD vs GPU vs TPU vs FPGA vs startups): Too soon to tell

41

Questions?

42

Quantum Computing to the Rescue?

● Google, IBM, Microsoft pursuing Quantum Computing

● Physics, Math, Theory results are beautiful

● For Cloud, not Client

● #1 Recommendation of Quantum Workshop May 2018:*

First and foremost, there is an overarching need for new

Quantum Computing algorithms that can make use of the

limited qubit counts and precisions available in the foreseeable

future. Without a “killer app” or at least a useful app runnable in

the first ten years, progress may stall.

  • “Next Steps in Quantum Computing: Computer Science’s Role,” May 22-23, 2018, Washington D.C., Computing Community Consortium

Quantum Computing to the Rescue?

● Quantum Computing - Progress and Prospects*

● 12/2018 consensus study from National Academies

● " Significant technical and financial issues remain towards

building a large, fault-tolerant quantum computer and one is

unlikely to be built within the coming decade.”

Gwynne, Peter. (2019). “Practical quantum computers still at least a

decade away.” Physics World. 32. 9-9. 10.1088/2058-7058/32/1/14.

*Mark Horowitz (Chair, NAE, Stanford, EE), Alán Aspuru-Guzik (U. Toronto, Chemistry), David Awschalom (NAE & NAS, U. Chicago, Physics), Robert Blakley (Citigroup), Dan Boneh (NAE, Stanford, CS), Susan Coppersmith (NAS, U. Wisconsin, Physics), Jungsang Kim (Duke, Physics & CS), John Martinis (UCSB & Google), Margaret Martonosi (Princeton, CS), Michele Mosca (U. Waterloo, Math & Physics), William Oliver (MIT, Physics), Krysta Svore (Microsoft), Umesh Vazirani (NAE, Berkeley, CS), National Academies, Washington D.C. https://www.nap.edu/catalog/25196/quantum-computing-progress-and-prospects

Need Free & Open Specification

To Have Free & Open Designs

4949

Free & Open

Spec

Licensable

Spec

Closed

Spec

Specifications

Specifications

Need Free & Open Specification

To Have Free & Open Designs

5050

Designs (“Source”)

Free & Open

Designs

Licensable

Designs

Closed

Designs

Free & Open

Spec

Licensable

Spec

Closed

Spec

Specifications

Designs

Specifications

Products

Need Free & Open Specification

To Have Free & Open Designs

5151

Designs (“Source”)

Free & Open

Designs

Licensable

Designs

Closed

Designs

Free & Open

Spec

Licensable

Spec

Closed

Spec

Specifications

Designs

Specifications Based on Closed

Designs

Products

Need Free & Open Specification

To Have Free & Open Designs

5252

Designs (“Source”)

Free & Open

Designs

Licensable

Designs

Closed

Designs

Free & Open

Spec

Licensable

Spec

Closed

Spec

Specifications

Designs

Specifications

Based on Licensed or Closed Designs

Based on Closed Designs

Products

$5M + 4% $25M

Need Free & Open Specification

To Have Free & Open Designs

5353

Designs (“Source”)

Free & Open

Designs

Licensable

Designs

Closed

Designs

Free & Open

Spec

Licensable

Spec

Closed

Spec

Specifications

Designs

Specifications

“Open

Source”

Based on Free & Open, Licensed, Closed Designs Based on Licensed or Closed Designs

Based on Closed Designs

Products

OURS Pygmy microprocessor

  • 28nm HPC+ TSMC @ 600 MHz
    • From scratch to tapeout ~7 months

(Thanks to the RISC-V infrastructure)

  • Full RISC-V based heterogenous

multicore architecture

  • 64-bit control processor (RV64g)
    • ~ 10mW active
  • 12 energy-efficient AI engines based on

custom RV vector extensions

  • INT8 : ~4 TOPS/watt
  • FP16 : ~0.35 TOPS/watt
  • 1MB SRAM, LPDDR4 support
  • Retail price < $

OURS (睿思芯科) energy-efficient RISC-V AI Chip for IoT