Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Case Studies of Multi-core: POWER4, POWER5, Intel Montecito, and Sun Niagara, Slides of Computer Science

Case studies of multi-core architectures, including power4, power5, intel montecito, and sun niagara. It covers features, pipeline details, cache hierarchy, power efficiency, and die photos. Power4 has a 32 mb/8-way associative l3 cache, while power5 moves the l3 cache to the processor side and increases l2 and l3 cache sizes. Intel montecito features dual-core itanium 2 with separate l2 instruction and data caches. Sun niagara has eight pipelines or cores, each shared by four threads, and a shared 3 mb l2 cache.

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekanath
ekanath 🇮🇳

3.8

(4)

80 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Objectives_template
file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture16/16_1.htm[6/14/2012 12:01:07 PM]
Module 8: Memory Consistency Models and Case Studies of Multi-core
Lecture 16: Case Studies of Multi-core
The Lecture Contains:
POWER4 L3 Cache
POWER4 Die Photo
IBM POWER5
POWER5 Die Photo
Features
Overview
Power Efficiency
Foxton Technology
Die Photo
Features
Pipeline Details
Cache Hierarchy
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Case Studies of Multi-core: POWER4, POWER5, Intel Montecito, and Sun Niagara and more Slides Computer Science in PDF only on Docsity!

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

The Lecture Contains:

POWER4 L3 Cache

POWER4 Die Photo

IBM POWER

POWER5 Die Photo

Features

Overview

Power Efficiency

Foxton Technology

Die Photo

Features

Pipeline Details

Cache Hierarchy

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

POWER4 L3 Cache

On-chip tag (IBM calls it directory), off-chip data 32 MB/8-way associative/512 bytes line Contains eight coherence/snoop controllers Does not maintain inclusion with L2: requires L3 to snoop fabric interconnect also Maintains five coherence states Putting the L3 cache on the other side of the fabric requires every L2 cache miss (even local miss) to cross the fabric: increases latency quite a bit

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

IBM POWER

Same pipeline structure as POWER Added SMT facility Like Pentium 4, fetches from each thread in alternate cycles (8-instruction fetch per cycle just like POWER4) Threads share ITLB and ICache Increased size of register file compared to POWER4 to support two threads: 120 integer and floating-point registers (POWER4 has 80 integer and 72 floating-point registers): improves single-thread performance compared to POWER4; smaller technology (0.13 μ m) made it possible to access a bigger register file in same or shorter time leading to same pipeline as POWER Doubled associativity of L1 caches to reduce conflict misses: icache is 2-way and dcache is 4-way

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

IBM POWER

Dynamic power management With SMT and CMP average number of switching per cycle increases leading to more power consumption Need to reduce power consumption without losing performance: simple solution is to clock it at a slower frequency, but that hurts performance POWER5 employs fine-grain clock-gating: in every cycle the power management logic decides if a certain latch will be used in the next cycle; if not, it disables or gates the clock for that latch so that it will not unnecessarily switch in the next cycle Clock-gating and power management logic themselves should be very simple If both threads are running at priority level 1, the processor switches to a low power mode where it dispatches instructions at a much slower pace

POWER5 Die Photo

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

Power Efficiency

Foxton technology Blind replication of Itanium 2 cores at 90 nm would lead to roughly 300 W peak power consumption (Itanium 2 consumes 130 W peak at 130 nm) In case of lower than the ceiling power consumption, the voltage is increased leading to higher frequency and performance 10% boost for enterprise applications Software or OS can also dictate a frequency change if power saving is required 100 ms response time for the feedback loop Frequency control is achieved by 24 voltage sensors distributed across the chip: the entire chip runs at a single frequency (other than asynchronous L3) Clock gating found limited application in Montecito

Foxton Technology

Embedded microcontroller runs a real-time scheduler to execute various tasks

Module 8: Memory Consistency Models and Case Studies of Multi-core

Lecture 16: Case Studies of Multi-core

Die Photo

Sun Niagara

OR

Ultrasparc T

Features

Eight pipelines or cores, each shared by 4 threads 32-way multithreading on a single chip Starting frequency of 1.2 GHz, consumes 60 W Shared 3 MB L2 cache, 4-way banked, 12-way set associative, 200 GB/s bandwidth Single-issue six stage pipe Target market is web service where ILP is limited, but TLP is huge (independent transactions) Throughput matters