




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
1.2.1 Interconnection network topology. Topology refers to the pattern of interconnections between the processors and other system elements.
Typology: Exercises
1 / 172
This page cannot be seen from the preview
Don't miss anything!
(^1) Supp orted by U.S. Department of Energy grant numb er DE-FG02-88ER25052.
For my father John Wilson Dickey 1916{
i
4.2 Packet formats for the interconnection network in a 16 16 NYU Ultracomputer prototyp e. : 72 4.3 Design of systolic combining queue. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 78 4.4 Schematic of a single cell of the combining queue in the forward path comp onent. : : : : : : : 78 4.5 Blo ck diagram of a combining queue implementation. : : : : : : : : : : : : : : : : : : : : : : : 79 4.6 Combining queue transitions for slot j. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 80 4.7 Behavior of chute transfer signal with IN and OUT b oth moving, when combining a 4-packet message. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81 4.8 Behavior of chute transfer signal with IN moving and OUT not moving, when combining a 4-packet message. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 4.9 Behavior of chute transfer signal with OUT moving and IN not moving, when combining a 4-packet message. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83 4.10 Logic to pro duce propagate and generate signals. : : : : : : : : : : : : : : : : : : : : : : : : : 84 4.11 Multiple output Domino CMOS gate in carry chain. : : : : : : : : : : : : : : : : : : : : : : : 85 4.12 Blo ck diagram of a return path comp onent. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 85 4.13 Blo ck diagram of a wait bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 86 4.14 Slot of a wait bu er holding a two-packet message. : : : : : : : : : : : : : : : : : : : : : : : : 87 4.15 Schematic of a wait bu er cell. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 88 4.16 Typ e B switches, molasses and susy simulations, uniform trac, memory cycle 2 : : : : : : : 91 4.17 Typ e B switches, molasses and susy simulations, 0.5 p ercent hot sp ot, no combining, memory cycle 2. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 92 4.18 Typ e B switches, molasses and susy simulations, 0.5 p ercent hot sp ot, combining, memory cycle 2. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 93 4.19 Typ e B switches, molasses and susy simulations, 1 p ercent hot sp ot, no combining, memory cycle 2. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 94 4.20 Typ e B switches, molasses and susy simulations, 1 p ercent hot sp ot, combining, memory cycle
vi
Communication b etween hundreds or thousands of co op erating pro cessors is the key problem in building a massively parallel pro cessor. This thesis is concerned with the b est way to design a fast VLSI switch to b e used in the interconnection network of such a parallel pro cessor. Such a switch should handle the \hot sp ot" problem as well as provide go o d p erformance for uniform trac. The switch designs we consider alleviate the \hot sp ot" problem by adding extra logic to the switches to combine conventional loads and stores as well as fetch-and- op erations destined for the same memory lo cation, according to the metho ds describ ed in [57]. The goal of this work has b een to analyze and design a switching comp onent that is inexp ensive compared to the cost of a pro cessing no de, yet provides the functionality necessary for high-bandwidth, low-latency network p erformance. The theoretical p eak p erformance of a highly-parallel shared-memory multipro cessor may b e less than that of a message-passing multicomputer of equal comp onent count, in which all no des contain a pro cessing element as well as switching hardware. However, the actual p erformance achieved p er pro cessor on a large class of applications should b e much higher in the shared-memory multipro cessor b ecause the dedicated hardware of the network switches provides greater bandwidth p er pro cessing element and handles communication in a more ecient way. The rst section of this intro ductory chapter outlines the contributions of this thesis. The second section discusses related research.
1.1 Contributions
The analyses and simulations rep orted in this thesis were carried out in supp ort of the design and imple- mentation of a switching comp onent for the NYU Ultracomputer architecture. The results are generally concerned with the trade-o b etween overall p erformance and implementation cost. The di erent areas in which results have b een obtained are describ ed in the following subsections.
1.1.1 Performance analysis of di erent switch typ es
Chapter 2 analyzes the e ect that the arrangement and arbitration of bu ers and the degree of the crossbar may have on switch p erformance and cost. Switches in interconnection networks for highly parallel shared memory computer systems may b e imple- mented with di erent internal bu er structures. For a 2 2 synchronous switch, previous studies [78, 116 ] have often assumed a switch comp osed of two queues, one at each output, each of which has unb ounded size and may accept two inputs every clo ck cycle. We call this typ e of switch Type A; a k k Typ e A switch has k queues, one at each output, each of which may accept k inputs p er cycle. Hardware implementations may actually use simpler queue designs and will have b ounded size. Two additional typ es of switch are analyzed, b oth using queues that may accept only one input at a time: for k k switches, a Typ e B switch uses k 2 queues, one for each input/output pair; a Typ e C switch uses only k queues, one at each input. In b oth cases, a multiplexer blo cks all but one queue if more than one queue desires the same output, making these mo dels more dicult to analyze than the previous Typ e A
............... . ............... .
.............. ..
.............. ... ...............
............... .
............... . ............... .
.............. .. .............. ..
@ @
............^ .. .. .............. ..
(C)
.............. ..
.............. ... ...............
............... .
............... . ............... .
.............. .. .............. ..
A AAU