Assignment 10 on Data Structures and Data Management | CS 240 | Exams Computer Science

Assignment 10

Computer Science 240

Fall 2008

Due: Monday, December 8

Reading. Patterson & Hennessy, §7.1 – 7.4

10.1. SRAM is commonly used to implement small, fast, on-chip caches while DRAM is

used for larger, slower main memory. In the past, a common design for supercomputers

was to build machines with no caches and main memories made entirely out of SRAM

(the Cray C90, for example, a very fast computer in its day). If cost were no object,

would you still want to design a system this way?

10.2. For each of the following parts, describe the general characteristics of a program

that would exhibit the given properties. Provide an example program (pseudo code if

fine) in each case.

a. Very little temporal and spatial locality with regard to data accesses.

b. High amounts of temporal locality but very little spatial locality with regard to data

accesses.

c. Very little temporal but very high spatial locality with regard to data accesses.

10.3. A new processor can use either a write-through or write-back cache selectable

through software.

a. Assume the processor will run data-intensive applications with a large number of

load and store operations. Explain which cache write policy should be used.

Justify your answer.

b. Consider the same question but this time for a safety-critical system in which data

integrity is more important than memory performance. Justify your answer.

10.4. Here is a series of address references given as word addresses: 2, 3, 11, 16, 21, 13,

64, 48, 19, 11, 3, 22, 4, 27, 6, 11. Assume a directive-mapped cache with 16-oneword

blocks that is initially empty, label each reference in the list as a hit or a miss and show

the final contents of the cache.

10.5. Compute the total number of bits required to implement the cache in Figure 7.9 on

page 486. This number is different from the size of the cache, which usually refers to the

number of bytes of data stored in then cache. The number of bits needed to implement the

cache represents the total amount of memory needed for storing all the data, tags, and

valid bits.

10.6. Consider a memory hierarchy using one of the three organizations for main

memory shown in Figure 7.11 on page 489. Assume that the cache block size is 16

words, that the width of organization (b) of the figure is four words, and that the number

of banks in organization (c) is four. If the main memory latency for a new access is 10

memory bus clock cycles and the transfer time is 1 memory bus clock cycle, what are the

miss penalties for each of these organizations?

Partial preview of the text

Download Assignment 10 on Data Structures and Data Management | CS 240 and more Exams Computer Science in PDF only on Docsity!

Assignment 10

Computer Science 240 Fall 2008 Due: Monday, December 8 Reading. Patterson & Hennessy, §7.1 – 7. 10 .1. SRAM is commonly used to implement small, fast, on-chip caches while DRAM is used for larger, slower main memory. In the past, a common design for supercomputers was to build machines with no caches and main memories made entirely out of SRAM (the Cray C90, for example, a very fast computer in its day). If cost were no object, would you still want to design a system this way? 10 .2. For each of the following parts, describe the general characteristics of a program that would exhibit the given properties. Provide an example program (pseudo code if fine) in each case. a. Very little temporal and spatial locality with regard to data accesses. b. High amounts of temporal locality but very little spatial locality with regard to data accesses. c. Very little temporal but very high spatial locality with regard to data accesses. 10 .3. A new processor can use either a write-through or write-back cache selectable through software. a. Assume the processor will run data-intensive applications with a large number of load and store operations. Explain which cache write policy should be used. Justify your answer. b. Consider the same question but this time for a safety-critical system in which data integrity is more important than memory performance. Justify your answer. 10 .4. Here is a series of address references given as word addresses: 2, 3, 11, 16, 21, 13, 64, 48, 19, 11, 3, 22, 4, 27, 6, 11. Assume a directive-mapped cache with 16-oneword blocks that is initially empty, label each reference in the list as a hit or a miss and show the final contents of the cache. 10 .5. Compute the total number of bits required to implement the cache in Figure 7.9 on page 486. This number is different from the size of the cache, which usually refers to the number of bytes of data stored in then cache. The number of bits needed to implement the cache represents the total amount of memory needed for storing all the data, tags, and valid bits. 10 .6. Consider a memory hierarchy using one of the three organizations for main memory shown in Figure 7.11 on page 489. Assume that the cache block size is 16 words, that the width of organization (b) of the figure is four words, and that the number of banks in organization (c) is four. If the main memory latency for a new access is 10 memory bus clock cycles and the transfer time is 1 memory bus clock cycle, what are the miss penalties for each of these organizations?

Computer Science 240 10.7. Consider three processors with different cache configurations: Cache 1: Direct mapped with one-word blocks; Cache 2: Direct mapped with four-word-blocks; Cache 3: Two-way set associative with four-word blocks. The following rate measurements have been made: Cache 1: Instruction miss rate 4%; data miss rate 6%; Cache 2 Instruction miss rate 2%; data miss rate 4%; Cache 3: Instruction miss rate 2%; data miss rate 3%. For these processor, one-half of the instructions contain a data reference. Assume that the cache miss penalty is 6 + Block size words. The CPI for this workload was measured on a processor with cache 1 and was found to be 2.0. Determine which processor spends the most cycles on cache misses. 10.8. A machine has a 32-bit byte-addressable virtual space. The page size is 8 KB. How many pages of virtual address space exist? 10.9. A virtual memory has a page size of 1024 words, eight virtual pages, and four physical page frames. The page table is as follows: a. Make a list of all virtual addresses that will cause page faults. b. What are the physical addresses for 0, 3728, 1023, 1024, 1025, 7800, and 4096? 10 .10. A computer has 16 pages of virtual address space but only four page frames. Initially the memory is empty. A program references the virtual pages in the following order: 0, 7, 2, 7, 5, 8, 9, 4 a. Which references cause a page fault with LRU? b. Which references cause a page fault with FIFO? 10.11. An operating system (OS) is a program designed to promote better communications between people and computer hardware. You may think of it a diplomat who, on the one hand, makes the computer system more convenient for people to use,

Computer Science 240

addTwoNumbers: input X -- read 1st num into location X input Y -- read 2nd num into location Y load X -- load X into the AC add Y -- add Y to X and store in AC store SUM -- put the sum into location SUM output SUM -- write SUM in Turtle's box halt -- and stop -- program variables X: 000 -- label a location for X Y: 000 -- label a location for Y SUM: 000 -- label a location for the SUM

2. Changes required for the operating system Our goal in this assignment is to extend the TINKER system so that it serves as a multi-user timesharing system. Here, we will assume that there are a maximum of three simultaneous users, each of which is running an independent user process. A unique integer, known as the process ID is associated with each user process. In these notes, we will refer to these processes using the letters A, B and C. Making this structure possible require the addition of several new features which are described in the following subsections. 2.1 Paging The first problem that we must solve is that one hundred words of memory really don’t give us much room to maneuver, particularly if all parts of every user program must be in memory simultaneously. To get around this, we will assume that we have some secondary memory area (like a disk, for example) and that we can swap regions out of primary memory into secondary storage and vice versa. When swapping is done, it is traditional to move a block of contiguous words as a unit. In the standard terminology, such a unit is called a page. In TINKER, we will use a page size of ten words, so that TINKER's primary memory can be viewed as a set of ten pages, each of which consists of ten words. To refer to pages, we will use the address of the beginning of the page, so that we will use designations like the "40 page" or the " page". 2.2 Virtual memory When a paging system is used in conjunction with a timesharing operating system, memory is partitioned into distinct areas that hold the operating system, operating

Computer Science 240 system data about the individual processes, and the user programs. Here, we will divide up the memory as shown: 00 10 20 30 40 50 60 70 80 90 operating system kernel user page paging data for process A paging data for process B paging data for process C user page user page When a user process is running, its code must be in the region of memory consisting of the 70 page, 80 page and the 90 page. Unfortunately, most users (in that vast community of TINKER programmers out there) do not write programs that fit in three pages, nor do they assume that their programs will live in memory addresses 70-99. The existence of paging is not sufficient to get around this part of the problem. In addition, we need some mechanism by which user addresses in the low part of memory get translated into the appropriate addresses on these three pages. This type of mechanism is called virtual addressing. When virtual addressing is in effect, a user instruction such as LOAD 43 does not refer to the real address 43, but to the appropriate address in the user memory region that contains the physical address corresponding to 43. A page map table stored in the per- process data page for the relevant process provides the translation. The first eight words on the page show the actual page number that corresponds to user address on the first eight virtual pages. Thus, in order to translate address 43 to a real address, the system begins by looking at the fourth word (since this is part of the virtual 40 page) in the appropriate page map. This entry gives the actual memory page that corresponds to that address. For example, if this entry contained 90, then the physical address would be 93, since address 43 specifies word 3 on the indicated page. Of course, not all memory pages can be loaded at the same time, so that page map entries may contain the address of a page on the secondary storage device. In this system, pages that have been “swapped out” in this way are indicated by negative entries in the page table. Whenever the TINKER "hardware" encounters such an entry, it performs all the operations necessary to read in the actual page, swapping out the oldest memory page remaining. In order to make the operating system itself work, it is necessary to be able to "turn off" virtual addressing and restore direct physical addressing. This is done by allowing the TINKER processor to run in two modes. In kernel mode, all addresses are

Computer Science 240 In the single user version of TINKER, it is quite reasonable for the INPUT instruction to wait until the user enters a value before continuing with the program. With multiple users, this is no such a good idea, since some other process may be cooling its heals in the meantime. Thus, in user mode, the execution of the INPUT instruction is handled by trapping to the IWAIT address in the operating system. This causes the program to enter an input wait state until it is resumed when input arrives. When input actually arrives for a process, this causes a trap to IREADY with IPID and IVAL set to the process receiving input and the input value, respectively. The IREADY code computes the address in which this data should be stored, swaps in the page, and places the process back on the run queue.

3. Extended instructions To make all this possible, it is necessary to extend the instruction set of TINKER to include several additional operations, which are internally represented as negative instructions. Each of these extended instructions is described in detail below: LOADI The first two instructions in the extended set are not fundamentally STOREI necessary but simplify the coding. Each of these instruction functions in much the same way as its LOAD and STORE counterpart, but the effective address is determined, not by using the address field directly, but by using the contents of that word as an indirect address. Thus, LOADI PTR loads the AC from the word whose address is contained in PTR. ENQUEUE Appends the process ID at the indicated address to the run queue. DEQUEUE Takes the first process ID off the run queue and stores its number in the indicated address. If the run queue is empty, the value 0 is stored. SSTATE Saves the state of the indicated process by storing the AC , the pre- interrupt PC , and the kernal/user mode indicator in words 8 and 9 of the page table for that process. RSTATE Restores the state for the indicated process by reversing the steps used in SSTATE above. RTI Returns from an interrupt by restoring the previous PC and setting the mode indicator to kernel or user as appropriate. During the interrupt, these values are stored in location 00. GETINS The GETINS instruction takes a process number value from the specified address and loads the AC with the address associated with the instruction at which that process was suspended (presumably an INPUT instruction).

Computer Science 240 GETPAGE The GETPAGE instruction takes an address in the AC and a process number from the specified address and returns a physical pointer in the AC which refers to that address in the kernel address space. If the necessary page is not in memory, it is swapped in from secondary storage.

4. The TINKER operating system We start out with a system that is almost (but not quite) complete. (After all, we need to leave something for you.) 4.1 Almost TINKER (01) SCHED: DEQUEUE PID ;Get next ID from run queue (02) LOAD PID ;Check it (03) JUMPE SCHED ;Zero means try again (04) LOAD C0 ;Get kernel ID (=0) (05) STORE LPID ;Store so clock does not kill us (06) RSTATE PID ;Restore process state (07) RTI ;Resume process (08) CLOCK: STORE ACSAV ;Save user's AC (09) LOAD LPID ;Recover last ID (10) SUB PID ;Same as this? (11) JUMPE SLEEP ;Time for us to snooze (12) LOAD PID ;Copy our ID (13) STORE LPID ;Into LPID (14) RESUME: LOAD ACSAV ;Restore the AC (15) RTI ;And resume process (16) SLEEP: LOAD ACSAV ;Recover state (17) DISMS: SSTATE PID ;Save state (18) ENQUEUE PID ;Put on run queue (19) JUMP SCHED ;And get next process (20) IWAIT: (21) (22) IREADY: (23) (24) (25) (26) (27) (28)

Computer Science 240 4.3 A sample process set As an example, consider the file disk.toy listed below $AJC 111 (01) 320 ; Load the constant 0 (02) 430 ; Initialize I to 0 (03) 431 ; Initialize RESULT to 0 (04) 330 ; Load I (05) 621 ; Add the constant 1 (06) 430 ; And store the result back in I (07) 622 ; Subtract the constant 10 (08) 813 ; If I > 10 we're done (09) 331 ; Load RESULT (10) 530 ; Add I (11) 431 ; And store in RESULT (12) 704 ; Next I (13) 231 ; Print the result (14) 000 ; And stop (20) 0 ; Constant 0 (21) 1 ; Constant 1 (22) 10 ; Upper bound for loop (30) 0 ; Variable I (31) 0 ; Variable RESULT $END $AJC 222 (01) 210 ; Print 1 (02) 220 ; Print 2 (03) 230 ; Print 3 (04) 240 ; Print 4 (05) 250 ; Print 5 (06) 000 ; Halt (10) 1 (20) 2 (30) 3 (40) 4 (50) 5 $END $AJC 333 (01) 340 ;
(02) 453 ; / Let RESULT = 0 (03) 341 ;
(04) 452 ; / Let I = 1 (05) 352 ; \

Computer Science 240 (06) 650 ; > If I=X > 0 then goto DONE (07) 815 ; / (08) 353 ;
(09) 551 ; > Let RESULT = RESULT + Y (10) 453 ; / (11) 352 ;
(12) 541 ; > Let I = I + 1 (13) 452 ; / (14) 705 ; Next I (15) 253 ; Print RESULT (16) 000 ; And STOP (40) 0 ; Constant 0 (41) 1 ; Constant 1 (50) 3 ; X (51) 4 ; Y (52) 0 ; I (53) 0 ; RESULT $END Suppose we load these three programs into Tinker and set the system in motion. Exercise 9.11.1. As thing now stand, the operating system executes in an infinite loop. Does this seem reasonable to you? What about puma’s systems? How do they terminate? In standard TINKER, the HALT command sets runflag = 0, thereby start the execution cycle. In kernel mode this may be all right, but it's not such a good idea in user mode. Why? What should the TINKER hardware do when it encounters a HALT instruction when executing in user mode? Exercise 9.11.2. Only three pages of memory were allocated to user programs. This allocation would be a bit unrealistic in puma’s worlds. In fact, allocating too small a swap space was the cause of the infamous LISP debacle when a former CS computer spent all of its time in kernel mode swapping pages and almost no time in user mode executing code. There is name for this unhappy state of affairs: thrashing. To illustrate how nasty thrashing can get consider the case where only a single page of memory was allocated to swap space. In the best case, the executing process may spend all its execution time on a single page. If no I/O is required and the process doesn't exceed its time quantum, then only a single swap would be needed to see the process to completion. There are instances, however, when even the execution of a single instruction may require multiple swaps. For example, suppose two instructions flanked a page boundary (say address 10 and address 11). After the instruction in location 10 completed execution, a swap would be required to get the instruction in location 11. Assume this instruction is ADD. Why might an additional swap be needed to complete execution? Are there any instructions (other than i/o) that might require three swaps to complete?

Assignment 10 on Data Structures and Data Management | CS 240, Exams of Computer Science

Related documents

Partial preview of the text

Download Assignment 10 on Data Structures and Data Management | CS 240 and more Exams Computer Science in PDF only on Docsity!

Assignment 10

Computer Science 240