Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Memory Systems: Concepts, Types, and Performance, Lecture notes of Computer Architecture and Organization

Pondicherry University Computer Architecture and Organization

Computer Organization and Architecture Unit 1-5

Typology: Lecture notes

2021/2022

Available from 07/05/2022

vignesh392 🇮🇳

5 documents

1 / 33

This page cannot be seen from the preview

Don't miss anything!

UNIT IV - THE MEMORY SYSTEM: Some Basic Concepts, Semiconductor RAM Memories, Read-Only

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory

Management requirements, Secondary Storage.

THE MEMORY SYSTEM

SOME BASIC CONCEPTS

 Maximum size of the Main Memory

 byte-addressable

 CPU-Main Memory Connection

,

 Measures for the speed of a memory:

o memory access time.

o memory cycle time.

 An important design issue is to provide a computer system with as large and fast a

memory as possible, within a given cost target.

 Several techniques to increase the effective size and speed of the memory:

o Cache memory (to increase the effective speed).

o Virtual memory (to increase the effective size).

SEMICONDUCTOR RAM MEMORIES

 Each memory cell can hold one bit of information.

 Memory cells are organized in the form of an array.

 One row is one memory word.

 All cells of a row are connected to a common line, known as the “word line”.

 Word line is connected to the address decoder.

 Sense/write circuits are connected to the data input/output lines of the memory chip.

Up to 2

k

addressable

MDR

MAR

k

-bit

address bus

n

-bit

data bus

Control lines

( , MFC, etc.)

Processor

Memory

locations

Word length =

n

bits

W

R

Partial preview of the text

Download Memory Systems: Concepts, Types, and Performance and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

THE MEMORY SYSTEM

SOME BASIC CONCEPTS  Maximum size of the Main Memory  byte-addressable  CPU-Main Memory Connection

 Measures for the speed of a memory: o memory access time. o memory cycle time.  An important design issue is to provide a computer system with as large and fast a memory as possible, within a given cost target.  Several techniques to increase the effective size and speed of the memory: o Cache memory (to increase the effective speed). o Virtual memory (to increase the effective size).

SEMICONDUCTOR RAM MEMORIES  Each memory cell can hold one bit of information.  Memory cells are organized in the form of an array.  One row is one memory word.  All cells of a row are connected to a common line, known as the “word line”.  Word line is connected to the address decoder.  Sense/write circuits are connected to the data input/output lines of the memory chip.

Up to 2 k^ addressable

MDR

MAR

k - bit

address bus

n - bit

data bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

R W

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Fig 4.1 organization of bit cells in a memory chips

SRAM cells

 Two transistor inverters are cross connected to implement a basic flip-flop.  The cell is connected to one word line and two bits lines by transistors T1 and T  When word line is at ground level, the transistors are turned off and the latch retains its state  Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b’

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Fig 4.3 Internal organization of a 2M X 8 dynamic memory chip

 Each row can store 512 bytes. 12 bits to select a row, and 9 bits to select a group in a row. Total of 21 bits.  First apply the row address, RAS signal latches the row address. Then apply the column address, CAS signal latches the address.  Timing of the memory unit is controlled by a specialized unit which generates RAS and CAS.  This is asynchronous DRAM

Fast Page Mode

 column addresses can be applied to select and place different bytes on the Suppose if we want to access the consecutive bytes in the selected row.  This can be done without having to reselect the row.  Add a latch at the output of the sense circuits in each row.  All the latches are loaded when the row is selected.  Different data lines.  Consecutive sequence of column addresses can be applied under the control signal CAS, without reselecting the row.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

 Allows a block of data to be transferred at a much faster rate than random accesses.  A small collection/group of bytes is usually referred to as a block.  This transfer capability is referred to as the fast page mode feature. Synchronous DRAMs

Fig 4.4 Synchronous DRAM.

Operation is directly synchronized with processor clock signal.
The outputs of the sense circuits are connected to a latch.
During a Read operation, the contents of the cells in a row are loaded onto the latches.
During a refresh operation, the contents of the cells are refreshed without changing the contents of the latches.
Data held in the latches correspond to the selected columns are transferred to the output.
For a burst mode of operation, successive columns are selected using column address counter and clock.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

 Implement a memory unit of 2M words of 32 bits each.  Use 512x8 static memory chips.  Each column consists of 4 chips.  Each chip implements one byte position.  A chip is selected by setting its chip select control line to 1.  Selected chip places its data on the data output line, outputs of other chips are in high impedance state.  21 bits to address a 32-bit word.  High order 2 bits are needed to select the row, by activating the four Chip Select signals.  19 bits are used to access specific byte locations inside the selectedchip.

Dynamic memories

 Large dynamic memory systems can be implemented using DRAM chips in a similar way to static memory systems.  Placing large memory systems directly on the motherboard will occupy a large amount of space.  Also, this arrangement is inflexible since the memory system cannot be expanded easily.  Packaging considerations have led to the development of larger memory units known as SIMMs (Single In-line Memory Modules) and DIMMs (Dual In-line Memory Modules).  Memory modules are an assembly of memory chips on a small board that plugs vertically onto a single socket on the motherboard.  Occupy less space on the motherboard.  Allows for easy expansion by replacement. Memory controller

 Recall that in a dynamic memory chip, to reduce the number of pins, multiplexed addresses are used.  Address is divided into two parts:  High-order address bits select a row in the array.  They are provided first, and latched using RAS signal.  Low-order address bits select a column in the row.  They are provided later, and latched using CAS signal.  However, a processor issues all address bits at the same time.  In order to achieve the multiplexing, memory controller circuit is inserted between the processor and memory.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Fig 4.6 Use of Memory controller

READ-ONLY MEMORIES Describe about the types of ROM. (11 Marks Dec 2015)

 SRAM and SDRAM chips are volatile:  Lose the contents when the power is turned off.  Many applications need memory devices to retain contents after the power is turned off.  For example, computer is turned on, the operating system must be loaded from the disk into the memory.  Store instructions which would load the OS from the disk.  Need to store these instructions so that they will not be lost after the power is turned off.  We need to store the instructions into a non-volatile memory.  Non-volatile memory is read in the same manner as volatile memory.  Separate writing process is needed to place information in this memory.  Normal operation involves only reading of data, this type of memory is called Read-Only memory (ROM).  Read-Only Memory:  Data are written into a ROM when it is manufactured.  Programmable Read-Only Memory (PROM):  Allow the data to be loaded by a user.  Process of inserting the data is irreversible.  Storing information specific to a user in a ROM is expensive.  Providing programming capability to a user may be better.  Erasable Programmable Read-Only Memory (EPROM):  Stored data to be erased and new data to be loaded.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Fig 4.6 Memory Hierarchy

Fastest access is to the data held in processor registers. Registers are at the top of the memory hierarchy.
Relatively small amount of memory that can be implemented on the processor chip. This is processor cache.
Two levels of cache. Level 1 (L1) cache is on the processor chip. Level 2 (L2) cache is in between main memory and processor.
Next level is main memory, implemented as SIMMs. Much larger, but much slower than cache memory.
Next level is magnetic disks. Huge amount of inexepensive storage.
Speed of memory access is critical, the idea is to bring instructions and data that will be used in the near future as close to the processor as possible.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

CACHE MEMORIES

Explain the various mapping techniques associated with cache memories with example.( Marks April 2015,Dec 2014)

 Processor is much faster than the main memory.  As a result, the processor has to spend much of its time waiting while instructions and data are being fetched from the main memory.  Major obstacle towards achieving good performance.  Speed of the main memory cannot be increased beyond a certain point.  Cache memory is an architectural arrangement which makes the main memory appear faster to the processor than it really is.  Cache memory is based on the property of computer programs known as “locality of reference”.  Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while the others are accessed relatively less frequently.  These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly.  This is called “locality of reference”.  Temporal locality of reference:  Recently executed instruction is likely to be executed again very soon.  Spatial locality of reference:  Instructions with addresses close to a recently instruction are likely to be executed soon.

Fig 4.7 Use of a cache memory

Processor issues a Read request, a block of words is transferred from the main memory to the cache, one word at a time.
Subsequent references to the data in this block of words are found in the cache.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

 If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desired word is overwritten with new information.

Cache Coherence Problem

A bit called as “valid bit” is provided for each block.
If the block contains valid data, then the bit is set to 1, else it is 0.
Valid bits are set to 0, when the power is just turned on.
When a block is loaded into the cache for the first time, the valid bit is set to 1.
Data transfers between main memory and disk occur directly bypassing the cache.
When the data on a disk changes, the main memory block is also updated.
However, if the data is also resident in the cache, then the valid bit is set to 0.
What happens if the data in the disk and main memory changes and the write-back protocol is being used?
In this case, the data in the cache may also have changed and is indicated by the dirty bit.
The copies of the data in the cache, and the main memory are different. This is called the cache coherence problem.
One option is to force a write-back before the main memory is updated from the disk. Mapping functions

 Mapping functions determine how memory blocks are placed in the cache.  A simple processor example:  Cache consisting of 128 blocks of 16 words each.  Total size of cache is 2048 (2K) words.  Main memory is addressable by a 16-bit address.  Main memory has 64K words.  Main memory has 4K blocks of 16 words each.  Three mapping functions:  Direct mapping  Associative mapping  Set-associative mapping. Direct mapping

Block j of the main memory maps to j modulo 128 of the cache.0 maps to 0, 129 maps to l.
More than one memory block is mapped onto the same position in the cache.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Fig 4.8 Direct mapped cache

May lead to contention for cache blocks even if the cache is not full.
Resolve the contention by allowing new block to replace the old block, leading to a trivial replacement algorithm.
Memory address is divided into three fields:
Low order 4 bits determine one of the 16 words in a block.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

High order 12 bits or tag bits identify a memory block when it is resident in the cache.
- Flexible, and uses cache space efficiently.
- Replacement algorithms can be used to replace an existing block in the cache when the cache is full.
- Cost is higher than direct-mapped cache because of the need to search all 128 patterns to determine whether a given block is in the cache.

Set-Associative mapping

Fig 4.10 Set-associative mapped cache with two blocks per set

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

 Blocks of cache are grouped into sets.  Mapping function allows a block of the main memory to reside in any block of a specific set.  Divide the cache into 64 sets, with two blocks per set.  Memory block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.  Memory address is divided into three fields:

6 bit field determines the set number.
High order 6 bit fields are compared to the tag  fields of the two blocks in a set.  Set-associative mapping combination of direct and associative mapping.  Number of blocks per set is a design parameter.
One extreme is to have all the blocks in one set, requiring no set bits (fully associative mapping).
Other extreme is to have one block per set, is the same as direct mapping.

PERFORMANCE CONSIDERATIONS

 A key design objective of a computer system is to achieve the best possible performance at the lowest possible cost.  Price/performance ratio is a common measure of success.  Performance of a processor depends on:  How fast machine instructions can be brought into the processor for execution.  How fast the instructions can be executed. Interleaving  Divides the memory system into a number of memory modules. Each module has its own address buffer register (ABR) and data buffer register (DBR).  Arranges addressing so that successive words in the address space are placed in different modules.  When requests for memory access involve consecutive addresses, the access will be to different modules.  Since parallel access to these modules is possible, the average rate of fetching words from the Main Memory can be increased.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Hit Rate and Miss Penalty

 Hit rate  Miss penalty  Hit rate can be improved by increasing block size, while keeping cache size constant  Block sizes that are neither very small nor very large give best results.  Miss penalty can be reduced if load-through approach is used when loading new blocks into cache. Caches on the processor chip

 In high performance processors 2 levels of caches are normally used.  Avg access time in a system with 2 levels of caches is T (^) ave = h1c1+(1-h1)h2c2+(1-h1)(1-h2)M

Other Performance Enhancements

Write buffer  Write-through:

Each write operation involves writing to the main memory.
If the processor has to wait for the write operation to be complete, it slows down the processor.
Processor does not depend on the results of the write operation.
Write buffer can be included for temporary storage of write requests.
Processor places each write request into the buffer and continues execution.
If a subsequent Read request references data which is still in the write buffer, then this data is referenced in the write buffer.  Write-back:
Block is written back to the main memory when it is replaced.
If the processor waits for this write to complete, before reading the new block, it is slowed down.
Fast write buffer can hold the block to be written, and the new block can be read first. Prefetching
New data are brought into the processor when they are first needed.
Processor has to wait before the data transfer is complete.
Prefetch the data into the cache before they are actually needed, or a before a Read miss occurs.
Prefetching can be accomplished through software by including a special instruction in the machine language of the processor.  Inclusion of prefetch instructions increases the length of the programs.

Memories, Speed, Size, and Cost, Cache Memories, Performance Considerations, Virtual memories, Memory Management requirements, Secondary Storage.

Prefetching can also be accomplished using hardware:  Circuitry that attempts to discover patterns in memory references and then prefetches according to this pattern. Lockup-Free Cache
Prefetching scheme does not work if it stops other accesses to the cache until the prefetch is completed.
A cache of this type is said to be “locked” while it services a miss.
Cache structure which supports multiple outstanding misses is called a lockup free cache.
Since only one miss can be serviced at a time, a lockup free cache must include circuits that keep track of all the outstanding misses.
Special registers may hold the necessary information about these misses_._ VIRTUAL MEMORIES Explain how the virtual address is converted into real address in a paged virtual memory system.(4 Marks April 2015,Nov 2015)

 An important challenge in the design of a computer system is to provide a large, fast memory system at an affordable cost.  Architectural solutions to increase the effective speed and size of the memory system.  Cache memories were developed to increase the effective speed of the memory system.  Virtual memory is an architectural solution to increase the effective size of the memory system.  Recall that the addressable memory space depends on the number of address bits in a computer.  For example, if a computer issues 32-bit addresses, the addressable memory space is 4G bytes.  Physical main memory in a computer is generally not as large as the entire possible addressable space.  Physical memory typically ranges from a few hundred megabytes to 1G bytes.  Large programs that cannot fit completely into the main memory have their parts stored on secondary storage devices such as magnetic disks.  Pieces of programs must be transferred to the main memory from secondary storage before they can be executed.  When a new piece of a program is to be transferred to the main memory, and the main memory is full, then some other piece in the main memory must be replaced.  Recall this is very similar to what we studied in case of cache memories.  Operating system automatically transfers data between the main memory and secondary storage.  Application programmer need not be concerned with this transfer.

Memory Systems: Concepts, Types, and Performance, Lecture notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Memory Systems: Concepts, Types, and Performance and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

Up to 2 k^ addressable

MDR

MAR

k - bit

address bus

n - bit

data bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

R W