Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Algorithm Analysis with Heap Priority Queue, Lecture notes of C programming

An analysis of Huffman's algorithm using a heap priority queue. It discusses the time cost of the algorithm, sources of information and entropy, uniquely decodable codes, and the Huffman code for an example. The document also covers the implementation of Huffman code trees using arrays and heaps, and overriding operator< for HCNodes.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

ekaram
ekaram 🇺🇸

4.6

(30)

264 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE100
Advanced Data Structures
Lecture 13
(Based on Paul Kube course materials)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Huffman Algorithm Analysis with Heap Priority Queue and more Lecture notes C programming in PDF only on Docsity!

CSE

Advanced Data StructuresLecture 13 (Based on Paul Kube course materials)

  • CSE 100 • Priority Queues in Huffman’s algorithm • Heaps and Priority Queues • Time and space costs of coding with Huffman codesReading: Weiss Ch. 6, Ch. 10.1.

Implementing a priority queueusing a heap^ •^ A

heap

is a binary tree with these properties:

-^ structural property

: each level is completely filled, with the possible

exception of the last level, which is filled left-to-right( heaps are “complete” binary trees, and so are balanced:Height H = O(logN) ) • ordering property

: for each node X in the heap, the key value stored in

X is greater than or equal to all key values in descendants of X(this is sometimes called a MAX-heap; for MIN-heaps, replace greaterthan with less than)

-^ Heaps are often just called “priority queues”, because they are naturalstructures for implementing the Priority Queue ADT. They are also usedto implement the heapsort sorting algorithm, which is a nice fast NlogNsorting algorithm •^ Note: This heap data structure has NOTHING to do with the “heap”region of machine memory where dynamic data is allocated...

The heap structural property^ •^ These trees satisfy the structural property of heaps:^ •^ These do not:

Inserting a key in a heap^ •^ When inserting a key in a heap, must be careful to preserve thestructural and ordering properties. The result must be a heap!^ •^ Basic algorithm:

1.^

Create a new node in the proper location, filling level left-to-right

2.^

Put key to be inserted in this new node

3.^

“Bubble up” the key toward the root, exchanging with key inparent, until it is in a node whose parent has a larger key (or it is inthe root)

-^ Insert in a heap with N nodes has time cost O(log N)

Deleting the maximum key in a heap^ •^ When deleting the maximum valued key from a heap, must be careful topreserve the structural and ordering properties. The result must be aheap!^ •^ Basic algorithm:

1.^

The key in the root is the maximum key. Save its value to return

2.^

Identify the node to be deleted, unfilling bottom level right-to-left

3.^

Move the key in the node to be deleted to the root, and delete the node

4.^

“Trickle down” the key in the root toward the leaves, exchanging with keyin the larger child, until it is in a node whose children have smaller keys, orit is in a leaf

-^ deleteMAX in a MAX heap is O(log N). (What is deleteMIN in a MAXheap?)

Time cost of Huffman coding^ •^ We have analyzed the time cost of constructing the Huffmancode tree^ •^ What is the time cost of then using that tree to code the inputmessage?^ •^ Coding one symbol from the input message takes timeproportional to the number of bits in the code for thatsymbol; and so the total time cost for coding the entiremessage is proportional to the total number of bits in thecoded version of the message^ •^ If there are K symbols in the input message, and the averagenumber of bits per symbol in the Huffman code is H’, then thetime cost for coding the message is proportional to H’∙K^ •^ What can we say about H’? This requires a little informationtheory...

Sources of information and entropy^ •^ A source of information emits a sequence of symbols drawn independently fromsome alphabet^ •^ Suppose the alphabet is the set of symbols



-^ Suppose the probability of symbol

occurring in the source is^

-^ Then the information contained in symbol

is^

 log bits, and the average

information per symbol is (logs are base 2): • This quantity H is the “entropy” or “Shannon information” of the informationsource • For example, suppose a source uses 3 symbols, which occur with probabilities1/3, 1/4, 5/12 • The entropy of this source is

Entropy and coding sources ofinformation^ •^ A uniquely decodable code for an information source is a function that assigns toeach source symbol

a unique sequence of bits, called the code for^

-^ Suppose the number of bits in the code for symbol

is^

-^ Then the average number of bits in the code for the information source is •^ Let

be the average number of bits in the code with the smallest average number of bits of all uniquely decodable codes for an information source withentropy

. Then you can prove the following interesting results:

-^  ≤ ′

<  + 1

, i.e. you can do no better than

, and will never do worse by

more than one bit per symbol • The Huffman code for this source has codes with average number of bits

′

,

i.e., there is no better uniquely decodable code than Huffman

-^ So Huffman-coding a message consisting of a sequence of K symbols obeying theprobabilities of an information source with entropy H takes at least H’∙K bits butless than H’∙K+K bits

Entropy and Huffman: an example^ •^ Suppose the alphabet consists of the 8 symbols A,B,C,D,E,F,G,H, and themessage sequence to be coded is AAAAABBAHHBCBGCCC^ •^ Given an alphabet of 8 symbols, the entropy

H^ of a source using those

symbols must lie in the range

0 ≤  ≤ log

. But if the source emits

symbols with probabilities as evidenced in this message, what exactly is itsentropy

H?

-^ The table of counts or frequencies of symbols in this message would be:^ A: 6; B: 4; C: 4; D: 0; E: 0; F: 0; G: 1; H: 2 •^ The message has length 17, and so the probabilities of these symbols are:^ A: 6/17; B: 4/17; C: 4/17; D: 0; E: 0; F: 0; G: 1/17; H: 2/17 •^ The entropy of this message (and any message with those symbolprobabilities) is then •^ As we saw last time, the Huffman code for this message gives

bits per symbol, which is quite close to the message

entropy

Implementing trees using arrays^ •^ Arrays can be used to implement trees; this is a good choice in somecases if the tree has a very regular structure, or a fixed size^ •^ One common case is: implementing a heap^ •^ Because heaps have a very regular structure (they are complete binarytrees), the array representation can be very compact: array entriesthemselves don’t need to hold parent-child relational information

-^ the index of the parent, left child, or right child of any node in the array canbe found by simple computations on that node’s index • Huffman code trees do not have as regular a structure as a heap (theycan be extremely unbalanced), but they do have some structure, andthey do have a determinable size •^ structure: they are “full” binary trees; every node is either a leaf, or has 2children •^ size: to code N possible items, they have N leaves, and N-1 internal nodes • These features make it potentially interesting to use arrays toimplement a Huffman code tree

Implementing heaps using arrays^ •^ The heap structural property permits a neat array-basedimplementation of a heap^ •^ Recall that in a heap each level is filled left-to-right, and each level iscompletely filled before the next level is begun (heaps are "complete"binary trees)^ •^ One way to implement a heap with N nodes holding keys of type T, is touse an N element array

T heap[N];

Nodes of the heap will correspond to

entries in the array as follows:^ •^ The root of the heap is array element indexed 0,

heap[0]

-^ if a heap node corresponds to array element indexed

i, then

-^ its left child corresponds to element indexed

2*i + 1

-^ its right child corresponds to element indexed

2*i + 2

-^ ..and so a node indexed

k^ has parent indexed

(k-1)/

-^ The result: Nodes at the same level are contiguous in the array, and thearray has no "gaps" •^ Now it is easy to implement the heap Insert and Delete operations (interms of "bubbling up" and "trickling down"); and easy to implementO(N logN) “heap sort”, in place, in a N-element array

Priority queues in C++^ •^ It is not too hard to code up your own priority queue, and using theheap-structured array is a nice way to do it^ •^ But the C++ STL has a

priority_queue

class template, which provides

functions

push(), top(), pop(), size()

-^ A C++

priority_queue

is a generic container, and can hold any kind of

thing as specified with a template parameter when it is created: forexample

HCNode

s, or pointers to

HCNode

s, etc.

#include std::priority_queue p;

-^ However, objects in a priority queue must be comparable to each otherfor priority •^ By default, a

priority_queue

uses

operator<

defined for objects of

type

T • if a < b

,^ b^ is taken to have higher priority than

a

-^ So let’s see how to override that operator for

HCNode

s

Overriding operator< for HCNodes:header file^ In a header file, say HCNode.hpp:^ #ifndef HCNODE_HPP#define HCNODE_HPPclass HCNode {

friend class HCTree;

// so an HCTree can access HCNode fields

private:

HCNode parent;*

// pointer to parent; null if root

bool isChild0;

// true if this is "0" child of its parent

HCNode child0;*

// pointer to "0" child; null if leaf

HCNode child1;*

// pointer to "1" child; null if leaf

unsigned char symb;

// symbol

int count;

// count/frequency of symbols in subtree

public:

// for less-than comparisons between HCNodes bool operator<(HCNode const &) const; }; #endif