


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An analysis of Huffman's algorithm using a heap priority queue. It discusses the time cost of the algorithm, sources of information and entropy, uniquely decodable codes, and the Huffman code for an example. The document also covers the implementation of Huffman code trees using arrays and heaps, and overriding operator< for HCNodes.
Typology: Lecture notes
1 / 34
This page cannot be seen from the preview
Don't miss anything!
Implementing a priority queueusing a heap^ •^ A
-^ structural property
-^ Heaps are often just called “priority queues”, because they are naturalstructures for implementing the Priority Queue ADT. They are also usedto implement the heapsort sorting algorithm, which is a nice fast NlogNsorting algorithm •^ Note: This heap data structure has NOTHING to do with the “heap”region of machine memory where dynamic data is allocated...
The heap structural property^ •^ These trees satisfy the structural property of heaps:^ •^ These do not:
Inserting a key in a heap^ •^ When inserting a key in a heap, must be careful to preserve thestructural and ordering properties. The result must be a heap!^ •^ Basic algorithm:
-^ Insert in a heap with N nodes has time cost O(log N)
Deleting the maximum key in a heap^ •^ When deleting the maximum valued key from a heap, must be careful topreserve the structural and ordering properties. The result must be aheap!^ •^ Basic algorithm:
The key in the root is the maximum key. Save its value to return
2.^
Identify the node to be deleted, unfilling bottom level right-to-left
3.^
Move the key in the node to be deleted to the root, and delete the node
4.^
“Trickle down” the key in the root toward the leaves, exchanging with keyin the larger child, until it is in a node whose children have smaller keys, orit is in a leaf
-^ deleteMAX in a MAX heap is O(log N). (What is deleteMIN in a MAXheap?)
Time cost of Huffman coding^ •^ We have analyzed the time cost of constructing the Huffmancode tree^ •^ What is the time cost of then using that tree to code the inputmessage?^ •^ Coding one symbol from the input message takes timeproportional to the number of bits in the code for thatsymbol; and so the total time cost for coding the entiremessage is proportional to the total number of bits in thecoded version of the message^ •^ If there are K symbols in the input message, and the averagenumber of bits per symbol in the Huffman code is H’, then thetime cost for coding the message is proportional to H’∙K^ •^ What can we say about H’? This requires a little informationtheory...
Sources of information and entropy^ •^ A source of information emits a sequence of symbols drawn independently fromsome alphabet^ •^ Suppose the alphabet is the set of symbols
-^ Suppose the probability of symbol
occurring in the source is^
-^ Then the information contained in symbol
is^
log bits, and the average
information per symbol is (logs are base 2): • This quantity H is the “entropy” or “Shannon information” of the informationsource • For example, suppose a source uses 3 symbols, which occur with probabilities1/3, 1/4, 5/12 • The entropy of this source is
Entropy and coding sources ofinformation^ •^ A uniquely decodable code for an information source is a function that assigns toeach source symbol
a unique sequence of bits, called the code for^
-^ Suppose the number of bits in the code for symbol
is^
-^ Then the average number of bits in the code for the information source is •^ Let
be the average number of bits in the code with the smallest average number of bits of all uniquely decodable codes for an information source withentropy
. Then you can prove the following interesting results:
-^ ≤ ′
< + 1
, i.e. you can do no better than
, and will never do worse by
more than one bit per symbol • The Huffman code for this source has codes with average number of bits
′
,
i.e., there is no better uniquely decodable code than Huffman
-^ So Huffman-coding a message consisting of a sequence of K symbols obeying theprobabilities of an information source with entropy H takes at least H’∙K bits butless than H’∙K+K bits
Entropy and Huffman: an example^ •^ Suppose the alphabet consists of the 8 symbols A,B,C,D,E,F,G,H, and themessage sequence to be coded is AAAAABBAHHBCBGCCC^ •^ Given an alphabet of 8 symbols, the entropy
-^ The table of counts or frequencies of symbols in this message would be:^ A: 6; B: 4; C: 4; D: 0; E: 0; F: 0; G: 1; H: 2 •^ The message has length 17, and so the probabilities of these symbols are:^ A: 6/17; B: 4/17; C: 4/17; D: 0; E: 0; F: 0; G: 1/17; H: 2/17 •^ The entropy of this message (and any message with those symbolprobabilities) is then •^ As we saw last time, the Huffman code for this message gives
Implementing trees using arrays^ •^ Arrays can be used to implement trees; this is a good choice in somecases if the tree has a very regular structure, or a fixed size^ •^ One common case is: implementing a heap^ •^ Because heaps have a very regular structure (they are complete binarytrees), the array representation can be very compact: array entriesthemselves don’t need to hold parent-child relational information
-^ the index of the parent, left child, or right child of any node in the array canbe found by simple computations on that node’s index • Huffman code trees do not have as regular a structure as a heap (theycan be extremely unbalanced), but they do have some structure, andthey do have a determinable size •^ structure: they are “full” binary trees; every node is either a leaf, or has 2children •^ size: to code N possible items, they have N leaves, and N-1 internal nodes • These features make it potentially interesting to use arrays toimplement a Huffman code tree
Implementing heaps using arrays^ •^ The heap structural property permits a neat array-basedimplementation of a heap^ •^ Recall that in a heap each level is filled left-to-right, and each level iscompletely filled before the next level is begun (heaps are "complete"binary trees)^ •^ One way to implement a heap with N nodes holding keys of type T, is touse an N element array
heap[0]
-^ if a heap node corresponds to array element indexed
i, then
-^ its left child corresponds to element indexed
2*i + 1
-^ its right child corresponds to element indexed
2*i + 2
-^ ..and so a node indexed
k^ has parent indexed
(k-1)/
-^ The result: Nodes at the same level are contiguous in the array, and thearray has no "gaps" •^ Now it is easy to implement the heap Insert and Delete operations (interms of "bubbling up" and "trickling down"); and easy to implementO(N logN) “heap sort”, in place, in a N-element array
Priority queues in C++^ •^ It is not too hard to code up your own priority queue, and using theheap-structured array is a nice way to do it^ •^ But the C++ STL has a
-^ A C++
-^ However, objects in a priority queue must be comparable to each otherfor priority •^ By default, a
,^ b^ is taken to have higher priority than
a
-^ So let’s see how to override that operator for
Overriding operator< for HCNodes:header file^ In a header file, say HCNode.hpp:^ #ifndef HCNODE_HPP#define HCNODE_HPPclass HCNode {
friend class HCTree;
// so an HCTree can access HCNode fields
private:
HCNode parent;*
// pointer to parent; null if root
bool isChild0;
// true if this is "0" child of its parent
HCNode child0;*
// pointer to "0" child; null if leaf
HCNode child1;*
// pointer to "1" child; null if leaf
unsigned char symb;
// symbol
int count;
// count/frequency of symbols in subtree
public:
// for less-than comparisons between HCNodes bool operator<(HCNode const &) const; }; #endif