Introduction
这是Huffman作业的第四部分,也是最后一部分,同样分为三个小练习。
Exercise 7 Completing the Huffman Codec ADT
To complete the C The other ADT in Huffman.cc is called the Huffman Codec. The word codec is a contraction of two words: COde and DECode. This ADT does the work of coding and decoding using a Huffman tree. The Huffman Codec is a record that stores two records: the Huffman Tree thatwas created in the previous exercise, and a codebook, which was created by analysing the Huffman tree. The codebook has a place to store a C-string for every ASCII character, though most of the codebook will be filled with NULL references; only the characters in the original message will have a code. The function find_codes() builds the codebook using the Huffman Tree.
In this exercise, youll complete the encode() operations for the Huffman Codec. Heres what it looks like currently:
1 | // encode(hcdc, message) |
Your job is to complete this function.
Hints
The operation has to encode each character in the given message. The output of this function is a reference to a C-string, so we have to allocate a C-string big enough to fill with the encoded message. Use a large size, just to be sure.
To encode the message, you have to look at each character in the message, and copy its code from the Codecs codebook to the encoded C-string. Its possible to use strcat() for this, but its just as easy to write your own loop to copy the code for a character from the codebook to the coded C-string. Remember that youre looking at the message one character at a time, but you are copying several characters into the coded C-string.
The encoded C-string is probably twice as long as the original message. Thats because were using 0 and 1 in the codebook; a real compression utility would use bits 0 and 1, not characters. Thats okay for us; seeing the code is much easier to debug.
Exercise Summary
- Complete encode() as indicated.
- Build the testADT app, and run it.
- If your code does the right things, tests 45-48 should all report Passed.
What to hand in:
The file Huffman.cc containing your code for encode() described above.
Grading
- 10 marks: encode() builds a coded string using the codebook and the given message.
Exercise 8 Completing the Huffman Codec ADT
To complete the Huffman Codec, we need an algorithm to decode a coded string. The operation to decode a coded string is called decode(). Weve given it to you in the file Huffman.cc. It calls a function called decode_char(), whose job it is to step through the Huffman tree, starting at the root, and winding up at a leaf; decode_char() returns the character stored in the Frequency record stored at the leaf. (We cannot use the codebook for decoding! We have to use the Huffman Tree. To decode a character using the Huffman tree requires us to step down from tbhe root to a leaf, which should be fairly short. The complexity of stepping through the tree is O(h) where h is the height of the Huffman Tree. If we used the codebook, wed have to look at each possible code, and try to figure out if the code were looking at is a code in the codebook. This would require at least O(nh) time, where n is the number of characters in the message. So use the tree!
Heres what decode_char() looks like currently:
1 | // decode_char(tnode, message, d) { |
Your job is to complete this function.
Hints
Because were using the Huffman Tree, decode_char() is recursive. The base case is when were at a leaf, in which case, we return the character stored in the Frequency record there. If were not at a leaf, we have to go left or right, depending on whether the code has a 0 or a 1.
The tricky part of this function is that we are traversing a Tree and a coded message at the same time. Every time we go left or right, we advance along the coded message too. To do thise, we have an index, d, that tells us where we are in the coded message. Everytime we step deeper into the tree, we increase d. Its important to remember that decode_char() is called by decode(), and it needs to know how many characters of the code were used to reach a leaf. Thats why d is a reference to integer: so that when the correct code is returned, d is updated for decode() as well.
Exercise Summary
- Complete decode_char() as indicated.
- Build the testADT app, and run it.
- If your code does the right things, tests 49-52 should all report Passed.
What to hand in:
The file Huffman.cc containing your code for decode_char() described above.
Grading
- 10 marks: decode_char() correctly returns a character corresponding to the given coded message, and updates the index d.
Exercise 9 Building the Application a9app
If youve completed all the exercises, you should be able to build the main application using the following comandline:
g++ -Wall -pedantic *.cc a9app.cpp -o a9app
If youre lucky, the testADT application has caught most of the bugs in your implementation of the exercises, but theres a chance that there may be some more debugging work to do!
Weve given you a message.txt file, but you can use your own text, maybe smaller to help you debug this rather complicated application. Once you are here, you should feel free to add testing code to testADTs.cpp, to help you figure out where things are going wrong.
Exercise Summary
- Build and run the a9app.
- If your code does the right things, it should read and code and decode the first line of text in the file message.txt.
What to hand in:
The file example.txt containing a copy/paste of the output of your program running.
Grading
- 5 marks: The file example.txt shows that a9app reads, encodes and decodes a message file.