Save a copy of this lab for yourself
- Language: Intermediate Student with Lambda Expressions
- Teachpack: none
- Lab 6 Template (scm) Change the name of this template file using your CNET ID.
0. Huffman Codes
Donald Knuth has called
huffman coding
one of the fundamental ideas in Computer Science for its ubiquitous role
in coding and compression algorithms used in data communication and data
storage.
Many of the most popular
compression methods, such as ZIP, JPEG and MP3 make
use of huffman coding at some point. Huffman coding was discovered by
David Huffman, as a graduate student at MIT in 1951. He and his
classmates were given the choice of a term paper or a final exam. The
term paper was on the problem of finding the most efficient binary code.
Huffman found himself unable to show any of the known codes were most
efficient, and was just about to give-up when he had an epiphany
and the solution to the problem came to him
"It was the most singular moment of my life," Huffman says. "There was the absolute lightning of sudden realization."
A binary code is a way of representing data (called a message), such as a picture or text or music, as a sequence of 0s and 1s (bits.) For example, ASCII code represents text characters using seven bits. Pictures can be represented by lists of positions and colors (the color of each pixel), and colors can be represented using 24 bits to represent the three color components: red, green and blue.
For example, suppose we are coding messages using only the
letters A through H. We can choose a code using three bits for each of the
eight possible characters:
| A 000 | C 010 | E 100 | G 110 |
| B 001 | D 011 | F 101 | H 111 |
With this code, the message
BACADAEAFABBAAAGAH
is encoded as a sequence of 54 bits
001000010000011000100000101000001001000000000110000111
Codes such as ASCII and the A-through-H code above are known as fixed-length codes, because they represent each symbol in the message with the same number of bits.
David Huffman's basic insight was
to use variable-length codes, in which different symbols may
be represented by different numbers of bits. If some symbols appear
more frequently in the message, then we can encode the data using fewer
bits if we assign shorter codes to these symbols.
In the above message the letter A occurs 8 times, B occurs 3 times and
all other letters occur once. Using this information the
huffman coding procedure assigns the following codes to each letter:
| A 0 | C 1010 | E 1100 | G 1110 |
| B 100 | D 1011 | F 1101 | H 1111 |
Notice that A has the shortest code, B next shortest, and all other letters have the same length code. With this code the same message as above is encoded by the sequence
100010100101101100011010100100000111001111
This string contains 42 bits, so it saves more than 20% in space in comparison with the fixed-length code A-through-H.
1. Code Trees
This section will introduce you the data structure, a code-tree,
that we will use to create codes for each symbol occuring in the example
message above. We will use these trees to encode and decode messages
using these symbols.
1a. Code Tree Data Structure
;; A leaf is a
;; (make-leaf symbol number)
(define-struct leaf (symbol weight))
;; A code-tree is either a
;; --leaf, or
; (make-code-tree (listof symbols) number
; code-tree
; code-tree)
(define-struct code-tree (symbols weight left right))
;; and where the following two invariants hold
;;
(a) For each node, the sum of the weights of the left subtree
;; plus the sum of the weights of the right subtree is
;; the weight of the node.
;;
(b) For each node, the symbols in the list of symbols of the node are
;; all of the symbols occurring in leaves of the left subtree
;; and all of the symbols occurring in leaves of the right subtree.
(This is similar to the use of invariants in the data definition of
binary-search-tree.) For example, the following is a
code-tree
(make-code-tree '(A B) 2
(make-leaf 'A 1)
(make-leaf 'B 1))
1b. Basic Algorithm for Building Code Trees
The basic algorithm for building a code-tree starts
with a list of leaves and outputs a code-tree created
from these leaves. The leaves are created from our data about
the symbol frequency in the message, but for now we will
concentrate on building the code-tree from the list
of leaves. We will
use this algorithm to write a Scheme function
for building code-trees in Part 4.
- Create a list of leaves.
- Merge
code-treeson your list (as follows) until there is onecode-treeremaining on the list: - Choose two
code-treeswith smallestweightand remove from the list. - Create a new
code-treeas follows:- Let the
leftandrightsubtrees be the twocode-treesyou removed from the list. - Let
weightbe the sum of theweights of theleftandrightsubtrees. - Let
symbolsbe the list of symbols you get by appending the list of symbols in the left subtree with the list of symbols in the right subtree.
- Let the
- Insert this new
code-treeon your list (Notice that your list ofcode-treeshas shrunk by one.) - Return the remaining
code-treeon your list.
In step 1 we start-off with a list of leaves, which are
code-trees.
In step 2b we produce a code-tree
provided the two items chosen in 2a were also
code-trees. Together, these imply that the remaining
item on our list in step 3 really is a code-tree.
1c. A fixed-length code-tree
Follow the algorithm from Part 1b and create a
code-tree with the following leaves:
(list
(make-leaf 'A 1)
(make-leaf 'B 1)
(make-leaf 'C 1)
(make-leaf 'D 1)
(make-leaf 'E 1)
(make-leaf 'F 1)
(make-leaf 'G 1)
(make-leaf 'H 1))
Notice that in the code-tree you create, the depth of
each leaf (the number of make-code-trees that have
that leaf as a subtree) is always the same, 3.
I have included the code-tree tree on your template,
tree-1.
1d. A variable-length code-tree
Follow the algorithm from Part 1 and create a
code-tree with the following leaves:
(list
(make-leaf 'A 8)
(make-leaf 'B 3)
(make-leaf 'C 1)
(make-leaf 'D 1)
(make-leaf 'E 1)
(make-leaf 'F 1)
(make-leaf 'G 1)
(make-leaf 'H 1))
The weight of each leaf is the frequency that the symbol
occurs in our original message:
BACADAEAFABBAAAGAH
from Part 0. Notice that the depth of the leaf containing the symbol 'A
(the number of make-code-trees that have the leaf with this
symbol as a subtree) is only one, the depth of 'B is three, and the depth
of each other symbol is four.
I have included the code-tree tree on your template,
tree-2.
2. Structural vs. Generative Recursion
Several of the functions you will write will be based upon
generative recursion.
Functions written using structural recursion are data-directed,
your analysis of the data the function has as input drives the
organization of your function through templates.
The recursive structure of the data determines
what new subproblems need to be solved and what the immediately solvable
subproblems are.
Functions written using generative recursion are algorithm-directed,
you must start with an algorithm for generating solutions to determine
what new subproblems need to be solved and which subproblems are
immediately solvable.
The examples you give play
a crucial role in your writing the function body: by working
through simple examples by hand you are led to an understanding of the
of how new subproblems are generated and combined to produce a solution
to your original problem.
There are several questions you need to ask yourself as you set out to write the body of your function
Do I have recursive data elements in the input of my function?If you do, you may need to use structural or generative recursion to write your function. In this case you need to decide which kind of recursion will guide the construction of your function
Do I solve a given problem by solving subproblems generated by the recursive structure of some of the input data elements? If this is so, then you are likely going to use structural recursion. You need to decide which input elements drive the recursion, and produce an appropriate template for the body of your function.If the new subproblems you have generated are not generated by the recursive structure of some of the input data elements, then you will have to use generative recursion. In this case, there are three questions you need to answer:
- What are the immediately solvable cases of the problem at hand?
- How can I break the current problem into smaller subproblems whose solutions I can use to solve the current problem?
- How do I combine the solution to the smaller subproblems to provide a solution to the current problem?
3. Encoding and Decoding Messages
We can encode and decode messages using the code-trees
we construct using the algorithm in Part 1b. Here is a pictorial
representation of the code-tree we generated in
Part 1d, tree-2:

3a. Encoding
Encoding is the process of going from the message
to a binary coding of the message. Because of invariant (b), you
can use the list of symbols, symbols, to follow the
branches of the code-tree until you arrive at a leaf on
which the symbol occurs.
Each time you go down the
left child, add a 0 to your code; each time you go down the
right child add a 1 to your code. For example, given
the pictured tree representation of tree-2
the coding of each symbol is
| 'A 0 | 'C 1010 | 'E 1100 | 'G 1110 |
| 'B 100 | 'D 1011 | 'F 1101 | 'H 1111 |
The depth of the leaf with a given symbol corresponds to the length of the code assigned to that symbol.
Write the following function
;; A bit is either 0 or 1
;; encode: (list-of symbols) code-tree -> (list-of bits)
;; encode msg using ctree
;; (define (encode msg ctree) ... )
Hint: You can write this function using structural recursion.
Test encode using your tree-1
and tree-2 from Part 1.
3b. Decoding
Decoding is the process of retrieving a message given a
binary coding of the message. To decode we start with a
(list-of bits) and our code-tree.
We are trying to find the next symbol using the bits. If
our code-tree has subtrees, then we use the next bit
to select:
- The
leftsubtree if the bit is 0. - The
rightsubtree if the bit is 1.
code-tree is a leaf,
we have our next symbol. If there are remaining bits on
the list we search for our next symbol.
Write the following function
;; A bit is either 0 or 1
;; decode: (list-of bits) code-tree -> (list-of symbols)
;; decode string (of bits) using ctree
;; (define (encode string ctree) ... )
Hint: You can write this function using structural recursion
on several input data elements.
Test decode using your code-trees from Part 1.
4. Building Code Trees
In Part 1b I gave an algorithm for building a code-tree
from a list of leaves. Now, you will write a Scheme function for
implementing this algorithm and actually use the computer
to build the code-trees from Part 1c and 1d.
4a. Frequency Lists
We need to build a (list-of leaves) from information
about the frequency of each symbol.
;; An fpair is a
;; (make-fpair symbol weight)
(define-struct fpair (symbol weight))
Write the following function
;; create-leaves: (list-of fpair) -> (list-of leaf)
;; Create a list of leaves from list of fpairs
;; (define (create-leaves fpairs) ...)
You can test your function on freq-1 and freq-2
given on the template.
4b. build-code-tree
You will write a Scheme function for implementing the algorithm
in Part 1b. The first step, creating the list of leaves from
data about the frequency of each symbol is performed by
create-leaves from Part 3a.
;; build-code-tree: (list-of s-w-p) -> code-tree
;; return a code-tree based on fpairs giving the frequency of
each symbol.
;; fpairs is sorted in increasing order of weight
;; (define (build-code-tree fpairs) ... )
You can test build-code-tree on freq-1
and freq-2 found in the template.
Your output from (build-code-tree freq-1) and
(build-code-tree freq-2) may not exactly match
the examples tree-1 and tree-2. But
the code generated from each symbol in your output trees
must have the same length as these examples.
The depth of the leaf for a given symbol must be the same in your
code-tree as the depth in the example.
Hint:
Writing a helper function for step 2 is the heart of the problem.
;; merge-trees: (list-of code-trees) -> (list-of code-trees)
;; Merge alocts as in step 2 of the algorithm in Part 1b
;; until alocts contains only one code-tree
;; (define (merge-trees alocts)...)
Will this require structural or generative recursion?