CSPP 55001 Algorithms — Autumn 2013

Homework 4 (assigned October 22, due October 29)

Reading: CLRS chapter 15, sections 15.1–15.4.

Written assignment: Solve the following "Do" exercises and assigned problems. Only solutions to the assigned problems should be turned in.
Note: You are responsible for the material covered in both "Do" exercises and assigned problems.
Note: If you work with others, indicate their names at the top of your homework paper. Everyone must submit their own independently written solutions.

"Do" Exercises (not to be handed in):

1. Exercise 15.3-4, page 390.

2. Problem 15-2, page 405.

3. Problem 15-4, pages 405–406.

4. Problem 15-8, pages 409–410. YouTube video

5. Problem 15-9, page 410.

Assigned Problems (to be handed in):

Important: When describing an algorithm in pseudocode, explain the meaning of your variables in English. Comment the lines of your pseudocode. Your pseudocode should be short, clear, and complete to receive full credit. If your code is long, difficult to read, or incomplete, you will not receive full credit.

1. You are given a string of n characters S[1.. n] from a text document in which all punctuation and spaces have been removed, so that the text looks something like "longagoinalandfaraway". You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict( ⋅ ) such that, for any string w, dict(w) = 1 (true) if w is a valid word, and dict(w) = 0 (false) otherwise.
Give an efficient dynamic programming algorithm that determines whether an input string S[1..n] can be reconstituted as a sequence of valid words. The running time should be at most O(n2), assuming calls to dict take unit time.
• Define the subproblem to be solved. For full credit, define all variables you introduce. (5 points)
• Write a recurrence expressing the optimal solution of a general subproblem in terms of optimal solutions of smaller subproblems. Solve the base case(s). (5 points)
• Describe your algorithm in pseudocode. (4 points)
• Analyze the running time of your algorithm, including the number of subproblems and the time spent per subproblem. (1 point)
• In the case that the string is valid, make your algorithm output the corresponding sequence of words. Does the running time of your algorithm increase as a result? (5 points)

2. We define the longest common substring (not subsequence) of two strings to be the longest contiguous string that is a substring of both strings.
Example. The longest common substring of the strings "ABABC" and "ABCBA" is string "ABC" of length 3.
Give an efficient dynamic programming algorithm for the following problem. Given two strings X = x1x2xn and Y = y1y2ym , find the length of their longest common substring, i.e., the largest k for which there are indices i and j with xixi + k − 1 = yjyj + k − 1. The strings X and Y are given as arrays of characters.
• Define the subproblem to be solved. Define all variables introduced. (5 points)
• Write a recurrence expressing the optimal solution of a general subproblem in terms of optimal solutions of smaller subproblems. Solve the base case(s). (5 points)
• Describe your algorithm in pseudocode. (4 points)
• Analyze the running time of your algorithm, including the number of subproblems and the time spent per subproblem. (1 point)

3. Alice and Bob decide to play a simple card game. At the beginning of the game, n cards are dealt face up in a long row. Each card is worth a different number of points. After all the cards are dealt, Alice and Bob take turns removing either the leftmost or rightmost card from the row, until all the cards are gone. At each turn, each player can decide which of the two cards to take. The winner of the game is the player who has collected the most points when the game ends. Each player plays optimally. Neither player is allowed to skip a turn.
• Suppose Alice plays first. Give an example of a sequence of cards such that it is not optimal for Alice to start by removing the available card (either leftmost or rightmost in the row) with the higher point value. That is, show there is a game that Alice will lose if she starts by taking the rightmost card if it has higher value, or the leftmost card if it has higher value. (2 points)
• Let s1, …, sn be the initial sequence of n face up cards, where each card si has a point value vi. Assume that n is even. Give a dynamic programming algorithm to compute an optimal strategy for the player who plays first.
• Define the subproblem to be solved. Define all variables introduced. (5 points)
• Write a recurrence expressing the optimal solution of a general subproblem in terms of optimal solutions of smaller subproblems. Solve the base case(s). (5 points)
• Describe your algorithm in pseudocode. (4 points)
• Explain why your algorithm is correct. (1 point)
• Analyze the running time of your algorithm, including the number of subproblems and the time spent per subproblem. (1 point)
• Extend your algorithm to keep track of the optimal move at any point of the game. (2 points)

4. You are running a high-performance computing system capable of processing several terabytes of data per day. For each of n days, you are presented with a quantity of data; on day i, you are presented with xi terabytes. For each terabyte you process, you receive a fixed revenue, but any unprocessed data becomes unavailable at the end of the day, i.e., it cannot be worked on in any future day. In addition, the amount of data your system can process decreases with each day that passes since the most recent reboot of the system. On the first day after a reboot, your system can process s1 terabytes, on the second day after a reboot, it can process s2 terabytes, and so on, up to sn; we assume s1 > s2 > … > sn. Of course, on day i you can only process up to xi terabytes regardless of how fast your system is. To get the system back to peak performance, you can choose to reboot it, but on any day you reboot the system, you cannot process any data at all.

Problem. Given the amounts of available data x1, …, xn for the next n days, and given the profile of your system as expressed by s1, …, sn, starting from a freshly rebooted system on day 1, choose the days on which you are going to reboot so as to maximize the total number of data you process.

Example. Suppose n = 4, and the values of xi and si are given by the following table.

 Day 1 Day 2 Day 3 Day 4 x 10 1 7 7 s 8 4 2 1

The best solution would be to reboot on day 2 only; in that case, 8 terabytes would be processed on day 1, then 0 on day 2, then 7 on day 3, then 4 on day 4, for a total of 19. (Note that if you did not reboot at all, 8 + 1 + 2 + 1 = 12 terabytes would be processed.)
• Given an example of an instance with the following properties:
– There is a surplus of data in that xi > si for every i.
– The optimal solution reboots the system at least twice.
In addition to the example, you should say what the optimal solution is. You do not need to prove that it is optimal. (3 points)

• Give an efficient dynamic programming algorithm that takes values for x1, …, xn and s1, …, sn and returns the total number of terbytes processed by an optimal solution. Maximum points will be given for the most efficient (correct) algorithms.
• Define the subproblem that you will use to solve this problem. (5 points)
• State the recurrence that expresses the value of an optimal solution of the subproblem in terms of smaller subproblems. Solve the base case(s). (5 points)
• Write pseudocode for your algorithm. (4 points)
• Explain how your algorithm works and why it is correct. (2 points)
• Analyze the running time of your algorithm, including number of subproblems and time spent per subproblem. (1 point)