$\DeclareMathOperator{\adj}{adj}$ $\def\TOWER{TOWER}$ $\def\fff{\mathbb F}$ $\def\rrr{\mathbb R}$ $\def\nnn{\mathbb N}$

CMSC 37000: Algorithms -- Winter 2014

Make sure to REFRESH each time you visit this page.

What's new?

Final exam posted (3-25). Typo in problem 5b corrected.

Third quiz posted (3-25).

Quiz-3 and cumulative quiz statistics posted.

Homework set #13 posted. Due Tuesday, 3-11. (First batch of problems posted 3-8 4:30pm. Second batch posted 3-8 5:40pm.) Updated 3-13 7pm.

Quiz-2 statistics posted (3-7 1:30pm)

Homework statistics posted (3-3 8:20pm).

Second quiz posted (3-3 8:20pm). Typos fixed.

First quiz posted (2-25 10:45am). Solve the problems on your own time before the next quiz.

Homework set #12 posted (2-25 1pm). Written homework due Tuesday, 3-04. Some of the "DO" exercises do Thu Feb 27.

Homework set #11 posted (2-24 9pm) (stated in class). Due Tuesday, 2-25. It is ok if you submit it after class but the same day. (Slide it under my door.)

Homework set #10: First batch of (problems 10.1-10.8) posted 2-16, 6:30pm. Second batch (problems 10.9-10.12) added 2-18, 3:45am. Due Thursday, February 20. (Originally the problem set was labeled "ninth." Relabeled as "tenth" 2-18 11:30am, after belatedly posting the actual problem set #9 that was due 2-18.)

Text: Kleinberg - Tardos: Algorithm Design. (Coop Seminary Bookstore)

Recommended reading: Cormen - Leiserson - Rivest - Stein: Introduction to Algorithms.

Grading, tests. There will be three short (10-minute) quizzes and a final exam. The composition of the course grade: quizzes: 12% (4% each) final exam 45% homework 35% class participation (including tutorials) 8%

The tests are closed-book (no notes allowed). Schedule of tests:

Quiz 1 Thu Feb 20.
The quiz will be selected from all material covered so far except for computability (the material discussed in class on Tue, Feb 18). Please pay special attention to the "DO" exercises. Accurate and elegant pseudocodes will be required in the test. When writing pseudocode, please explain the meaning of your variables.

Quiz 2 Thu Feb 27

Quiz 3 Tue Mar 11

Last class Thu Mar 13 (attendance mandatory)

Last tutorial Thu Mar 13

Final exam Tue Mar 18 8am - 10am

Quiz-123 statistics (25 students) (cumulative score of three quizzes, posted 3-13):
       120 points: Max
       91 points: Best
       69 points: Upper quartile (75%)
       37 points: Median
       28 points: Lower quartile (25%)

Quiz-3 statistics (25 students) (quiz date 3-11, stat posted 3-13):
       40 points: Max
       34 points: Best
       20 points: Upper quartile (75%)
       10 points: Median
       05 points: Lower quartile (25%)

Quiz-2 statistics (25 students) (quiz date 2-27, stat posted 3-7):
       40 points: Max
       40 points: Best
       30 points: Upper quartile (75%)
       21 points: Median
       12 points: Lower quartile (25%)

HW statistics after 11th homework set (25 students) (posted 3-3):
       194 points: Max
       186 points: Best
       179 points: Upper quartile (75%)
       168 points: Median
       157 points: Lower quartile (25%)

Quiz-1 statistics (25 students) (quiz date 2-20, stat posted 2-27):
       40 points: Max
       28 points: Best
       19 points: Upper quartile (75%)
       13 points: Median
       05 points: Lower quartile (25%)

Thirteenth homework set. Due Tuesday, March 11 except for problem 13.8 which is due Thursday, March 13. Problem 13-3 updated 3-13 7pm.

13.0 DO: Review the proof that the language QBF (quantified Boolean formulas) is in PSPACE.

13.1 DO: Karp-reduce SAT to 3-SAT. (SAT is the set of satisfiable Boolean circuits; 3SAT is the set of satisfiable 3CNF formulas.)

13.2 CHALLENGE. Prove that QBF is PSPACE-complete (under Karp reductions).

13.3 HW (MAX-3-SAT approximation): Recall the MAX-3-SAT problem:
Input: a list of $m$ disjunctive 3-clauses over a given set of $n$ Boolean variables. (A disjunctive clause is an OR of literals. A disjunctive 3-clause is an OR of 3 literals corresponding to three distinct variables, e.g., $x_2\vee {\overline x_5} \vee x_8$. The disjunctions $x_2\vee {\overline x_2} \vee x_8$ and $x_2\vee {\overline x_5} \vee x_2$ are not 3-clauses.)
Question: Find the maximum number $k$ such that $k$ of the clauses are simultaneously satisfiable.

(a) Prove: $k\ge 7m/8$. (Hint: Use the probabilistic method.) (8 points)

(b) DO: Turn your proof into a probabilistic algorithm that finds an assignment that satisfies $7m/8$ of the clauses. The algorithm should work in polynomial expected time on all inputs. Warning: if a positive random variable has large expected value, this does not mean that its value is typically large. Hint (added 3-13 7pm): prove that the probability that a random assignment satisfies at least $7m/8$ clauses is at least $1/(m+1)$.

(c) Derandomize the algorithm in (b) (turn it into a deterministic algorithm). Here are hints for two distinct solutions. Updated 3-13 7pm.

(c1) CHALLENGE: Hint 1: use small sample-spaces for 3-wise independent random variables.

(c2) DO: Hint 2: Successively evaluate the variable to zero or one so that after each evaluation, the expected number of satisfied clauses remains at least $7m/8$.

13.4 HW (wrong NP-definitions): Recall the definition of NP (adding a needed existential quantifier on the alphabet of $L_1$):
(Correct definition) The language $L\subseteq\Sigma^*$ belongs to NP if
$(\exists$ finite alphabet $\Sigma_1) (\exists L_1\in P, L_1\subseteq\Sigma_1^*)(\exists c\in \nnn)(\forall x\in\Sigma^*) (x\in L \Leftrightarrow (\exists w\in\Sigma_1^*) (|w|\le |x|^c$ AND $(x,w)\in L_1))$.

Kyle, Larry, and Mike gave incorrect definitions of the class NP. Below we denote Kyle's class "KNP", Larry's class "LNP", and Mike's class "MNP". Here are their definitions.
(Kyle) The language $L\subseteq\Sigma^*$ belongs to KNP if
$(\exists$ finite alphabet $\Sigma_1) (\exists L_1\in P, L_1\subseteq\Sigma_1^*)(\exists c\in \nnn)(\forall x\in\Sigma^*) (x\in L \Rightarrow (\exists w\in\Sigma_1^*) (|w|\le |x|^c$ AND $(x,w)\in L_1))$.
(Larry) The language $L\subseteq\Sigma^*$ belongs to LNP if
$(\exists$ finite alphabet $\Sigma_1) (\exists L_1\in P, L_1\subseteq\Sigma_1^*)(\exists c\in \nnn)(\forall x\in\Sigma^*)(\exists w\in\Sigma_1^*) (x\in L \Leftrightarrow (|w|\le |x|^c$ AND $(x,w)\in L_1))$.
(Mike) The language $L\subseteq\Sigma^*$ belongs to MNP if
$(\exists$ finite alphabet $\Sigma_1) (\exists L_1\in P, L_1\subseteq\Sigma_1^*)(\exists c\in \nnn)(\forall x\in L) (\exists w\in\Sigma_1^*) (|w|\le |x|^c$ AND $(x,w)\in L_1)$.

(a) HW: Determine the class KNP. Give a very simple answer. (6 points)

(b,c) DO: Determine the classes LNP and MNP. Give very simple answers.

13.5 CHALLENGE (Savitch's Theorem): Prove: if $s(n)\ge \log n$ is a space-constructible function then NSPACE($s(n)$)$\subseteq$ SPACE($s(n)^2$). Corollary: NPSPACE = PSPACE. (NSPACE refers to nondeterministic space: the finite control is nondeterministic).

13.6 CHALLENGE (Immerman -- Szelepcsényi Theorem): Prove: if $s(n)\ge \log n$ is a space-constructible function then NSPACE($s(n)$) = coNSPACE($s(n)$).

13.7 DO: State the decision version of the "maximum independent set" problem in an undirected graph. Prove that this problem is NP-complete, via reduction from a probem shown NP-complete in class. (This is a very easy problem, the reduction is described by a single word.)

13.8 HW (due Thu, 3-13): The 3-CLIQUE problem asks whether or not a given undirected graph has a triangle. Decide this question in $O(n^{2.81})$ where $n$ is the number of vertices. (Hint: $2.81 > \log_2 7$.) (10 points)

13.9 DO: The MAX-ACYCLIC-SUBGRAPH problem takes a directed graph $G = (V,E)$ as input and asks to find a largest acyclic subset $F \subseteq E$ (i.e., $(V,F)$ must be a DAG (directed acyclic graph)). This problem is NP-hard. Find a factor-2 approximation, i.e., an acyclic subset $F' \subseteq E$ that is at least half as large as the largest acyclic subset. Find $F'$ in polynomial time. (The solution is very simple.)

13.10 DEFINITION: A linear program is a system of linear inequalities written as $Ax \le b$ where $A$ is a matrix, $x$ is an unknown column vector, and $b$ is a given column vector. An inequality between vectors is meant coordinatewise: $(a_1,\dots,a_n)\le (b_1,\dots,b_n)$ means $(\forall j)(a_j \le b_j)$. A linear program is feasible if it has a solution (in real numbers). Let LP denote the set of feasible linear programs with integer coefficients. Let BLP ("bounded LP") denote those linear programs in LP which include the inequalities $0\le x_i\le 1$ for every variable $x_i$. (These can be written as $-x_i\le 0$ and $x_i\le 1$ so we conform with the format $Ax\le b$.)

13.11 DO: Prove: if a bounded linear program in $n$ unknowns is feasible (i.e., it belongs to BLP) then there is a solution that satisifies $n$ linearly independent constraints with equality.

13.12 DO: Prove: BLP $\in$ NP. Do not use the known fact that LP $\in$ P (Khachiyan, 1979). What is the witness? Can you show that your witness is short? What does "short" mean? Use 13.11.

13.13 CHALLENGE. Prove: LP $\in$ NP. (Again, do not use the fact that LP $\in$ P.)

13.14 DEFINITION: Let ILP denote the set of those linear programs with integer coefficients which have a solution in integers ("integer linear programs").

13.15 CHALLENGE. ILP $\in$ NP. (Difficult!) DO: Why is this problem difficult? What is a witness? What do we need to show about the witness?

13.16 HW: (a) Let (0,1)-LP denote the set those linear programs with integer coefficients which have a solution in zeros and ones (every variable takes a value in {0,1}). Prove: (0,1)-LP $\in$ NP. (b) Assuming that 3SAT $\in$ NPC, prove that (0,1)-LP $\in$ NPC. (Give a Karp reduction. NPC denotes the class of NP-complete languages.) (3+10 points)

Twelfth homework set. Written homework due Tuesday, March 4; some of the DO exercises due Thursday, Feb 27 (indicated).

12.1 HW:Recall the FIND-3-COL problem: Input: a graph $G$; output: "NO" if $G$ is not 3-colorable, and a 3-coloring if $G$ is 3-colorable. Show that this problem is Cook-reducible to the decision problem 3-COL (given a graph, is it 3-colorable?). Recall that a Cook-reduction would use 3-COL as an oracle (black box, subroutine) and work in polynomial time (which may include a polynomial number of calls to the oracle). Give your solution in pseudocode. State the number of calls made to the oracle. (8 points)

12.2 HW: Jimmy incorrectly copied the definition of NP: he omitted the condition that the length of the witness is polynomially bounded in the length of the input. Let us call the complexity class defined by Jimmy's definition JNP. Determine JNP. (It is a class of languages that has been discussed in class.) Added 3-3 7:30pm Naturally, your solution should begin with the statement of Jimmy's definition. (6 points)

12.3 DO (due Thu, Feb 27): Count the Boolean functions in $n$ Boolean variables.

12.4 DO (due Thu Feb 27): A Boolean circuit is a DAG of which the sinks (nodes of out-degree zero) are labeled by literals and all other nodes are labeled "AND" or "OR." Given an input assignment to the variables, a value at each node is computed in the natural way peforming the Boolean operations on each node once its out-neighbors have been evaluated. The depth of the circuit is the length of the longest path from the output node to an input node. The size of the circuit is the number of wires (edges). Prove: CNF formulas can be represented as Boolean circuits of depth 2.

12.5 DO: Prove: PARITY$_n$ can be computed by a Boolean circuit of depth 3 and size $O(n 2^{\sqrt{n}})$. What can be said about depth 4?

12.6 DO (due Thu Feb 27): Prove: PARITY$_n$ can be computed by a Boolean circuit of depth $O(\log n)$ and polynomial size.

12.7 DO: Prove: Addition of $n$-bit integers can be done by Boolean circuits of depth 3 and polynomial size.

12.8 DO: A theorem of Ajtai and Furst - Saxe - Sipser (1982) says that PARITY$_n$ cannot be computed by circuits of bounded depth and polynomial size. Use this theorem to prove that multiplication of two $n$-digit integers cannot be done in bounded depth and polynomial size.

12.9 CHALLENGE (Claude Shannon): Prove: almost all Boolean functions in $n$ Boolean variables require Boolean circuits of size $\Omega(2^n/n)$.

12.10 DO (due Thu Feb 27): Complete the proof of the Karp reduction of 3-SAT to CLIQUE.

12.11 CHALLENGE (V. Pratt, 1970s): Prove that the factoring language "FACT" discussed in class belongs to NP $\cap$ coNP, without using the AKS theorem ("Primes $\in$ P"). Recall that what you need is to produce a witness of primality. If your prime has $n$ digits, estimate the lengths of your witness. Hint: Use the theorem that for every prime there is a primitive root (i.e., the multiplicative group of reduced mod p residue classes is cyclic).

Eleventh homework set. Due Tuesday, Feb 25.

11.1 DO: "Boolean variables" take the value 0 or 1. Recall that a Boolean function in $n$ (Boolean) variables is a function $f : \{0,1\}^n \to \{0,1\}$. A CNF ("conjunctive normal form") formula is a formula of the form $C_1\wedge\dots\wedge C_m$ where each $C_i$ is a (disjunctive) clause. A (disjunctive) clause has the form form $C = z_{1}\vee\dots\vee z_{t}$ where each $z_i$ is a literal. A literal is a variable or a negated variable. So for instance, $(x_1\vee \overline{x_4}\vee x_5)\wedge (\overline{x_1}\vee \overline{x_2}\vee x_3\vee x_4)\wedge \overline{x_4}$ is a CNF formula. Prove that every Boolean function can be represented by a CNF formula.

11.2 HW: The PARITY$_n$ function of $n$ Boolean variables is defined as PARITY$_n(x_1,\dots,x_n) = \sum_{i=1}^n x_i \bmod 2$. So for instance PARITY$_3(1,0,1) = 0$ and PARITY$_4(1,0,1,1) = 1$. PARITY$_n$ is a Boolean function, so it can be expressed by a CNF formula. Prove: if PARITY$_n$ is expressed as a CNF formula with $m$ clauses then $m \ge 2^{n-1}$. (8 points)

11.3 DO: Prove: P $\subseteq$ NP.

11.4 DO: Prove: if P = NP then NP = coNP.

11.5 DO: For a language $L\subseteq \Sigma^*$ over the finite alphabet $\Sigma$, let $f_L : \Sigma^*\to\{0,1\}$ denote the characteristic function of $L$ defined on the string $x\in\Sigma^*$ by $f_L(x)=1$ if $x\in L$ and $f_L(x)=0$ otherwise. Let $\cal R$ denote the class of recursive languages, i.e., the class of languages $L$ for which $f_L$ is a computable function (a.k.a. a recursive function, hance the letter $\cal R$), i.e., for which membership is decidable by an algorithm (C program, or equivalently, a Turing machine) that halts on every input. Prove: NP $\subseteq$ $\cal R$.

11.6 DO: Let $\cal{RE}$ denote the class of computably enumerable (a.k.a. recursively enumerable) languages. By definition, $L\in {\cal{RE}}$ if there exists a C program (or a Turing machine) that halts on input $x\in\Sigma^*$ if and only if $x\in L$. (a) Prove: ${\cal R} \neq {\cal RE}$. (b) Prove: ${\cal R} = {\cal RE} \cap co{\cal {RE}}$.

Tenth homework set (previously erroneously labeled "ninth"). Due Thursday, Feb 20.

10.1 DO: Study the handout on Euclid's algorithm and Repeated Squaring.

10.2 HW: Let $\ell_b(x)$ denote the number of digits to base $b$ of the positive integer $x$. For $r,s \ge 2$ determine $$\lim_{x\to\infty} \frac{\ell_r(x)}{\ell_s(x)}.$$ Prove your answer. The proof should include an explicit expression of $\ell_b(x)$ in terms of $b$ and $x$. (4 points)

10.3 DO: Given an $n$-digit (in binary) positive integer $x$, determine $\lfloor\sqrt{x}\rfloor$ in polynomial time. Write your algorithm in pseudocode. Minimize the number of comparisons made by the algorithm. Show that the number of comparisons is $\sim cn$ for some constant $c$; determine $c$. Show that the overall complexity of the algorithm (number of bit operations) is $O(n^C)$; find the smallest value of $C$ assuming multiplication of two $n$-digit integers is performed by the schoolbook algorithm and thus takes $\Theta(n^2)$.

10.4 DO (RSA): Let $p < q$ be $n$-digit prime numbers. Let $N=pq$ and $M=(p-1)(q-1)$. Given $N$ and $M$, compute $p$ and $q$ in polynomial time. Evaluate the exponent in your time bound assuming schoolbook arithmetic (multiplication and division).

10.5 DO: review basic number theory, especially greatest common divisor, multiplicative inverses, Fermat's little Theorem, and the Chinese Remainder Theorem.

10.6 HW (RSA): Alice's public key is $(N,e)$ where $N=pq$ is the product of two distinct $n$-digit primes she keeps private. Her private key $f$ is an $O(n)$-digit integer that satisfies the congruence $ef\equiv 1 \pmod M$ where $M=(p-1)(q-1)$. Alice notices that her private key $f$ has been compromised (but not the primes $p,q$). So she decides to keep $N$ but replace $e$ and $f$ by some other pair of values, $e'$ and $f'$. (Of course $e'f'\equiv 1 \pmod M$). She replaces the pair $(N,e)$ by $(N,e')$ as her public key. Prove that this is not a good idea. Show that if Eve knows $N, e, e'$, and $f$ then Eve can decode in polynomial time any message Alice receives that was encrypted using the public key $(N,e')$. (Note that we do not claim that Eve can determine $f'$.) (Warning: $f$ does not need to fall in the range $1\le f\le M-1$. All we know about $f$ is that it does not have too many digits and it is a multiplicative inverse of $e$ modulo $M$.) (9 points)

10.7 HW: (a) Evaluate the order of magnitude of the number of binary digits of $F_k$, the $k$-th Fibonacci number, in terms of the number $n=\ell_2(k)$ of binary digits of $k$. Your answer should be of the form $\ell_2(F_k)= \Theta(n^b c^n)$ where $b,c$ are constants. Determine $b,c$.
(b) Prove that $F_k$ cannot be computed in polynomial time (given $k$ in binary as input). (5+2 points)

10.8 DO: Given the positive integers $k$ and $m$ in binary, compute $F_k \bmod m$ in polynomial time. Evaluate the exponent in "polynomial time" as a function of the bit-lengths of $k$ and $m$, assuming schoolbook multiplication. Hint: consider the powers of the matrix $$ A = \left(\matrix{1 & 1 \\ 1 & 0}\right) .$$

10.9 DO: Let $G=(V,E,w,s)$ be a weighted directed graph with a source vertex $s$. Assume $k$ of the edges have negative weight but there are no negative cycles. Find the cost of the minimum-cost paths from $s$ to each vertex in time $O((k+1)!T)$ where $T$ is the maximum cost of executing Dijkstra on the nonnegative part of $G$ with arbitrary starting points, i.e., the cost of Dijkstra on $(V,E^*,w^*,s')$, where $E^*$ is the set of edges with non-negative weight and $w^*$ is the restriction of $w$ to $E^*$; we maximize over all choices of $s'\in V$.

10.10 HW (min-cost paths in a DAG): (a) Let $G=(V,E,w,s)$ be a weighted, rooted DAG with root $s\in V$, where $w : E \to \rrr$ is the weight function. (Negative weights are permitted.) For each vertex $t\in V$, find the minimum cost of reaching $t$ from $s$. Describe your algorithm in elegant pseudocode; use the white/grey/black convention. Assume $V=\{1,\dots,n\}$ and $G$ is topologically sorted (if $i\to j$ is an edge then $i< j$). (This assumption is justified by Exercise 8.3.) Your algorithm should run in linear time. Prove that your algorithm is correct. Clearly state your loop invariant.
(b) Use part (a) to solve the critical path problem: given a DAG with non-negative edge weights, find the maximum cost of reaching $t$ from $s$. (8+1 points)

10.11 DO (Bellman-Ford algorithm): Let $G=(V,E,w,s)$ be a digraph with weighted edges and root $s\in V$. Negative weights are permitted but no negative cycles. For each vertex $t\in V$, find the minimum cost of reaching $t$ from $s$. Describe your algorithm in pseudocode. The algorithm should run in $O(mn)$ where $m$ is the number of edges and $n$ the number of vertices. (Hint: dynamic programming, using recursion on the additional parameter $\ell$: find the minimum cost of reaching $t$ from $s$ in at most $\ell$ steps [using at most $\ell$ edges].) This was one of the first dynamic programming algorithms, from the 1950s.

10.12 DO: We are given an $r\times n$ array $L$ of zeros and ones. Let $S$ denote the set of rows of $L$. We want to set up a data structure that will answer membership queries in $S$ at the cost of $O(n^2)$ bit-operations, regardless of the value of $r$. The input to a membership query is an $n$-bit string $x$; the question is, does there exist $i$ such that $x$ is the $i$-th row of $L$. We are allowed to spend $O(rn)$ time and we can use $O(rn)$ space to build the data structure. (So, for instance, we cannot use an $r\times r$ array.) The algorithm must be deterministic. Describe in English how to build the data structure and how to answer the membership queries. Indicate why each task is solved within the required time bound.

Ninth homework set. Due Tuesday, Feb 18. (Announced in class 2-13. Posted 2-18, after class. If you forgot to submit this, please do later today (Tuesday, 2-18))

9.1 HW (density of Fermat witnesses):An integer $x$ is composite if $x\ge 2$ and $x$ is not a prime. Let $x\ge 2$ be an integer. A Fermat witness (of the compositeness of $x$) is an integer $w$ such that $1\le w\le x-1$, $\gcd(x,w)=1$, and $w^{x-1}\not\equiv 1 \pmod x$. By Fermat's little theorem, if a Fermat witness exists then $x$ is composite. Composite numbers that do not have a Fermat witness are called Carmichael numbers. They exist, but they are rare. Assume $x$ is composite and not a Carmichael number (i.e., a Fermat witness of the compositeness of $x$ exists). Prove: for a random integer $y$ from the interval $1\le y\le x-1$, the probability that either $y$ is not relatively prime to $x$ or $y$ is a Fermat witness of the compositeness of $x$ is at least 1/2. (8 points)

9.2 DO: Prove that an integer of the form $pq$ ($p,q$ primes) is never a Carmichael number.

9.3 DO: The smallest Carmichael number has three digits and is of the form $3pq$. Find this Carmichael number. (You don't need to prove that it is the smallest.)

Eighth homework set. Due Tuesday, Feb 11. (Announced in class 2-6. Posted 2-10, 10pm)

8.1 DO: Study Depth-First Search (DFS)

8.2 HW (strong connectedness): Recall that a digraph $G=(V,E)$ is strongly connected if $(\forall x,y\in V)(\exists x\to\dots\to y$ directed path$)$. Given a digraph $G$ (by an array of adjacency lists), decide in linear time whether or not $G$ is strongly connected. Do NOT use DFS. The essence of your answer should be two lines. (6 points)

8.3 DO: Given a digraph, find a directed cycle if there is one; otherwise, topologically sort the vertices. Do this in linear time. Describe your solution in pseudocode. Do NOT use DFS.

8.4 DO: Suppose $(\forall n\ge 1)(T(n)\ge 0)$ and $T(n)\le T(\lfloor n/2\rfloor) + O(n)$. (a) Prove: $T(n)=O(n\log n)$. Do not assume that $n$ is a power of 2. (b) Prove the same conclusion under the weaker assumption that $T(n)\le T(\lceil n/2\rceil) + O(n)$.

8.5 DO (nearest pair): In class, we studied Shamos's divide-and-conquer algorithm to find a nearest pair among a given set of points in the plane. Prove that if $p_1,\dots,p_t$ are the points in the middle strip of width $2d_0$, listed in increasing order of their $y$-coordinates, then looking ahead by 7 steps suffices, i.e., if $|i-j|\ge 8$ then the distance between $p_i$ and $p_j$ is at least $d_0$.

8.6 CHALLENGE: Can the nearest-pair problem be solved in nearly linear time in 3D?

8.7 DO (universal family of hash functions): Prove: given $z\in\fff_p^n$, $z\neq 0$, the probability that for a random $v\in\fff_p^n$ we have $z\cdot v=0$ is $1/p$. Here $\fff_p$ denotes the (residue classes of) integers modulo the prime number $p$ (the "field of order $p$") so we can take $\fff_p=\{0,1,\dots,p-1\}$ under mod$~p$ arithmetic. The dot product of $z=(z_1,\dots,z_n)$ and $v=(v_1,\dots,v_n)$ is $z\cdot v = \sum_{i=1}^n z_i v_i$ (again, in mod$~p$ arithmetic).

HW statistics after 7th homework set (26 students) (posted 2-10):
       143 points: Max
       140 points: Best
       131 points: Upper quartile (75%)
       127 points: Median
       118 points: Lower quartile (25%)

Seventh homework set. Due Thursday, Feb 6 (before class) (Posted 2-4 at 11am)

7.1 DO (radix sort): Sort $n$ words of length $k$ alphabetically in time $O(nk)$. The alphabet is the integers $1,\dots, m$ where $m \le nk$. Write your algorithm in elegant pseudocode.

7.2 DO (radix sort with variable length words): Let $w_1,\dots,w_n$ be a list of $n$ words over the alphabet $\{1,\dots,m\}$. Let $|w|$ denote the length of the words $w$ (number of letters in $w$). Sort this list in linear time, i.e., in time $O(N)$ where $N=\sum_{i=1}^n |w_i|$ is the length of the input. Assume $m\le N$.

7.3 DO: Prove: for any real numbers $x,y$ we have $$|(\lceil x \rceil + \lfloor y\rfloor)- (\lfloor x\rfloor + \lceil y \rceil)| \le 1 .$$ (This was a lemma to the analysis of Batcher's sorting network.)

7.4 HW (Batcher sort): Draw a diagram of Batcher's sorting network that sorts $n=8$ data. Your drawing should be elegant, well organized. Clearly mark the various SORT subnetworks built recursively (put a box around them). (6 points) Note: in class I only asked to draw a MERGE network. Please draw the entire SORT network.

Sixth homework set. Due Tuesday, Feb 4 (before class) (Posted 1-31 at 6:45pm, Problem 6.2 revised 2-1 at 3:30pm)

6.1 DO: Write a short and elegant pseudocode for the MERGE algorithm discussed in class.

6.2 HW ($k$-way merge): (Revised) Merge $k$ sorted lists of data (real numbers) in $O(n\log k)$ time using a heap. Here $n$ is the total number of items on the $k$ lists combined. Describe your algorithm in pseudocode. You do not need to describe how the heap executes the various heap operations (EXTRACT-MIN, etc.). (6 points)

6.3 HW (UNION-FIND) This problem is part of the analysis of the hierarchical implementation of the UNION-FIND data structure discussed in class. Prove that under the "deepest wins" rule, the rooted trees constructed in each country will have depth $\le \log_2 n$. (8 points)

6.4 HW (slow growth) Recall from class that $\log^*n = \min\{k \mid \TOWER_2(k)\ge n\}$ where $\TOWER_a(0) = 1$ and $\TOWER_a(k+1) = a^{\TOWER_a(k)}$. Prove that for all sufficiently large $n$ we have $\log^*n \le \log\log n$. State how large is "sufficiently large." Hint. First observe that this problem is equivalent to saying that for all sufficiently large $k$ we have $\TOWER_2(k) \ge 2^{2^k}$. First prove that for $k\ge 2$ we have $\TOWER_2(k) \ge k+2$ and then bootstrap from here. ("Bootstrapping" means using a relatively weak result to prove a much stronger result.) (6 points)

6.5 HW (amortized analysis): Read the notes on amortized analysis. Solve Problem 2 ("Queue via stacks") stated in the notes. (6 points)

6.6 HW (edit-distance): Solve the problem stated here. (6 points)

6.7 DO (all-ones square): Solve the problem stated here.

Fifth homework set. Due Thursday, Jan 30 (before class)

5.1 HW (necessity of non-negative weights in Dijkstra's algorithm) (posted 1-29 6:20pm, stated in class 1-23) Construct an edge-weighted DAG (directed acyclic graph) with exactly one negatively weigthed edge, on which Dijkstra's algorithm fails. Use as few edges as possible. (You don't need to prove the minimality of the number of edges used.) DRAW your DAG and describe how the configuration changes step-by-step. (6 points)

5.2 CHALLENGE. Find a bipartite graph such that under a random ordering of the vertices, the greedy coloring algorithm is likely to use a large number of colors.

5.3 DO: Let $G$ be a graph with $n$ vertices; let $\alpha(G)$ denote the idependence number of $G$ (maximum number of independent vertices). (a) Prove: $\alpha(G) \ge n/(d_{\max}+1)$ where $d_{\max}$ is the maximum degree. (b) Find an independent set of this size in linear time.

5.4 CHALLENGE: (a) Prove: $\alpha(G) \ge \sum_{i=1}^n 1/(d_i+1)$ where $d_i$ is the degree of the $i$-th vertex. (b) Find an independent set of this size (with large probability) by an efficient randomized algorithm.

Erratum (1-28-2014). As pointed out by Nathan, in class I misstated the objective function optimized by the Viterbi algorithm. The correct objective function is the conditional probability $P(X \mid Y)$ where $Y$ is the given sequence of $t$ observations and $X$ is any sequence of $t$ (hidden) states. For any fixed $Y$ the task is to maximize $P(X\mid Y)$ over all the $S^t$ sequences $X$ of $t$ states. ($S$ is the number of states.)

Fourth homework set. Due Tuesday, Jan 28 (before class)

Handout Dynamic programming: the knapsack problem posted 1-21 12:30pm. DO: Study the handout.

4.1 DO: The algorithm given in the handout describes how to find the maximum value $m$ under the given weight constraint. Modify the algorithm to also find the optimal set $S$ of items.

4.2 HW: Consider the following variant of the Knapsack problem. Assume the values are positive integers (but the weights are positive reals). Given a positive integer $V$, find the minimum weight of a set of items of total value at least $V$. Find this minimum weight in $O(nV)$ steps where $n$ is the number of items. Describe your solution in simple and elegant pseudocode. (Posted 1-21 12:40pm) (8 points)

4.3 HW: The Interval Sum problem. (Posted 1-21 12:40pm) (8 points)

4.4: The Car Race problem. DO: part (a). HW: parts (b) and (c). (Posted 1-21 1pm) (5+5 points) (A correct solution to (c) earns you the full 10 points (b+c).)

Third homework set. Due Tuesday, Jan 21 (before class)

3.1 HW The Greedy Matching algorithm. Read the sheet, solve the Problem stated at the end. (Posted 1-16 4:40pm) (2+5+5 points)

3.2 HW The Greedy Coloring algorithm. Read the sheet, solve the Problem stated at the end. (Posted 1-16 5:30pm) (2+5+4+5 points)

3.3 HW Sorting a heap. (Posted 1-20 5pm. Assigned in class with detailed explanation on 1-16.) (8 points)

3.4 (Basic manipulation of the representation of a digraph) Our input is a directed graph (digraph) given in adjacency-list representation (i.e., by an array of adjacency lists) as explained in class. The set of vertices is an array $A[1\dots n]$. Each vertex $i$ in the array is the starting point of the linked list $\adj[i]$ of the out-neighbors of $i$. We always put a link from the end of of the list to its start and vice versa. Each register is capable of strong the name of a vertex (a $\log_2 n)$-bit integer). One step of the algorithm can read a register, follow a link, copy the contents of a register. The number of vertices is $n$, the number of edges is $m$, so the total length (number of registers and links) of the description of the digraph is $O(n+m)$. An algorithm of which the digraph is the input runs in linear time if it takes $O(n+m)$ steps.

3.4(a) DO: Convert this representation to edge-list representation in linear time. ("Edge-list representation" is the list of vertices followed by the list of edges.)

3.4(b) DO: Convert the edge-list representation to adjacency-list representation in linear time.

3.4(c) HW. The "reverse" of a digraph $G$ is the digraph $G^{rev}$ which has the same vertices and every edge $(x,y)$ is replaced by its reverse, $(y,x)$. It is trivial to construct $G^{rev}$ from $G$ is the edge-list representation (in linear time). Build an adjacency-list representation of $G^{rev}$ directly from an adjacency-list representation of $G$, making a single pass through the adjacency-list representation of $G$. Write your solution is simple and elegant pseudocode. Your cycle organization should be a nested pair of for loops:
for $i=1$ to $n$
for $j\in \adj[i]$ (3 points)
Note: in class, this problem was listed as a "DO" exercise. It has been upgraded to "HW" (to hand in) because the details of your solution are essential to the solution of Problem 3.4(d).

3.4(d) HW. We say that an adjacency-list representation is monotone if for each vertex $i$ the list of outneighbors of $i$ is given in increasing order. Given an adjacency-list representation of $G$, create the monotone adjacency-list representation in linear time. The description of your algorithm should be no more than three lines with reference to 3.4(c). (3 points)

(Posted 1-20, 6pm. Assigned in class. This completes the problem set due Tuesday, Jan 21.)

Handout Binary Search to find item in sorted array posted Jan 15. DO: Study it, compare it with your solution.

Handout on The Method of Reverse Inequalities posted Jan 15. DO: Study it, answer the Questions, solve the Exercises stated in the handout (by Jan 16).

Second homework set posted Jan 9, due Jan 16. Clarification to problem 2.5 added on Jan 15 at 8:30pm.

First homework set posted Jan 9, due Jan 14