Interval Scheduling
The Greedy approach 贪心算法
-
The goal is to come up with a global solution. 提出一个全局的解法
-
The solution will be built up in small consecutive steps. 会构建连续小步
-
For each step, the solution will be the best possible myopically(目光短浅地), according to some criterion (通过一些准则).
-
Definition:
-
We start by selecting an interval [s(i), f(i)] for some request i.
-
We include this interval in the schedule.
-
This necessarily means that we can not include any other interval that is not compatible with [s(i), f(i)].
-
We will continue with some compatible interval [s(j), f(j)] and repeat the same process.
-
We terminate when there are no more compatible intervals to consider.
-
The Greedy Approach
-
Option 1: Choose the available interval that starts earliest.
-
Option 2: Choose the smallest available interval.
-
Option 3: Something more clever.
Interval Scheduling
-
A set of requests {1, 2, … , n}.
-
Each request has a starting time s(i) and a finishing time f(i).
-
Alternative view: Every request is an interval [s(i), f(i)].
-
Two requests i and j are compatible if their respective intervals do not overlap.
-
Goal: Output a schedule which maximises the number of compatible intervals.
Greedy Algorithm for interval scheduling
Arguing for optimality
-
Some notation:
-
O is the optimal schedule. Recall, that A is the schedule of the Greedy algorithm.
-
Let i1, i2, … , ik be the order in which the intervals were added to A by the algorithm.
-
Note that |A| = k.
-
Let j1, j2, … , jm be the set of requests in O.
-
Note that |O| = m.
-
We will prove that m=k. (Why is that enough?)
-
Let j1, j2, … , jm be the set of requests in O.
-
Assume wlog that this is in order of increasing s(jh).
-
Since O is feasible, this is also in order of increasing f(jh).
-
Claim: f(i1) ≤ f(j1)
- This holds because i1 is chosen to be the interval with the smallest f(ih).
-
Claim: f(i1) ≤ f(j1)
- As i1 is chosen to be the interval with the smallest f(ih).
-
Lemma: For all indices r ≤ k, it holds that f(ir) ≤ f(jr)
Induction step proof

Completing the proof
-
By contradiction: To the contrary, assume that m > k
-
For r=k, the Lemma gives us that f(ik) ≤ f(jk).
-
Since m > k , there is an extra request jk+1 in O.
-
s(jk+1) > f(jk) ≥ f(ik).
-
The greedy algorithm would have continued with jk+1.
Running Time
The running time is O(n log n).
Minimum Spanning Tree 最小生成树
-
Consider a connected graph G=(V, E), such that for every edge e=(v,w) of E, there is an associated positive cost ce.
-
Goal: Find a subset T of E so that the graph G’=(V, T) is connected and the total cost sum ce is minimised.
Claim: T is a tree
-
By definition, (V, T) is connected.
-
Suppose that it contained a cycle.
-
Let e be an edge on that cycle.
-
Take (V, T-{e}).
-
This is still connected.
- All paths that used e can be rerouted through the other direction.
-
(V, T-{e}) is a valid solution, and it is cheaper. Contradiction!
Greedy Approach 1 (Kruskal’s Algorithm)
Greedy Approach 2 (Prim’s Algorithm)
-
Start with an empty set of edges T.
-
Start with a node s.
-
Add an edge e=(s,w) to T.
-
We continue like this – growing a set of connected vertices.
- We only consider edges to neighbours that are not in the spanning tree.
The cut property




Kruskal’s algorithm is optimal
-
Consider any edge e=(u, w) that Kruskal’s algorithm adds to the output on some step.
-
Let S be the set of nodes reachable from u just before e is added to the output.
-
It holds that u is in S and w is in V-S.(Why?)
- Because otherwise adding e would create a cycle.
-
The algorithm has not found any edge crossing S and V-S. (Why?)
- Such an edge would have been added to the output by the algorithm.
-
The edge e must be the cheapest edge crossing S and V-S.
-
By the cut property, it belongs to every minimum spanning tree.
Prim’s algorithm is optimal
-
In each iteration of the algorithm, there is a set S of nodes which are the nodes of a partial spanning tree.
-
An edge is added to “expand” the partial spanning tree, which has the minimum cost.
-
This edge has one endpoint in S and one in V-S and has minimum cost.
-
So it must be part of every minimum spanning tree.
Greedy Approach 3 (Reverse-Delete Algorithm)
The cycle property
-
Assume that all edge costs are distinct.
-
Let C be any cycle of G.
-
Let e=(w,v) be the maximum cost edge of C.
-
Then e is not contained in any minimum spanning tree of G.
-
Let T be a spanning tree that contains e.
-
We will show that it does not have minimum cost.
-
We will substitute e with another edge e’, resulting in a cheaper spanning tree.
-
How to find this edge e’?
-
We delete e from T.
-
This partitions the nodes into
-
S (containing u).
-
V - S (containing w).
-
We follow the other path the cycle from u to w.
-
At some point we cross from S to V - S, following edge e’.
-
The resulting graph is a tree with smaller cost.
Reverse-Delete is optimal
-
Consider any edge e=(v, w) which is removed by Reverse- Delete.
-
Just before deleting, it lies on some cycle C.
-
It has the maximum cost among edges, so it cannot be part of any minimum spanning tree.
Non-distinct costs
-
Take the original instance with non-distinct costs.
-
Make the costs distinct by adding small numbers ε to the costs to break ties.
-
Obtain a perturbed instance.
-
Run the algorithm on the perturbed instance.
-
Output the minimum spanning tree T.
-
T is a minimum spanning tree on the original instance.
T in the original instance
-
Suppose that there was a cheaper spanning tree T* on the original instance.
-
If T* contains different edges to T but with the same costs, it is not cheaper than T on the original instance.
-
If T contains different edges with different costs, we can make ε small enough to make sure the ones we selected are still cheapest.
Prim’s algorithm running time
-
We add nodes to the expanding spanning tree S.
-
We need to figure out which node to add next.
-
We need to know the attachment cost of each node: a(v) = mine=(u,v):u is in S ce
-
Naive solution: For every step run over all candidates.
-
O(n2).
Priority Queue
-
Maintains
-
The key denotes the priority of v.
-
Operations:
-
The Priority Queue is an abstract data type.
-
In reality, we have to implement it with known data structures.
-
Many implementations exists, the usual one is with heaps.
-
For now:
- PQ operations can be implemented in O(log n) time.
-
PQ solution: Insert the nodes in a PQ, with minus the attachment cost as the keys.
Clustering
Definition of Clustering
-
Definition: Given a set U of n elements, a k-clustering of U is a partition of U into non-empty sets C1, …, Ck.
-
Definition: The spacing of a k-clustering is the minimum distance between any pair of points in different clusters. // 不同集合之间的最小距离
-
Goal: Among all possible k-clusterings, find one with the maximum possible spacing.
In greedy algorithm
-
Pick two objects pi and pj with the smallest distance d(pi,pj).
-
Connect them with an edge e=(pi,pj).
-
Continue like this until we obtain k clusters.
-
If the edge e under consideration connects two objects pi and pj already in the same component, skip it.
Kruskal’s algorithm
-
Pick an edge (pi, pj) with the smallest cost d(pi,pj).
-
Include the edge in the output.
-
Stop before including the last k-1 edges.
- i.e., in the end, remove the k-1 most expensive edges.
-
If the edge e under consideration introduces a cycle, then skip it.
-
Lemma: Let C1, C2, … , Ck be the k connected components formed by deleting the k-1 most expensive edges from a minimum spanning tree T. // 生成一个最小生成树需要删除 k - 1 条最 expensive 的边
-
Proof of the Lemma
-
Let C’ = {C’1, C’2, … , C’k} be any other k-clustering.
-
By other, there exists a cluster Cr of C which is not contained in any cluster C’s of C’.
-
This means that there exist points pi, pj in Cr that belong to different clusters in C’.
-
Let C’i and C’j denote these clusters respectively.