## Saturday, October 2, 2010

Continuing with my Facebook Engineering Puzzles series, here's the writeup for Gattaca (gattaca) (note: Facebook took down the original puzzle so I'm linking to another site with the description). The original problem statement is interesting in that it has a computational biology twist, but it's really a weighted interval scheduling problem in disguise. This one took me a while to recognize, even though I was barking up the right tree by looking for an optimal substructure that would lend to a dynamic programming solution. Ultimately, the key to the problem is that the order of the gene predictions is important, and is part of the solution.

For this writeup, I reformulated the problem in a more "classic" interval scheduling manner. I think treating the DNA strand as a timeline and the gene predictions as time intervals is a good way to gain an intuitive understanding of the problem and the solution.

Party Prioritization

Suppose you are the owner of the most popular dance hall in town. Everyone wants to hold their company party, school prom, or wedding at your venue, so you often receive more booking requests than you can handle. Being a tech-savvy business person, you decide to write a program to figure out the maximum profit you can make if you schedule events for your dance hall optimally.

Each event has a start time and end time, along with a booking fee. A first stab at the problem might involve using a greedy algorithm. That is, repeatedly  book the event with the highest booking fee that hasn't been booked yet, and reject any other events that have a time conflict with the booked event. However, it's easy to see that this approach quickly breaks down. A counter example would be three events: (A) one that takes the whole day and would net us a profit of \$1000, (B) one that occupies the morning with a fee of \$750, and (C) one that occupies the evening with a fee of \$800. The greedy algorithm would automatically choose to book event C since it has the highest fee; but it's obvious that booking events B and C would make more money for the business.

Determining Conflicts

We can develop a more sensible approach by recognizing some key properties of the problem. For a given event, we can always consider two solutions: one in which the event is booked, and one in which the event isn't booked. When the event is booked, we can subsequently reject all other conflicting events; otherwise, we only reject the given event and all other events are still fair game. The solution with the higher profit would then be the optimal solution.

With that in mind, we first need to figure out how to find all the events that conflict with a given event. Suppose events are represented by tuples of the form (start, finish, fee), where start and finish are integers representing the (inclusive) start and finish times of the event, and fee is the profit to be made. If we sort the events by their finish times, we can figure out the conflicts for each event: for a given event i, we can start examining all the events that finish earlier than i, in order; any event j that has a finish time earlier than the start time of i does not conflict with i. Once we find a event j that meets the criteria, we know that all events with earlier finish times (that is, with index less than j) do not conflict with event i either.

Thus, we can define an array compatible such that compatible[i] is the largest index in the sort order that does not conflict with the ith event. Due to the sorted nature of the events, any event with an index smaller than compatible[i] will also not conflict with the ith event.

```def compatible(events)
# Events conflicting with all earlier events have -1 values
compatible = Array.new(events.size, -1)

# Assume that events are sorted by finish time
for i in 0..events.size-1
i_start = events[i][0]

# Examine events before i, in reverse sort order
(i-1).downto(0) do |j|
j_finish = events[j][1]
if j_finish < i_start
compatible[i] = j
break
end
end
end
return compatible
end
```

To Include Or Not To Include

As determined earlier, when we examine event i we only have to consider a solution including i and a solution excluding i. For a solution including i, we need to exclude all conflicting events; equivalently, we only need to consider events up to compatible[i]. Likewise, for a solution excluding i, we only need to consider events up to compatible[i-1]. This allows us to reuse the results of potentially overlapping subproblems, and naturally leads us to a dynamic programming solution.

Our dynamic programming algorithm will produce an array profit, where profit[i] is the maximum profit we can make with an optimal schedule of events up to i.

```def profit(events)
# Note that profits is 1-indexed and compatible is 0-indexed
compatible = compatible(events)

# profits[k] = maximum possible profit when scheduling up to
# the kth event
profits = Array.new(events.size + 1)
profits[0] = 0

for i in 1..profits.size-1
# Profit if we include event i
include = events[i-1][2] + profits[compatible[i-1] + 1]

# Profit if we exclude event i
exclude = profits[i-1]

profits[i] = include > exclude ? include : exclude
end
return profits
end
```

Suppose you've been sent eight booking requests for the same day. The start and finish times are the hours at which the events are scheduled to start and finish. The fee is in thousands of dollars. We can now run our program to find the maximum amount of profit we can make, which would be \$11,000.

```#!/usr/bin/ruby
events = [
[10, 14, 1],
[14, 16, 3],
[0,  4,  3],
[0,  4,  4],
[11, 15, 4],
[8,  11, 2],
[17, 18, 2],
[11, 14, 4],
]
events.sort! { |a, b| a[1] <=> b[1] }

result = profit(events)
puts result[events.size]
```

If you enjoyed this post, you should follow me on Twitter @cloudkj. If you're interested in more puzzle fun, you should check out my posts about various other puzzles, such as Liar Liar, Dance Battle, User Bin Crash, It's A Small World, and Refrigerator Madness.

Food For Thought
• How can the program be modified to output the events that should be scheduled to maximize profit?
• What is the run-time complexity of the algorithm?