Thursday, 4 June 2015

Data Structure and Algorithm - Find the Overlapping Tasks of Job Schedules (Amazon Interview Question)

1. Problem Description
This is an Amazon interview question for software develop engineer from careercup. Here is the original thread,

"
Given set of job schedules with start and end time, write a function that returns indexes of overlapping sets. 

for ex :-
input -> [1,2][5,6][1,5][7,8][1,6]
return -> [0,1,2,4]".
2. Analysis
My solution to this problem is to find out all the overlapping intervals and then find each overlapping intervals its crossing tasks. In the above example the overlapping intervals are: [1, 6] and [7, 8]. Then find out which tasks they come cross. [1, 6] crosses the first, second, third and fifth tasks. and [7, 8] crosses the 4th task.

To find out how many overlapping intervals is very close to this Google interview question. The only difference is that this problem does not have CONTINUOUS case as the Google interview question. For example [1, 5] and [6, 9] in this case they are not overlapped but they are in Google interview question.

As I introduced the sweep line algorithm in this blog entry, Data Structure and Algorithm - Sweep Line Algorithm and I also use it solve this Google interview question. Here it works too to find out the overlapping intervals.

Problem:
Given N intervals, [X1, Y1], [X2, Y2], ..., [Xn, Yn], where
    - Xi and Yi are all integers with i's range from 1 to N, and
    - the intervals are all inclusive on both ends.

Properties of the Sorted Lower and Upper Bounds:
According to the blog entry, Data Structure and Algorithm - Sweep Line Algorithm, we group the lower bounds and the upper bounds, {X1, X2, ..., Xn} and {Y1, Y2, ..., Yn}. Now we sort both lower bounds and upper bounds in the ascending order as,
    - Lower bounds: {L1, L2, ..., Ln}
    - Upper bounds: {U1, U2, ..., Un}
Because the lower and upper bounds are sorted, the following properties hold,
    - L1 <= L2 <= ... <= Ln
    - U1 <= U2 <= ... <= Un
    - Li <= Ui, for any given i between 1 to N
    - L1 <= L2 <= ... <= Li <= Ui, for any given i betwen 1 to N

Detect the Overlap
Given the sorted lower and upper bounds,
    - Lower bounds: { L1, l2, ...,   Li, Li+1, ...., Ln}
    - Upper bounds: { U1, U2, ..., Ui, Ui+1, ..., Un]

Let's consider if Ui and Li+1 are overlapped, Based on the properties of the sorted lower and upper bounds, Ui is the largest value of {L1, U1, L2, U2, ..., Li, Ui} and Li+1 is the least value of { Li+1, Ui+1, Li+2, and Ui+2, ..., Ln, Un}. Therefore let's examine their relationship,
    - If Li+1 > Ui, then there is NO OVERLAP here,
        * An interval with upper bound of Ui should stop here.
        * A new interval with the lower bound of Li+1 should start here
    - Otherwise there is a OVERLAP

How to find out what tasks the overlapped intervals cross?
Use a hash map to contain the overlapped intervals. Use lower bound of the overlapped intervals as the key and a vector as the value to save the tasks that it crossed.
    - Key: int (lower bounds of overlapped intervals)
    - Value: std::vector<size_t> (to save the indices of tasks that this overlapped interval crosses)

Sort the lower bounds of all overlapped intervals as LB. Then go through the tasks. Search the lower bound of each task lb in LB and find the first value that, ul', is less or equal t ul. Then save the index of this task into the hash map with key value of ul'.

3. Solution
Example:
Let's take a look at the example: [1, 2], [5, 6], [1, 5], [7, 8], [1, 6]
The sorted lower and upper bounds are,
    - Lower bounds: {1, 1, 1, 5, 7}
    - Upper bounds: {2, 5, 6, 6, 8}
Here is the algorithm runs
    - Take 1 as the lower bound of the first interval
    - Ui = 2, Li+1 = 1, where  i = 1. it is a OVERLAP case and continue
    - Ui = 5, Li+1 = 1, where i = 2, it is a OVERLAP case and continue
    - Ui = 6, Li+1 = 5, where i = 3, it is a OVERLAP case and continue
    - Ui = 6, Li+1 = 7, where i = 4, it is NOT a OVERLAP case
        * The first interval stops her as [1, 6]
        * The second interval starts here with lower bound as 7
    - Reach the end, construct the second interval with the upper bound of Un, [7, 8]
Two overlapped intervals generated as [1, 6] and [7, 8]
Insert key 1 with empty vector and key 7 with empty vector into hash map,
Order the overlapped interval's lower bounds as {1, 7}. Go through the lower bound of each interval and find the first value (in this case {1, 7}) which it is equal to or greater than.
    - Task [1, 2]: the 1st value 1 >= 1 is 1. then append 0 into the value of Key 1 of hash map
    - Task [5, 6]: the 1st value 5 >= 1 is 1. then append 1 into the value of Key 1 of hash map
    - Task [1, 5]: the 1st value 1 >= 1 is 1. then append 2 into the value of Key 1 of hash map
    - Task [7, 8]: the 1st value  7 >= 7 is 7. then append 3 into the value of Key 7 of hash map
    - Task [1, 6]: the 1st value 1>= 1 is 1. then append 4 into the value of Key 1 of hash map
Go through the hash map,
    [0, 1, 2, 4] in one group
    [3] in one group

Pseudo-code:
    - 1. Group lower bounds and upper bounds as {X1, ..., Xn} and {Y1, ..., Yn}
    - 2. Sort lower and upper bounds in ascending order as {L1, ..., Ln} and {U1, ..., Un}
    - 3. Take LowerBound = L1 as the lower bound of the first interval
    - 4. Compare Li+1 and Ui where i ranges from 1 to N-1
        * 4.1 if Li+1 > Ui, then 
                 construct the interval with Ui as the upper bound, [LowerBound, Ui]
                 update LowerBound = Li+1
                 go to Step 4.3
        * 4.2 if i = N - 1, then
                 construct the interval with Un as the upper bound, [LowerBound, Un]
                 go to Step 5
        * 4.3 Increment i by 1 and repeat Step 4
    - 5. Sort the lower bounds of overlapped interval as LB
    - 6. construct hash map <int, std::vector<size_t>> with LB and empty vectors
    - 7. Go through the tasks list
        * Find the first value x which is <= the lower bound of this task
        * hashMap[x].push_back(indexOfTask)
        * Repeat 7 until exhaust the task list
    - 8. Go through the hash map. The value (std::vector) of each entry are grouped.         

This solution has O(NlogN) computation complexity and O(N) space complexity.


2 comments:

  1. Hi, thanks for posting this. There seems to be an issue with the algorithm. You mention that "Task [7, 8]: the 1st value <= 7 is 7. then append 3 into the value of Key 7 of hash map" however, the first value of LB that's <= 7 is 1, since LB = {1, 7}. Can you clarify here please? This algorithm also fails with larger timestamp values i.e.
    Pair.of(1479942506000l, 1479942507000l)
    Pair.of(1479942337000l, 1479942354000l)
    Pair.of(1479942545000l, 1479942545000l)
    Pair.of(1479942620000l, 1479942639000l)

    ReplyDelete
    Replies
    1. After finding the intervals [1, 6] and [7, 8], the form the lower bounds as {1, 7}. Then go through the lower bound of each interval, find the first value which is >=. In this case,
      [1, 2] --> 1 >= 1, then append into hash map of KEY 1
      [5, 6] --> 5 > 1, then append into hash map of KEY 1
      [1, 5] --> 1 >= 1, then append into hash map of KEY 1
      [7, 8] --> 7 >= 7, then append into hash map of KEY 7
      [1, 6] --> 1 >= 1, then append into hash map of KEY1

      For the large time stamp, just long long data type.

      Delete