Thursday 30 April 2015

Data Structure and Algorithm - Find All Values as the Sum of Two Cubes in Two Different Ways Under N (Google Interview Question)

1. Problem Description

"
"

2. Analysis
The simplest way is to use brutal force to iterate from 1 to N and check if this value is the sum of two cubes in two different combinations. Iterating each value takes a linear way. For each value i, it takes (i^1/3) to check if it is the sum of two cubes and has only two combinations. All together the computation complexity is more than O(N) and less than O(N^(4/3)).

The brutal force approach is worse than linear computation, therefore it is not good enough. As some of contributors points out, the correct way is to find out all the cubes under N and then see if value i can be sum of two and has only two combinations.

Rather than take the top-down approach. Here I would like to take bottom-up approach. After find out all the cubes under N, rather than check if value i can be sum of two and has only two combinations, I would compute all the possible sums of two cubes and their frequency. And those sums that are no more than N and the frequency equal to 2 are what I am after.

Given N and all cubes no greater than N as {X1, X2, ..., Xm}, where m = the integer no greater than (N^1/3). Then computer all possible sums, SUM = Xi + Xj, where i is not equal to j and both range from 1 to M. In maximum it would have M*(M-1)/2 possible SUMs. And those are no greater than N and appear exactly twice meet the requirement.

3. Solution
Pseudo-code
    - 1. Find out all cubes under N as {X1, X2, ..., Xm}
    - 2. Compute all possible SUM = Xi+Xj and increment its frequency
    - 3. Filter out SUM no greater than N and its frequency equal to 2
Counting the frequency associated with the SUM could use the hash map by SUM as the key and the frequency as the value. It takes constant time to search and linear time to filter out those meet the requirement.

Complexity
    - M equal to the integer no greater than N^(1/3)
    - In maximum has M*(M-1)/2 possible SUM

Therefore the computation complexity is O(N^(2/3)) and space complexity is O(N^(1/3)+N^(2/3)). Both are better than linear complexity.

Tuesday 28 April 2015

Data Structure and Algorithm - Find the minimal intervals (Google Interview Question)`

1. Problem Description
This is a Google interview question for Interns from careercup. Here is the original description of this thread,

"
From a list of integer intervals, write a function to minimize the number of overlapping or consecutive ones. 

Test Input: [4, 8], [3, 5], [-1 2], [10, 12]
Test ouput: [-1, 8], [10,12]"
2. Analysis
As I introduced the sweep line algorithm in this blog entry, Data Structure and Algorithm - Sweep Line Algorithm, I believe that this problem can be solved with the similar idea.

Problem:
Given N intervals, [X1, Y1], [X2, Y2], ..., [Xn, Yn], where
    - Xi and Yi are all integers with i's range from 1 to N, and
    - the intervals are all inclusive on both ends.

Properties of the Sorted Lower and Upper Bounds:
According to the blog entry, Data Structure and Algorithm - Sweep Line Algorithm, we group the lower bounds and the upper bounds, {X1, X2, ..., Xn} and {Y1, Y2, ..., Yn}. Now we sort both lower bounds and upper bounds in the ascending order as,
    - Lower bounds: {L1, L2, ..., Ln}
    - Upper bounds: {U1, U2, ..., Un}
Because the lower and upper bounds are sorted, the following properties hold,
    - L1 <= L2 <= ... <= Ln
    - U1 <= U2 <= ... <= Un
    - Li <= Ui, for any given i between 1 to N
    - L1 <= L2 <= ... <= Li <= Ui, for any given i betwen 1 to N

Detect the Continuous/Overlap
Given the sorted lower and upper bounds,
    - Lower bounds: { L1, l2, ...,   Li, Li+1, ...., Ln}
    - Upper bounds: { U1, U2, ..., Ui, Ui+1, ..., Un]

Let's consider if Ui and Li+1 are overlapped, Based on the properties of the sorted lower and upper bounds, Ui is the largest value of {L1, U1, L2, U2, ..., Li, Ui} and Li+1 is the least value of { Li+1, Ui+1, Li+2, and Ui+2, ..., Ln, Un}. Therefore let's examine their relationship,
    - If Li+1 > 1 + Ui, then there is NO Continuous/Overlap here,
        * A interval with upper bound of Ui should stop here.
        * A new interval with the lower bound of Li+1 should start here
    - Otherwise there is either a CONTINUOUS or OVERLAP,
        * If Li+1 = 1 + Ui, then this is a CONTINUOUS  case.
        * Otherwise it is a OVERLAP case

3. Solution
Example:
Let's take a look at the example: [4, 8], [3, 5], [-1 2], [10, 12]
The sorted lower and upper bounds are,
    - Lower bounds: {-1, 3, 4, 10}
    - Upper bounds: { 2,  5, 8, 12}
Here is the algorithm runs
    - Take -1 as the lower bound of the first interval
    - Ui = 2, Li+1 = 3, where  i = 1. it is a CONTINUOUS case and continue
    - Ui = 5, Li+1 = 4, where i = 2, it is a OVERLAP case and continue
    - Ui = 8, Li+1 = 10, where i = 3, this is none of above cases
        * The first interval stops her as [-1, 8]
        * The second interval starts here with lower bound as 10
    - Reach the end, construct the second interval with the upper bound of Un, [10, 12]

Pseudo-code:
    - 1. Group lower bounds and upper bounds as {X1, ..., Xn} and {Y1, ..., Yn}
    - 2. Sort lower and upper bounds in ascending order as {L1, ..., Ln} and {U1, ..., Un}
    - 3. Take LowerBound = L1 as the lower bound of the first interval
    - 4. Compare Li+1 and Ui where i ranges from 1 to N-1
        * 4.1 if Li+1 > 1 + Ui, then 
                 construct the interval with Ui as the upper bound, [LowerBound, Ui]
                 update LowerBound = Li+1
                 go to Step 4.3
        * 4.2 if i = N - 1, then
                 construct the interval with Un as the upper bound, [LowerBound, Un]
                 Terminate the program
        * 4.3 Increment i by 1 and repeat Step 4         
This solution has O(NlogN) computation complexity and O(N) space complexity.

Thursday 23 April 2015

Data Structure and Algorithm: Minimal Swaps from Array A to B (Google Interview Question)

1. Problem Description
This is a Google interview question for software engineer from careercup. The following is the original thread,
"
You are given two arrays - A & B. The length of the arrays is the same - N. The values in the arrays are integers from 0 to N - 1 in random order and no number is repeated. You need to list a sequence of two-element swaps required to reorder the array A into the order of the array B. Additionally, at each step of the sequence you are allowed to swap two elements only if one of them is 0.
"

2. Analysis
Given the size of array as N and all unique elements (according to the problem description), all the total different combination of arrays are N! (factorial of N). A and B are just two combinations of N! Based on the fact all the combinations can be reached from a initial state (let's say in a ascending order) via swapping there must exist a minimal swaps that can make A and B the exactly same.

As we are talking about the minimal swaps of A to reach B, all the elements in A that are already in the correct locations should stay untouched and those are not should be swapped to reach B. Here comes the problem of how to reach B via minimal swaps.

The current solution proposed by some contributors
The solution proposed by some contributes are not the solution to reach B with the minimal swaps. Here is how it is done. Go through form the beginning of A and B and comparing two elements at the same location. If they are the same then carry on to the next location. If not the same, do the following, (suppose at this location x in A and y in B and x != y)
    - Swap x with 0 in A
    - Swap 0 with y in A
Now at this location A and B have the same value y. And in this way one difference causes two swaps. In general given M difference between A and B, the minimal swaps will be 2M.

The minimal swaps solution
Let's take another look at the example appearing in the thread.
int a[] = {1, 3, 2, 4, 0};
int b[] = { 4, 0, 3, 2, 1};
The solution proposed by Diapankar and a few others needs 10 swaps to reach B. According to the above analysis, 10 is exactly twice of the number of different element between A and B, 5.

Is this really need 10 swaps. Here is the solution that it should look like,
    - 1st swap:   1 and 0 => 0 3 2 4 1
    - 2nd swap: 4 and 0 => 4 3 2 0 1
    - 3rd swap:  2 and 0 => 4 3 0 2 1
    - 4th swap:  3 and 0 => 4 0 3 2 1
This takes exactly 4 swaps. If recall my other few blog entries, the difference between A and B can be called "edit distance". As all elements are unique and A and B are composed of the same elements. The actual question can be generalized as how many swaps to reduce the "edit distance" between A and B to 0. And what is the minimal swaps?

My solution as illustrated in the above example follows,
    - Find out the element 0 in A and its location i
    - Find out the element x in B at the same location i
    - Swap x with 0 in A
    - Repeat last 3 steps until there is no difference
The number of swaps of this solution is M - 1, where M is the "edit difference".

Here is a special case, When element 0 appears in the same location in A and B at the beginning or during the swap, two extra swap is need. In this case the minimal swap will be M+1.
int a[] = {1, 0, 2, 4, 3};
int b[] = { 4, 0, 3, 2, 1};
In the above case the edit distance M is equal to 4. But as the element 0 is not appearing in the editing difference list, therefore have to make it in the list. The solution is simple swap it out. For instance swap 0 with 3 (then M' = 5), Then we need M'-1 = 4 swaps to reach B. Therefore in total M'-1+1 = 5 swaps to from A to B. In this special case the minimal swaps is M'=M+1.

3. Solution
Here is the pseudo-code,
    - 1. Extract out the different elements into the edit distance list
    - 2. Check if 0 is appearing in edit distance list
        - 2.1 If not, swap 0 with one of element in the edit distance and add it into this list
    - 3. From the list look at the location of element "0".
    - 4. Find out the value x in B at the same location
    - 5. Swap 0 with x in the edit distance list (in A)
    - 6. Remove x from the edit distance list
    - 7. Repeat last 5 steps until exhaust the edit distance list
Bear in mind that Step 2.1 can only happen once. Either it happens at the very beginning when exacting the edit distance list, or it could happen after Step 5.
    -
Quite a few contributes pointed out how to do Step 3, 4 and 5 via using a N size array to save the index of each element in A, because the range of elements is from 0 to N-1 and without duplicate. In this special case it works perfectly fine. But in a more general case that keeps the uniqueness of elements but does not have the range constraint, a hash map should be used as the data structure with the element as the key and its index in A as the value.

In all here this solution has O(N) computation complexity and O(N) space complexity and guarantees minimal swaps as M-1/M+1.

Saturday 18 April 2015

Minimize the cost of cutting a log : Google Interview Question

1. Problem Description
This is a Google interview question from careercup. Here is the original post,

"You are given a log of wood of length `n’. There are `m’ markings on the log. The log must be cut at each of the marking. The cost of cutting is equal to the length of the log that is being cut. Given such a log, determine the least cost of cutting.
 "

2. Analysis
Before diving into the anslysis, let's be clear about my understanding on this problem.
    - The log can be cut on both sides
    - The cost is accumulated on log cut off each time (total m marking, m cuts)
    - The log still in the hand is not counted as cost

Assuming given a log of length n with m markings use D to represent the length between neighboring markings, d0, d1, ..., dm.
    - Size of D: m+1
    - Sigma(di) = n, where i from 0 to m

m = 1
One marking, one cutting to d0 and d1. In this case literally decide to cut from which side. Cut d0, then cost is d0 and cut d1 the cose is d1. Therefore the least cost of cutting this log is min(d0, d1).

m = 2
Two markings, two cuttings to d0, d1 and d2. Now we have 4 ways of cutting
    - Cut d0 off, then the cost is d0 + f(1) = d0 + min(d1, d2),
        where sud-problem f(1): the remaining log d1, d2
    - Cut d2 off, then the cost is d2 + f(1)  = d2 + min(d0, d1),
        where sud-problem f(1): the remaining log d0, d1
    - Cut d0+d1 off, then the cost is d0+d1+f(1) = d0+d1+min(d0,d1),
        where sud-problem f(1): the cut-off log d0, d1
    - Cut d1+d2 off, then the cost is d1+d2+f(1) = d1+d2+min(d1, d2),
        where sud-problem f(1): the cut-off log d1, d2

Let consider the value of d0, d1, and d2
If d0 is the largest, then the cost follows
    - d0 + min(d1, d2)
    - d2 + d1
    - d0 + d1 + d1
    - d1 + d2 + min(d1, d2)
Minimize the cost, therefore d2+d1 is th least cost.

If d1 is the largest, then the cost follows,
    - d0 + d2
    - d0 + d2
    - d0 + d1 + d0
    - d1 + d2 + d2
Minize the cost, therefore d0+d2 is the least cost.

If d2 is the largest, then the cost follows,
    - d0 + d1
    - d2 + min(d0, d1)
    - d0 + d1 + min(d0, d1)
    - d1 + d2 + d1
Minimize the cost, therefore d0+d1 is the least cost.

Now we can conclude that the least cost will be sum of the m least logs. It is also equals to the length of log minus the longest log, n - max(di).

m = 3
Three markings, three cuttings to d0, d1, d2, and d3. Now we have 24 ways of cutting
    - Cut d0 off, then the cost is d0+f(2),
        where the sub-problem f(2) with remaining log: d1, d2, d3
        cost = d0 + (d1+d2+d3 - max(d1, d2, d3))
                = n - max(d1, d2, d3)
    - Cut d0 and d1 off, then the cost is d0+d1+f(1)+f(1)
        where two sub-problems f(1), cut-off log d0, d1 and remaining log d2,d3
        cost = d0 + d1 + (d0+d1 - max(d0, d1)) + (d2+d3 - max(d2, d3))
                = n + d0 + d1 - max(d0, d1) - max(d2, d3)
    - Cut d0,d1 and d2 off, then the cost is d0+d1+d2+f(2)
        where the sub problem f(2) with cut-off log d0, d1, d2
        cost = d0 + d1 + d3 + (d0+d1+d2 - max(d0, d1, d2))
                = n + d0 + d1 + d2 - d3 - max(d0, d1, d2)
    - Cut d3 off, then the cost is d3 + f(2),
        where the sub-problem f(2) with remaining log: d0, d1, d2
        cost = d3 + (d0 + d1 + d2 - max(d0, d1, d2))
                = n - max(d0, d1, d2)
    - Cut d3, d2 off, then the cost is d3+d2+f(1)+f(1)
        where two sub-problems f(1), cut-off log d3, d2 and remaining log d1, d0
        cost = d3 + d2 + (d3+d2 - max(d3, d2)) + (d1+d0 - max(d1, d0))
                = n + d3 + d2 - max(d3, d2) - max(d1, d0)
    - Cut d3, d2, d1 off, then the cost is d3+d2+d1+f(2)
        where the sub-problem f(2) with cut-off log d3, d2, d1
        cost = d3 + d2 + d1 + (d3+d2+d1 - max(d3, d2, d1))
                = n + d3 + d2 + d1 - d0 - max(d3, d2, d1)

Again lets consider the largest value of di.
If d0 is the largest one, then the cost follows,
    - n - max(d1, d2, d3)
    - n + d0 + d1 - d0 - max(d2, d3) = n + d1 - max(d2, d3)
    - n + d0 + d1 + d2 - d3 - d0 = n + d1 + d2 - d3
    - n - d0
    - n + d3 + d2 - max(d3, d2) - d0
    - n + d3 + d2 + d1 - d0 - max(d3, d2, d1)
Minimize the cost, therefore n - d0 is the least cost

If d1 is the largest one, then the cost follows,
    - n - d1
    - n + d0 + d1 - d1 - max(d2, d3) = n + d0 - max(d2, d3)
    - n + d0 + d1 + d2 - d3 - d1 = n + d0 + d2 - d3
    - n - d1
    - n + d3 + d2 - max(d3, d2) - d1
    - n + d3 + d2 + d1 - d0 - d1 = n + d3 + d2 - d0
Minimize the cost, therefore n - d1 is the least cost

If d2 is the largest one, then the cost follows,
    - n - d2
    - n + d0 + d1 - max(d0, d1) - d2
    - n + d0 + d1 + d2 - d3 - d2 = n + d0 + d1 - d3
    - n - d2
    - n + d3 + d2 - d2 - max(d1, d0) = n + d3 - max(d1, d0)
    - n + d3 + d2 + d1 - d0 - d2 = n + d3 + d1 - d0
Minimize the cost, therefore n - d2 is the least cost

If d3 is the largest one, then the cost follows,
    - n - d3
    - n + d0 + d1 - max(d0, d1) - d3
    - n + d0 + d1 + d2 - d3 - max(d0, d1, d2)
    - n - max(d0, d1, d2)
    - n + d3 + d2 - d3 - max(d1, d0) = n + d2 - max(d1, d0)
    - n + d3 + d2 + d1 - d0 - d3 = n + d2 + d1 - d0
Minimize the cost, therefore n - d3 is the least cost

Now we can conclude that the least cost is the n - max(di) if m = 3.

m as any value
Use the induction theory this can be concluded that the least cost is equal to n - max(di).

3. How to Cut?
As the solution is rolled out, the longest log will be remained in the hand and the rest will be cut off and regarded as the cost. The solution of cutting is quite simple - always pick up the shorter log to cut on the both side for each cutting. In this way it will guarantee the longest log will remain on the hand as the last.