Saturday, 30 May 2015

Data Structure and Algorithm - Trie and Palindrome (Google Interview Question)

1.Problem Description


2. Analysis
Initially I was only thinking about the case given the example list - all the elements/strings have the same length. A simple solution is to use hash map.
    - Use the string as the key and the frequency of it appearing in the list as the value.
    - Inset the first element and set the frequency as 1.
    - From the second string on, try to find its reverse string in the hash map.
        * If found, then there is palindrome and increment the counter by the frequency.
        * Insert the string into the hash map
            ~ If already in the hasp map, then increment the frequency by 1
            ~ Otherwise insert and set the frequency as 1

Unfortunately this algorithm won't work for strings with varying length, for instance {a, aa, aaa, aaaa}, - the elements can have different length. In this example {a, aa, aaa, aaaa} all the combinations are palindrome and in total it has 6 combinations. The hash map solution does not work any more because all the strings are different keys.

Let's examine what properties they share.
The strings have the same length. It is quite simple. The reverse of the second string should be equal to the first string.
The strings have different length. Assume two string with size L1, L2, and L1 > l2. Then after reverse one of them. Then the first L2 should be same and the rest of L1-l2 should be palindrome as well. In the above example
    - "a" and "aaaa", the reverse of  "aaaa" is "aaaa".
        * The red part is the same and blue part is palindrome as well. 

In summary for any two given string s1 and s2 with length of L1 and L2, to decide if s1s2 is a palindrome follows, (s2' is the reverse string of s2)
    - Case 1: L1 = L2, then  s1 = s2'
    - Case 2: L1 >  l2  then s1.substr(0, L2) = s2' and s1.substr(L2 + 1, L1 - L2) is palindrome too.
    - Case 3: L1 < L2, then s1 = s2'.substr(0, L1) and s2.substr(L1 + 1, L2 - L1) is palindreom too.
3. Solution
Some contributors mentioned using Trie structure/algorithm to solve this issue. But none of them proposed how the data structure looks like exactly and how to do the searching/traversing and insertion operations.

Data Structure
In this case I assume all the strings given are made of letters only and are either in upper case or in lower case. Based on my original Trie data structure and  the fact that there might be duplicates of strings in the list according to the problem description, the data structure is modifies as,

constexpr unsigned char NUMBER_OF_LETTERS = 26
struct DictionaryTrie {
char oneLetter;
        size_t frequency;
DictionaryTrie* children[NUMBER_OF_LETTERS];
DictionaryTrie() {
str = NULL;
for (int i = 0; i != NUMBER_OF_LETTERS; ++i) {
children[i] = NULL;
}
}
};
Each node only takes one letter, "oneLetter" as data and records how many times this string appears in the list as "frequency".

How to insert a string into Trie?
When a string is inserted into Trie, only the last letter's "frequency" set to indicate how many times this string appears in the list. For instance,
    - "abc" inserted into a empty Trie as,
        * root->(a, 0)->(b, 0)->(c, 1)
    - "abcd" inserted into this Trie
        * root->(a, 0)->(b, 0)->(c, 1)->(d, 1)
    - "abc" inserted into the Trie again
        * root->(a, 0)->(b, 0)->(c, 2)->(d, 1)
    - "abef" inserted into the Trie
        * root->(a, 0)->(b, 0)->(c, 2)->(d, 1)
                                         |->(e, 0)->(f, 1)

Pseudo-code
So far we understand how the string is inserted into the Trie and how to decide if the combination of two strings (Section 2) is a palindrome. Here is the pseudo-code:
1. Initialize counter as 0
2. Insert the first string of the list into an empty Trie
3. Take the next string S and its reverse string S'
4. Traverse the Trie with S'
    4.1 If found a node with frequency > 0, then check S'  (check Section 2)
        - Increment counter by frequency and go to Step 5, if reach the end of S' too.
                                                                                          : Case 1 in Section 2
        - Check if the rest of S' is palindrome, if does not reach the end of S'
                                                                                          : Case 2 in Section 2
            * If yes, then increment counter by frequency and go to Step 4
            * If no, then go to Step 4
    4.2 If reach the end of S', record node X                      : Case 3 in Section 2
        - 4.2.1 Continue traverse the children of X
            * If found a node Y with frequency > 0, then
                ~ If between X and Y is palindrome,  increment counter by frequency and go to Step 4.2.1
                ~ If between X and Y is not palindrome, go to Step 4.2.1
            * Reach the end of Trie, go to Step 5
5. Insert S into Trie
    - If not exist, insert it and set the last letter's frequency as 1
    - If exist. increment the last letter's frequency by 1
6. Repeat Step 3,4 and 5 until exhaust the list

4. Example
Here is an example to show how the algorithm works.
    - The black color is the Trie structure
    - The red color is the newly inserted new string (tracing back to the root node) : Step 5
    - The green color is to detect the palindrome: Step 4
        * The reverse string is to compare with the Trie before inserting it
        * For instance in Step 5: "aabb" is to compare with the Trie in Step 4
    - The blue color is indicate the result
    - Case 3: 2, 3, 4
    - Case 1: 6 @ (a, 1) of level 1 (Trie at 5)
    - Case 2: 6 @ (a, 1) of level 2 (Trie at 5)
       
Example 1

5. Complexity
I don't think O(nk) is achievable and the worst case will be (n*(k^2)). Because there is a special case like this {a, aa, aaa, aaaa, aaaaa}.
When the algorithm does with "aaaaa", it will have to decide if "aaaa", "aaa", "aa" and "a" are palindrome. This operation is quite expensive as O(k^2). 

Dynamic Programming - Minimize the Number of Canoes to Take Kids (Google Interview Question)

1. Problem Description
This is a Google interview question for software engineer from careercup. Here  is the original thread of this problem,

"If a canoe can hold 2 kids and a max weight of 150 lbs, write a function that returns the minimum number of canoes needed, given a list of kids and their weights."
2. Analysis
Some of contributors have already proposed a very good solution for this problem. It is to sort the weight - the lightest on one end and the heaviest on the other. Then try to pair the lightest and the heaviest. If the sum of two weights is not greater than the load of boat, then these two share one canoe. Otherwise the heaviest uses one canoe and move towards the middle. Then consider the lightest with the 2nd heaviest. Repeat this process until exhaust the list of weight.

I believe that this is the best solution so far and my solution proposed in this blog can't do better than this. However it will think about this problem in a different angle and try to use dynamic programming to solve this problem.

The following picture shows the mathematical formulation and constraints, and the sub-problem.

Constraints and Sub-problem
Corner cases:
    - N = 2, if the sum of two weights > 150, then f(2) = 2 otherwise f(2) = 1
    - N= 1, f(1) = 1

3. Implementation Discussion
Caching 
As usual I was trying to cache the result of the sub-problems. But unfortunately I fail to do so, because I can't find the good key for hash map that can represent the sub-problem uniquely. The sub-problem consists of a list o f weights with the list size from 1 to N-1. The candidate of the key would be either the list of indices of kids remaining in the sub-problem or the actual weights remaining in the sub-problem. Therefore the key would be a list/vector.This is really not a good solution in this case.

Sorting
Sorting should also benefit this solution. Because it will limit the number of sub-problems. As long as found the pair whose weight is greater than the load of the canoe, then can stop trying the rest of sub-problems because we already have a solution as good as the best of the rest. The following picture shows the sub-problem after sorting.

Sub-problem after Sorting


Saturday, 9 May 2015

Data Structure and Algorithm - Find the longest strictly ascending sub-sequence of array (Google Interview Question)

1. Problem Description
This is a Google interview question from careercup. Here is the original thread,

"
".

2. Examples
I am to propose a tree structure to insert all the elements of array into the tree. Only the best/longest possible solutions will remain in the tree and one of the solutions will be found when reaching the end of array. 
Let's take a look how the two examples in the thread can be worked out by hand, then we can look into the tree structure. Apologize for my handwriting as the train was tumbling when I was working on them.

Example 1: [-1, 2, 100, 100, 101, 3, 4, 5, -7]
Solution of Example 1


Example 2: [10, 8, 6, 10, 2, 10 , 6, 14, 4, 9, 5, 13, 3, 11, 7, 15, 80, 9, 10, 10, 10, 10]

Solution of Example 2 - 1
Solution of Example 2 - 2
Solution of Example 2 - 3

Let's take a look at the more difficult one - Example 2. Here is some detailed explanation why each step is taken so.
Step a: - Insert 10 into an empty node
Step b: - Insert 8 into  left of 10, because 8 < 10.
            - Remove 10 from the right branch.
              Reason: any sub-sequence involving 10 can be replaced by 8. The left branch (8)
                            is always better or at least as good/long as the right branch (10).
Step c: - Insert 6 into the left of 8, because 6 < 8
            - Remove 8 from the right branch. The same reason as Step b
Step d: - Insert 10 as the child of 6, because 10 > 6 (build up the sub-sequence)
Step e: - Insert 2 into the left branch of 6, because 2 < 6
            - Keep both branches
              Reason: the left branch (2) has the potential to overtake the right branch (6), for
                            instance 3,  4, 5 following next in the array.
Step f: - Insert 10 as child of 2. because 10  > 2
            - Ignore inserting 10 as child of 6, because 6 already has a child 10
            - Remove the right branch 6, 10
              Reason: The same as Step b. In general if two branches has the same depth and
                            each element in the left branch is =< it counterpart at the same
                            level in the right branch, then the whole right branch can be removed.
                            In this case the parent node of two branches are 2 and 6, and their
                            parent node is the root, therefore 6 and 10 can be removed.
Step g: - Insert 6 as a child of 2 and stays at the left branch
               Reason: as child of 2 because 6 > 2 and stays at the left branch because 6 < 10
            - Remove 10 as Step b
Step h: - Insert 14 as a child of 6 because of 14 > 2 and 14 > 6
Step  i: - Insert 4 as a child as 2 and stay as  left branch. Same reason as Step g
            - Keep both branches as Step e
Step  j: - Insert 9 as child of 4 and 6. And stays the left branch as child of 6
            - Remove 14 as Step b
            - Then continue to remove 6 and 9 as Step f
Step k: - Insert 5 as child of 4 and stays in the left branch
            - Remove 9 as Step b
Step  l: - Insert 13 as child of 5 as 13 is the largest value
Step m: - Insert 3 as child of 2
             - Keep both branches as Step i
Step n: - Insert 11 as child of 3 and child of 5 and stay at the left branch of 5
            - Remove 13 as Step b
Step n':- Remove 11 as child of 3
              Reason: any following number > 11 conforms the sub-sequence, the right
                            branch is preferred because it is longer than left branch. (Currently
                            the depth of left and right are 3 and 4)
Step o: - Insert 7 as child of 3 and 5. Stays at the left branch of 5
            - Remove 11 as Step b
            - Remove 7 as child of 3 as Step n'
Step p: - Insert 15 as child of 3 and child of 7
            - Remove 15 as the child 3 as Step n'
Step q: - Insert 80 as child of 3 and child of 15
            - Remove 80 as child of 3 as Steop n'
Step r: - Insert 9 as child of 3 and child of 7. Stays at the left branch of 7
            - Remove 9 as child of 3 as Step n'
Step s: - Insert 10 as child of 3 and child of 9
            - Remove 15 and 80 as Step f
            - Remove 10 as child of 3 as Step n'
Step t: - Ignore 10 as duplicate of last value
Step u: - Ignore 10 as duplicate of last value
Step v: - Ignore 10 as duplicate of last value 
End:     - Reach the end of the array
             - One of solutions 2, 4, 5, 7, 9, 10
             - Length: 6

In Step n, Step o, Step p, Step q, Step r and Step s, the insertion to the left branch can be avoided actually in the first place then no bother deleting again. It will be discussed in the next section.

3. Analysis
Now let's take a close look at the tree structure and operations based on the examples in Section 2.
Branches 
    - 2, therefore this is a binary tree
Ordered
    - Yes
    - The children are greater than parents
    - The left branches is less than the right branches
Insertion
    - Give x, Find the node, y, that is one of following cases. And insert x as a child of y
        * y is a leaf node and less than x, for instance Step d and Step h
        * y is not a leaf node but its child is greater than x, for instance Step g and Step i
    - If y is the root, then insert as its left node, for instance Step b and Step c
Deletion after insertion of x
    -  If y is not a left node and x has a sibling z (must be greater than x)
        * Delete z, if z has no children, for instance Step k
    - x has a counterpart, z, at the same level at the right branch,
        * Delete x, if x > z and z has child/children, for instance Step p and Step q
    - Compare x with its sibling z
        * Delete the right branch up to the common ancestor of x and z, if x is less
           than z and x's ancestors are less than z's ancestor at each level,
           for instance Step f, Step g and Step j 

Exemption from insertion
Just examining Step n, Step o, Step p, Step q, Step r and Step s, probably do not need to insert x into the left branch. Do not insert x, if meet all the following conditions
    - y is a left node it has a sibling z
    - y < z
    - x < y
    - x < z
    - z's child is less than x as well

Solution
Guarantee find one of the longest sub-sequences but not all the solutions. In Example 2, Step s shows that 2, 4, 5, 7, 15, 80 is also a solution.

Complexity
Assume the size of the array is N and  the size of the longest sub-sequence as M,
    - Computation complexity: worst case N*M.
        * For instance given array is sorted and ascending, then  M=N, the tree will be like
           drop line. Each note has an element and each insertion will have to  compare
           all from the top to the bottom for insertion. (This worst case can be dropped
           to a linear solution by caching the leaf node)
    - Space complexity: worst case N

4. Solution and Implementation
I will discuss them in my next two blogs.