Thursday, 8 December 2016

DSA - Move Basketball into a Continuous Row

1. Problem Description
This is a Google interview question for software engineer/developer from careercup. Here is the original thread,
"

3
of 3 votes
16
Answers

Tom works in a warehouse. A billion (1,000,000,000) boxes are arranged in a row. They are numbered from one to one billion (from left to right). Some boxes contain a basketball (at most one basketball in each box). In total, there are N basketballs.Tom wants to organize the warehouse. He would like to see all the basketballs arranged next to each other (they should occupy a consistent interval of boxes). In one move, Tom can take one basketball and move it into an empty box. What is the minimum number of moves needed to organize the basketballs in the warehouse?

Write a function:


class Solution { public int solution(int[] A); }

that, given an array A containing N integers, denotes the positions of the basketballs (the numbers of the boxes in which they are placed) and returns the minimum number of moves needed to organize the basketballs in the warehouse.

For example, given: A = [6, 4, 1, 7, 10], your function should return 2 because the minimum number of moves needed to arrange all basketballs next to each other is 2. There are several ways to do it. For example, you could move the ball from the first box to the fifth, and the ball from the tenth box to the eighth. You could also move the ball from the first box to the fifth, and the ball from the tenth box to the third instead. In any case, you need at least two moves.

To be done in O(nlogn) Time complexity and O(1) space complexity
- SmashDUNK November 20, 2016 in United States | Report Duplicate | Flag

".

2. Data Structure and Algorithm
Give an array of indices of basketball positions, P {p1, p2, ..., pn}. The question is to find the minimum move to arrange the N basketballs into a continuous row.

Pseudo code:
    - Sort the array in  the ascending order
    - Use a window of size N to detect the maximal number of basketballs covered as M
    - Then the minimum move is N - M

Use in-place heap sort to sort the array in ascending order and use to pointers to point to the beginning and end of the window. It guarantees O(nlogn) computation complexity and O(1) space complexity. The heap sort used in this blog can be found here - Data Structure and Algorithm - Sort and Binary Search.

3. C++ Implementation
// ********************************************************************************
// Implementation
// ********************************************************************************
size_t MinimunMovement(std::vector<int> &basketBalls)
{
    if (basketBalls.empty()) {
        return 0;
    }
    const size_t NumOfBasketballs = basketBalls.size();
    // use in-place heap sort to sort the array
    HeapSort().Sort(basketBalls);
    // two pointers to point the front and end of window
    auto itSlow = basketBalls.cbegin();
    auto itFast = basketBalls.cbegin();
    auto itEnd = basketBalls.end();
    size_t result = 0;
    size_t tempResult = 0;
    // std::vector<int> copy;

    // apply the window to the sorted array to find the
    // the maximal number basketballs the window covers
    for (; itFast != itEnd; ++itFast) {
        if ((*itFast - *itSlow) < NumOfBasketballs) {
            ++tempResult;
        }
        else {
            if (tempResult > result) {
                result = tempResult;
                //copy = std::vector<int>(itSlow, itFast);
            }
            ++tempResult;
            while ((*itFast - *itSlow) >= NumOfBasketballs) {
                ++itSlow;
                --tempResult;
            }
        }
    }
    if (tempResult > result) {
        result = tempResult;
        //copy = std::vector<int>(itSlow, itFast);
    }

    // the minimal number of move is the difference
    return NumOfBasketballs - result;
}

// ********************************************************************************
// Test
// ********************************************************************************
void TestBasketBall()
{
    std::vector<int> basketBalls;
    assert(MinimunMovement(basketBalls) == 0);
    basketBalls = { 6, 4, 1, 7, 10 };
    assert(MinimunMovement(basketBalls) == 2);
    basketBalls = { 8, 4, 1, 7, 10 };
    assert(MinimunMovement(basketBalls) == 2);
    basketBalls = { 8, 2, 1, 7, 10 };
    assert(MinimunMovement(basketBalls) == 2);
}

Wednesday, 7 December 2016

Identify the Posionous Bucket with Minimal Cost

1. Problem Description
This is a Micorsoft interview question for software engineer intern from careercup. Here is the description of the original thread,
"

Just a disclaimer: I doubt you will ever get this interview question. My interviewer even started off by saying, "Hmm, well this isn't really fair, but..." So don't place too much stock in whether or not you can solve this. 

Question: You have a group of pigs and buckets of food for said pigs. There are 1,000 buckets of food, and exactly 1 of them is poisoned. Your goal is to determine, by the end of 1 hour, which bucket is poisoned. 

The poison takes 30 minutes to kill a pig, and you'd like to kill as few pigs as possible. The number of pigs you can test is limitless, and you can assign a number to each bucket and each pig so that you know exactly which pig ate from which bucket(s). You determine which buckets to feed to which pigs, but you have no timer and no way to guesstimate the time. What is the minimum number of pigs you need to use to solve the problem?
- oxymoronic2012 November 02, 2016 in United States for Bing | Report Duplicate | Flag 
".

2. Analysis
I don't think there is a definite answer to this question. Because in the worst case it needs 1000 pigs to find out the poisonous bucket and the best case is 1. Therefore instead of looking of a definite answer this question is to find out the minimum of expectation of how many pigs needed to find the poisonous bucket.

Also in this problem it offers a trial because the poison takes 30 minutes to take effect and the time limit is one hour. Hence the first 30 minutes allow us to try first and then decide what to do in the next 30 minutes. Typical relative probability problem.

In the worst case (without attempting a trial) 1000 pigs eat the food of the buckets and then in 30 minutes find the answer. Then 1000 pigs with 100% assurance. Then the cost is 1000. But in a trial case, for instance 500 pigs to try the bucket first (with the probability of 50% to spot the poisonous bucket) and then terminates if found otherwise put another 500. Then expected value is 500*0.5 + 1000*0.5 = 725.

Here is the formula: assume the first try is x, where x is within [1, 1000], then the possibility of successful is x/1000. If fail, then use another (1000-x) to test to get the answer.
    - First try x pigs with probability of x/1000
    - 1000-x pigs with probability of 100%, if the first try fail (the probability of fail as (1 - x/1000))
Then the expectation E is x*(x/1000) + 1000 * (1-x/1000).

The problem can be regarded as: Min{x*x/1000 - x + 1000}, where x is within [1, 1000].

(x*x - 1000x + 1000000)/1000 = ((x - 500)*(x-500) + 725000)/1000.
Min{((x - 500) * (x- 500) + 725000)/1000)}, where x is within [1, 1000].
Therefore x=500, it has the minimum expectation as 725.

3. General Case
What if the poison takes only 20, 10 5, ...... minutes to take effect. This means that we have 2, , 5, 11, ...... trials before hitting the last resort.

Let's generalize the problem. The number of total bucket as N and the number of trial as M. In the above case N = 1000 and M = 2 (1 hour/30minutes). Then we represent the above question as,
     F(N, M) as F(1000, 2).

Let's check the case of take 20 minutes for the poison to take effect and one hour allowed. Then N = 1000 and M = 3, as F(1000, 3)
Assume the first attempt x within [1, 1000],
     - Probability to successful: x/1000
     - Expected value: x * x/1000
If the first trial fail, then the follow has possibility of (1 - x/1000). Therefor the expectation can be written as:
    E{F(1000, 3)} = E{x * x/1000 + F(1000 - x, 2) * (1 - x/1000)}, where x is within [1, 1000]
Here F(1000-x, 2) can solved as in Section 2.

Using the induction theory, for F(N, M) case,
    E{F(N, M)} = E{x * x/N + F(N - x, M - 1) * (1 - x/N)}, where x is within [1, N].
The special case:
    Min{E{F(N, 1)}} = N
    Min{E{F(N, 2)}} = 3*N/4

Saturday, 12 November 2016

BTS - Find the Longest Sequence

1. Problem Description
Find the longest sequence in a binary tree

2. Data Structure and Algorithm
Pseudo-code:
    - Generate the depth map for each node
        * maximal depth of the left child
        * maximal depth of the right child
    - Go through then map and spot the node, Nx, with maximal sum of left and right depths
    - Find the deepest path, Pl of the left child of Nx
    - Find the deepest path, Pr of the right child of Nx
    - Revert Pl and join Pl, Nx and Pr as the longest sequence

Find the deepest path check details: BTS - Find the Deepest Path

3. C++ Implementation
Find the TreeNode definition, utility functions here: BST - Find Next In-Order Value and BTS - Print the First and Last Nodes of Each Level.

// ********************************************************************************
// Implementation
// ********************************************************************************
struct TreeNodeDepth{
    TreeNodeDepth()
        : depthOfLeft(0)
        , depthOfRight(0)
    {}
    TreeNodeDepth(const size_t left, const size_t right)
        : depthOfLeft(left)
        , depthOfRight(right)
    {}

    size_t Total() const {
        return depthOfLeft + depthOfRight;
    }
    // depth of left and right children
    size_t depthOfLeft;
    size_t depthOfRight;
};

template <typename T>
using DepthMap = std::unordered_map<TreeNode<T>*, TreeNodeDepth>;

template <typename T>
size_t GenerateDepthMap(TreeNode<T> *root, DepthMap<T> &depth) {
    if (!root) {
        // special case - NULL pointer
        return 0;
    }

    // depth of this node and save into the map
    size_t leftDepth = GenerateDepthMap(root->left, depth);
    size_t rightDepth = GenerateDepthMap(root->right, depth);
    depth.insert(std::make_pair(root, TreeNodeDepth(leftDepth, rightDepth)));

    // return the maximal depth of left and right path
    return leftDepth >= rightDepth ? (leftDepth + 1) : (rightDepth + 1);
}

template <typename T>
std::list<TreeNode<T>*> FindTheLongestSequence(TreeNode<T> *root)
{
    if (!root) {
        // special case - NULL pointer
        return std::list<TreeNode<T>*>();
    }

    // Generate the depth map
    DepthMap<T> depth;
    GenerateDepthMap(root, depth);
    size_t totalLength = 0;
    TreeNode<T>* nodeWithTheLongestSeq = NULL;
    auto itEnd = depth.rend();
    // Loop through the depth map and find the longest sequence
    // Spot the node with largest sum of depth of left and right children
    for (auto it = depth.rbegin(); it != itEnd; ++it) {
        if (it->second.Total() > totalLength) {
            nodeWithTheLongestSeq = it->first;
            totalLength = it->second.Total();
        }
    }
   
    // Find the deepest path of the left child
    std::list<TreeNode<T>*> leftBranchPath = 
        FindTheDeepestPath(nodeWithTheLongestSeq->left);
    // Find the deepest path of the right child
    std::list<TreeNode<T>*> rightBranchPath =
        FindTheDeepestPath(nodeWithTheLongestSeq->right);
    // Join them to form the sequence
    leftBranchPath.reverse();
    leftBranchPath.push_back(nodeWithTheLongestSeq);
    leftBranchPath.splice(leftBranchPath.end(), rightBranchPath);
    return leftBranchPath;
}

// ********************************************************************************
// Test
// ********************************************************************************
void TestFindTheLongestSeq()
{
    std::vector<double> input = { 1.0, 2.0, 3.0, 4.0, 5.0 };
    TreeNode<double> *root = NULL;
    auto path = FindTheLongestSequence(root);
    assert(path.empty() == true);
    //              3
    //      1               4
    //          2               5
    root = ConstructTreeOnSortedValueBFS(input);
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 2.0, 1.0, 3.0, 4.0, 5.0 });
    
    input = { 6.0, 7.0, 8.0, 9.0 };
    TreeNode<double> *subTree1 = ConstructTreeOnSortedValueDFS(input);
    input = { 10.0, 11.0, 12.0, 13.0 , 14.0 };
    TreeNode<double> *subTree2 = ConstructTreeRecursive(input);
    input = { 15.0, 16.0, 17.0, 18.0 };
    TreeNode<double> *subTree3 = ConstructTreeOnSortedValueDFS(input);
    input = { 19.0, 20.0, 21.0, 22.0 , 23.0 };
    TreeNode<double> *subTree4 = ConstructTreeRecursive(input);

    TreeNode<double> *leftestNode = FindLeftestNode(root);
    assert(leftestNode != NULL);
    leftestNode->left = subTree1;
    //              3
    //         1        4
    //      8     2        5
    //    6   9
    //  7
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 7.0, 6.0, 8.0, 1.0, 3.0, 4.0, 5.0 });

    TreeNode<double> *rightestNode = FindRightestNode(root);
    assert(rightestNode != NULL);
    rightestNode->left = subTree2;
    //              3
    //         1        4
    //      8     2         5
    //    6   9          12
    //  7            10      13
    //                    11      14
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 7.0, 6.0, 8.0, 1.0, 3.0, 4.0, 5.0, 12.0, 10.0, 11.0 });
    
    TreeNode<double> *firstLeftRighestNode = FindRightestNode(root->left);
    assert(firstLeftRighestNode != NULL);
    firstLeftRighestNode->right = subTree3;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //                16          11      14
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 16.0, 15.0, 17.0, 2.0, 1.0, 3.0,
                               4.0, 5.0, 12.0, 10.0, 11.0 });
    
    TreeNode<double> *firstRightLeftestNodeOfS2 = FindRightestNode(subTree2->left);
    assert(firstRightLeftestNodeOfS2 != NULL);
    firstRightLeftestNodeOfS2->left = subTree4;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //               16           11      14
    //                         21
    //                    19     22
    //                        20       23
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 16.0, 15.0, 17.0, 2.0, 1.0, 3.0,
        4.0, 5.0, 12.0, 10.0, 11.0, 21.0, 19.0, 20.0 });
    
    TreeNode<double> *rightestNodeOfS2 = FindRightestNode(subTree2);
    assert(rightestNodeOfS2 != NULL);
    for (size_t i = 0; i < 8; ++i) {
        rightestNodeOfS2->right = new TreeNode<double>(100.0 + i);
        rightestNodeOfS2 = rightestNodeOfS2->right;
    }
    TreeNode<double> *rightestNodeOfS4 = FindRightestNode(subTree4);
    assert(rightestNodeOfS4 != NULL);
    for (size_t i = 0; i < 4; ++i) {
        rightestNodeOfS4->left = new TreeNode<double>(50.0 + i);
        rightestNodeOfS4 = rightestNodeOfS4->left;
    }
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //                16          11      14
    //                         21              100
    //                    19     22             101
    //                      20       23            102
    //                             50                   103
    //                           51                       104
    //                         52                            105
    //                       53                                106
    //                                                              107
    path = FindTheLongestSequence(root);
    TestResult<double>(path, { 53.0, 52.0, 51.0, 50.0, 23.0, 22.0, 21.0,
        11.0, 10.0, 12.0, 13.0, 14.0, 100.0, 101.0, 102.0, 103.0, 104.0,
        105.0, 106.0, 107.0 });
    // clean up
    DeleteTree_TD(&root);
}


BTS - Find the Deepest Path

1. Problem Description
Find the deepest path from root to its leaf node.

2. Data Structure and Algorithm
Recursive implementation pseudo-code:
    - Get the deepest path of the path of the left child
    - Get the deepest path of the path of the right child
    - Return the deepest path of the left child if it is longer than the right
        * Return the deepest path of the right child if it is longer than the left
    - Special case: Return null path if this node is empty

Loop implementation pseudo-code:
    - Generate a stack map
        * In the order of each level
        * Each level is separated by a NULL pointer
        * Start from the root until the leaf nodes of the lowest level
    - Take the stack map from backwards
        * Take the first node of the lowest level (one of deepest path), Pn
        * Populate out node of each level (use separator of NULL pointer)
        * Find the parent of Pn -- Pn-1
        * Loop the stack map until the root

3. C++ Implementation
Find the TreeNode definition, utility functions here: BST - Find Next In-Order Value and BTS - Print the First and Last Nodes of Each Level.

// ********************************************************************************
// Implementation
// ********************************************************************************
// Recursive implementation
template <typename T>
std::list<TreeNode<T>*> FindTheDeepestPath_R(TreeNode<T> *root)
{
    if (!root) {
        return std::list<TreeNode<T>*>();
    }
    // left path
    std::list<TreeNode<T>*> leftPath = FindTheDeepestPath_R(root->left);
    leftPath.emplace_front(root);
    // right path
    std::list<TreeNode<T>*> rightPath = FindTheDeepestPath_R(root->right);
    rightPath.emplace_front(root);

    return leftPath.size() >= rightPath.size() ? leftPath : rightPath;
}

// Loop implementation
template <typename T>
std::list<TreeNode<T>*> FindTheDeepestPath(TreeNode<T> *root)
{
    if (!root) {
        return std::list<TreeNode<T>*>();
    }

    std::stack<TreeNode<T>*> nodesInLevel;
    std::queue<TreeNode<T>*> searchQueue;
    searchQueue.push(root);
    searchQueue.push(NULL);
    TreeNode<T> * curNode = NULL; // empty node as separater
    while (!searchQueue.empty()) {
        curNode = searchQueue.front();
        searchQueue.pop();
        if (curNode) {
            if (curNode->left) {
                searchQueue.push(curNode->left);
            }
            if (curNode->right) {
                searchQueue.push(curNode->right);
            }
        }
        else if (!searchQueue.empty()) {
            searchQueue.push(curNode);
        }

        nodesInLevel.push(curNode);
    }

    // nodeInLevel should have all nodes
    // each level is separated by empty nodes
    // start with root and end with empty nodes
    std::list<TreeNode<T>*> path;
    std::list<TreeNode<T>*> nodesInSameLevel;
    while (!nodesInLevel.empty()) {
        curNode = nodesInLevel.top();
        nodesInLevel.pop();
        if (!curNode) {
            // found a level
            if (!nodesInSameLevel.empty()) {
                if (path.empty()) {
                    // it might have multipile path
                    // take the first one avaialbel
                    path.push_back(*nodesInSameLevel.rbegin());
                }
                else {
                    auto itEnd = nodesInSameLevel.cend();
                    for (auto it = nodesInSameLevel.cbegin(); it != itEnd; ++it) {
                        if ((*it)->left == path.back() || (*it)->right == path.back()) {
                            // find its parent node
                            path.push_back(*it);
                            break;
                        }
                    }
                }
            }
            nodesInSameLevel.clear();
        }
        else {
            nodesInSameLevel.push_back(curNode);
        }
    }
    path.push_back(curNode);
    path.reverse();  
    return path;
}

// ********************************************************************************
// Test
// ********************************************************************************
void TestFindTheDeepestPath()
{
    std::vector<double> input = { 1.0, 2.0, 3.0, 4.0, 5.0};
    TreeNode<double> *root = NULL;
    auto path = FindTheDeepestPath(root);
    assert(path.empty() == true);
    path = FindTheDeepestPath_R(root);
    assert(path.empty() == true);
    //              3
    //      1               4
    //          2               5
    root = ConstructTreeOnSortedValueBFS(input);
    path = FindTheDeepestPath(root);
    TestResult<double>(path, {3.0, 1.0, 2.0});
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 1.0, 2.0 });

    input = { 6.0, 7.0, 8.0, 9.0 };
    TreeNode<double> *subTree1 = ConstructTreeOnSortedValueDFS(input);
    input = { 10.0, 11.0, 12.0, 13.0 , 14.0};
    TreeNode<double> *subTree2 = ConstructTreeRecursive(input);
    input = { 15.0, 16.0, 17.0, 18.0 };
    TreeNode<double> *subTree3 = ConstructTreeOnSortedValueDFS(input);
    input = { 19.0, 20.0, 21.0, 22.0 , 23.0 };
    TreeNode<double> *subTree4 = ConstructTreeRecursive(input);

    TreeNode<double> *leftestNode = FindLeftestNode(root);
    assert(leftestNode != NULL);
    leftestNode->left = subTree1;
    //              3
    //         1        4
    //      8     2        5
    //    6   9
    //  7
    path = FindTheDeepestPath(root);
    TestResult<double>(path, { 3.0, 1.0, 8.0, 6.0, 7.0 });
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 1.0, 8.0, 6.0, 7.0 });

    TreeNode<double> *rightestNode = FindRightestNode(root);
    assert(rightestNode != NULL);
    rightestNode->left = subTree2;
    //              3
    //         1        4
    //      8     2         5
    //    6   9          12
    //  7            10      13
    //                  11      14
    path = FindTheDeepestPath(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 10.0, 11.0 });
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 10.0, 11.0 });

    TreeNode<double> *firstLeftRighestNode = FindRightestNode(root->left);
    assert(firstLeftRighestNode != NULL);
    firstLeftRighestNode->right = subTree3;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //               16            11      14
    path = FindTheDeepestPath(root);
    TestResult<double>(path, { 3.0, 1.0, 2.0, 17.0, 15.0, 16.0 });
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 1.0, 2.0, 17.0, 15.0, 16.0 });

    TreeNode<double> *firstRightLeftestNodeOfS2 = FindRightestNode(subTree2->left);
    assert(firstRightLeftestNodeOfS2 != NULL);
    firstRightLeftestNodeOfS2->left = subTree4;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //               16           11      14
    //                         21
    //                    19     22
    //                       20       23
    path = FindTheDeepestPath(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 10.0, 11.0, 21.0, 19.0, 20.0 });
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 10.0, 11.0, 21.0, 19.0, 20.0 });

    TreeNode<double> *rightestNodeOfS2 = FindRightestNode(subTree2);
    assert(rightestNodeOfS2 != NULL);
    for (size_t i = 0; i < 8; ++i) {
        rightestNodeOfS2->right = new TreeNode<double>(100.0 + i);
        rightestNodeOfS2 = rightestNodeOfS2->right;
    }
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10     13
    //              16           11      14
    //                        21              100
    //                    19     22            101
    //                      20       23          102
    //                                                  103
    //                                                     104
    //                                                        105
    //                                                          106
    //                                                            107
    // clean up
    path = FindTheDeepestPath(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 13.0, 14.0, 100.0, 101.0,
                               102.0, 103.0, 104.0, 105.0, 106.0, 107.0 });
    path = FindTheDeepestPath_R(root);
    TestResult<double>(path, { 3.0, 4.0, 5.0, 12.0, 13.0, 14.0, 100.0, 101.0,
                               102.0, 103.0, 104.0, 105.0, 106.0, 107.0 });

   // clean up
    DeleteTree_TD(&root);
}

Monday, 31 October 2016

Data Structure and Algorithm - Detect Missing Numbers in a Large Set

1. Problem Description
This is an Amazon interview question for senior software development engineer from careercup. Here is the original thread,
"
Given an array of 1 billion numbers with just a few 100 missing, find all the missing numbers. you have only enough memory to store only 10k elements
PS October 07, 2016 in United States Report Duplicate | Flag ".


2. Data Structure and Algorithm
Assuming that 1 billion numbers are given starting from 1. The 10K memory is in unit of C++ 32-bit integer.

The first approach:
The idea is very simple. Simply go through the number stream and detect numbers from 1, 2, ..., until 1 billion. 10K integer and use each bit to represent one number, therefore total range is 32K
    - Detect number 1, 2, ..., 32K
    - Detect number 32K+1, 32K+2, ..., 64K
    - Detect number 64K+1, 64K+2, ..., 96K
    - Keep until reach 1 billion

The computation complexity is O(N*(N/M), where
    -  N is the size of number stream and
     - M is the size of memory

The second approach:
Comparing with the first approach, instead of searching numbers one by one in the number stream this approach is to narrow down the range in missing number. Until the range is not greater than the given memory, then linear search the number stream to find out the missing number in the range.
    - Take the mid number and missing number in the first and second half.
         Set MID = (LowerBound+UpperBound)/2
         Start with (1+1000000000)/2 = 500000000
    - Find out count of number <= MID and count of number > MID
        [LowerBound, MID] and [MID+1, UpperBound]
    - Discard the range, if there is no missing number in this range
        If count of number in [LowerBound, MID] is equal to MID - LowerBound + 1
        If count of number in [MID+1, UpperBound] is equal to UpperBound - MID
    - Keep above 3 steps until the range is under the mem size
        In this case until the range is under 32K
    - Go through each range to find out the missing number.

The computation complexity is O(N(P + log(N/M)), where
    - N is the size of number stream
    - M is the size of memory
    - P is the size of missing number
Of course this approach takes extra memory to save the P ranges.

The third approach:
This is an improvement of the second approach. Instead of dividing the stream into 2 parts this approach is to directly divided the number stream into N/(M << 5) parts.
The second approach guarantees that the range is under mem size (32K), however the third approach guarantee that the range is equal to mem size. Therefore reduce the number of range and then reduce the times of going through the number stream linearly to find out the missing number.

The computation complexity is O(N*P), where,
    - N is the size of number stream
    - P is the size of missing number.

Section 3 shows the C++ implementation. And a Python implementation can be found here, Data Structure and Algorithm - Detect Missing Numbers in a Large Set.

3. C++ Implementation
An interface called "NumberStream" is implemented to act as the 1 billion numbers. A derived class called "NumberGenerator" is implemented to return the numbers with range of 1 billion with missing numbers.

A loop and recursive solution are implemented for Approach 2. A loop implementation is implemented for Approach 3. The test is based on a smaller scale, with 1 million number with 100 missing and 10000/100000/1000000 memory available.

// ********************************************************************************
// Implementation
// ********************************************************************************
// NumberStream.h  -- interface to get numbers
#pragma once

#include <unordered_set>

class NumberStream
{
public:
    using MissingNumberCollection = std::unordered_set<size_t>;

    NumberStream(const size_t upperBound, const size_t missing)
        : m_UpperBound(upperBound)
        , m_Missing(missing)
    {}
    virtual ~NumberStream() {}
    virtual bool HasNext() const = 0;
    virtual size_t GetNext() const = 0;
    virtual const MissingNumberCollection& GetMissingCollection() const = 0;
    virtual void ResetToHead() const = 0;

    size_t GetUpperBound() const {
        return m_UpperBound;
    }

    size_t GetMissing() const {
        return m_Missing;
    }

protected:
    const size_t m_UpperBound;
    const size_t m_Missing;
};

// NumberGenerator -- derived class
#pragma once

#include "NumberStream.h"

#include <unordered_set>

#include <ctime>

class NumberGenerator : public NumberStream
{
public:
    NumberGenerator(const size_t upperBound, const size_t missing);
    ~NumberGenerator();

    const MissingNumberCollection& GetMissingCollection() const {
        return m_MissingNumbers;
    }

    size_t GetNext() const;
    bool HasNext() const;
    void ResetToHead() const;
    
private:
    MissingNumberCollection m_MissingNumbers;
    mutable size_t m_MissingNumberDetected;
    mutable size_t m_CurVal;
};

// NumberGenerator.cpp - implementation
#include "stdafx.h"
#include "NumberGenerator.h"

#include <random>
#include <ctime>

NumberGenerator::NumberGenerator(const size_t upperBound, const size_t missing)
    : NumberStream(upperBound, missing)
    , m_MissingNumberDetected(0)
    , m_CurVal(0)
{
    // generate random numbers as the missing number
    std::mt19937_64 mtRand(time(NULL));
    std::uniform_int_distribution<size_t> gen(1, upperBound);
    for (size_t idx = 0; idx < m_Missing;) {
        const size_t temp = gen(mtRand);
        if (m_MissingNumbers.find(temp) != m_MissingNumbers.end()) {
            continue;
        }
        m_MissingNumbers.insert(temp);
        //m_MissingNumbers.insert(idx + 1);
        ++idx;
    }
}

NumberGenerator::~NumberGenerator()
{
}

size_t NumberGenerator::GetNext() const
{
    while (++m_CurVal < m_UpperBound) {
        if (m_MissingNumbers.find(m_CurVal) != m_MissingNumbers.end()) {
            ++m_MissingNumberDetected;
        }
        else {
            break;
        }
    }
    return m_CurVal;
}

bool NumberGenerator::HasNext() const {
    return m_CurVal < m_UpperBound
        && (m_UpperBound - m_CurVal) > (m_Missing - m_MissingNumberDetected);
}

void NumberGenerator::ResetToHead() const
{
    m_CurVal = 0;
    m_MissingNumberDetected = 0;
}

// DetectMissingNumbers_Amazon.cpp
#include "NumberStream.h"
#include "NumberGenerator.h"

#include <map>
#include <queue>
#include <tuple>
#include <unordered_map>
#include <vector>

// Given range is not greater than mem size
// Linearly search number stream to find the missing number
void FindMisingNumber(const NumberStream &ns,
    const std::tuple<size_t, size_t> &missNumbers,
    const size_t MemSizeInInt,
    NumberStream::MissingNumberCollection &results)
{
    size_t lowerBound, upperBound;
    std::tie(lowerBound, upperBound) = missNumbers;
    std::vector<bool> hashTable(upperBound + 1 - lowerBound);
    ns.ResetToHead();
    size_t tmp;
    while (ns.HasNext()) {
        tmp = ns.GetNext();
        if (tmp >= lowerBound && tmp <= upperBound) {
            hashTable[tmp - lowerBound] = true;
        }
    }

    tmp = 0;
    for (auto it = hashTable.cbegin(); it != hashTable.end(); ++it, ++tmp) {
        if (*it == false) {
            results.insert(tmp + lowerBound);
        }
    }
}

// Approach 2: recursive implementation
void DetectMissingNumber_R(const NumberStream &ns,
                        const size_t MemSizeInInt,
                        const size_t lowerBound,
                        const size_t upperBound,
                        NumberStream::MissingNumberCollection &results)
{
    const size_t TotalLinearDetectSize = MemSizeInInt << 5; // int has 32 bits
    if ((upperBound - lowerBound) <= TotalLinearDetectSize) {
        // find the missing number
        FindMisingNumber(ns, 
                        std::make_tuple(lowerBound, upperBound),
                        MemSizeInInt,
                        results);
        return;
    }

    // keep divide into two
    const size_t mid = (lowerBound + upperBound) >> 1;
    ns.ResetToHead();
    size_t countOfLowerHalf = 0; // [lowerBound, mid]
    size_t countOfUpperHalf = 0; // [mid + 1, upperBound]
    size_t temp;
    while (ns.HasNext()) {
        temp = ns.GetNext();
        if (temp >= lowerBound && temp <= upperBound) {
            temp > mid ? ++countOfUpperHalf : ++countOfLowerHalf;
        }
    }

    if (countOfLowerHalf < (mid + 1 - lowerBound)) {
        /* sub problem f(lowerBound,
                         mid,
                         mid + 1 - lowerBound - countOfLowerHalf)
        */
        DetectMissingNumber_R(
            ns,
            MemSizeInInt,
            lowerBound,
            mid,
            results);
    }
    if (countOfUpperHalf < (upperBound - mid)) {
        /* sub problem f(mid + 1,
                         upperBound,
                         upperBound - mid - countOfUpperHalf)
        */
        DetectMissingNumber_R(
            ns,
            MemSizeInInt,
            mid + 1,
            upperBound,
            results);
    }
}

// Approach 2: recursive implementation entry-point
NumberStream::MissingNumberCollection DetectMissingNumber_R(
    const NumberStream & ns,
    const size_t MemSizeInInt)
{
    const size_t missing = ns.GetMissing();
    const size_t upperBound = ns.GetUpperBound();
    NumberStream::MissingNumberCollection results;
    if (missing) {
        DetectMissingNumber_R(ns, MemSizeInInt, 1, upperBound, results);
    }
    return results;
}

// Approach 2: loop implementation
NumberStream::MissingNumberCollection DetectMissingNumber(
                                                          const NumberStream & ns,
                                                          const size_t MemSizeInInt)
{
    const size_t Missing = ns.GetMissing();
    const size_t UpperBound = ns.GetUpperBound();
    const size_t TotalLinearDetectSize = MemSizeInInt << 5; // int has 32 bits
    NumberStream::MissingNumberCollection results;
    if (!Missing) {
        return results;
    }

    using SearchQueue = std::queue<std::tuple<size_t, size_t>>;
    SearchQueue sq;
    // lowerBound and upperBound inclusive [lowerBound, upperBound]
    sq.push(std::make_tuple(1, UpperBound));
    size_t lowerBound, upperBound;
    size_t tryCount = 0;
    while (!sq.empty()) {
        ++tryCount;
        std::tie(lowerBound, upperBound) = sq.front();
        if ((upperBound - lowerBound) <= TotalLinearDetectSize) {
            // find the missing number for range under mem size
            FindMisingNumber(ns, sq.front(), MemSizeInInt, results);
        }
        else {
            // keep divide into two
            const size_t mid = (lowerBound + upperBound) >> 1;
            ns.ResetToHead();
            size_t countOfLowerHalf = 0; // [lowerBound, mid]
            size_t countOfUpperHalf = 0; // [mid + 1, upperBound]
            size_t temp;
            while (ns.HasNext()) {
                temp = ns.GetNext();
                if (temp >= lowerBound && temp <= upperBound) {
                    temp > mid ? ++countOfUpperHalf : ++countOfLowerHalf;
                }
            }
                
            if (countOfLowerHalf < (mid + 1 - lowerBound)) {
                // push [lowerBound, mid]
                sq.push(std::make_tuple(lowerBound,
                    mid));
            }
            if (countOfUpperHalf < (upperBound - mid)) {
                // push [mid+1, upperBound]
                sq.push(std::make_tuple(mid + 1,
                    upperBound));
            }
        }
        sq.pop();
    }

    return results;
}

// Approach 3: divide into rang (size equal to Mem << 5)
void DivideRange(const NumberStream &ns,
    const size_t MemSizeInInt,
    const size_t lowerBound,
    const size_t upperBound,
    std::vector<std::tuple<size_t, size_t>> &divides)
{
    const size_t TotalLinearDetectSize = MemSizeInInt << 5; // int has 32 bits    
    const size_t divisions = (upperBound - lowerBound + 1) / TotalLinearDetectSize;
    std::map<size_t, size_t> ladders;
    ladders.insert(std::make_pair(lowerBound, 0));
    for (size_t idx = 1; idx < divisions; ++idx) {
        ladders.insert(
            std::make_pair(lowerBound + TotalLinearDetectSize * idx, 0));
    }
    if (divisions * TotalLinearDetectSize + lowerBound < upperBound) {
        ladders.insert(
            std::make_pair(lowerBound + TotalLinearDetectSize * divisions, 0));
    }
    
    ns.ResetToHead();
    size_t temp;
    while (ns.HasNext()) {
        temp = ns.GetNext();
        if (temp < lowerBound || temp > upperBound) {
            continue;
        }
        auto er = ladders.equal_range(temp);
        er.first != er.second ?
            ++(er.first->second) : ++((--er.first)->second);
    }

    auto it = ladders.cbegin();
    auto itEnd = ladders.cend();
    auto itNext = ladders.cbegin();
    ++itNext;
    size_t totalMissing = 0;
    for (; itNext != itEnd; ++it, ++itNext) {
        if (it->second < (itNext->first - it->first)) {
            divides.push_back(
                std::make_tuple(it->first, itNext->first - 1));
        }
        totalMissing += itNext->first - it->first - it->second;
    }
    auto itR = ladders.rbegin();
    if (itR->second < (upperBound - itR->first + 1)) {
        divides.push_back(
            std::make_tuple(itR->first, upperBound));
    }
    totalMissing += upperBound - itR->first + 1 - itR->second;
}

// Approach 3 - entry point
NumberStream::MissingNumberCollection DetectMissingNumber_Div(const NumberStream & ns,
    const size_t MemSizeInInt)
{
    const size_t Missing = ns.GetMissing();
    const size_t UpperBound = ns.GetUpperBound();
    const size_t TotalLinearDetectSize = MemSizeInInt << 5; // int has 32 bits
    NumberStream::MissingNumberCollection results;
    if (!Missing) {
        return results;
    }

    using SearchQueue = std::queue<std::tuple<size_t, size_t>>;
    SearchQueue sq;
    // lowerBound and upperBound inclusive [lowerBound, upperBound]
    sq.push(std::make_tuple(1, UpperBound));
    size_t lowerBound, upperBound;
    size_t tryCount = 0;
    while (!sq.empty()) {
        ++tryCount;
        std::tie(lowerBound, upperBound) = sq.front();
        if ((upperBound - lowerBound) <= TotalLinearDetectSize) {
            // find the missing number
            FindMisingNumber(ns, sq.front(), MemSizeInInt, results);
        }
        else {
            // keep divide into MemSizeInInt
            std::vector<std::tuple<size_t, size_t>> divides;
            DivideRange(ns, MemSizeInInt, lowerBound, upperBound, divides);
            for (auto it = divides.cbegin(); it != divides.end(); ++it) {
                std::tie(lowerBound, upperBound) = *it;
                sq.push(std::make_tuple(lowerBound, upperBound));
            }
        }
        sq.pop();
    }

    return results;
}

// ********************************************************************************
// Test
// ********************************************************************************
#include <cassert>
void Test_DetectMissingNumbers()
{
    NumberGenerator ns(1000000, 100);
    NumberGenerator::MissingNumberCollection results;

    results = DetectMissingNumber(ns, 10000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber(ns, 100000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber(ns, 1000000);
    assert(results == ns.GetMissingCollection());
}

void Test_DetectMissingNumbers_R()
{
    NumberGenerator ns(1000000, 100);
    NumberGenerator::MissingNumberCollection results;

    results = DetectMissingNumber_R(ns, 10000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber_R(ns, 100000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber_R(ns, 1000000);
    assert(results == ns.GetMissingCollection());
}

void Test_DetectMissingNumbers_Div()
{
    NumberGenerator ns(1000000, 100);
    NumberGenerator::MissingNumberCollection results;

    results = DetectMissingNumber_Div(ns, 10000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber_Div(ns, 100000);
    assert(results == ns.GetMissingCollection());

    results = DetectMissingNumber_Div(ns, 1000000);
    assert(results == ns.GetMissingCollection());
}

Thursday, 29 September 2016

Data Structure and Algorithm - Detect Word Permutation

1. Problem Description
This is a Microsoft interview question for software develop engineer from careercup. Here is the original thread,
"
A list of words is given and a bigger string given how can we find whether the string is a permutation of the smaller strings.
eg- s= badactorgoodacting dict[]={'actor','bad','act','good'] FALSE
eg- s= badactorgoodacting dict[]={'actor','bad','acting','good'] TRUE
The smaller words themselves don't need to be permuted. The question is whether we can find a ordering of the smaller strings such that if concatenated in that order it gives the larger string
One more constraint - some words from dict[] may also be left over unused
novicedhunnu September 15, 2016 in India Report Duplicate | Flag 
".

2. Data Structure and Algorithm
The solution is based on Trie data structure and algorithm. Details refer to my C++ post - Data Structure and Algorithm - Trie.

Pseudo-code:
    1. Save all word into Trie strucutre
    2. Initialize an empty queue/stack (breadth/depth-first search)
    3. Push starting index 0 into queue/stack (search from start of input string in Trie)
    4. Do until queue/stack is empty
        * Take the top value of queue/stack as idx and pop it out
        * Search input string starting from idx in Trie strucuture
            - Push ending index of any valid word starting from idx into queue/stack
            - Return TRUE, if 
                ~ reaching the end of input and 
                ~ having a valid word - [idx, endOfInputString)  in Trie
    5. Return FALSE otherwise

I have implemented this solution in both C++ and Python. The C++ implementation is provided in next section and the Python solution can be found on my Python post, Data Structure and Algorithm - Detect Word Permutation. Trie is a node-based data structure and algorithm. Therefore a recursive and non-recursive solution are implemented.

3. C++ Implementation
// ********************************************************************************
// Implementation
// ********************************************************************************
// PermutationDetector.h
#pragma once

#include <queue>
#include <string>
#include <vector>

struct DictionaryTrie;

class PermutationDetector
{
public:
    PermutationDetector();
    PermutationDetector(const std::vector<std::string> &words);
    ~PermutationDetector();

    void ConstructDict(const std::vector<std::string> &words, const bool appendMode);

    // recursive implementation
    bool IsPermutation_R(const std::string &word) const;
    // non-recursive implementation
    bool IsPermutation(const std::string &word) const;
private:
    bool IsPermutation_R(const std::string &word, std::queue<size_t> &nextSearch) const;
    DictionaryTrie *m_DictRoot;

};

// PermutationDetector.cpp
#include "PermutationDetector.h"
#include "DictionaryTrie.h"

#include <queue>

#include <cassert>

PermutationDetector::PermutationDetector()
    : m_DictRoot(nullptr)
{
}

PermutationDetector::PermutationDetector(const std::vector<std::string> &words)
    : m_DictRoot(nullptr)
{
    ConstructDict(words, false);
}

PermutationDetector::~PermutationDetector()
{
    if (m_DictRoot) {
        DestoryDictionaryTrie(m_DictRoot);
        m_DictRoot = nullptr;
    }
}

void PermutationDetector::ConstructDict(const std::vector<std::string> &words,
                                        const bool appendMode)
{
    if (!appendMode && m_DictRoot) {
        DestoryDictionaryTrie(m_DictRoot);
        m_DictRoot = nullptr;
    }
    if (!m_DictRoot) {
        m_DictRoot = new DictionaryTrie();
    }

    assert(m_DictRoot != nullptr);

    auto&& itEnd = words.rend();
    for (auto&& it = words.rbegin(); it != itEnd; ++it) {
        AddWord(m_DictRoot, it->c_str());
    }
}

bool PermutationDetector::IsPermutation_R(const std::string &word,
                                        std::queue<size_t> &nextSearch) const
{
    if (nextSearch.empty()) {
        return false;
    }

    const size_t startSearchIndex = nextSearch.front();
    const char *startSearchPtr = word.c_str() + startSearchIndex;
    nextSearch.pop();

    DictionaryTrie *tempPtr = m_DictRoot;
    size_t searchedLen = 1;
    while (*startSearchPtr != '\0' && tempPtr) {
        tempPtr = tempPtr->children[*startSearchPtr - 'a'];
        if (tempPtr && tempPtr->str) {
            // found a valid word/permutation and candidate for next search
            nextSearch.push(startSearchIndex + searchedLen);
        }
        ++searchedLen;
        ++startSearchPtr;
    }

    if (*startSearchPtr == '\0' && tempPtr && tempPtr->str) {
        return true;
    }

    // tail recursion
    return IsPermutation_R(word, nextSearch);
}

bool PermutationDetector::IsPermutation_R(const std::string &word) const
{
    if (word.empty()) {
        return false;
    }

    // switch queue/stack for breadth/depth-first search
    std::queue<size_t> nextSearch;
    nextSearch.push(0);
    return IsPermutation_R(word, nextSearch);
}

bool PermutationDetector::IsPermutation(const std::string &word) const
{
    if (word.empty()) {
        return false;
    }

    // switch queue/stack for breadth/depth-first search
    std::queue<size_t> nextSearch;
    nextSearch.push(0);
    while (!nextSearch.empty()) {
        size_t startSearchIndex = nextSearch.front();
        nextSearch.pop();
        const char *searchStartPtr = word.c_str() + startSearchIndex;
        DictionaryTrie *tempPtr = m_DictRoot;
        ++startSearchIndex;
        while (*searchStartPtr != '\0' && tempPtr) {
            tempPtr = tempPtr->children[*searchStartPtr - 'a'];
            if (tempPtr && tempPtr->str) {
                // found a valid word/permutation and candidate for next search
                nextSearch.push(startSearchIndex);
            }
            ++startSearchIndex;
            ++searchStartPtr;
        }
     
        if (*searchStartPtr == '\0' && tempPtr && tempPtr->str) {
            return true;
        }
    }

    return false;
}

// ********************************************************************************
// Test
// ********************************************************************************
void Test_PermutationDetector()
{
    PermutationDetector detector({"actor", "bad", "act", "good"});
    assert(detector.IsPermutation("badactorgoodacting") == false);
    assert(detector.IsPermutation_R("badactorgoodacting") == false);

    detector.ConstructDict({"actor", "bad", "acting", "good"}, false);
    assert(detector.IsPermutation("badactorgoodacting") == true);
    assert(detector.IsPermutation_R("badactorgoodacting") == true);
}