Thursday 21 January 2016

Dynamic Programming - Find If A Given Tree Has Duplicate Sub-Tree (Part II)

1. Problem Description
This blog entry is to present an alternative solution to the one proposed on Dynamic Programming - Find If A Given Tree Has Duplicate Sub-Tree (Part I). Please refer to it for the original problem.

The original idea is coming from one of my colleagues. When we were in the afternoon coffee, I mentioned what I usually do on the train and started to talk about this problem. My colleagues showed interest and one of them presented the idea of calculating the hash value of each node. And I further refined the solution and afterwards we had another brief discussion. (Just cannot image how smart they are.)

2. Algorithm
The idea is to calculate the hash code for each node and this hash value should have all information of its sub-tree. If there are duplicate sub-tree existing in the given tree, then two nodes must have the same hash value. But the same hash value does not guarantee the same sub-tree. It seems necessary to keep further node-by-node checking against those nodes calculated with the same hash value.

Pseudo-code:
1. For each node calculate the hash code
    - HashKey = 1009*currentNodeValue + 2909*HashValueOfLeft + 4021*HashValueOfRight
        * HashValueOfLeft = 0 if left child is not present
        * HashValueOfRight = 0 if right child is not present
        * Calculate the hash code based on HashKey
    - Use post-order traverse
2. Save the result into a hash map with <HashKey, HashSetOfTreeNode>
3. Go through the above hash map
    - Skip any sizeof(HashSetOfTreeNode) == 1
    - Otherwise check node-by-node of the entries in HashSetOfTreeNode
        * Skip entries that have less than 2 children
        * Return true if any two have exactly same nodes
4. Otherwise return false  

Prime number:
The reason why multiple different prime number (1009/2909/4021) with its value, HashValueOfLeft and HashValueOfRight is because it helps to remove the false entries. Example,
               1       |                                      1              |
          2            |                                           2         |
    3                  | Tree1                                     3     | Tree2

Tree1 has only left branch and Tree2 has only the right branch. Multiple left and right with different value will distinguish the left branch and right branch. This means that Tree1 and Tree2 are expected to have different hash value. (If multiply with the same value, the above example will expect Tree1 and Tree2 to have same hash value.) As for why choose 1009/2909/4021, it is arbitrary. The prime number seems well suited when hash function is involved.
But this does not remove all the possibility of different nodes but with same hash code. Just think of the equation of,
    1009*X+2909*Y+4021*Z = A, where X, Y and Z are variables and A is given number.

If A can be divided by (1009*2909*4021), there will be multiple solutions for X, Y and Z 
But 1009*2909*4021 is a really large number and the number of A (divisible by 1009*2909*4021) is relatively low. Choosing reasonable large prime numbers is expected to reduce the hash code collision.

3. C++ Implementation
// ********************************************************************************
// Implementation
// ********************************************************************************
template <typename T>
size_t CalculateHashMapOfTree_R(TreeNode<T>* root,
                                std::unordered_map<size_t,
                                std::unordered_set<TreeNode<T>*>> &hm)
{
    const size_t PrimeOfCurNode = 1009;
    const size_t PrimeOfLeftNode = 2909;
    const size_t PrimeOfRightNode = 4021;
    if (!root) {
        return 0;
    }

    // post-order traverse
    // Step 1 and 2
    size_t hashKey = *root->data * PrimeOfCurNode;
    if (root->left) {
        hashKey += PrimeOfLeftNode * CalculateHashMapOfTree_R(root->left, hm);
    }
    if (root->right) {
        hashKey += PrimeOfRightNode * CalculateHashMapOfTree_R(root->right, hm);
    }

    size_t hashValue = std::hash<size_t>{}(hashKey);
    auto foundIt = hm.find(hashValue);
    if (foundIt == hm.end()) {
        std::unordered_set<TreeNode<T>*> entry;
        entry.insert(root);
        hm.insert(std::make_pair(hashValue, entry));
    }
    else {
        std::unordered_set<TreeNode<T>*>& refEntry = foundIt->second;
        if (refEntry.find(root) == refEntry.end()) {
            refEntry.insert(root);
        }
    }
    return hashValue;
}

template <typename T>
bool FindDuplicateTreeInEntries(const std::unordered_set<TreeNode<T>*> entries,
                                TreeNode<T>* &entry1,
                                TreeNode<T>* &entry2)
{
    auto itEnd = entries.end();
    for (auto it = entries.begin(); it != itEnd; ++it) {
        auto itInner = it;
        ++itInner;
        for (; itInner != itEnd; ++itInner) {
            // enforce the rule of sub-tree
            if (MoreThanNNodes_R(*it, 2) && 
                MoreThanNNodes_R(*itInner, 2) && 
                CompareTwoTrees_R(*it, *itInner)) {
                entry1 = *it;
                entry2 = *itInner;
                return true;
            }
        }
    }

    return false;
}

template <typename T>
bool HaveDuplicateSubTree2_R(TreeNode<T> *root, TreeNode<T>* &entry1, TreeNode<T>* &entry2)
{
    std::unordered_map<size_t, std::unordered_set<TreeNode<T>*>> nodeHM;
    // Step 1 and 2: calculate the hash value and save it
    CalculateHashMapOfTree_R(root, nodeHM);
    auto itEnd = nodeHM.end();
    // Step 3: node-by-node check for those with same hash value
    for (auto it = nodeHM.begin(); it != itEnd; ++it) {
        std::unordered_set<TreeNode<T>*>& entryRef = it->second;
        if (FindDuplicateTreeInEntries(entryRef, entry1, entry2)) {
            return true;
        }
    }
    // Step 4
    return false;
}

// ********************************************************************************
// Test
// ********************************************************************************
void TestFindDuplicateSubTree2()
{
    {
        TreeNode<int>* root = nullptr;
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
    }

    {
        const std::vector<int> testInput = { 1, 2 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        const std::vector<int> subTreeInput = { 100, 101 };
        auto subTree1 = ConstructTreeRecursive(subTreeInput);
        auto subTree2 = ConstructTreeRecursive(subTreeInput);
        assert(GetTreeSize_R(subTree1) == subTreeInput.size());
        assert(GetTreeSize_R(subTree2) == subTreeInput.size());
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root);
        auto rightestNode = FindRightestNode(root);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 1);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        const std::vector<int> subTreeInput = { 100, 101, 102 };
        auto subTree1 = ConstructTreeRecursive(subTreeInput);
        auto subTree2 = ConstructTreeRecursive(subTreeInput);
        assert(GetTreeSize_R(subTree1) == subTreeInput.size());
        assert(GetTreeSize_R(subTree2) == subTreeInput.size());
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root);
        auto rightestNode = FindRightestNode(root);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 1);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == true);
        assert(entry1 != nullptr);
        assert(entry2 != nullptr);
        assert(CheckTwoTreesWithSameValues(entry1, subTree1));
        assert(CheckTwoTreesWithSameValues(entry2, subTree2));
        assert(CheckTwoTreesWithSameValues(entry1, entry2));

        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        const std::vector<int> subTreeInput = { 100, 101, 102 };
        auto subTree1 = ConstructTreeRecursive(subTreeInput);
        auto subTree2 = ConstructTreeRecursive(subTreeInput);
        assert(GetTreeSize_R(subTree1) == subTreeInput.size());
        assert(GetTreeSize_R(subTree2) == subTreeInput.size());
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->left);
        auto rightestNode = FindRightestNode(root->left);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 1);
        assert(*rightestNode->data == 4);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == true);
        assert(entry1 != nullptr);
        assert(entry2 != nullptr);
        assert(CheckTwoTreesWithSameValues(entry1, subTree1));
        assert(CheckTwoTreesWithSameValues(entry2, subTree2));
        assert(CheckTwoTreesWithSameValues(entry1, entry2));

        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1;
        TreeNode<int>* entry2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        const std::vector<int> subTreeInput = { 100, 101, 102 };
        auto subTree1 = ConstructTreeRecursive(subTreeInput);
        auto subTree2 = ConstructTreeRecursive(subTreeInput);
        assert(GetTreeSize_R(subTree1) == subTreeInput.size());
        assert(GetTreeSize_R(subTree2) == subTreeInput.size());
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->right);
        auto rightestNode = FindRightestNode(root->right);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 6);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == true);
        assert(entry1 != nullptr);
        assert(entry2 != nullptr);
        assert(CheckTwoTreesWithSameValues(entry1, subTree1));
        assert(CheckTwoTreesWithSameValues(entry2, subTree2));
        assert(CheckTwoTreesWithSameValues(entry1, entry2));

        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1 = nullptr;
        TreeNode<int>* entry2 = nullptr;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        const std::vector<int> subTreeInput1 = { 100, 101, 102 };
        const std::vector<int> subTreeInput2 = { 102, 101, 100 };
        auto subTree1 = ConstructTreeRecursive(subTreeInput1);
        auto subTree2 = ConstructTreeRecursive(subTreeInput2);
        assert(GetTreeSize_R(subTree1) == subTreeInput1.size());
        assert(GetTreeSize_R(subTree2) == subTreeInput2.size());
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->right);
        auto rightestNode = FindRightestNode(root->right);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 6);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        assert(entry1 == nullptr);
        assert(entry2 == nullptr);
        
        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1 = nullptr;
        TreeNode<int>* entry2 = nullptr;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        TreeNode<int>* subTree1 = new TreeNode<int>(1000);
        subTree1->left = new TreeNode<int>(1001);
        subTree1->left->left = new TreeNode<int>(1002);
        TreeNode<int>* subTree2 = new TreeNode<int>(1000);
        subTree2->right = new TreeNode<int>(1001);
        subTree2->right->right = new TreeNode<int>(1002);
        assert(GetTreeSize_R(subTree1) == 3);
        assert(GetTreeSize_R(subTree2) == 3);
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->right);
        auto rightestNode = FindRightestNode(root->right);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 6);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        assert(entry1 == nullptr);
        assert(entry2 == nullptr);

        DeleteTree(&root);
    }
    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1 = nullptr;
        TreeNode<int>* entry2 = nullptr;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        TreeNode<int>* subTree1 = new TreeNode<int>(1000);
        subTree1->left = new TreeNode<int>(1001);
        subTree1->right = new TreeNode<int>(1002);
        TreeNode<int>* subTree2 = new TreeNode<int>(1000);
        subTree2->right = new TreeNode<int>(1001);
        subTree2->right->right = new TreeNode<int>(1002);
        assert(GetTreeSize_R(subTree1) == 3);
        assert(GetTreeSize_R(subTree2) == 3);
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->right);
        auto rightestNode = FindRightestNode(root->right);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 6);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        assert(entry1 == nullptr);
        assert(entry2 == nullptr);

        DeleteTree(&root);
    }

    {
        const std::vector<int> testInput = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        auto root = ConstructTreeRecursive(testInput);
        assert(root != nullptr);
        assert(GetTreeSize_R(root) == testInput.size());
        TreeNode<int>* entry1 = nullptr;
        TreeNode<int>* entry2 = nullptr;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);

        TreeNode<int>* subTree1 = new TreeNode<int>(1000);
        subTree1->right = new TreeNode<int>(1002);
        subTree1->right->right = new TreeNode<int>(1001);
        TreeNode<int>* subTree2 = new TreeNode<int>(1000);
        subTree2->right = new TreeNode<int>(1001);
        subTree2->right->right = new TreeNode<int>(1002);
        assert(GetTreeSize_R(subTree1) == 3);
        assert(GetTreeSize_R(subTree2) == 3);
        assert(CheckTwoTreesWithSameValues(subTree1, subTree2) == true);

        auto leftestNode = FindLeftestNode(root->right);
        auto rightestNode = FindRightestNode(root->right);
        assert(leftestNode != nullptr);
        assert(leftestNode->left == nullptr);
        assert(rightestNode != nullptr);
        assert(rightestNode->right == nullptr);
        assert(*leftestNode->data == 6);
        assert(*rightestNode->data == 10);

        leftestNode->left = subTree1;
        rightestNode->right = subTree2;
        assert(HaveDuplicateSubTree2_R(root, entry1, entry2) == false);
        assert(entry1 == nullptr);
        assert(entry2 == nullptr);

        DeleteTree(&root);
    }
}

No comments:

Post a Comment