Monday 2 February 2015

Dynamic Programming - List All Sub-sets with Sum Equal to Zero II (Amazon Interview Question)

1. Motivation
As I discussed in my last blog, Dynamic Programming - List All Sub-sets with Sum Equal to Zero (Amazon Interview Question), the solution has exponential computation complexity and linear space complexity given to the size of the set. It is absolutely fine for the space complexity but still it is not acceptable for computation complexity. When the size of the set is over 30, the computation will be getting too much to compute.

This blog is to examine if there is any heuristic rules to guide us to search the whole solution space. For instance there is no need to go through all the exponential combinations if all the elements of the set are either negative or positive.

2. Problem Analysis
Let's say that the given set S, {X1, X2, ..., Xm}. the problem is to find all the sub-sets of S that has the sum of all elements equal to zero.
Problem statement:
    lambda1*X1 + lambda2*X2 + ... + lambda(m)*Xm = 0, where
        - X1, X2, ..., Xm are known from given set S
        - lambda1, lambda2, lambda(m) are unknown. they have two option, either 0 or 1
           lambad(i) = 0 or 1, i is within [1, M]

Because of lack of knowledge of the elements of given set, the solution on Dynamic Programming - List All Sub-sets with Sum Equal to Zero (Amazon Interview Question) can't predict which direction the sum of sub-set so far will go higher or lower if taking the next element in. (The next element could be negative, zero or positive. It means that taking this element into the sub-set could cause the sum to become less, unchanged or higher.) Therefore we have to go through all the rest of possible combinations of sub-set to find out if possible solution/sub-set exists.

Here I would like to do something about the given set in order that we can predict which direction the sum of the sub-set will go if taking the next element in. It could be potentially valuable to guide the search. What I going to do is to sort the given set and split the ordered set into 3 portions as,
    - Set Sn, with all negative elements and sorted, {X1, X2, ..., Xn}. The size of set is N.
    - Set Sz, with all zero elements {Z1, Z2, ..., Zo}, The size of set is O,
      and  Zi = 0, i within [1, O]
    - Set Sp with all positive elements and sorted, {Y1, Y2, ...., Yp}. The size of set is P
    - N, O and P are within [0, M] and have to meet N + O + P = M

Now let's take another look the problem statement. For elements in Sz we just don't care about them because they can either appear or not appear in the sub-set and their appearance does not affect the sum of the sub-sets. Now let's just focus on the non-zero elements, simply the negative and positive sets, Sn and Sp. In order to make the sum of elements of the sub-set equal to zero the sub-set must contain negative and positive elements. Now we can restate the problem.
Revised problem statement:
    lambda1*X1 + lambda*X2 + ... + lambda(n)*Xn = theta1*Y1 + ... theta2*Y2 + ... + theta(p)* Yp
        - where lambda(i) = 0 or -1 and Sigma(lambda(i)) != 0, where i within [1, N]
        - where theta(j) = 0 or 1 and Sigma(theta(j)) != 0, where j within [1, P]

What we do now is to take a sub-set of Sn (with certain combination of lambda) with sum as SUM. And then try to find the sub-set in Sp that have the sum equal to the absolute value of SUM. Bear in mind that the Sn and Sp are both sorted. When coming to find the sub-set in Sp given to SUM. If the sub-set in Sp we are holding now has sum of elements more than SUM, then we can stop searching the rest combination because Sp has sorted positive elements and taking any extra element will only make this sum of sub-set higher. These rules could be valuable to find the solutions. Here is the list of rules that I can think of,
Heuristic rules:
    - If N or P is equal to zero, then no need to do the search.
    - If O is more than 0, then it simply expands the number of sub-sets by time of (2^N - 1)
        * If no solution of sub-set with non-zero elements found, the output will be sub-sets
           that are made of only zero elements. Total will be 2^O -1 sub-sets.
        * If found solution made of Sn and Sp and assume the number of solutions is NUM, then
           arbitrary number of zero elements can be added as solutions as well. Total will be
           NUM*(2^N) sub-sets.
    - Stop search if the current sum of sub-set of Sp is more than the absolute value of SUM.

3. Solution Complexity
The heuristic rules will certainly help speeding up the searching. But it will not help the worst case. The worst case will be
    - O = 0
    - Both N and P are > 0 and < M
    - N + P = M
Therefore the total number of sub-sets of Sn will be (2^N-1) and the total number of sub-sets of Sp will be (2^P - 1). All possible combinations will be
    - (2^N - 1)*(2^P - 1) = 2^(N+P) - 2^N - 2^P + 1 = 2^M - 2^N - 2^P + 1
which is better than (2^M - 1) but not enough to swing big-O notation, still is an exponential computation complexity given to the size of set S. And the space complexity stays unchanged - linear complexity given to the size of set S.

4. C++ Implementation
- The implementation only prints out the sub-sets of non-zero elements.
- Arbitrary number of zero elements can be added into non-zero sub-sets above as solutions as well
- The count returned includes the sub-sets with/without zero elements.

// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
// Header Set.h
#pragma once

#include <vector>

class Set
{
public:
    using SET = std::vector<int>;
    using INDICE = std::vector<size_t>;

    Set(const SET& input);
    ~Set();

    size_t PrintSubSetOfSummOfAllElementsEqualZero_Heuristic();
private:
    void SplitInputData();
    size_t PrintSubSetOfSummOfAllElementsEqualZero_Heuristic_DP(
               const size_t curNegativeIndex, bool alreadyCounted);
    size_t FindSubsetWithPostitiveElements_Heuristic();
    size_t FindSubsetWithPostitiveElements_Heuristic_DP(const size_t curPositiveIndex);

    void PrintSubset();

    const SET& mInputRef;
    SET mSortedZeroElements;
    SET mSortedNegativeElements;
    SET mSortedPositiveElements;

    int mSumOfAllPositiveElements;
    size_t SIZE_OF_POSITIVE;
    size_t SIZE_OF_NEGATIVE;

    INDICE mNegativeSubsetIndices;
    int mNegativeSum;
    size_t mSizeOfNegativeSubsetIndices;
    INDICE mPositiveSubsetIndices;
    int mPositiveSum;
    size_t mSizeOfPositiveSubsetIndices;
};

// Cpp file Set.cpp
#include "Set.h"

#include <algorithm>
#include <iostream>
#include <numeric>

#include <cstdlib>

Set::Set(const std::vector<int>& input)
: mInputRef(input)
{
    SplitInputData();
}

Set::~Set()
{
}

size_t Set::PrintSubSetOfSummOfAllElementsEqualZero_Heuristic()
{
    SIZE_OF_NEGATIVE = mSortedNegativeElements.size();
    size_t count = 0;

    if (SIZE_OF_NEGATIVE > 0) {
        SIZE_OF_POSITIVE = mSortedPositiveElements.size();
        if (SIZE_OF_POSITIVE > 0) {
            const int sumOfNegative = std::accumulate(mSortedNegativeElements.begin(),
                        mSortedNegativeElements.end(), 0);
            if (abs(sumOfNegative) >= *mSortedPositiveElements.begin())
            {
                mSumOfAllPositiveElements = std::accumulate(mSortedPositiveElements.begin(),
                        mSortedPositiveElements.end(), 0);
                if (mSumOfAllPositiveElements >= abs(*mSortedNegativeElements.rbegin()))
                {
                    mNegativeSubsetIndices.assign(SIZE_OF_NEGATIVE, 0);
                    mPositiveSubsetIndices.assign(SIZE_OF_POSITIVE, 0);

                    for (size_t index = 0; index < SIZE_OF_NEGATIVE; ++index) {
                        mNegativeSubsetIndices[0] = index;
                        mNegativeSum = mSortedNegativeElements[index];
                        mSizeOfNegativeSubsetIndices = 1;
                        count += PrintSubSetOfSummOfAllElementsEqualZero_Heuristic_DP(
                                                    index + 1, false);
                    }
                }
            }
        }
    }
    const size_t SIZE_OF_ZERO = mSortedZeroElements.size();
    if (count > 0) {
        return count << SIZE_OF_ZERO;
    }
   
    if (SIZE_OF_ZERO > 0) {
        return (1 << SIZE_OF_ZERO) - 1;
    }

    return 0;
}

size_t Set::PrintSubSetOfSummOfAllElementsEqualZero_Heuristic_DP(
                     const size_t curNegativeIndex, bool alreadyCounted)
{
    size_t count = 0;
    if (!alreadyCounted) {
        count += FindSubsetWithPostitiveElements_Heuristic();
        alreadyCounted = true;
    }

    if (curNegativeIndex < SIZE_OF_NEGATIVE) {
        count += PrintSubSetOfSummOfAllElementsEqualZero_Heuristic_DP(
                        curNegativeIndex + 1, alreadyCounted);

        mNegativeSubsetIndices[mSizeOfNegativeSubsetIndices] = curNegativeIndex;
        mNegativeSum += mSortedNegativeElements[curNegativeIndex];
        ++mSizeOfNegativeSubsetIndices;
        count += PrintSubSetOfSummOfAllElementsEqualZero_Heuristic_DP(
                        curNegativeIndex + 1, false);
        mNegativeSum -= mSortedNegativeElements[curNegativeIndex];
        --mSizeOfNegativeSubsetIndices;
    }
    return count;
}

size_t Set::FindSubsetWithPostitiveElements_Heuristic()
{
    size_t count = 0;
    for (size_t index = 0; index < SIZE_OF_POSITIVE; ++index) {
        mPositiveSubsetIndices[0] = index;
        mPositiveSum = mSortedPositiveElements[index];
        mSizeOfPositiveSubsetIndices = 1;
        count += FindSubsetWithPostitiveElements_Heuristic_DP(index + 1);
    }

    return count;
}

size_t Set::FindSubsetWithPostitiveElements_Heuristic_DP(const size_t curPositiveIndex)
{
    if ((mNegativeSum + mSumOfAllPositiveElements) < 0) {
        return 0;
    }
   
    int result = mNegativeSum + mPositiveSum;
    if (result == 0) {
        PrintSubset();
        return 1;
    }

    if (result > 0) {
        return 0;
    }

    size_t count = 0;
    if (curPositiveIndex < SIZE_OF_POSITIVE) {
        count += FindSubsetWithPostitiveElements_Heuristic_DP(curPositiveIndex + 1);

        mPositiveSubsetIndices[mSizeOfPositiveSubsetIndices] = curPositiveIndex;
        mPositiveSum += mSortedPositiveElements[curPositiveIndex];
        ++mSizeOfPositiveSubsetIndices;
        count += FindSubsetWithPostitiveElements_Heuristic_DP(curPositiveIndex + 1);
        mPositiveSum -= mSortedPositiveElements[curPositiveIndex];
        --mSizeOfPositiveSubsetIndices;
    }

    return count;
}

void Set::SplitInputData()
{
    SET inputCopy(mInputRef);
    const auto SIZE_TO_RESERVE = inputCopy.size();
    mSortedNegativeElements.reserve(SIZE_TO_RESERVE);
    mSortedZeroElements.reserve(SIZE_TO_RESERVE);
    mSortedPositiveElements.reserve(SIZE_TO_RESERVE);

    std::sort(inputCopy.begin(), inputCopy.end());
    auto firstZeroIt = std::lower_bound(inputCopy.begin(), inputCopy.end(), 0);
    auto lastZeroIt = std::upper_bound(inputCopy.begin(), inputCopy.end(), 0);
    mSortedNegativeElements.assign(inputCopy.begin(), firstZeroIt);
    mSortedZeroElements.assign(firstZeroIt, lastZeroIt);
    mSortedPositiveElements.assign(lastZeroIt, inputCopy.end());
}

void Set::PrintSubset()
{
    std::cout << "{ ";
    for (size_t idx = 0; idx < mSizeOfNegativeSubsetIndices; ++idx) {
        std::cout << mSortedNegativeElements[mNegativeSubsetIndices[idx]] << ", ";
    }

    for (size_t idx = 0; idx < mSizeOfPositiveSubsetIndices; ++idx) {
        if ((idx + 1) != mSizeOfPositiveSubsetIndices) {
            std::cout << mSortedPositiveElements[mPositiveSubsetIndices[idx]] << ", ";
        }
        else {
            std::cout << mSortedPositiveElements[mPositiveSubsetIndices[idx]];
        }
    }
    std::cout << " }" << std::endl;
}

// ********************************************************************************
// TEST
// ********************************************************************************
#include "Set.h"
void TestPrintSubSetOfSummOfAllElementsEqualZero_Heuristic()
{
    {
        const SET testSet = { 1, 3, 5, 7};
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 0);
    }

    {
        const SET testSet = { -1, -3, -5, -7 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 0);
    }

    {
        const SET testSet = { -7, -8, -9, -10, 1, 2, 3 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 0);
    }

    {
        const SET testSet = { 7, 8, 9, 10, -1, -2, -3 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 0);
    }

    {
        const SET testSet = { 0, 0, 0, 0};
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 15);
    }

    {
        const SET testSet = { 5, 3, 1, 8, -8, -4 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 4);
    }

    {
        const SET testSet = { 0, 0, 0, 0 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 15);
    }

    {
        const SET testSet = { -1, 1 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 1);
    }

    {
        const SET testSet = { -1, 1, -3, 3 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 3);
    }

    {
        const SET testSet = { -1, 1, -3, 3, -4, 4 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 9);
    }

    {
        const SET testSet = { -1, 1, -2, 2, -3, 3, -4, 4 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 25);
    }

    {
        const SET testSet = { -1, 1, -2, 2, -3, 3, -4, 4, -5, 5 };
        assert(Set(testSet).PrintSubSetOfSummOfAllElementsEqualZero_Heuristic() == 75);
    }
}
// ********************************************************************************

No comments:

Post a Comment