Wednesday 10 August 2016

Data Structure and Algorithm - Find Sub-array with Sum in Given Range

1. Problem Description
This is a Google interview question for software engineers from careercup. Here is the original description from the thread,
"
Write a function that takes as input an array of integers A, and two integers low and high.

Your function has to output pairs of indices: {(i,j), ...}

Where each pair of indices denotes that the subarray of A[i...j] has a sum in the range low <= sum <= high.

Apparently there are algorithms better than O(N^2).
- / July 31, 2016 in United States | Report Duplicate | Flag 
".

2. Data Structure and Algorithm
Given a list of input, then the all the sums of sub-arrays staring from the first element can be constructed. Sort sums and find the sub-arrays with sum in range of lower and upper bounds. From now on the idea is not to construct the sums array of starting from the second element and so on. Because it can be deduced from the sum array from the first element.
Given lower bound LB, upper bound UB and an input array a1, a2, ..., an, assume that the sum array of the first element is s1, s2, s3, ..., sn, then the sum array of the second element is s2-a1, s3- a1, ..., sn - a1 then find LB and UB in sorted array of s2-a1, s3-a2, ..., sn-a1. Instead of constructing the new sum array of (s2-a1, s3-a1, ..., sn-a1) and searching LB and UB, search the updated bounds of LB+a1 and UB+a1 in sorted array of s2, s3, ..., sn.
In this way the sum array is only need to construct once and all the rest of searches are binary for each element in the input array.

Pseudo-code:
Given an input array A - a1, a2, ..., an, upper bound LB and upper bound UB,   
    1. Initialize the sums of sub-arrays starting with a1 as S with the following structure
        - Sum of subarray from a1 to aj, where j in [1, n], and the ending index (endIdx)
        - (s1, endIdx1) = (a1, 1)
        - (s2, endIdx2) = (a1 + a2, 2)
        - (s3, endIdx3) = (a1 + a2 + a3, 3)
        ......
        - (sn, endIdxn) = (a1 + a2 + ...... + an, n)
    2. Sort S as SortedS with the following less comparator (in std::set)
        - Return true, if si < sj
        - Return true, if si == sj and i < j
        - Return false otherwise
        , where si is the LHS and sj is the RHS
    3. Loop from start to end in A with startIdx = 0
        - Find lower bound of (LB, 0) in SortedS as lbIt
        - Find upper bound of (UB, n+1) in SortedS as ubIt
        - For each element resultIt in [lbIt, ubIt),
            * Save pair (startIdx, resultIt->endIdx)
        - LB = LB + A[startIdx]
        - UB = UB + A[startIdx]
        - Rememove S[startIdx] from SortedS

Complexity:
    - O(NlogN) computation complexity
    - O(N) space complexity

3. C++ Implementation
// ********************************************************************************
// Implementation
// ********************************************************************************
#include <set>
#include <tuple>
#include <vector>
struct SumAndEndIndex{
    SumAndEndIndex()
        : sum(0.0)
        , index(0)
    {}
    SumAndEndIndex(const double s, const size_t i)
        : sum(s)
        , index(i)
    {}
    double sum;
    size_t index;
};
struct SumAndEndIndexCmpLess
{
    bool operator()(const SumAndEndIndex *lhs,
                    const SumAndEndIndex *rhs) const
    {
        return lhs->sum < rhs->sum ||
            (lhs->sum == rhs->sum && lhs->index < rhs->index);
    }
};
using SumAndEndIndexVec = std::vector<SumAndEndIndex>;
using SumAndEndIndexSet = std::set<const SumAndEndIndex*, SumAndEndIndexCmpLess>;
void CreateSumAndEndIndexVec(const std::vector<double> &input,
                             SumAndEndIndexVec &seiv)
{
    if (input.empty()) {
        SumAndEndIndexVec().swap(seiv);
        return;
    }
    SumAndEndIndexVec tempVec;
    tempVec.reserve(input.size());
    auto itEnd = input.cend();
    double sum = 0;
    size_t index = 0;
    for (auto it = input.cbegin(); it != itEnd; ++it, ++index) {
        sum = sum + *it;
        tempVec.push_back(SumAndEndIndex(sum, index));
    }
    tempVec.swap(seiv);
}
using PairResultVect = std::vector<std::tuple<size_t, size_t>>;
PairResultVect FindSubArrayWithSumInRange(const std::vector<double> &input,
                                          const double lowerBound,
                                          const double upperBound,
                                          const SumAndEndIndexVec &sumAndIndexVec,
                                          SumAndEndIndexSet &sortedSumAndIndices)
{
    PairResultVect result;
    auto itEnd = input.cend();
    SumAndEndIndex seiLB(lowerBound, 0);
    SumAndEndIndex seiUB(upperBound, input.size());
    size_t startIndex = 0;
    auto itVec = sumAndIndexVec.cbegin();
    for (auto it = input.cbegin(); it != itEnd; ++it, ++startIndex, ++itVec) {
        auto lbIt = sortedSumAndIndices.lower_bound(&seiLB);
        auto ubIt = sortedSumAndIndices.upper_bound(&seiUB);
        for (auto foundIt = lbIt; foundIt != ubIt; ++foundIt) {
            result.push_back(std::make_tuple(startIndex, (*foundIt)->index));
        }
        seiLB.sum += *it;
        seiUB.sum += *it;
        sortedSumAndIndices.erase(&(*itVec));
    }
    return result;
}
PairResultVect FindSubArrayWithSumInRange(const std::vector<double> &input,
                                          const double lowerBound,
                                          const double upperBound)
{
    if (input.empty() || lowerBound > upperBound) {
        return PairResultVect();
    }
    SumAndEndIndexVec sumAndIndexVec;
    CreateSumAndEndIndexVec(input, sumAndIndexVec);
    SumAndEndIndexSet sortedSumAndIndices;
    auto itEnd = sumAndIndexVec.cend();
    for (auto it = sumAndIndexVec.cbegin(); it != itEnd; ++it) {
        sortedSumAndIndices.insert(&(*it));
    }
    return FindSubArrayWithSumInRange(input,
                                      lowerBound,
                                      upperBound,
                                      sumAndIndexVec,
                                      sortedSumAndIndices);
}

// ********************************************************************************
// Test
// ********************************************************************************
#include <cassert>
void Test_FindSubArrayWithSumInRange()
{
    PairResultVect result;
    {
        const std::vector<double> input;
        result = FindSubArrayWithSumInRange(input, 0, 1);
        assert(result.empty() == true);
    }
    {
        const std::vector<double> input = { 1.0, 2.0, 3.0, 4.0,
                                            5.0, 6.0, 7.0, 8.0,
                                            9.0, 10.0 };
        result = FindSubArrayWithSumInRange(input, 20, 15);
        assert(result.empty() == true);
        result = FindSubArrayWithSumInRange(input, -10, 0);
        assert(result.empty() == true);
        result = FindSubArrayWithSumInRange(input, 15.0, 20.0);
        assert(result.size() == 8);
    }
    {
        const std::vector<double> input = {-1, 1, -1, 1, -1, 1};
        result = FindSubArrayWithSumInRange(input, -1, 1);
        assert(result.size() == 21);
    }
}