Sunday, 12 October 2014

Data Structure and Algorithm - Find the Least Sum of Absolute Value of Difference of Each Value from Three Arrays

1. Problem Description
It is an Amazon's interview question for software engineer/developer on careercup.

"Given three arrays A,B,C containing unsorted numbers. Find three numbers a, b, c from each of array A, B, C such that |a-b|, |b-c| and |c-a| are minimum 
Please provide as efficient code as you can. 
Can you better than this ???"

2. Solution
The idea is to combine these 3 arrays into a single array and in its data structure there has to be something to record which array the value comes from originally. Then sort this big array. After sorting, go through each element, together with last two values from different arrays, to calculate the sum of the absolute values. If it is less than the minimum found so far, then update the result. For instance if the current value is from C, then calculate the sum of the absolute values with last value from A and B visited so far. If it is less, then update the result. Otherwise update last C to the current and visit the next.

Here is the pseduo-code
    1. Combine the 3 arrays into one with data structure to indicate which array the value comes from
    2. Sort this big array (of data structure) with the value from A, B and C as key
    3. Assign minAbsSum,  prevA, prevB and prevC as invalid (to track last A, B and C visited so far)
    4. Go through each element in the big array.(Loop the big array)
        4.1 If curVal is from array A, pick curVal, prevB and prevC. Go to 4.4
        4.2 If curVal is from array B, pick curVal, prevA and prevC. Go to 4.4
        4.3 If curVal is from array C, pick curVal, prevA and prevB. Go to 4.4
        4.4 Calculate the sum of the absolute values of difference
              if prevB and prevC are valid in case of 4.1
              if prevA and prevC are valid in case of 4.2
              if prevA and prevB are valid in case of 4.3
        4.5 Update the minAbsSum if absSum < minAbsSum,
        4.6 Return if minAbsSum = 0, otherwise continue

The way to calculate |a - b| + |b - c| + |c - a|
As a, b , c are numbers and we always can have the relationship of smaller, big and bigger among 3. Assuming that a >= b >= c, then |a - b| + |b - c| = |a - c| = |c-a|. This is telling us that no matter what the values of a, b and c, the sum of the absolute values is equal to 2*(max(a, b, c) - min(a, b, c)).

Optimization 1
Suppose a list of sorted values like: a1, a2, b1, c1, b2, c2, ..., ai, ai+1, ai+2, ai+3, bj, ck, ...
Any combinations that has ai+1 and a+2 can be skipped, because it will be always larger than the combinations of (ai, bj-1, ck-1) and (ai+3, bj, ck). Because
    ai+1 >= ai
    For combination of (ai+1, bj-1, ck-1) and (ai, bj-1, ck-1)
    2*(ai+1 - min(bj-1, ck-1)) >= 2*(ai - min(bj-1, ck-1)
As well,
    For combination of (ai+2, bj, ck) and (ai+3, bj, ck)
    ai+2 <= ai+3
    2*(max(bj, ck) - ai+2) >= 2*(max(bj, ck) - ai+3)
Therefore any value with the same type repeating can be ignored for calculation.

Optimization 2
Suppose Array A, B and C has size of L, M, N. And we have a list of sorted value like:
a1, a2, b1, c1, c2, ..., aL, bj, ck, ..., bM, ..., cN. In this case aL, bM and cN is th maximal value of Array A, B and C. Between bj and cK there can be any number of b but no c.
Between aL to cN, we do not need to go through every elements to calculate the sum of absolute values. We only need to calculate two combination at most.
In this case bj is the first value from Array B and ck is the first value of Array C after aL, if any of them exists. Any combination later from them will be larger than (aL, bj, cK). And one more combination needs to evaluated is (cj-1, aL, bj) in this case.
Therefor as long as hit the end of one array, then at most there are two combinations remaining to be evaluated.

Optimization 3
Because the absolute value is always not less than zero. For any combination of (a, b, c), if these three values are equal, then the minimal value has been reached - zero. There is no better combination than this, therefore stop and return the combinations.

3. C++ Implementation

// Implementation
//********************************************************************************
#include <map>
#include <vector>

#include <cassert>
#include <cmath>

struct ResultTuple{
    int val[3];          // the combination of (a, b, c)
    size_t absSum;
};

// Array A, B and C are mapped into CombinedArrayMap
// the value has type of "char" to indicate which array the key
// comes from.
typedef std::multimap<int, char> CombinedArrayMap;
typedef CombinedArrayMap::const_iterator CAMIterator;

void UpdateResult(CAMIterator& abc1,
                  CAMIterator& abc2,
                  CAMIterator& abc3,
                  size_t absSum,
                  ResultTuple& result)
{
    result.absSum = absSum;
    result.val[abc1->second - 'A'] = abc1->first;
    result.val[abc2->second - 'A'] = abc2->first;
    result.val[abc3->second - 'A'] = abc3->first;
}

void CalculateAndUpdateAbsSum(CAMIterator& itEnd,
                              CAMIterator& prev1,
                              CAMIterator& prev2,
                              CAMIterator& curVal,
                              ResultTuple& result)
{
    if (prev1 == itEnd || prev2 == itEnd || curVal == itEnd) {
        return;
    }
    // calculate the sum of the absolute values of difference
    // 2*(max(a, b, c) - min(a, b, c))
    size_t absSum = (prev1->first < prev2->first) ?
        (curVal->first - prev1->first) << 1 :
        (curVal->first - prev2->first) << 1;
    if (absSum < result.absSum) {
        UpdateResult(prev1, prev2, curVal, absSum, result);
    }
}

// Find ck in Optimization 2 in Section 2
void FindNextNotXandY(CAMIterator& itEnd,
                      const char x,
                      const char y,
                      CAMIterator& notXYIter)
{
    for (; notXYIter != itEnd; ++notXYIter) {
        if (notXYIter->second != x &&
            notXYIter->second != y) {
            break;
        }
    }
}

// Find bj in Optimization 2 in Section 2
void FindNextNotX(CAMIterator& itEnd,
                  const char x,
                  CAMIterator& notXIter)
{
    for (; notXIter != itEnd; ++notXIter) {
        if (notXIter->second != x) {
            break;
        }
    }
}

// Step 4 in psedo-code
void CalculateAndUpdateAbsSum(CAMIterator& itEnd,
                              CAMIterator& prevA,
                              CAMIterator& prevB,
                              CAMIterator& prevC,
                              CAMIterator& curABC,
                              ResultTuple& result)
{
    switch (curABC->second){
    case 'A':
        CalculateAndUpdateAbsSum(itEnd, prevB, prevC, curABC, result);
        break;
    case 'B':
        CalculateAndUpdateAbsSum(itEnd, prevA, prevC, curABC, result);
        break;
    case 'C':
        CalculateAndUpdateAbsSum(itEnd, prevA, prevB, curABC, result);
        break;
    default:
        assert(false);
    }
}

// Optimization 2 in Section 2
void FindLastTwoCombinationsAndUpdate(CAMIterator& itEnd,
                                      CAMIterator& prev1,
                                      CAMIterator& prev2,
                                      CAMIterator& curIter,
                                      ResultTuple& result)
{
    CAMIterator notXIter = curIter;
    FindNextNotX(itEnd, curIter->second, notXIter);
    if (notXIter != itEnd) {
        if (prev1 != itEnd && prev1->second != notXIter->second) {
            CalculateAndUpdateAbsSum(itEnd, prev1, curIter, notXIter, result);
        }
        if (prev2 != itEnd && prev2->second != notXIter->second) {
            CalculateAndUpdateAbsSum(itEnd, prev2, curIter, notXIter, result);
        }

        CAMIterator notXYIter = notXIter;
        FindNextNotXandY(itEnd, curIter->second,
            notXIter->second, notXYIter);
        CalculateAndUpdateAbsSum(itEnd, curIter, notXIter, notXYIter, result);
    }
}

ResultTuple CalculateLeastAbsSum(const std::vector<int>& aVec,
                                 const std::vector<int>& bVec,
                                 const std::vector<int>& cVec)
{
    CombinedArrayMap valueVecMap;

    for (auto val : aVec) {
        valueVecMap.insert(std::make_pair(val, 'A'));
    }

    for (auto val : bVec) {
        valueVecMap.insert(std::make_pair(val, 'B'));
    }

    for (auto val : cVec) {
        valueVecMap.insert(std::make_pair(val, 'C'));
    }

    CAMIterator itStart = valueVecMap.begin();
    CAMIterator itEnd = valueVecMap.end();
    CAMIterator prevA = itEnd;
    CAMIterator prevB = itEnd;
    CAMIterator prevC = itEnd;
    char prevType = 0;
    size_t SizeOfA = aVec.size();
    size_t SizeOfB = bVec.size();
    size_t SizeOfC = cVec.size();
    size_t visitedA = 0;
    size_t visitedB = 0;
    size_t visitedC = 0;

    ResultTuple result = { { INT_MAX, INT_MAX, INT_MAX }, SIZE_MAX }; // SIZE_MAX?
    for (CAMIterator curIter = itStart; curIter != itEnd; ++curIter) {
        // Optimization 1 in Section 2
        if (prevType != curIter->second) {
            CalculateAndUpdateAbsSum(itEnd, prevA, prevB, prevC, curIter, result);
            // Optimization 3 in Section 2
            if (result.absSum == 0) {
                return result;
            }
        }

        if (curIter->second == 'A') {
            prevA = curIter;
            if (++visitedA == SizeOfA) {
                // Optimization 2 in Section 2
                FindLastTwoCombinationsAndUpdate(itEnd, prevB, prevC,
                                                 curIter, result);
                break;
            }
        }
        else if (curIter->second == 'B') {
            prevB = curIter;
            if (++visitedB == SizeOfB) {
                // Optimization 2 in Section 2
                FindLastTwoCombinationsAndUpdate(itEnd, prevA, prevC,
                                                 curIter, result);
                break;
            }
        }
        else if (curIter->second == 'C') {
            prevC = curIter;
            if (++visitedC == SizeOfC) {
                // Optimization 2 in Section 2
                FindLastTwoCombinationsAndUpdate(itEnd, prevA, prevB,
                                                 curIter, result);
                break;
            }
        }
        else {
            assert(false);
        }
        prevType = curIter->second;
    }

    return result;
}
//********************************************************************************

// Test
//********************************************************************************
void testCase1()
{
    std::vector<int> aVec = { 9, 7, 5, 7, 4, 8, 12, 30, 50 };
    std::vector<int> bVec = { 30, 5, 9, 20, 35, 70, 50, 12 };
    std::vector<int> cVec = { 8, 10, 30, 21, 6, 3, 6, 10, 0 };

    ResultTuple result = CalculateLeastAbsSum(aVec, bVec, cVec);
    assert(result.absSum == 0);
    assert(result.val[0] == 30);
    assert(result.val[1] == 30);
    assert(result.val[2] == 30);
}

void testCase2()
{
    std::vector<int> aVec = { 9, 7, 6, 8, 4, 5, 3, 1, 2 };
    std::vector<int> bVec = { 20, 10, 19, 12, 15, 17, 14, 12 };
    std::vector<int> cVec = { 30, 31, 21, 40, 25, 23, 36, 40, 50 };

    ResultTuple result = CalculateLeastAbsSum(aVec, bVec, cVec);
    assert(result.absSum == 24);
    assert(result.val[0] == 9);
    assert(result.val[1] == 10);
    assert(result.val[2] == 21);
}

void testCase3()
{
    // 3 vectors has 4 and 9, this case should return 4
    std::vector<int> aVec = { 9, 7, 6, 8, 4, 5, 3, 1, 2 };
    std::vector<int> bVec = { 20, 4, 19, 2, 15, 17, 9, 12 };
    std::vector<int> cVec = { 3, 13, 9, 40, 25, 23, 6, 4, 5 };

    ResultTuple result = CalculateLeastAbsSum(aVec, bVec, cVec);
    assert(result.absSum == 0);
    assert(result.val[0] == 4);
    assert(result.val[1] == 4);
    assert(result.val[2] == 4);
}

void testCase4()
{
    // 3 vectors has two sets result
    // (90, 89, 91) and (3, 4, 5)
    // in this case, it should return (3, 4, 5)
    std::vector<int> aVec = { 90, 3, 16, 28, 45, 35, 63, 71, 82 };
    std::vector<int> bVec = { 89, 4, 19, 26, 48, 37, 69, 72, 86 };
    std::vector<int> cVec = { 91, 5, 15, 29, 43, 33, 66, 74, 85 };

    ResultTuple result = CalculateLeastAbsSum(aVec, bVec, cVec);
    assert(result.absSum == 4);
    assert(result.val[0] == 3);
    assert(result.val[1] == 4);
    assert(result.val[2] == 5);
}

void testCase5()
{
    std::vector<int> aVec = { 90, 83, 16, 28, 45, 35, 63, 71, 3 };
    std::vector<int> bVec = { 95, 88, 19, 26, 48, 37, 69, 72, 1 };
    std::vector<int> cVec = { 91, 85, 15, 29, 43, 33, 66, 74, 2 };

    ResultTuple result = CalculateLeastAbsSum(aVec, bVec, cVec);
    assert(result.absSum == 4);
    assert(result.val[0] == 3);
    assert(result.val[1] == 1);
    assert(result.val[2] == 2);
}
//********************************************************************************

No comments:

Post a Comment