Saturday 10 January 2015

Dynamic Programming - Minimize the SUM of Difference between Two Arrays with Constraint : Google Interview Question

1. Problem Description
This is a Google Interview Question for Software Developer/Engineer from carrercup. Here is the original problem description from the thread.

"Given an integer array, adjust each integers so that the difference of every adjcent integers are not greater than a given number target. 

If the array before adjustment is A, the array after adjustment is B, you should minimize the sum of |A[i]-B[i]| 


You can assume each number in the array is a positive integer and not greater than 100 


Given [1,4,2,3] and target=1, one of the solutions is [2,3,2,3], the adjustment cost is 2 and it's minimal. Return 2."

2.Analysis
The problem requires to find an array B which has the same size as A and has the minimal sum of distance between each element at the same position of A and B. Each element of B is constraint to the condition of its distance to its two neighbors (except the top and end) not greater than a given target distance.

Let's restate in mathematical term. Assume array A with size of N and array B has the same size and the minimal optimization problem can be stated as,
Objective function: Min(Sigma(|Ai - Bi|)), 
                                where i is within [0, N-1]
Constraint: |Bj - Bj-1| <= TARGET,
                   where j is within [1, N-1]
Known: A and TARGET

3. Solution
This is an optimization problem and a few questions need to answer.
    1. The search space
    2. The constraints
    3. How to search (heuristic rules or brutal force)

Search space:
The search space is easy to find out. Given the array A the search space for the each element of Bi should be limited between the minimal value of A and the maximal value of A, [MinA, MaxA]. Because we are computing value of absolute value of Ai - Bi. Any Bi over the maximal value of A or under the minimal value of A can always be found a replacement between the minimal and maximal value of A, because absolute value works out the same result from two directions.

Constraint and Computation Complexity:
Now we worked out the search space. If not considering any effect of the constraint on the search space, then we would have exponential computation complexity. Suppose the difference between minimal and maximal value of A is M (equal to MaxA - MinA + 1). For N elements, then we have M power of N possible combinations, O(M^N). This is very bad if we decide to use brutal force to solve this problem. Therefore let's examine how the constraints on B shape the search space.

The constraint is |Bi-1 - Bi| <= TARGET. For a given value of Bi-1, as X, then what is the search space of Bi. Based on the constraint the search space of Bi is [max(X - TARGET, MinA), min(X+TARGET, MaxA)], given the value of Bi-1 as X. Therefore for each given value of Bi-1, Bi has maximally 2*TARGET choices, more precisely (min[X+TARGET, MaxA] - max(X-TARGET, MinA) + 1), as Y. Then the complexity will be O(M*Y^(N-1)), where Y is not greater than M.
Explanation:
    B0 has M choices
    B1 has Y choices for any given value of B0
    B2 has Y choices for any given value of B1
    ......

This is still not good enough. Let's take another look what has been searched during two value X and X+1 of Bi-1.
    Bi-1 = X: the choices of Bi is [max(X - TARGET, MinA), min(X+TARGET, MaxA)]
    Bi-1 = X+1: the choices of Bi is [max(X + 1 - TARGET, MinA), min(X+TARGET + 1, MaxA)]
*) When both upper bound  are clampped to MaxA, then Bi has the same search space, when the value of Bi-1 increase from X to X+1.
*) Otherwise Bi has one extra choice, (X+TARGET+1), when the value of Bi-1 increases from X to X+1.
So caching the result will hugely boost the performance.

As this is an optimization problem, optimizer solvers are good choices, for example usingMatlab. Here I would like to use brutal force to solve this issue with dynamic programming. It does not have to be dynamic programming. The reason why I chose it is because it is simply easier to formulate the problem and use the caching techniques to save the searched space.
The sub-problem:
    - F(n, j) = | A0 - Xj| + F(n-1, j) ,
          where n is size of Array A
                     Xj is one of values of B0, which is bounded in the search space [MinA, MaxA]
                     F(n-1) = Sigma(|Ak - Bk|), where k is within [1, N-1], and B1 is bounded as
                                                                      [Max(B0-TARGET, MinA), Min(B0+TARGET, MaxA)]
    - Min(F(n, j)), when F(n, j) is computed with Xj within [MinA, MaxA]

Corner case:
The sum of absolute value of difference will be zero, when (MaxA-MinA) <= TARGET, because simply take B equal to A, which meets the constraint and sum of absolute values can't be less than 0.

4. C++ Implementation
* Dynamic programming
* Caching technique

// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
#include <algorithm>
#include <limits>
#include <unordered_map>

#include <cmath>

struct HashKeyPair
{
    HashKeyPair(size_t i, ptrdiff_t v) :
        index(i), value(v)
    {}
    size_t index; // index of i, |Ai - Bi|
    ptrdiff_t value; // searching value of Bi

    bool operator==(const HashKeyPair& hkp) const {
        return index == hkp.index &&
               value == hkp.value;
    }
};

struct HashKeyPairHashFunc
{
    size_t operator()(const HashKeyPair& hkp) {
        return std::hash<size_t>()(hkp.index) ^
               std::hash<ptrdiff_t>()(hkp.value);
    }

    bool operator()(const HashKeyPair& hkp1, const HashKeyPair& hkp2) const {
        return hkp1 == hkp2;
    }

};
typedef size_t MinSigmaOfAbsDiff;
typedef std::unordered_map<HashKeyPair, MinSigmaOfAbsDiff,
                           HashKeyPairHashFunc, HashKeyPairHashFunc> HashMap;

ptrdiff_t GetMinAbsValue(const ptrdiff_t ai, const ptrdiff_t target, const ptrdiff_t val)
{
    if (ai > (val + target)) {
        return ai - val - target;
    }
    else if (ai < (val - target)) {
        return val - target - ai;
    }

    return 0;
}

ptrdiff_t MinAbsDiff_DP(const std::vector<ptrdiff_t>& AI,
                     const ptrdiff_t target,
                     const ptrdiff_t minAi,
                     const ptrdiff_t maxAi,
                     const size_t curIndex,
                     const ptrdiff_t value,
                     HashMap& cache)
{
    if (curIndex == (AI.size() - 1)) {
        return GetMinAbsValue(*AI.rbegin(), target, value);
    }

    const HashKeyPair key = { curIndex, value };
    {
        // cached already?
        HashMap::const_iterator foundCIter = cache.find(key);
        if (foundCIter != cache.end()) {
            return foundCIter->second;
        }
    }

    // DP
    auto minSigmaOfAbsDiff = std::numeric_limits<ptrdiff_t>::max();
    ptrdiff_t tempMinSigmaOfAbsDiff;
    for (ptrdiff_t tempVal = value - target; tempVal <= (value + target); ++tempVal) {
        if (tempVal <= maxAi && tempVal >= minAi) {
            tempMinSigmaOfAbsDiff = abs(tempVal - AI[curIndex])
                    + MinAbsDiff_DP(AI, target, minAi, maxAi, curIndex + 1, tempVal, cache);
            if (minSigmaOfAbsDiff > tempMinSigmaOfAbsDiff) {
                minSigmaOfAbsDiff = tempMinSigmaOfAbsDiff;
            }
        }
    }

    // Cache it
    cache.insert(std::make_pair(key, minSigmaOfAbsDiff));

    return minSigmaOfAbsDiff;
}

ptrdiff_t MinAbsDiff(const std::vector<ptrdiff_t>& AI, const ptrdiff_t target)
{
    auto minVal = *(std::min_element(AI.begin(), AI.end()));
    auto maxVal = *(std::max_element(AI.begin(), AI.end()));

    if ((maxVal - minVal) <= target) {
        return 0;
    }

    HashMap cache;
    auto minSigmaOfAbsDiff = std::numeric_limits<ptrdiff_t>::max();
    ptrdiff_t temp;
    for (ptrdiff_t val = minVal; val <= maxVal; ++val) {
        temp = abs(AI[0] - val) +
               MinAbsDiff_DP(AI, target, minVal, maxVal, 1, val, cache);
        if (minSigmaOfAbsDiff > temp) {
            minSigmaOfAbsDiff = temp;
        }
        if (minSigmaOfAbsDiff == 0) {
            return 0;
        }
    }

    return minSigmaOfAbsDiff;
}

// ********************************************************************************
// TEST
// ********************************************************************************
#include <cassert>
void TestCases()
{
    {
        std::vector<ptrdiff_t> testVec = { 50, 50, 50, 50, 50, 50, 50, 50, 50, 50};
        ptrdiff_t target = 5;
        assert(MinAbsDiff(testVec, target) == 0);
    }

    {
        std::vector<ptrdiff_t> testVec = { 48, 53, 51, 50, 49, 53, 52, 48, 49, 50 };
        ptrdiff_t target = 5;
        assert(MinAbsDiff(testVec, target) == 0);
    }

    {
        std::vector<ptrdiff_t> testVec = { 56, 50};
        ptrdiff_t target = 5;
        assert(MinAbsDiff(testVec, target) == 1);
    }

    {
        std::vector<ptrdiff_t> testVec = { 56, 50, 50 };
        ptrdiff_t target = 5;
        assert(MinAbsDiff(testVec, target) == 1);
    }

    {
        std::vector<ptrdiff_t> testVec = { 56, 50, 50, 56 };
        ptrdiff_t target = 5;
        assert(MinAbsDiff(testVec, target) == 2);
    }

    {
        std::vector<ptrdiff_t> testVec = { 55, 77, 52, 61, 39, 6, 25, 60, 49, 47 };
        ptrdiff_t target = 10;
        assert(MinAbsDiff(testVec, target) == 75);
    }

    {
        std::vector<ptrdiff_t> testVec = { 94, 35, 29, 55 };
        ptrdiff_t target = 10;
        assert(MinAbsDiff(testVec, target) == 65);
    }

    {
        std::vector<ptrdiff_t> testVec = { 97, 73, 56, 56, 93, 55, 29, 47, 90, 36 };
        ptrdiff_t target = 3;
        assert(MinAbsDiff(testVec, target) == 157);
    }
}
// ********************************************************************************

No comments:

Post a Comment