Thursday 20 November 2014

Data Structure and Algorithm - Find the Largest Number within K Swaps

1. Problem Description
Yet again this is another Google Interview Question for Software Developer/Engineer in careerup. Here is the original description of the thread.

"Given a number M (N-digit integer) and K-swap operations(a swap 
operation can swap 2 digits), devise an algorithm to get the maximum possible integer? 
Examples: 
M = 132 K = 1 output = 312 
M = 132 K = 2 output = 321 
M = 7899 k = 2 output = 9987 
M = 8799 and K = 2 output = 9987"

2. Solution
Let's take a close look at this problem. For any number there is always a global maximal and minimal value. The global maximal value is to number that have all the digits in the descending order from the left to the right direction. The minimal value is the number that have all the digits in ascending order except that the starting digit can be 0.)

How much swap to get the maximal value?
Given a number with any arbitrary order, at the most it needs (N-1) swaps to get the global maximal value, where N is the number of digits this number has. Simply each time find the biggest digit and swap it to the left side. For instance,
    - Number = 8765432109
        * 9765432108
        * 9865432107
        * 9875432106
        * 9876432105
        * 9876532104
        * 9876542103
        * 9876543102
        * 9876543201
        * 9876543210
This example is the worst case that need 9 (10 digits - 1) swaps, because each swap will result in the biggest digit out of place. The last swap will make the two digits in place, and this is where (-1) comes from in (N - 1).

Swap distance
We all know about the edit distance between two strings. Literally it represents how many modifications is needed to make two strings equal, assuming that each adding, removing and modifying as 1 distance.
Then what is swap distance? Assume we represent a  number with std::string with 10 chars, '0' - '9'. The swap distance represents how many swaps is needed to reach the global maximal value given a number in any shape with each swap as 1 distance. And we already know that the swap distance will be no more than (N-1), where N is the size of the string.

Solution of this problem
Here we know that there is a global maximal value for any number. And we always can reach this global maximal value if given enough swaps. So if given K swaps, we would like to reach a number as close to the global maximal value as possible. This means that first we need to find the right digits to swap and secondly these digits should be in descending order after swaps.

How to find the right digits for these K swaps and how to swap? The idea is to match the global maximal value for the left to the right. Swap as many as biggest digits to the left and keep them arranged in descending order when swapping. Here is the pseudo-code:
    I). Find the global maximal value
    II). Compare the Number with its global maximal and locate the first K edit distance from the left side to the right. (Not exactly correct. Will discuss later)
    III). Keep the indices of these digits that appearing at the edit difference
    IV). Sort the digits in the descending order and put them back to the indices recorded above
Here are two examples
    - N: 8765432109, K = 3
        * 9876543210 as the global maximal value
        * The first 3 edit difference is: 876 vs. 987
        * The indices of these 4 digits are: 0, 1, 2, 9
        * Sort the 4 digits in the descending order: 9, 8, 7, 6
        * Put 4 digits back: 9@position 0, 8@position 1, 7@position 2 and 6@position 9
        * After 3 swaps, then the maximal number we can achieve: 9875432106

    - N: 876959, K = 3
        * 998765 as the global maximal value
        * The first 3 edit difference is: 876 vs. 998
        * The indices of these 5 digits are: 0, 1, 2, 3,  and 5
        * Sort the 5 digits in the descending order: 9, 9, 8, 7, 6
        * Put 5 digits back: 9@position 0, 9@position 1, 8@position 2, 7@position 3 and 6@position 5
        * After 3 swaps, the maximal number we can achieve: 998756

As I mentioned on the 2nd step finding the K edit distance between Number and its global maximal value is not exactly correct. It does not work for some scenarios. The reason behind is that the swap distance is different from the edit distance, because one swap could make two digits into the correct place if these two digits are taking their counterpart's desired location. This should count only 1 swap distance but it counts 2 edit distance. Here is an example.
    - N: 689945932999, K = 5
        * 999999865432 as the global maximal value
        * The first 5 edit difference is: 68xx459 vs. 99xx998
        * The key here @position 1 and @position 6: pair (8, 9) and (9, 8). Swapping the digits on these two locations will make both digits into the correct place. But only one swap but two edit distance.
        * So above we have not yet consume K swaps yet. So we have to keep comparing Number and the global maximal value until exhausting K swaps.
        * The first 5 swap difference is:  68xx4593 vs. 99xx9986
        * Now keep the indices, sort the digits and put them back. The result: 999999862543

In order to calculate the correct swap distances the previous pairs have to be tracked. If two pairs have the same number but in different order (x, y) <=> (y, x), then the swap distance counts only 1. In the above pseduo-code, finding K swap distances instead of K edit distances in the 2nd step will make this algorithm fly.

Complexity analysis
It takes O(N) to find the global maximal value. Do not sort the string in the descending because it takes at least O(N*logN) (see Sort and Binary Search). Keep in mind that there are only 10 digits. Simply count the occurrence of each digits and rearrange them in the descending order. This is an O(N) solution.

Finding the K swap distance takes O(N). Simply go through from left to the right. But at the same time we need to track if (x,y)/(y,x) appear. In my implementation hash map is used, therefore it is O(1) solution for tracking the pair. Overall it is O(N) to find the K swap distance.

 The digits to swap need to be sorted as well, and this can be done in the same way as to find the global maximal value. Therefore it is O(K).

At the same time we need to keep tracking the indices of the digits to swap. The digits to swap are sorted and put back from the left to right. The largest digit to the lowest index and the smallest digit to the largest index. So the indices has to be sorted as well. In my implementation a binary map is used therefor it takes O(K*logK). Still O(K) is achievable if using hash set plus a vector in the way of using hash map above to tracking the (x,y) and (y.x) pair.

In my implementation it is a O(N) + O(K*logK) solution in computation complexity and O(N) in space complexity. Still a O(N+K) solution is achievable in computation complexity.

3. C++ Implementation
// ********************************************************************************
// Implementation
// ********************************************************************************
#include <set>
#include <string>
#include <unordered_map>

const int TOTAL_NUM_OF_DIGITS = 10;

struct SwapPair {
    SwapPair(char src, char dst)
    : a(src < dst ? src : dst),
    b(src > dst ? src : dst)
    {}

    bool operator()(const SwapPair& rhs) const {
        return a == rhs.a && b == rhs.b;
    }

    char a;
    char b;
};

struct SwapPairInfo {
    SwapPairInfo(char src, char dst, size_t f)
    : aLessThanb(src < dst), frequency(f)
    {}

    bool aLessThanb;
    size_t frequency;
};

struct SwapPairHash{
    size_t operator()(const SwapPair& key) const {
        return std::hash<long>()(key.a) ^
            std::hash<long>()(key.b);
    }

    bool operator()(const SwapPair& k1, const SwapPair& k2) const {
        return k1.operator()(k2);
    }
};

// to track the (x,y) and (y,x) pair
typedef std::unordered_map<SwapPair, SwapPairInfo, SwapPairHash, SwapPairHash> SwapPairHashMap;

std::string FindLargestNumber(const std::string& numbers,
                                                   std::vector<size_t> allDigitsMap[TOTAL_NUM_OF_DIGITS])
{
    for (size_t charIdx = numbers.size(); charIdx > 0; --charIdx) {
        assert(numbers[charIdx - 1] >= '0' && numbers[charIdx - 1] <= '9');
        allDigitsMap[numbers[charIdx - 1] - '0'].push_back(charIdx - 1);
    }

    std::string result;
    for (int num = TOTAL_NUM_OF_DIGITS - 1; num >= 0; --num) {
        if (!allDigitsMap[num].empty()) {
            for (size_t index = 0; index < allDigitsMap[num].size(); ++index) {
                result.push_back('0' + num);
            }
        }
    }

    return result;
}

std::string SwapKcharsBySorting(const std::string& numbers, const std::set<size_t>& swapIndices)
{
    size_t digitsCount[TOTAL_NUM_OF_DIGITS] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
    for (std::set<size_t>::const_iterator cIter = swapIndices.begin();
                             cIter != swapIndices.end(); ++cIter) {
        ++digitsCount[numbers[*cIter] - '0'];
    }

    std::vector<char> sortedSwapChars;
    for (int num = TOTAL_NUM_OF_DIGITS - 1; num >= 0; --num) {
        if (digitsCount[num] > 0) {
            for (size_t index = 0; index < digitsCount[num]; ++index) {
                sortedSwapChars.push_back('0' + num);
            }
        }
    }

    std::string result(numbers);
    size_t index = 0;
    for (std::set<size_t>::const_iterator cIter = swapIndices.begin();
        cIter != swapIndices.end(); ++cIter, ++index) {
        result[*cIter] = sortedSwapChars[index];
    }

    return result;
}

std::string KswapSolution(const std::string& numbers, const size_t k)
{
    if (k == 0 || numbers.empty()) {
        return numbers;
    }

    std::vector<size_t> digitsMap[TOTAL_NUM_OF_DIGITS];
    const std::string largestNum = FindLargestNumber(numbers, digitsMap);
    assert(largestNum.size() == numbers.size());

    if ((k + 1) >= numbers.size()) {
        // special case to reach the maximal value any way
        return largestNum;
    }


    // find the index to swap
    SwapPairHashMap swapPairHM;
    size_t swapCount = k;
    std::set<size_t> swapIndices;
    size_t tempIndexOfDigitsMap[TOTAL_NUM_OF_DIGITS] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
    for (size_t index = 0; index < numbers.size() && swapCount > 0; ++index) {
        if (largestNum[index] != numbers[index]) {
            const SwapPair sp(numbers[index], largestNum[index]);
            SwapPairHashMap::iterator found = swapPairHM.find(sp);
            if (found == swapPairHM.end()) {
               // keep (x,y)
               swapPairHM.insert(std::make_pair(sp,
                    SwapPairInfo(numbers[index], largestNum[index], 1)));
                --swapCount;
                swapIndices.insert(index);
                swapIndices.insert(digitsMap[largestNum[index] - '0'].at(
                    tempIndexOfDigitsMap[largestNum[index] - '0']));
                ++tempIndexOfDigitsMap[largestNum[index] - '0'];
            }
            else {
                bool aLessThanb = numbers[index] < largestNum[index];
                if (found->second.aLessThanb == aLessThanb) {
                    // (x, y) again
                    ++found->second.frequency;
                    --swapCount;
                }
                else {
                    // (y,x) found
                    if (found->second.frequency > 0) {
                        // there is (x,y) available, then do not reduce swapCount
                        --found->second.frequency;
                    }
                    else {
                        // there is not (x,y) available, then change it to (y, x) pair
                        ++found->second.frequency;
                        found->second.aLessThanb = aLessThanb;
                        --swapCount;
                    }
                }

                swapIndices.insert(index);
                swapIndices.insert(digitsMap[largestNum[index] - '0'].at(
                    tempIndexOfDigitsMap[largestNum[index] - '0']));
                ++tempIndexOfDigitsMap[largestNum[index] - '0'];
            }
        }
    }

    return SwapKcharsBySorting(numbers, swapIndices);
}

// ********************************************************************************
// TEST
// ********************************************************************************

#include <cassert>
void TestCornerCases()
{
    assert(KswapSolution("", 1).empty());
    assert(KswapSolution("9", 1) == "9");
    assert(KswapSolution("19", 1) == "91");
    assert(KswapSolution("19", 0) == "19");
    assert(KswapSolution("1234", 3) == "4321");
    assert(KswapSolution("119798699", 8) == "999987611");
}

void TestCases()
{
    assert(KswapSolution("132", 1) == "312");
    assert(KswapSolution("132", 2) == "321");
    assert(KswapSolution("7899", 2) == "9987");
    assert(KswapSolution("8799", 2) == "9987");
    assert(KswapSolution("1189119999", 5) == "9999981111");
    assert(KswapSolution("191899", 3) == "999811");
    assert(KswapSolution("191899", 2) == "999811");
    assert(KswapSolution("34155", 2) == "55143");
    assert(KswapSolution("12520", 2) == "52210");
    assert(KswapSolution("876959", 3) == "998756");
    assert(KswapSolution("6899459999", 5) == "9999998654");
    assert(KswapSolution("68994579999", 5) == "99999987654");
    assert(KswapSolution("689984599382999", 5) == "999999988382654");
    assert(KswapSolution("68994593999", 5) == "99999986543");
    assert(KswapSolution("68994597999", 5) == "99999987654");
    assert(KswapSolution("689945932999", 5) == "999999862543");
    assert(KswapSolution("876959", 2) == "996857");
    assert(KswapSolution("123456789098765432199998888777777266655", 350)
                == "999999888888777777776666655554433222110");
    assert(KswapSolution("123456789098765432199998888777777266655", 35)
                == "999999888888777777776666655554433222110");
    assert(KswapSolution("123456789098765432199998888777777266655", 9)
                == "999999888878765432165438210777777266655");
}
// ********************************************************************************


2 comments:

  1. can you explain * The first 3 edit difference is: 876 vs. 987
    * The indices of these 4 digits are: 0, 1, 2, 9
    how indices are chosen

    ReplyDelete
  2. 3 edit difference with 4 numbers 6, 7, 8 and 9. In the given value "8765432109", the indices of the 4 digits are 0, 1, 2 and 9.

    ReplyDelete