Sunday 14 December 2014

Dynamic Programming - Maximizing Gold : Cooperative Play and Known Pots

1. Analysis
In the last two blog we have discussed competitive play with unknown pots, cooperative play with unknown pots and competitive play with known pots. In this blog I am going to discuss cooperative play with known pots.

Comparing with competitive play with known pots the difference is on the second player's strategy. In competitive play with known pots the second player is to maximize the gold as the first player. However in cooperative play with known pots the second player is to minimize the gold in order to let the first player maximize the gold. In another word in competitive play both players have to bring up their best game, however in cooperative play the first player has to play the best and the second player has to play the worst.

2. DP's Sub-problem
As we discussed above, the strategy for the first player is to maximize the gold and the strategy for the second player is to minimize the gold. Or in other word the second player is to pick the gold pot in the first player's favor which allows the first player to maximize the gold.

Given a serial gold pots (P1, P2, ..., Pn), and at any stage of play F(Pi, ..., Pj), where i, j are within [1, n] and i is not greater than j.
For the first player:
    - Pick the left pot, then the return is Pi + F(Pi+1, ..., Pj)
    - Pick the right pot, then the return is Pj + F(Pi, ..., Pj-1)
    - And take the bigger return
For the first player:
    - Pick the left pot, then the return for the first player is F(Pi+1, ..., Pj)
    - Pick the right pot, then the return for the first player is F(Pi, ..., Pj-1)
    - Pick the pot that result in a bigger return for the first player

3. C++ Implementation
As competitive play with known pots, the caching technique is used to save the duplicate calculation as well.

// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
#include <unordered_map>
#include <vector>

struct GoldPotKey {
    size_t startIndex;
    size_t endIndex;

    GoldPotKey(size_t s, size_t e)
        : startIndex(s), endIndex(e)
    {}

    bool operator==(const GoldPotKey& rhs) const {
        return startIndex == rhs.startIndex &&
               endIndex == rhs.endIndex;
    }
};

struct GoldPotKeyHash {
    size_t operator()(const GoldPotKey& key) const {
        return std::hash<double>()(key.startIndex) ^ std::hash<size_t>()(key.endIndex);
    }

    bool operator()(const GoldPotKey& k1, const GoldPotKey& k2) const {
        return k1 == k2;
    }
};

typedef size_t REWARD;
typedef std::unordered_map<GoldPotKey, REWARD, GoldPotKeyHash, GoldPotKeyHash>
                                 GoldPotRewardHashMap;

size_t MaxGoldCooperativeAndKnown_DP(const std::vector<size_t>& goldPots,
                                     const size_t startIndex,
                                     const size_t endIndex,
                                     const bool isFirstPlayer,
                                     GoldPotRewardHashMap& gprhm)
{
    if (startIndex == (endIndex - 1)) {
        if (isFirstPlayer) {
            return goldPots[startIndex];
        }
        return 0;
    }

    const GoldPotKey gpk{ startIndex, endIndex };
    {
        GoldPotRewardHashMap::const_iterator foundIter = gprhm.find(gpk);
        if (foundIter != gprhm.end()) {
            return foundIter->second;
        }
    }

    size_t returnOfOpponentIfTakeTheLeftPot;
    size_t returnOfOpponentIfTakeTheRightPot;
    if (isFirstPlayer)
    {
        returnOfOpponentIfTakeTheLeftPot = goldPots[startIndex] +       
                        MaxGoldCooperativeAndKnown_DP(goldPots,
                                                startIndex + 1, endIndex, false, gprhm);
        returnOfOpponentIfTakeTheRightPot = goldPots[endIndex - 1] + 
                        MaxGoldCooperativeAndKnown_DP(goldPots,
                                                startIndex, endIndex - 1, false, gprhm);
    }
    else {
        returnOfOpponentIfTakeTheLeftPot =
                        MaxGoldCooperativeAndKnown_DP(goldPots, startIndex + 1,
                                                endIndex, true, gprhm);
        returnOfOpponentIfTakeTheRightPot =
                        MaxGoldCooperativeAndKnown_DP(goldPots,
                                                startIndex, endIndex - 1, true, gprhm);
    }
    // the second player tries to maximize his/her gold, and take the bigger
    // return and the left is for the first player
    size_t expectedReward = 
                returnOfOpponentIfTakeTheLeftPot > returnOfOpponentIfTakeTheRightPot ?
                returnOfOpponentIfTakeTheLeftPot : returnOfOpponentIfTakeTheRightPot;

    gprhm.insert(std::make_pair(gpk, expectedReward));

    return expectedReward;
}

size_t MaxGoldCooperativeAndKnown(const std::vector<size_t>& goldPots)
{
    if (goldPots.empty()) {
        return 0;
    }

    size_t totalPots = goldPots.size();
    if (totalPots == 1) {
        return goldPots[0];
    }

    GoldPotRewardHashMap gprhm;
    return MaxGoldCooperativeAndKnown_DP(goldPots, 0, goldPots.size(), true, gprhm);
}

// ********************************************************************************
// TEST
// ********************************************************************************
#include <cassert>

void TestCases()
{
    std::vector<size_t> goldPots = { 1, 2 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 2);

    goldPots = { 1, 3, 2 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 5);

    goldPots = { 1, 2, 10, 3 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 13);

    goldPots = { 4, 1, 2, 10 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 14);

    goldPots = { 4, 1, 2, 10, 3 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 17);

    goldPots = { 5, 4, 1, 2, 10 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 19);

    goldPots = { 5, 4, 1, 2, 10, 3 };
    assert(MaxGoldCooperativeAndKnown(goldPots) == 19);
}
// ********************************************************************************

Wednesday 10 December 2014

Dynamic Programming - Maximizing Gold : Competitive Play and Known Pots

1. Analysis 
As the two players are to play competitively, both sides are trying to gain as much gold as possible. The first player has the privilege to pick first then has the edge. The second player has to do the best given the remaining pots after the first player's pick.

The ultimate goal for the first player is to achieve the global maximal gold. As we discussed in the last blog, there might be multiple ways of picking to reach that global maximal gold. But for some scenarios each picking will be critical if there is only one global maximal and only one way of picking to reach it.

Let's take a look at how this game is played. First of all the first player has the advantage to pick first. Afterwards the second player, the first player, ... in turn until running out of the gold pots. For each picking it is independent on what has happened before. Even thought players make some mistakes, for now this picking they have to decide has to bring them the most gold in the remaining gold pots. As we know that each picking is independent and this is a maximization problem, dynamic programming is good solution for this kind of problems

2. DP's Sub-problem
Given a serial gold pots (P1, P2, ..., Pn) the game play is F(P1, ..., Pn). The first player has the advantage as taking the pick first. Two options are available, picking P1 or Pn. The goal to maximize the gold, therefore pick P1 or Pn which can bring more gold.
    - If pick P1, then the return of gold is P1 + F(P2, ..., Pn)
    - If pick Pn, then the return of gold is Pn + F(P1, ..., Pn-1)
    - Pick the pot that has more return

After the very first pick, both players are trying to maximize their return. In other words both players are trying to pick the pot in the least favor of the opponent, because the total amount of gold is fixed. The more you gain, the less the opponent gets. At any stage of play F(Pi, ..., Pj), where i <= j and both within [1, n]. All the gold remaining is SUM = sigma(Pk), where k within[i, j].
    - If pick Pi, then the return is Pi + F(Pi+1, ..., Pj)
            and the return of the opponent is SUM - (Pi + F(Pi+1, ..., Pj)
    - If pick Pj, then the return is Pj + F(Pi, ..., Pj-1)
            and the return of the opponent is SUM - (Pj + F(Pi, ..., Pj-1)
    - Pick the pot in the least favor of the opponent
            * Pick Pi if, SUM - (Pi + F(Pi+1, ..., Pj) < SUM - (Pj + F(Pi, ..., Pj-1)
            * Pick Pj if, SUM - (Pi + F(Pi+1, ..., Pj) > SUM - (Pj + F(Pi, ..., Pj-1)

Keep in mind in this game we are looking for the return of the first player. The DP function has to return the gain of the first player even though it is the 2nd player is playing. This is why we have interpret the goal of two player is to pick the plot in the least favor of the opponent.

3. C++ Implementation
Caching techniques has been used to save the duplicate searching path. Because two players are adopting the same strategy, the real important thing is remaining gold pots. Therefore the pair of start and end indices of gold pots is used as the hash key.

// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
#include <numeric>
#include <unordered_map>
#include <vector>

struct GoldPotKey {
    size_t startIndex;
    size_t endIndex;

    GoldPotKey(size_t s, size_t e)
        : startIndex(s), endIndex(e)
    {}

    bool operator==(const GoldPotKey& rhs) const {
        return startIndex == rhs.startIndex &&
               endIndex == rhs.endIndex;
    }
};

struct GoldPotKeyHash {
    size_t operator()(const GoldPotKey& key) const {
        return std::hash<double>()(key.startIndex) ^ std::hash<size_t>()(key.endIndex);
    }

    bool operator()(const GoldPotKey& k1, const GoldPotKey& k2) const {
        return k1 == k2;
    }
};

typedef size_t REWARD;
typedef std::unordered_map<GoldPotKey, REWARD, GoldPotKeyHash, GoldPotKeyHash>
                        GoldPotRewardHashMap;

size_t MaxGoldCompetitiveAndKnown_DP(const std::vector<size_t>& goldPots,
                                     const size_t startIndex,
                                     const size_t endIndex,
                                     const size_t sum,
                                     GoldPotRewardHashMap& gprhm)
{
    if (startIndex == (endIndex - 1)) {
        return 0;
    }

    const GoldPotKey gpk{ startIndex, endIndex };
    {
        GoldPotRewardHashMap::const_iterator foundIter = gprhm.find(gpk);
        if (foundIter != gprhm.end()) {
            return foundIter->second;
        }
    }

    size_t returnOfOpponentIfTakeTheLeftPot = goldPots[startIndex] +
                                MaxGoldCompetitiveAndKnown_DP(goldPots,
                                       startIndex + 1, endIndex, sum - goldPots[startIndex], gprhm);
    size_t returnOfOpponentIfTakeTheRightPot = goldPots[endIndex - 1] +
                                MaxGoldCompetitiveAndKnown_DP(goldPots,
                                       startIndex, endIndex - 1, sum - goldPots[endIndex - 1], gprhm);
    // the second player tries to maximize his/her gold, and take the bigger
    // return and the left is for the first player
    REWARD expectedReward;
    if (returnOfOpponentIfTakeTheLeftPot < returnOfOpponentIfTakeTheRightPot) {
        expectedReward = sum - returnOfOpponentIfTakeTheRightPot;
    }
    else {
        expectedReward = sum - returnOfOpponentIfTakeTheLeftPot;
    }

    gprhm.insert(std::make_pair(gpk, expectedReward));

    return expectedReward;
}

size_t MaxGoldCompetitiveAndKnown(const std::vector<size_t>& goldPots)
{
    if (goldPots.empty()) {
        return 0;
    }

    size_t totalPots = goldPots.size();
    if (totalPots == 1) {
        return goldPots[0];
    }

    GoldPotRewardHashMap gprhm;
    size_t sum = std::accumulate(goldPots.begin(), goldPots.end(), 0);
    size_t returnOfTakeTheLeft = goldPots[0] +
        MaxGoldCompetitiveAndKnown_DP(goldPots, 1, totalPots, sum - goldPots[0], gprhm);
    size_t reutrnOfTakeTheRight = goldPots[totalPots - 1] +
        MaxGoldCompetitiveAndKnown_DP(goldPots, 0, totalPots - 1,
                               sum - goldPots[totalPots - 1], gprhm);
    size_t maxGold = returnOfTakeTheLeft > reutrnOfTakeTheRight ?
                                  returnOfTakeTheLeft : reutrnOfTakeTheRight;
    return maxGold;
}
// ********************************************************************************
// TEST
// ********************************************************************************
#include <cmath>

void TestCases()
{
    std::vector<size_t> goldPots = { 1, 2 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 2);
 
    goldPots = { 1, 3, 2 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 3);
 
    goldPots = { 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 11);
 
    goldPots = { 4, 1, 2, 10 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 12);
 
    goldPots = { 4, 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 9);
 
    goldPots = { 5, 4, 1, 2, 10 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 15);
 
    goldPots = { 5, 4, 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 16);
  }
// ********************************************************************************

Monday 8 December 2014

Dynamic Programming - Maximizing Gold : Competitive/Cooperative Play with Unknown Pots

1. Problem Description
Yet again this is a Google Interview Questions for Intern from careerup. The following is the original problem description of that thread.

"Question: Two players A and B are playing a game. Pots of gold, each with 
varying number of coins are placed in a single line. The rules of the game are:
1) Players play turn by turn.
2) On each turn a player can pick a pot of gold from either end of the line. He
gets all the gold in that pot. The next pot of gold on that end is now available
for picking.
What is the maximum number of gold can the first player get ?"
2. Analysis 
As some contributors on that thread pointed out that the owner of this thread was not very clear about two questions
    - Are two players playing competitively or cooperatively?
    - Are the gold pots are known to two players?

If the gold pots are unknown to two player and the only known pots are these two at the two ends available to pick, then there is not really much strategy to play. As the very limited information is available, like
    - Two pots available to pick at the two ends
    - The gold that two players are holding before the next pick
Assume that the number of gold in each pot, the number of pots are completely random and each game is independent.  Therefor repeating many games does not really bring statistical hint for next games. All the two players can do is to make a pick based on the very limited information.

If the gold pots are known to two players, not matter if the players are playing competitively or cooperatively there is a global optimal/maximal gold that the first player can gain. If not single global maximal (many ways of picking to reach the maximal gold for the first player, for instance all pots with the same amount of gold), one of the ways to pick is what we are looking for. If only one global maximal available, the each picking of the first player will be critical because there might be one only way of picking that can take the first player to the unique global maximal. This will be a game between two players who can think deep forward.

In this blog I will discuss the scenarios that the gold pots are known to the two players. And these two players can play either competitively or cooperatively.

3. Solution
As I briefly discussed in last section, very  limited information are available to the two players.Only the two pots at the two ends and the gold that they are currently holding now are available to be considered. My strategy is rather simple as well. Focus on the short term run as not enough information available to plan in the long run.
    - Competitive play: both players pick the bigger pots at the two ends
    - Cooperative play: the first player picks the bigger pot and the 2nd pick the smaller pot

4. C++ Implementation
// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
#include <vector>

size_t MaxGoldCooperativeAndUnknown(const std::vector<size_t>& goldPots)
{
    if (goldPots.empty()) {
        return 0;
    }

    size_t endIndex = goldPots.size() - 1;
    if (endIndex == 0) {
        return goldPots[0];
    }

    size_t gold = 0;
    size_t whoseTurn = 0; // Even: first player; Odd: second player
    for (size_t startIndex = 0; startIndex < endIndex;) {
        if (goldPots[startIndex] > goldPots[endIndex]) {
            if (whoseTurn & 1) {
                --endIndex; // the second player pick up smaller one
            }
            else {
                gold += goldPots[startIndex];
                ++startIndex; // the first player pick up bigger pot
            }
        }
        else
        {
            if (whoseTurn & 1) {
                ++startIndex; // the second player pick up smaller one
            }
            else {
                gold += goldPots[endIndex];
                --endIndex; // // the first player pick up bigger pot
            }
   
        }
        ++whoseTurn;
    }

    return gold;
}

size_t MaxGoldCompetitiveAndUnknown(const std::vector<size_t>& goldPots)
{
    if (goldPots.empty()) {
        return 0;
    }

    size_t endIndex = goldPots.size() - 1;
    if (endIndex == 0) {
        return goldPots[0];
    }

    size_t gold = 0;
    size_t whoseTurn = 0; // Even: first player; Odd: second player
    for (size_t startIndex = 0; startIndex < endIndex;) {
        if (goldPots[startIndex] > goldPots[endIndex]) {
            if (whoseTurn & 1) {
                ++startIndex; // the second player pick up bigger one
            }
            else {
                gold += goldPots[startIndex];
                ++startIndex; // the first player pick up bigger pot
            }
        }
        else
        {
            if (whoseTurn & 1) {
                --endIndex; // the second player pick up bigger one
            }
            else {
                gold += goldPots[endIndex];
                --endIndex; // the first player pick up bigger pot
            }

        }
        ++whoseTurn;
    }

    return gold;
}
// ********************************************************************************

5. Reflection on the Strategy
Because of limited information available and the simple strategy, this is really not a fun game to play. And the game does not really depends on how well the two players can play and it completely depends on how the gold pots is generated.

Another question is if this is a win-or-lose game. If yes, then the player can consider taking disadvantage pick for the short run in order to gain more in the long run. For instance if one player knows that the gold is much less than the opponent. Then probably it is worth taking the risk to take some bold picks in order to swing the game. This can't be called a strategy but be really just a gamble, because there is not really any evidence to bring you different game result by taking disadvantage picks based on the assumption in Section 3.

Surely with known gold pots, it would be much fun to play. Because each pick of the first player will potentially be deciding moment if he/she can reach the global maximal gold. I will discuss it in my next two blogs.