Wednesday 10 December 2014

Dynamic Programming - Maximizing Gold : Competitive Play and Known Pots

1. Analysis 
As the two players are to play competitively, both sides are trying to gain as much gold as possible. The first player has the privilege to pick first then has the edge. The second player has to do the best given the remaining pots after the first player's pick.

The ultimate goal for the first player is to achieve the global maximal gold. As we discussed in the last blog, there might be multiple ways of picking to reach that global maximal gold. But for some scenarios each picking will be critical if there is only one global maximal and only one way of picking to reach it.

Let's take a look at how this game is played. First of all the first player has the advantage to pick first. Afterwards the second player, the first player, ... in turn until running out of the gold pots. For each picking it is independent on what has happened before. Even thought players make some mistakes, for now this picking they have to decide has to bring them the most gold in the remaining gold pots. As we know that each picking is independent and this is a maximization problem, dynamic programming is good solution for this kind of problems

2. DP's Sub-problem
Given a serial gold pots (P1, P2, ..., Pn) the game play is F(P1, ..., Pn). The first player has the advantage as taking the pick first. Two options are available, picking P1 or Pn. The goal to maximize the gold, therefore pick P1 or Pn which can bring more gold.
    - If pick P1, then the return of gold is P1 + F(P2, ..., Pn)
    - If pick Pn, then the return of gold is Pn + F(P1, ..., Pn-1)
    - Pick the pot that has more return

After the very first pick, both players are trying to maximize their return. In other words both players are trying to pick the pot in the least favor of the opponent, because the total amount of gold is fixed. The more you gain, the less the opponent gets. At any stage of play F(Pi, ..., Pj), where i <= j and both within [1, n]. All the gold remaining is SUM = sigma(Pk), where k within[i, j].
    - If pick Pi, then the return is Pi + F(Pi+1, ..., Pj)
            and the return of the opponent is SUM - (Pi + F(Pi+1, ..., Pj)
    - If pick Pj, then the return is Pj + F(Pi, ..., Pj-1)
            and the return of the opponent is SUM - (Pj + F(Pi, ..., Pj-1)
    - Pick the pot in the least favor of the opponent
            * Pick Pi if, SUM - (Pi + F(Pi+1, ..., Pj) < SUM - (Pj + F(Pi, ..., Pj-1)
            * Pick Pj if, SUM - (Pi + F(Pi+1, ..., Pj) > SUM - (Pj + F(Pi, ..., Pj-1)

Keep in mind in this game we are looking for the return of the first player. The DP function has to return the gain of the first player even though it is the 2nd player is playing. This is why we have interpret the goal of two player is to pick the plot in the least favor of the opponent.

3. C++ Implementation
Caching techniques has been used to save the duplicate searching path. Because two players are adopting the same strategy, the real important thing is remaining gold pots. Therefore the pair of start and end indices of gold pots is used as the hash key.

// ********************************************************************************
// IMPLEMENTATION
// ********************************************************************************
#include <numeric>
#include <unordered_map>
#include <vector>

struct GoldPotKey {
    size_t startIndex;
    size_t endIndex;

    GoldPotKey(size_t s, size_t e)
        : startIndex(s), endIndex(e)
    {}

    bool operator==(const GoldPotKey& rhs) const {
        return startIndex == rhs.startIndex &&
               endIndex == rhs.endIndex;
    }
};

struct GoldPotKeyHash {
    size_t operator()(const GoldPotKey& key) const {
        return std::hash<double>()(key.startIndex) ^ std::hash<size_t>()(key.endIndex);
    }

    bool operator()(const GoldPotKey& k1, const GoldPotKey& k2) const {
        return k1 == k2;
    }
};

typedef size_t REWARD;
typedef std::unordered_map<GoldPotKey, REWARD, GoldPotKeyHash, GoldPotKeyHash>
                        GoldPotRewardHashMap;

size_t MaxGoldCompetitiveAndKnown_DP(const std::vector<size_t>& goldPots,
                                     const size_t startIndex,
                                     const size_t endIndex,
                                     const size_t sum,
                                     GoldPotRewardHashMap& gprhm)
{
    if (startIndex == (endIndex - 1)) {
        return 0;
    }

    const GoldPotKey gpk{ startIndex, endIndex };
    {
        GoldPotRewardHashMap::const_iterator foundIter = gprhm.find(gpk);
        if (foundIter != gprhm.end()) {
            return foundIter->second;
        }
    }

    size_t returnOfOpponentIfTakeTheLeftPot = goldPots[startIndex] +
                                MaxGoldCompetitiveAndKnown_DP(goldPots,
                                       startIndex + 1, endIndex, sum - goldPots[startIndex], gprhm);
    size_t returnOfOpponentIfTakeTheRightPot = goldPots[endIndex - 1] +
                                MaxGoldCompetitiveAndKnown_DP(goldPots,
                                       startIndex, endIndex - 1, sum - goldPots[endIndex - 1], gprhm);
    // the second player tries to maximize his/her gold, and take the bigger
    // return and the left is for the first player
    REWARD expectedReward;
    if (returnOfOpponentIfTakeTheLeftPot < returnOfOpponentIfTakeTheRightPot) {
        expectedReward = sum - returnOfOpponentIfTakeTheRightPot;
    }
    else {
        expectedReward = sum - returnOfOpponentIfTakeTheLeftPot;
    }

    gprhm.insert(std::make_pair(gpk, expectedReward));

    return expectedReward;
}

size_t MaxGoldCompetitiveAndKnown(const std::vector<size_t>& goldPots)
{
    if (goldPots.empty()) {
        return 0;
    }

    size_t totalPots = goldPots.size();
    if (totalPots == 1) {
        return goldPots[0];
    }

    GoldPotRewardHashMap gprhm;
    size_t sum = std::accumulate(goldPots.begin(), goldPots.end(), 0);
    size_t returnOfTakeTheLeft = goldPots[0] +
        MaxGoldCompetitiveAndKnown_DP(goldPots, 1, totalPots, sum - goldPots[0], gprhm);
    size_t reutrnOfTakeTheRight = goldPots[totalPots - 1] +
        MaxGoldCompetitiveAndKnown_DP(goldPots, 0, totalPots - 1,
                               sum - goldPots[totalPots - 1], gprhm);
    size_t maxGold = returnOfTakeTheLeft > reutrnOfTakeTheRight ?
                                  returnOfTakeTheLeft : reutrnOfTakeTheRight;
    return maxGold;
}
// ********************************************************************************
// TEST
// ********************************************************************************
#include <cmath>

void TestCases()
{
    std::vector<size_t> goldPots = { 1, 2 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 2);
 
    goldPots = { 1, 3, 2 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 3);
 
    goldPots = { 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 11);
 
    goldPots = { 4, 1, 2, 10 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 12);
 
    goldPots = { 4, 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 9);
 
    goldPots = { 5, 4, 1, 2, 10 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 15);
 
    goldPots = { 5, 4, 1, 2, 10, 3 };
    assert(MaxGoldCompetitiveAndKnown(goldPots) == 16);
  }
// ********************************************************************************

No comments:

Post a Comment