1, Problem description
The following is the original problem description on math stackexchange.
"Since future market prices, and the effect of large sales on these prices,
are very hard to predict, brokerage firms use models of the market to help them
make such decisions. In this problem, we will consider the following simple
model. Suppose we need to sell “x” shares of the stock in a company, and suppose
that we have an accurate model of the market: it predicts that the stock price
will take the values P1, P2, …, Pn over the next “n” days. Moreover, there is a
function f(.) that predicts the effect of large sales: if we sell “y” shares on
a single day, it will permanently decrease the price by f(y) from that day
onward. So, if sell “y1” shares on day 1, we obtain a price per share of
"p1-f(y1)”, for a total income of “y1•(p1-f(y1))”. Having sold “y1” shares on
day 1, we can then sell “y2” shares on day 2 for a price per share of
“p2-f(y1)-f(y2)”; this yields an additional income of “y2•(p2-f(y1)-f(y2))”.
This process continues over all “n” days. (NOTE, as in our calculation for day
2, that the decreases from earlier days are absorbed into the prices for all
later days.)
Design an efficient algorithm that takes the prices “p1,…,pn” and the
function f(•) (written as a list of values f(1), f(2), …, f(x)) and determines
the best way to sell “x” shares by day “n”. In other words, find natural numbers
“y1, y2,…, yn” so that "x=y1+..+yn” and selling y1 shares on day “i” for “i=1,
2,…,n” maximizes the total income achievable. You should assume that the share
“p1”, is monotone decreasing, and f(y) is monotone increasing; that is, selling
a larger number of shares causes a large drop in the price. Your algorithm’s
running time can have a polynomial dependence on “n” (the number of days), “x”
(the number of shares), and “y” (the peak price of the stock).
EXAMPLE: Consider the case when n=3, the prices for me three days are 90, 80,
40; and f(y) = 1 for y≤40000 and f(y) = 20 for y>40000. Assume you start whit
x=100000 shares. Selling all of them on day 1 would yield a price of 70 per
share, for a total income of 7000000. On the other hand, selling 40000 shares on
day 1 yields a price of 89 per share, and selling the remaining 60000 shares on
day 2 results in a price of 59 per share, for a total income of 7000000."
2. Analysis
From the applied mathematical point of view this is a typical optimization problem. Its objective is to maximize the sale revenue under the time and price constraint.
Objective function: max(sigma(yi*pi - sigma(f(yj))*yi)), where
i: within [1, n], - time constraint to sell all the shares
yi: shares to sale at i-th day, subject to
sigma(yi) = x, - quantity constraint to sell all the shares within n days
sigma(f(yj)): j within [1, i] - accumulated price discount due to stock sale in
previous i-1 days and the current the i-th day
Any modelling software can be applied to solve this problem, for instance Matlab[1] and gPROMS ModelBuilder[2].
Here I would like to use dynamic programming to solve this problem, simply utilizing modern tools computation power and endless memory. Here is the dynamic programming formulation:
MaxSale(P, X, N) = max{yi*pi - sigma(f(yj))*pi + MaxSale(P, X-yi, N-1)} OR
= yn*X - sigma(f(yj))*X, when come to the last day - have to sell all no
matter what price it is to maximize the revenue
, where
yi within [0, X]
3. C++ Implementation
// Code
//********************************************************************************
#include <ctime>
#include <iostream>
#include <unordered_map>
// Generate pi
double GetPredictedSharePrice(size_t day)
{
// Hard coded for the example
const double prices[] = { 90.0, 80.0, 40.0, 30.0 };
return prices[day];
}
// Generate f(yi)
double GetEffectOnSharePriceFromLargeSales(size_t shares)
{
if (shares == 0) {
return 0;
}
if (shares <= 40000) {
return 1;
}
else {
return 20;
}
}
// Key used for hash table in caching purpose
struct StockExchangeKey {
size_t cur_day;
size_t remaining_share;
double accumalted_price_discount;
bool operator() (const StockExchangeKey& right) const {
return cur_day == right.cur_day &&
remaining_share == right.remaining_share &&
accumalted_price_discount == right.accumalted_price_discount;
}
};
// Hash function for use-defined hash key
struct StockExchangeKey_Hash {
size_t operator() (const StockExchangeKey& key) const {
return std::hash<size_t>()(key.cur_day) ^
std::hash<size_t>()(key.remaining_share) ^
std::hash<double>()(key.accumalted_price_discount);
}
bool operator() (const StockExchangeKey& k1, const StockExchangeKey& k2) const {
return k1.operator()(k2);
}
};
typedef std::unordered_map<StockExchangeKey, double, StockExchangeKey_Hash, StockExchangeKey_Hash> StockExchangeHashMap;
// dynamic programming solution
double StrategyForSale(size_t duration_days,
size_t minimal_quantity_for_sale,
size_t remaining_shares,
size_t cur_day,
double accumulated_price_effect_over_sale,
StockExchangeHashMap& SE_Cache)
{
if (remaining_shares == 0) {
return 0;
}
double priceOfDay = GetPredictedSharePrice(cur_day);
if ((cur_day + 1) == duration_days) {
// At the last day, sell whatever in hand with
// whatever price to cash in
double priceForSale = priceOfDay - accumulated_price_effect_over_sale -
GetEffectOnSharePriceFromLargeSales(remaining_shares);
if (priceForSale > 0) {
return priceForSale * remaining_shares;
}
// should not hit here because the share price must be > 0
return 0;
}
// Look for computed sub-problem in the hash table
StockExchangeKey key{ cur_day,
remaining_shares,
accumulated_price_effect_over_sale };
{
StockExchangeHashMap::const_iterator constIt = SE_Cache.find(key);
if (constIt != SE_Cache.end()) {
return constIt->second;
}
}
double maxShareValue = 0;
double shareValue;
double accumPriceEffectOverSale;
for (size_t sharesForSale = 0; sharesForSale <= remaining_shares;
sharesForSale += minimal_quantity_for_sale) {
accumPriceEffectOverSale = accumulated_price_effect_over_sale +
GetEffectOnSharePriceFromLargeSales(sharesForSale);
// values of selling the number of "shareForSale" shares
shareValue = (priceOfDay - accumPriceEffectOverSale) * sharesForSale;
// add up the values of rest of share values
shareValue += StrategyForSale(duration_days,
minimal_quantity_for_sale,
remaining_shares - sharesForSale,
cur_day + 1,
accumPriceEffectOverSale,
SE_Cache);
if (shareValue > maxShareValue) {
maxShareValue = shareValue;
}
}
// cache the sub-problem
SE_Cache.insert(std::make_pair(key, maxShareValue));
// std::cout << "maxShareValue: " << maxShareValue << std::endl;
return maxShareValue;
}
double MaxValueOfStockExchange(size_t days, size_t minimalQuantityForSale, size_t shares)
{
StockExchangeHashMap seCache;
return StrategyForSale(days, minimalQuantityForSale,shares, 0, 0.0, seCache);
}
// test
int _tmain(int argc, _TCHAR* argv[])
{
clock_t startTime = clock();
double maxValue = MaxValueOfStockExchange(3, 100, 100000); // 7.42e+6
std::cout << "N = 3, step = 100, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
startTime = clock();
maxValue = MaxValueOfStockExchange(3, 10, 100000);
std::cout << "N = 3, step = 10, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
startTime = clock();
maxValue = MaxValueOfStockExchange(3, 1, 100000);
std::cout << "N = 3, step = 1, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
startTime = clock();
maxValue = MaxValueOfStockExchange(4, 100, 100000);
std::cout << "N = 4, step = 100, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
startTime = clock();
maxValue = MaxValueOfStockExchange(4, 10, 100000);
std::cout << "N = 4, step = 10, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
startTime = clock();
maxValue = MaxValueOfStockExchange(4, 1, 100000);
std::cout << "N = 4, step = 1, X = 100000: "
<< double(clock() - startTime) / (double)CLOCKS_PER_SEC
<< std::endl;
std::cout << "maxValue: " << maxValue << std::endl;
return 0;
}
//********************************************************************************
4. Further Discussion
As we discussed in Section 2. Using dynamic programming to solve this problem is simply using the brutal computing force of computer. And how complex this solution is depends on the search space and how it is quantified. Based on the nature of this problem the search space can be easily/intuitively discretized via day in time constraint and via per share in quantity constraint.
Search depth:
This is the decided by the time constraint, via per day, N days. y1 + y2 + ... + yn = X. For each solution fist fix y1, then y2, then ..., yn = X - (y1 + y2 + ... + yn-1) in order not to violate this constraint. This also means that the search depth is the how many days in advance this optimization problem is constrained to.
Search Breadth:
This is dependent on how many options for each yi has. For y1 it has (X+1) options, from 0, 1, ..., X-1, X. For y2 it has (X - y1 + 1) options, 0, 1, 2, , ..., X- y1. And finally yn it has (X - y1 - ... - yn-1 + 1) options, 0, 1, 2, ..., X - y1 - y2 - ... - yn-1.
Based on the search depth and search breadth, the complexity of this algorithm is exponential, ~O(POW(X, N)). This is not really a valid algorithm solution and something has to be done to improve this algorithm.
Caching:
As the examples I have shown caching before in Heavy Numbers (Part I) and Heavy Numbers (Part II), this problem has no exception. The caching technique is also applicable to this problem in order to avoid the repeated computing of the same sub-problems. It is worth pointing out that caching does not improve the performance when N = 3, because all the sub-problems are the leaf-nodes which are not cached in this implementation. Because I believe the simple computation should be faster than pushing into hash map, computing the hash value and locating the record. The caching will kick in when N >=4.
Here is the table of time spent to solve the example in Section 1. The search breadth is equal to the total share to sell divided by minimal sale.
// Table - Performance on Microsoft Visual Studio 2013, Win32 and Release
// Machine: Intel(R) Core(TM) i3-3227U CPU @1.9GHz 1.9GhHz, 6G RAM
// Window 8.1 64-bit
//********************************************************************************
X = 100000, X = 100000 X = 100000
Minimal Sale = 100 Minimal Sale = 10 Minimal Sale = 1
Search Breadth = 1000 Search Breadth = 10000 Search Breadth = 100000
N = 3 0.019 seconds 1.026 seconds 82.389 seconds
N = 4 0.186 seconds 8.312 seconds 1070.13 seconds
//********************************************************************************
From the performance of these experiments, we can see how the "Minimal Sale" is to affect the performance. Literally the bigger, the faster the solution runs. The explanation is quite simple - it reduce the search breadth. As for N there is really nothing we can do about it because the bigger it is the more advantage the business can gain by the assumption that this optimization does bring maximal profits. Ideally the larger N is, the better it is for business and the slower it runs.
Output:
Simply returning the maximal value of stock sale may not be the ideal for business, because a dealer may care more how to reach this maximal value. Both the strategy to sale, y1, y2, ..., yn, and the maximal value of the sale should be returned.
Bibliography:
[1] http://www.mathworks.co.uk/products/matlab/
[2] http://www.psenterprise.com/modelbuilder.html
Saturday, 20 September 2014
Saturday, 13 September 2014
Dynamic Programming - Game Value of Rolling N Dices with S Sides Until Rollong Two Same Value in Row
1. Game Description:
Dice: S sides, 1, 2, 3, ..., S
Roll N dices,
- If the sum of N dices is not equal to last rolling, then the reward is sum together
- If the sum of N dices is equal to last rolling, then the reward is 0.
Calculate the game value. Here is a concrete example.
2. Analysis
Suppose that on the current state, the accumulated rewards as R and current dice sum is K represent the current state as V(R, K). Now what is the strategy to stop rolling.
The Game Strategy: stop rolling and keep the current reward if and only if the next rolling will not increase the reward statistically.
Given N dices with S sides, where each side of the dice is from 1, 2, ..., S, then the possible value of SUM K is [N, N*S]. Let's use Qi to represent the total combination of SUM Ki given N dices and S sides and Qi can be computed as Dynamic programming - Sum K of N dices with S sides, where Qi should meet this constraint,
Sigma(Qi) = Pow(S, N), where i within [N, N*S].
Then the probability Pi of rolling Ki is Qi/Pow(S, N) : Pi = Qi/Pow(S, N).
According to the strategy, it can be represented mathematically as
V(R, Ki) >= E(V(V(R, Ki), Kj)), where
E: represents the expected value given the fact of V(R, Ki) and rolling again.
According to the game:
If Ki = Kj, then the reward is 0,
If Ki != Kj, then the reward is V(R, Ki) + Kj
Hence E(V(V(R, Ki), Kj) can be computed as
Pn*(V(R, Ki) + Kn)) + Pn+1*(V(R, Ki) + Kn+1)) + ... + Pi-1(V(R, Ki) + Ki-1)) +
Pi*0 + Pi+1(V(R, Ki) + Ki+1)) + ... + Pn*s *(V(R, Ki) + Kn*s))
= Sigma(Pj*(V(R,Ki) + Kj)) - Pi*(V(R, Ki) + Ki)) , where
j belongs to [N, N*S]
According to the strategy, now we should stop rolling if
V(R, Ki) >= E(V(V(R, Ki), Kj)) ==>
V(R, Ki) >= Sigma(Pj*(V(R, Ki) + Kj)) - Pi*(V(R, Ki) + Ki))
V(R, Ki) >= Sigma(Pj*V(R, Ki)) + Sigma(Pj*Kj) - Pi(V(R, Ki) + Ki)), where
Sigma(Pj*V(R, Ki)) = V(R, Ki) * Sigma(Pj) = V(R, Ki)
Sigma(Pj*Kj) = E(Kj), expected value of SUM of N dices with S sides
V(R, Ki) >= V(R, Ki) + Sigma(Pj*Kj) - Pi*V(R, Ki) - Pi*Ki
Pi*V(R, Ki) >= Sigma(Pj*Kj) - Pi*Ki, where
Pi = Qi/Pow(S, N) > 0, both Qi > 0 and Pow(S, N) > 0
Qi*V(R, Ki) >= Pow(S, N)*(Sigma(Pj*Kj) - Pi*Ki)
Qi*V(R, Ki) >= Pow(S, N)*(Sigma(Qj/Pow(S, N)*Kj) - Qi/Pow(S, N)*Ki)
Qi*V(R, Ki) >= Sigma(Qj*Kj) - Qi*Ki
V(R, Ki) >= Sigma(Qj*Kj)/Qi - Ki
Let's assume EE = Sigma(Qj*Kj), which is a constant value given N dices with S sides. Then the above constraint tells us that at the current state V(R, Ki), if R is not less than EE/Qi - Ki, then the player should stop rolling and keep the reward.
Calculate the game value:
GV = Sigma(Pi*V(0, Ki)), where
i: [N, N*S]
V(R, Ki) = R, if V(R, Ki) >= Sigma(Qj*Kj)/Qi - Ki, else
= V(R, Ki) + E(Kj) - Pi(V(R, Ki) + Ki))
The strategy value of this concrete example:
Let's take a look at a concrete case as N = 2 and S = 6. Then
K = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Q = {1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1}
Pow(S, N) = Pow(6, 2) = 6*6 = 36, where
Sigma(Q) = Pow(S, N) = 36
P = {1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36}, where
Sigma(P) = 1
EE = 252
Then now calculate,
V(R, 2) >= 252/1 - 2 = 250
V(R, 3) >= 252/2 - 3 = 123
V(R, 4) >= 252/3 - 4 = 80
V(R, 5) >= 252/4 - 5 = 58
V(R, 6) >= 252/5 - 6 = 44.4
V(R, 7) >= 252/6 - 7 = 35
V(R, 8) >= 252/5 - 8 = 42.4
V(R, 9) >= 252/4 - 9 = 54
V(R, 10) >= 252/3 - 10 = 74
V(R, 11) >= 252/2 - 11 = 115
V(R, 12) >= 252/1 - 12 = 240
3. C++ Implementation
// ********************************************************************************
#include <iostream>
#include <math.h>
#include <numeric>
#include <time.h>
#include <unordered_map>
#include <utility>
#include <vector>
// In blue color, the implementation of
// Dynamic programming - Sum K of N dices with S sides via hash map
struct DiceKey {
size_t nDice;
size_t curSum;
bool operator()(const DiceKey& right) const {
return nDice == right.nDice && curSum == right.curSum;
}
};
struct DiceKey_Hash {
size_t operator()(const DiceKey& key) const {
return std::hash<size_t>()(key.nDice) ^ std::hash<size_t>()(key.curSum);
}
bool operator()(const DiceKey& k1, const DiceKey& k2) const {
return k1.operator()(k2);
}
};
typedef std::unordered_map<DiceKey, size_t, DiceKey_Hash, DiceKey_Hash> DiceCacheMap;
size_t SumOfDiceAlg(long s, long n, long k, DiceCacheMap &dcm)
{
if (k < n || k >(n*s)) {
// no solution found in this case
return 0;
}
if (n == 1 && k > 0 && k <= s) {
// found a solution here
return 1;
}
const DiceKey key{ n, k };
if (dcm.find(key) != dcm.end()) {
return dcm[key];
}
// this loop is the sub-problem illustration
// f(S, N, K) = sigma(f(S, N-1, K-Si))
size_t count = 0;
for (long j = 1; j <= s; ++j) {
count += SumOfDiceAlg(s, n - 1, k - j, dcm);
}
dcm.insert(std::make_pair(key, count));
return count;
}
size_t SumOfDice(long s, long n, long k)
{
DiceCacheMap dcm;
size_t count = SumOfDiceAlg(s, n, k, dcm);
return count;
}
// Calculate "Qi" and "Pi"
void CalculateQiAndPiOfSums(size_t s,
size_t n,
std::vector<size_t>& counts,
std::vector<double>& probablity)
{
if (s == 0 || n == 0) {
return;
}
counts.clear();
counts.reserve(s*n - n + 1);
probablity.clear();
probablity.reserve(s*n - n + 1);
double allCombinations = pow((double)s, (double)n);
for (size_t count, sum = n; sum <= n*s; ++sum) {
count = SumOfDice(s, n, sum);
counts.push_back(count);
probablity.push_back((double)count / allCombinations);
}
}
// Calculate "EE"
size_t CalculateSigmaQiKi(size_t sides,
size_t nDices,
const std::vector<size_t>& counts)
{
size_t sigma = 0;
for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
sigma += sum * counts[sum - nDices];
}
return sigma;
}
// Calculate "Strategy" and "Pi"
void CalculateStrategyAndPiOfSums(size_t sides,
size_t nDices,
std::vector<double>& strategyVals,
std::vector<double>& probablity)
{
std::vector<size_t> counts;
CalculateQiAndPiOfSums(sides,
nDices,
counts,
probablity);
size_t sigma = CalculateSigmaQiKi(sides,
nDices,
counts);
strategyVals.clear();
strategyVals.reserve(counts.size());
double strategyVal;
for (size_t indx = 0, sum = nDices; indx < counts.size(); ++indx, ++sum) {
strategyVal = (double)sigma / (double)counts[indx] - sum;
strategyVals.push_back(strategyVal);
}
}
struct CurrentState{
double reward;
size_t curSum;
CurrentState(double r, size_t s)
: reward(r), curSum(s)
{}
bool operator()(const CurrentState& right) const {
return reward == right.reward && curSum == right.curSum;
}
};
struct CurrentState_HASH{
size_t operator()(const CurrentState& key) const {
return std::hash<double>()(key.reward) ^ std::hash<size_t>()(key.curSum);
}
bool operator()(const CurrentState& k1, const CurrentState& k2) const {
return k1.operator()(k2);
}
};
typedef std::unordered_map<CurrentState, double, CurrentState_HASH, CurrentState_HASH> CS_CACHE;
double DiceGameStrategy(size_t sides,
size_t nDices,
const std::vector<double>& strategyVals,
const std::vector<double>& probablity,
double reward,
size_t curSum,
CS_CACHE& cache)
{
if (reward >= strategyVals[curSum - nDices]) {
return reward;
}
const CurrentState key{ reward, curSum };
{
CS_CACHE::const_iterator cachedIter = cache.find(key);
if (cachedIter != cache.end()) {
return cachedIter->second;
}
}
double expectedReward = 0;
double p;
for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
if (sum != curSum) {
p = probablity[sum - nDices];
expectedReward += p* DiceGameStrategy(sides,
nDices,
strategyVals,
probablity,
reward + sum,
sum,
cache);
}
}
cache.insert(std::make_pair(key, expectedReward));
return expectedReward;
}
// Calculate "GV"
double DiceGame(size_t sides, size_t nDices)
{
std::vector<double> strategyVals;
std::vector<double> probablity;
CalculateStrategyAndPiOfSums(sides,
nDices,
strategyVals,
probablity);
CS_CACHE cache;
double gameValue = 0;
// std::vector<double> gameValues;
for (size_t sum = nDices; sum < (sides * nDices + 1); ++sum) {
double val = DiceGameStrategy(sides,
nDices,
strategyVals,
probablity,
0,
sum,
cache);
gameValue += probablity[sum - nDices] * val;
//gameValues.push_back(val);
}
return gameValue;
}
// test
int _tmain(int argc, _TCHAR* argv[])
{
DiceGame(6, 2); // gameValue: 22.224974509843324
DiceGame(6, 3); // gameValue: 40.471468665548528
DiceGame(6, 4); // gameValue 61.969998311984483
return 0;
}
//********************************************************************************
4. Further Discussion
I believe that one of key deciders of the game value is the game strategy. Currently the game strategy is only to keep rolling if the next rolling is to improve the reward statistically, otherwise stop rolling and keep the reward. This might not be the optimal strategy, because I have not been able to prove it is optimal, or the global optima. The question is if the short-run setback on the reward is possible to bring the long-run optimal.
Bibliography:
[1] http://cpluspluslearning-petert.blogspot.co.uk/2014/04/dynamic-programming-sum-k-of-dice-n.html
[2] Matthew M. Conroy, "A Collection of Dice Problems", Version June 15, 2012
Dice: S sides, 1, 2, 3, ..., S
Roll N dices,
- If the sum of N dices is not equal to last rolling, then the reward is sum together
- If the sum of N dices is equal to last rolling, then the reward is 0.
Calculate the game value. Here is a concrete example.
2. Analysis
Suppose that on the current state, the accumulated rewards as R and current dice sum is K represent the current state as V(R, K). Now what is the strategy to stop rolling.
The Game Strategy: stop rolling and keep the current reward if and only if the next rolling will not increase the reward statistically.
Given N dices with S sides, where each side of the dice is from 1, 2, ..., S, then the possible value of SUM K is [N, N*S]. Let's use Qi to represent the total combination of SUM Ki given N dices and S sides and Qi can be computed as Dynamic programming - Sum K of N dices with S sides, where Qi should meet this constraint,
Sigma(Qi) = Pow(S, N), where i within [N, N*S].
Then the probability Pi of rolling Ki is Qi/Pow(S, N) : Pi = Qi/Pow(S, N).
According to the strategy, it can be represented mathematically as
V(R, Ki) >= E(V(V(R, Ki), Kj)), where
E: represents the expected value given the fact of V(R, Ki) and rolling again.
According to the game:
If Ki = Kj, then the reward is 0,
If Ki != Kj, then the reward is V(R, Ki) + Kj
Hence E(V(V(R, Ki), Kj) can be computed as
Pn*(V(R, Ki) + Kn)) + Pn+1*(V(R, Ki) + Kn+1)) + ... + Pi-1(V(R, Ki) + Ki-1)) +
Pi*0 + Pi+1(V(R, Ki) + Ki+1)) + ... + Pn*s *(V(R, Ki) + Kn*s))
= Sigma(Pj*(V(R,Ki) + Kj)) - Pi*(V(R, Ki) + Ki)) , where
j belongs to [N, N*S]
According to the strategy, now we should stop rolling if
V(R, Ki) >= E(V(V(R, Ki), Kj)) ==>
V(R, Ki) >= Sigma(Pj*(V(R, Ki) + Kj)) - Pi*(V(R, Ki) + Ki))
V(R, Ki) >= Sigma(Pj*V(R, Ki)) + Sigma(Pj*Kj) - Pi(V(R, Ki) + Ki)), where
Sigma(Pj*V(R, Ki)) = V(R, Ki) * Sigma(Pj) = V(R, Ki)
Sigma(Pj*Kj) = E(Kj), expected value of SUM of N dices with S sides
V(R, Ki) >= V(R, Ki) + Sigma(Pj*Kj) - Pi*V(R, Ki) - Pi*Ki
Pi*V(R, Ki) >= Sigma(Pj*Kj) - Pi*Ki, where
Pi = Qi/Pow(S, N) > 0, both Qi > 0 and Pow(S, N) > 0
Qi*V(R, Ki) >= Pow(S, N)*(Sigma(Pj*Kj) - Pi*Ki)
Qi*V(R, Ki) >= Pow(S, N)*(Sigma(Qj/Pow(S, N)*Kj) - Qi/Pow(S, N)*Ki)
Qi*V(R, Ki) >= Sigma(Qj*Kj) - Qi*Ki
V(R, Ki) >= Sigma(Qj*Kj)/Qi - Ki
Let's assume EE = Sigma(Qj*Kj), which is a constant value given N dices with S sides. Then the above constraint tells us that at the current state V(R, Ki), if R is not less than EE/Qi - Ki, then the player should stop rolling and keep the reward.
Calculate the game value:
GV = Sigma(Pi*V(0, Ki)), where
i: [N, N*S]
V(R, Ki) = R, if V(R, Ki) >= Sigma(Qj*Kj)/Qi - Ki, else
= V(R, Ki) + E(Kj) - Pi(V(R, Ki) + Ki))
The strategy value of this concrete example:
Let's take a look at a concrete case as N = 2 and S = 6. Then
K = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Q = {1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1}
Pow(S, N) = Pow(6, 2) = 6*6 = 36, where
Sigma(Q) = Pow(S, N) = 36
P = {1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36}, where
Sigma(P) = 1
EE = 252
Then now calculate,
V(R, 2) >= 252/1 - 2 = 250
V(R, 3) >= 252/2 - 3 = 123
V(R, 4) >= 252/3 - 4 = 80
V(R, 5) >= 252/4 - 5 = 58
V(R, 6) >= 252/5 - 6 = 44.4
V(R, 7) >= 252/6 - 7 = 35
V(R, 8) >= 252/5 - 8 = 42.4
V(R, 9) >= 252/4 - 9 = 54
V(R, 10) >= 252/3 - 10 = 74
V(R, 11) >= 252/2 - 11 = 115
V(R, 12) >= 252/1 - 12 = 240
3. C++ Implementation
// ********************************************************************************
#include <iostream>
#include <math.h>
#include <numeric>
#include <time.h>
#include <unordered_map>
#include <utility>
#include <vector>
// In blue color, the implementation of
// Dynamic programming - Sum K of N dices with S sides via hash map
struct DiceKey {
size_t nDice;
size_t curSum;
bool operator()(const DiceKey& right) const {
return nDice == right.nDice && curSum == right.curSum;
}
};
struct DiceKey_Hash {
size_t operator()(const DiceKey& key) const {
return std::hash<size_t>()(key.nDice) ^ std::hash<size_t>()(key.curSum);
}
bool operator()(const DiceKey& k1, const DiceKey& k2) const {
return k1.operator()(k2);
}
};
typedef std::unordered_map<DiceKey, size_t, DiceKey_Hash, DiceKey_Hash> DiceCacheMap;
size_t SumOfDiceAlg(long s, long n, long k, DiceCacheMap &dcm)
{
if (k < n || k >(n*s)) {
// no solution found in this case
return 0;
}
if (n == 1 && k > 0 && k <= s) {
// found a solution here
return 1;
}
const DiceKey key{ n, k };
if (dcm.find(key) != dcm.end()) {
return dcm[key];
}
// this loop is the sub-problem illustration
// f(S, N, K) = sigma(f(S, N-1, K-Si))
size_t count = 0;
for (long j = 1; j <= s; ++j) {
count += SumOfDiceAlg(s, n - 1, k - j, dcm);
}
dcm.insert(std::make_pair(key, count));
return count;
}
size_t SumOfDice(long s, long n, long k)
{
DiceCacheMap dcm;
size_t count = SumOfDiceAlg(s, n, k, dcm);
return count;
}
// Calculate "Qi" and "Pi"
void CalculateQiAndPiOfSums(size_t s,
size_t n,
std::vector<size_t>& counts,
std::vector<double>& probablity)
{
if (s == 0 || n == 0) {
return;
}
counts.clear();
counts.reserve(s*n - n + 1);
probablity.clear();
probablity.reserve(s*n - n + 1);
double allCombinations = pow((double)s, (double)n);
for (size_t count, sum = n; sum <= n*s; ++sum) {
count = SumOfDice(s, n, sum);
counts.push_back(count);
probablity.push_back((double)count / allCombinations);
}
}
// Calculate "EE"
size_t CalculateSigmaQiKi(size_t sides,
size_t nDices,
const std::vector<size_t>& counts)
{
size_t sigma = 0;
for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
sigma += sum * counts[sum - nDices];
}
return sigma;
}
// Calculate "Strategy" and "Pi"
void CalculateStrategyAndPiOfSums(size_t sides,
size_t nDices,
std::vector<double>& strategyVals,
std::vector<double>& probablity)
{
std::vector<size_t> counts;
CalculateQiAndPiOfSums(sides,
nDices,
counts,
probablity);
size_t sigma = CalculateSigmaQiKi(sides,
nDices,
counts);
strategyVals.clear();
strategyVals.reserve(counts.size());
double strategyVal;
for (size_t indx = 0, sum = nDices; indx < counts.size(); ++indx, ++sum) {
strategyVal = (double)sigma / (double)counts[indx] - sum;
strategyVals.push_back(strategyVal);
}
}
struct CurrentState{
double reward;
size_t curSum;
CurrentState(double r, size_t s)
: reward(r), curSum(s)
{}
bool operator()(const CurrentState& right) const {
return reward == right.reward && curSum == right.curSum;
}
};
struct CurrentState_HASH{
size_t operator()(const CurrentState& key) const {
return std::hash<double>()(key.reward) ^ std::hash<size_t>()(key.curSum);
}
bool operator()(const CurrentState& k1, const CurrentState& k2) const {
return k1.operator()(k2);
}
};
typedef std::unordered_map<CurrentState, double, CurrentState_HASH, CurrentState_HASH> CS_CACHE;
double DiceGameStrategy(size_t sides,
size_t nDices,
const std::vector<double>& strategyVals,
const std::vector<double>& probablity,
double reward,
size_t curSum,
CS_CACHE& cache)
{
if (reward >= strategyVals[curSum - nDices]) {
return reward;
}
const CurrentState key{ reward, curSum };
{
CS_CACHE::const_iterator cachedIter = cache.find(key);
if (cachedIter != cache.end()) {
return cachedIter->second;
}
}
double expectedReward = 0;
double p;
for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
if (sum != curSum) {
p = probablity[sum - nDices];
expectedReward += p* DiceGameStrategy(sides,
nDices,
strategyVals,
probablity,
reward + sum,
sum,
cache);
}
}
cache.insert(std::make_pair(key, expectedReward));
return expectedReward;
}
// Calculate "GV"
double DiceGame(size_t sides, size_t nDices)
{
std::vector<double> strategyVals;
std::vector<double> probablity;
CalculateStrategyAndPiOfSums(sides,
nDices,
strategyVals,
probablity);
CS_CACHE cache;
double gameValue = 0;
// std::vector<double> gameValues;
for (size_t sum = nDices; sum < (sides * nDices + 1); ++sum) {
double val = DiceGameStrategy(sides,
nDices,
strategyVals,
probablity,
0,
sum,
cache);
gameValue += probablity[sum - nDices] * val;
//gameValues.push_back(val);
}
return gameValue;
}
// test
int _tmain(int argc, _TCHAR* argv[])
{
DiceGame(6, 2); // gameValue: 22.224974509843324
DiceGame(6, 3); // gameValue: 40.471468665548528
DiceGame(6, 4); // gameValue 61.969998311984483
return 0;
}
//********************************************************************************
4. Further Discussion
I believe that one of key deciders of the game value is the game strategy. Currently the game strategy is only to keep rolling if the next rolling is to improve the reward statistically, otherwise stop rolling and keep the reward. This might not be the optimal strategy, because I have not been able to prove it is optimal, or the global optima. The question is if the short-run setback on the reward is possible to bring the long-run optimal.
Bibliography:
[1] http://cpluspluslearning-petert.blogspot.co.uk/2014/04/dynamic-programming-sum-k-of-dice-n.html
[2] Matthew M. Conroy, "A Collection of Dice Problems", Version June 15, 2012
[3] Donald E. Knuth "The Art of Computer Programming", Volume 1, Fundamental Algorithms, Third Edition
[4] Donald E. Knuth "The Art of Computer Programming", Volume 2, Seminumerical Algorithms, Third Edition
[5] http://math.stackexchange.com/questions/302472/whats-the-optimal-strategy-of-this-dice-game
[5] http://math.stackexchange.com/questions/302472/whats-the-optimal-strategy-of-this-dice-game
Saturday, 6 September 2014
C++ - Integer Comparison: A Hidden Bug in Warning
Recently I was asked to review a fix on a C++ warning on Linux by one of my colleagues. Firstly I was put on a test if I can spot the bug, which is pointed out by a warning generated by gcc.
// Code 1 - warning/bug
//********************************************************************************
char buf[16];
// ......
if (buf[i] == 0xC0 && buf[i+1] == 0x80) {
// ......
}
//********************************************************************************
The warning points to the if statement in the above code. Before I was presented the warning message, my colleague asked if I can spot the bug on that line. I wasn't able to spot it. Then I was shown the warning message,
"warning: comparison is always false due to limited range of data type"
This message actually is very helpful. It is telling me that in the comparison statement the two data types do not match and one data type actually has limited range of representing another one. For sure buf[i] has data type of char, with 1 byte size, and by default char is singed value. (This singed/unsigned is configurable in compiler. In my case by default char/int/long/... are all signed data type.) Because signed char is the smallest integer data type, it makes me think that it must the 0xC0 and 0x80 is treated as another data type by the compiler and for sure this data type is not signed char.
1. Investigation and Fix
After googling this warning message and further checking the standard, it turns out that these constant values, 0xC0 and 0x80, can only be treated as right-hand-value (RHS). And if not data type specified the compiler will treat them as data type of signed/unsigned int. If the constant value is less than ~2 billion, then it is treated as (signed) int and if more than that, it is treated as unsigned int. (Of course the standard allows to specify the type deterministically, for instance 0xC0L as long type, 0xC0UL as unsigned long type.)
Let's come back to the above if statement. 0xC0 and 0x80 is treated as (signed) int data type by default, as they are both less than 2 billion. However buf[i] and buf[i+1] has data type of signed char, which can only represent [-128, 127]. However 0xC0 and 0x80 has value of 192 and 128 respectively, which are out of the range of signed char can represent. Therefore the comparison will be always false. The warning message is spot-on. And the fix is rather simple - casting the constant value into the data type that you want.
// Code 2 - fix of Code 1
//********************************************************************************
char buf[16];
// ......
if (buf[i] == (char)0xC0 && buf[i+1] == (char)0x80) {
// ......
}
//********************************************************************************
// Code 1 - warning/bug
//********************************************************************************
char buf[16];
// ......
if (buf[i] == 0xC0 && buf[i+1] == 0x80) {
// ......
}
//********************************************************************************
The warning points to the if statement in the above code. Before I was presented the warning message, my colleague asked if I can spot the bug on that line. I wasn't able to spot it. Then I was shown the warning message,
"warning: comparison is always false due to limited range of data type"
This message actually is very helpful. It is telling me that in the comparison statement the two data types do not match and one data type actually has limited range of representing another one. For sure buf[i] has data type of char, with 1 byte size, and by default char is singed value. (This singed/unsigned is configurable in compiler. In my case by default char/int/long/... are all signed data type.) Because signed char is the smallest integer data type, it makes me think that it must the 0xC0 and 0x80 is treated as another data type by the compiler and for sure this data type is not signed char.
1. Investigation and Fix
After googling this warning message and further checking the standard, it turns out that these constant values, 0xC0 and 0x80, can only be treated as right-hand-value (RHS). And if not data type specified the compiler will treat them as data type of signed/unsigned int. If the constant value is less than ~2 billion, then it is treated as (signed) int and if more than that, it is treated as unsigned int. (Of course the standard allows to specify the type deterministically, for instance 0xC0L as long type, 0xC0UL as unsigned long type.)
Let's come back to the above if statement. 0xC0 and 0x80 is treated as (signed) int data type by default, as they are both less than 2 billion. However buf[i] and buf[i+1] has data type of signed char, which can only represent [-128, 127]. However 0xC0 and 0x80 has value of 192 and 128 respectively, which are out of the range of signed char can represent. Therefore the comparison will be always false. The warning message is spot-on. And the fix is rather simple - casting the constant value into the data type that you want.
// Code 2 - fix of Code 1
//********************************************************************************
char buf[16];
// ......
if (buf[i] == (char)0xC0 && buf[i+1] == (char)0x80) {
// ......
}
//********************************************************************************
However this warning is not always happening. It depends on what the constant value is. In this case if the const value is within [-128, 127], then there will be no such warning, which is expected as residing in the range of (signed) char can represent.
// Code 3 - no warning
//********************************************************************************
char buf[16];
// ......
if (buf[i] == 0x50 && buf[i+1] == 0x30) {
// ......
}
//********************************************************************************
//********************************************************************************
char buf[16];
// ......
if (buf[i] == 0x50 && buf[i+1] == 0x30) {
// ......
}
//********************************************************************************
2. Data Type Truncation and Promotion
Be careful of these warning on data type truncation, promotion and data type casting. These kind of data type truncation and promotion is quite common in string/char/file/Unicode manipulation.
// Code 4 - data type truncation, promotion
//********************************************************************************
char buf[] = {0x4D, 0x5A, 0x80, 0xC0}; // warning on data type truncation
// on 0x80 and 0xC0
// on 0x80 and 0xC0
// buf[2] and buf[3] are both negative value
int negativeC0 = buf[4]; // negativeC0 = 0xffffffC0, data promotion
int positiveC0 = 0xC0; // positiveC0 = 0x000000C0, 0xC0 default type as int
//********************************************************************************
3. MSVC Compiler
Also tried Code 1 on Miscrosoft Visual Studio Expression 2013. Unfortunately MSVC does not generate any warning to pin-point out the problem as gcc does, even though all level warnings are enabled.
Even worse MSVC generates a rather misleading warning against Code 2, "warning C4310: cast truncates constant value". This warning is rather a red-herring and does not help us to spot this bug at all. Overall MSVC compiler seems not doing a good job against this hidden bug.
4. Other Useful Threads on This Warning
http://www.linuxtopia.org/online_books/an_introduction_to_gcc/gccintro_71.html
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
http://stackoverflow.com/questions/13666033/c-truncation-of-constant-value
http://msdn.microsoft.com/en-us/library/z8f60833.aspx
Subscribe to:
Posts (Atom)