methods for bandit algorithms
Epsilon greedy is a randomized algorithm for the multi-armed bandit problem.
In every iteration, it either selects an action uniformly at random with probability \varepsilon_t or it greedily exploits the best action seen so far with probability 1 - \varepsilon_t . In this implementation, \varepsilon_t decreases over time (number of selections performed), controlled by the epsilon parameter.
Exp.3 is a randomized selection method for the multi-armed bandit problem
Exp3 maintains a probability distribution according to which an action is drawn in every iteration. The probability distribution is a mixture between a uniform distribution and a softmax distribution based on the cumulative rewards of the actions. The weight of the uniform distribution in the mixture is controlled by the parameter \gamma , ie., setting \gamma = 1 uses a uniform distribution in every selection step. The cumulative reward for the actions can be fine-tuned by adding a general bias for all actions. The bias is given by the parameter \beta .
UCB (Upper confidence bounds) is a deterministic selection algorithm for the multi-armed bandit problem. In every iteration, UCB selects the action that maximizes a tradeoff between its performance in the past and a variance term. The influence of the variance (confidence width) can be controlled by the parameter \alpha .
Files | |
file | pub_bandit.h |
public methods for bandit algorithms | |
file | pub_bandit_epsgreedy.h |
public methods for the epsilon greedy bandit selector | |
file | pub_bandit_exp3.h |
public methods for Exp.3 | |
file | pub_bandit_ucb.h |
public methods for UCB bandit selection | |
SCIP_RETCODE SCIPbanditSelect | ( | SCIP_BANDIT * | bandit, |
int * | action | ||
) |
select the next action
bandit | bandit algorithm data structure |
action | pointer to store the selected action |
Definition at line 143 of file bandit.c.
References SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by selectNeighborhood().
SCIP_RETCODE SCIPbanditUpdate | ( | SCIP_BANDIT * | bandit, |
int | action, | ||
SCIP_Real | score | ||
) |
update the score of the selected action
bandit | bandit algorithm data structure |
action | index of action for which the score should be updated |
score | observed gain of the i'th action |
Definition at line 164 of file bandit.c.
References SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by updateBanditAlgorithms().
const char* SCIPbanditvtableGetName | ( | SCIP_BANDITVTABLE * | banditvtable | ) |
return the name of this bandit virtual function table
banditvtable | virtual table for bandit algorithm |
Definition at line 244 of file bandit.c.
References SCIP_BanditVTable::name.
Referenced by SCIPsetSortProps().
SCIP_RANDNUMGEN* SCIPbanditGetRandnumgen | ( | SCIP_BANDIT * | bandit | ) |
return the random number generator of a bandit algorithm
bandit | bandit algorithm data structure |
Definition at line 255 of file bandit.c.
References SCIP_Bandit::rng.
Referenced by alnsFixMoreVariables(), alnsUnfixVariables(), dataReset(), and SCIP_DECL_BANDITSELECT().
int SCIPbanditGetNActions | ( | SCIP_BANDIT * | bandit | ) |
return number of actions of this bandit algorithm
bandit | bandit algorithm data structure |
Definition at line 265 of file bandit.c.
References SCIP_Bandit::nactions.
Referenced by SCIP_DECL_BANDITFREE(), SCIP_DECL_BANDITRESET(), SCIP_DECL_BANDITSELECT(), SCIP_DECL_BANDITUPDATE(), SCIP_DECL_HEURINITSOL(), SCIPbanditReset(), SCIPbanditSelect(), SCIPbanditUpdate(), SCIPgetConfidenceBoundUcb(), and SCIPgetProbabilityExp3().
SCIP_RETCODE SCIPcreateBanditEpsgreedy | ( | SCIP * | scip, |
SCIP_BANDIT ** | epsgreedy, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | eps, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and resets an epsilon greedy bandit algorithm
scip | SCIP data structure |
epsgreedy | pointer to store the epsilon greedy bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
eps | parameter to increase probability for exploration between all actions |
nactions | the positive number of possible actions |
initseed | initial seed for random number generation |
Definition at line 235 of file bandit_epsgreedy.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateEpsgreedy(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
SCIP_Real* SCIPgetWeightsEpsgreedy | ( | SCIP_BANDIT * | epsgreedy | ) |
get weights array of epsilon greedy bandit algorithm
epsgreedy | epsilon greedy bandit algorithm |
Definition at line 262 of file bandit_epsgreedy.c.
References SCIPbanditGetData().
Referenced by printNeighborhoodStatistics().
void SCIPsetEpsilonEpsgreedy | ( | SCIP_BANDIT * | epsgreedy, |
SCIP_Real | eps | ||
) |
set epsilon parameter of epsilon greedy bandit algorithm
epsgreedy | epsilon greedy bandit algorithm |
eps | parameter to increase probability for exploration between all actions |
Definition at line 275 of file bandit_epsgreedy.c.
References eps, and SCIPbanditGetData().
SCIP_RETCODE SCIPcreateBanditExp3 | ( | SCIP * | scip, |
SCIP_BANDIT ** | exp3, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | gammaparam, | ||
SCIP_Real | beta, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
creates and resets an Exp.3 bandit algorithm using scip
pointer
scip | SCIP data structure |
exp3 | pointer to store bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
gammaparam | weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution |
beta | gain offset between 0 and 1 at every observation |
nactions | the positive number of actions for this bandit algorithm |
initseed | initial seed for random number generation |
Definition at line 299 of file bandit_exp3.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateExp3(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
void SCIPsetGammaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | gammaparam | ||
) |
set gamma parameter of Exp.3 bandit algorithm to increase weight of uniform distribution
exp3 | bandit algorithm |
gammaparam | weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution |
Definition at line 325 of file bandit_exp3.c.
References SCIPbanditGetData().
void SCIPsetBetaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | beta | ||
) |
set beta parameter of Exp.3 bandit algorithm to increase gain offset for actions that were not played
exp3 | bandit algorithm |
beta | gain offset between 0 and 1 at every observation |
Definition at line 338 of file bandit_exp3.c.
References SCIPbanditGetData().
SCIP_Real SCIPgetProbabilityExp3 | ( | SCIP_BANDIT * | exp3, |
int | action | ||
) |
returns probability to play an action
exp3 | bandit algorithm |
action | index of the requested action |
Definition at line 351 of file bandit_exp3.c.
References SCIP_Real, SCIPbanditGetData(), and SCIPbanditGetNActions().
Referenced by printNeighborhoodStatistics().
SCIP_RETCODE SCIPcreateBanditUcb | ( | SCIP * | scip, |
SCIP_BANDIT ** | ucb, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | alpha, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and reset UCB bandit algorithm
scip | SCIP data structure |
ucb | pointer to store bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
alpha | parameter to increase confidence width |
nactions | the positive number of actions for this bandit algorithm |
initseed | initial random number seed |
Definition at line 325 of file bandit_ucb.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateUcb(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
SCIP_Real SCIPgetConfidenceBoundUcb | ( | SCIP_BANDIT * | ucb, |
int | action | ||
) |
returns the upper confidence bound of a selected action
ucb | UCB bandit algorithm |
action | index of the queried action |
Definition at line 250 of file bandit_ucb.c.
References SCIP_Real, SCIPbanditGetData(), SCIPbanditGetNActions(), and sqrt().
Referenced by printNeighborhoodStatistics().
int* SCIPgetStartPermutationUcb | ( | SCIP_BANDIT * | ucb | ) |
return start permutation of the UCB bandit algorithm
ucb | UCB bandit algorithm |
Definition at line 279 of file bandit_ucb.c.
References SCIPbanditGetData().