methods for bandit algorithms
Epsilon greedy is a randomized algorithm for the multi-armed bandit problem.
In every iteration, it either selects an action uniformly at random with probability \( \varepsilon_t\) or it greedily exploits the best action seen so far with probability \( 1 - \varepsilon_t \). In this implementation, \( \varepsilon_t \) decreases over time (number of selections performed), controlled by the epsilon parameter.
Exp.3 is a randomized selection method for the multi-armed bandit problem
Exp3 maintains a probability distribution according to which an action is drawn in every iteration. The probability distribution is a mixture between a uniform distribution and a softmax distribution based on the cumulative rewards of the actions. The weight of the uniform distribution in the mixture is controlled by the parameter \( \gamma \), ie., setting \( \gamma = 1\) uses a uniform distribution in every selection step. The cumulative reward for the actions can be fine-tuned by adding a general bias for all actions. The bias is given by the parameter \( \beta \).
UCB (Upper confidence bounds) is a deterministic selection algorithm for the multi-armed bandit problem. In every iteration, UCB selects the action that maximizes a tradeoff between its performance in the past and a variance term. The influence of the variance (confidence width) can be controlled by the parameter \( \alpha \).
Files | |
file | pub_bandit.h |
public methods for bandit algorithms | |
file | pub_bandit_epsgreedy.h |
public methods for the epsilon greedy bandit selector | |
file | pub_bandit_exp3.h |
public methods for Exp.3 | |
file | pub_bandit_ucb.h |
public methods for UCB bandit selection | |
SCIP_RETCODE SCIPbanditSelect | ( | SCIP_BANDIT * | bandit, |
int * | action | ||
) |
select the next action
bandit | bandit algorithm data structure |
action | pointer to store the selected action |
Definition at line 143 of file bandit.c.
References SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by selectNeighborhood().
SCIP_RETCODE SCIPbanditUpdate | ( | SCIP_BANDIT * | bandit, |
int | action, | ||
SCIP_Real | score | ||
) |
update the score of the selected action
bandit | bandit algorithm data structure |
action | index of action for which the score should be updated |
score | observed gain of the i'th action |
Definition at line 164 of file bandit.c.
References SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by updateBanditAlgorithms().
const char* SCIPbanditvtableGetName | ( | SCIP_BANDITVTABLE * | banditvtable | ) |
return the name of this bandit virtual function table
banditvtable | virtual table for bandit algorithm |
Definition at line 244 of file bandit.c.
References SCIP_BanditVTable::name.
Referenced by SCIPsetSortProps().
SCIP_RANDNUMGEN* SCIPbanditGetRandnumgen | ( | SCIP_BANDIT * | bandit | ) |
return the random number generator of a bandit algorithm
bandit | bandit algorithm data structure |
Definition at line 255 of file bandit.c.
References SCIP_Bandit::rng.
Referenced by alnsFixMoreVariables(), alnsUnfixVariables(), dataReset(), and SCIP_DECL_BANDITSELECT().
int SCIPbanditGetNActions | ( | SCIP_BANDIT * | bandit | ) |
return number of actions of this bandit algorithm
bandit | bandit algorithm data structure |
Definition at line 265 of file bandit.c.
References SCIP_Bandit::nactions.
Referenced by SCIP_DECL_BANDITFREE(), SCIP_DECL_BANDITRESET(), SCIP_DECL_BANDITSELECT(), SCIP_DECL_BANDITUPDATE(), SCIP_DECL_HEURINITSOL(), SCIPbanditReset(), SCIPbanditSelect(), SCIPbanditUpdate(), SCIPgetConfidenceBoundUcb(), and SCIPgetProbabilityExp3().
SCIP_RETCODE SCIPcreateBanditEpsgreedy | ( | SCIP * | scip, |
SCIP_BANDIT ** | epsgreedy, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | eps, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and resets an epsilon greedy bandit algorithm
scip | SCIP data structure |
epsgreedy | pointer to store the epsilon greedy bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
eps | parameter to increase probability for exploration between all actions |
nactions | the positive number of possible actions |
initseed | initial seed for random number generation |
Definition at line 235 of file bandit_epsgreedy.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateEpsgreedy(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
SCIP_Real* SCIPgetWeightsEpsgreedy | ( | SCIP_BANDIT * | epsgreedy | ) |
get weights array of epsilon greedy bandit algorithm
epsgreedy | epsilon greedy bandit algorithm |
Definition at line 262 of file bandit_epsgreedy.c.
References SCIPbanditGetData().
Referenced by printNeighborhoodStatistics().
void SCIPsetEpsilonEpsgreedy | ( | SCIP_BANDIT * | epsgreedy, |
SCIP_Real | eps | ||
) |
set epsilon parameter of epsilon greedy bandit algorithm
epsgreedy | epsilon greedy bandit algorithm |
eps | parameter to increase probability for exploration between all actions |
Definition at line 275 of file bandit_epsgreedy.c.
References eps, and SCIPbanditGetData().
SCIP_RETCODE SCIPcreateBanditExp3 | ( | SCIP * | scip, |
SCIP_BANDIT ** | exp3, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | gammaparam, | ||
SCIP_Real | beta, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
creates and resets an Exp.3 bandit algorithm using scip
pointer
scip | SCIP data structure |
exp3 | pointer to store bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
gammaparam | weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution |
beta | gain offset between 0 and 1 at every observation |
nactions | the positive number of actions for this bandit algorithm |
initseed | initial seed for random number generation |
Definition at line 299 of file bandit_exp3.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateExp3(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
void SCIPsetGammaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | gammaparam | ||
) |
set gamma parameter of Exp.3 bandit algorithm to increase weight of uniform distribution
exp3 | bandit algorithm |
gammaparam | weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution |
Definition at line 325 of file bandit_exp3.c.
References SCIPbanditGetData().
void SCIPsetBetaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | beta | ||
) |
set beta parameter of Exp.3 bandit algorithm to increase gain offset for actions that were not played
exp3 | bandit algorithm |
beta | gain offset between 0 and 1 at every observation |
Definition at line 338 of file bandit_exp3.c.
References SCIPbanditGetData().
SCIP_Real SCIPgetProbabilityExp3 | ( | SCIP_BANDIT * | exp3, |
int | action | ||
) |
returns probability to play an action
exp3 | bandit algorithm |
action | index of the requested action |
Definition at line 351 of file bandit_exp3.c.
References SCIP_Real, SCIPbanditGetData(), and SCIPbanditGetNActions().
Referenced by printNeighborhoodStatistics().
SCIP_RETCODE SCIPcreateBanditUcb | ( | SCIP * | scip, |
SCIP_BANDIT ** | ucb, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | alpha, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and reset UCB bandit algorithm
scip | SCIP data structure |
ucb | pointer to store bandit algorithm |
priorities | nonnegative priorities for each action, or NULL if not needed |
alpha | parameter to increase confidence width |
nactions | the positive number of actions for this bandit algorithm |
initseed | initial random number seed |
Definition at line 325 of file bandit_ucb.c.
References BANDIT_NAME, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateUcb(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
SCIP_Real SCIPgetConfidenceBoundUcb | ( | SCIP_BANDIT * | ucb, |
int | action | ||
) |
returns the upper confidence bound of a selected action
ucb | UCB bandit algorithm |
action | index of the queried action |
Definition at line 250 of file bandit_ucb.c.
References SCIP_Real, SCIPbanditGetData(), SCIPbanditGetNActions(), and sqrt().
Referenced by printNeighborhoodStatistics().
int* SCIPgetStartPermutationUcb | ( | SCIP_BANDIT * | ucb | ) |
return start permutation of the UCB bandit algorithm
ucb | UCB bandit algorithm |
Definition at line 279 of file bandit_ucb.c.
References SCIPbanditGetData().