Detailed Description
methods for bandit algorithms
Epsilon greedy
Epsilon greedy is a randomized algorithm for the multi-armed bandit problem.
In every iteration, it either selects an action uniformly at random with probability \( \varepsilon_t\) or it greedily exploits the best action seen so far with probability \( 1 - \varepsilon_t \). In this implementation, \( \varepsilon_t \) decreases over time (number of selections performed), controlled by the epsilon parameter.
Exp.3
Exp.3 is a randomized selection method for the multi-armed bandit problem
Exp3 maintains a probability distribution according to which an action is drawn in every iteration. The probability distribution is a mixture between a uniform distribution and a softmax distribution based on the cumulative rewards of the actions. The weight of the uniform distribution in the mixture is controlled by the parameter \( \gamma \), ie., setting \( \gamma = 1\) uses a uniform distribution in every selection step. The cumulative reward for the actions can be fine-tuned by adding a general bias for all actions. The bias is given by the parameter \( \beta \).
Upper Confidence Bounds (UCB)
UCB (Upper confidence bounds) is a deterministic selection algorithm for the multi-armed bandit problem. In every iteration, UCB selects the action that maximizes a tradeoff between its performance in the past and a variance term. The influence of the variance (confidence width) can be controlled by the parameter \( \alpha \).
Files | |
file | pub_bandit.h |
public methods for bandit algorithms | |
file | pub_bandit_epsgreedy.h |
public methods for the epsilon greedy bandit selector | |
file | pub_bandit_exp3.h |
public methods for Exp.3 | |
file | pub_bandit_ucb.h |
public methods for UCB bandit selection | |
Function Documentation
◆ SCIPbanditSelect()
SCIP_RETCODE SCIPbanditSelect | ( | SCIP_BANDIT * | bandit, |
int * | action | ||
) |
select the next action
- Parameters
-
bandit bandit algorithm data structure action pointer to store the selected action
Definition at line 143 of file bandit.c.
References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by selectNeighborhood().
◆ SCIPbanditUpdate()
SCIP_RETCODE SCIPbanditUpdate | ( | SCIP_BANDIT * | bandit, |
int | action, | ||
SCIP_Real | score | ||
) |
update the score of the selected action
- Parameters
-
bandit bandit algorithm data structure action index of action for which the score should be updated score observed gain of the i'th action
Definition at line 164 of file bandit.c.
References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.
Referenced by updateBanditAlgorithm().
◆ SCIPbanditvtableGetName()
const char* SCIPbanditvtableGetName | ( | SCIP_BANDITVTABLE * | banditvtable | ) |
return the name of this bandit virtual function table
- Parameters
-
banditvtable virtual table for bandit algorithm
Definition at line 272 of file bandit.c.
References SCIP_BanditVTable::name, and NULL.
Referenced by SCIPsetSortProps().
◆ SCIPbanditGetRandnumgen()
SCIP_RANDNUMGEN* SCIPbanditGetRandnumgen | ( | SCIP_BANDIT * | bandit | ) |
return the random number generator of a bandit algorithm
- Parameters
-
bandit bandit algorithm data structure
Definition at line 283 of file bandit.c.
References NULL, and SCIP_Bandit::rng.
Referenced by alnsFixMoreVariables(), alnsUnfixVariables(), dataReset(), SCIP_DECL_BANDITRESET(), and SCIP_DECL_BANDITSELECT().
◆ SCIPbanditGetNActions()
int SCIPbanditGetNActions | ( | SCIP_BANDIT * | bandit | ) |
return number of actions of this bandit algorithm
- Parameters
-
bandit bandit algorithm data structure
Definition at line 293 of file bandit.c.
References SCIP_Bandit::nactions, and NULL.
Referenced by SCIP_DECL_BANDITFREE(), SCIP_DECL_BANDITRESET(), SCIP_DECL_BANDITSELECT(), SCIP_DECL_BANDITUPDATE(), SCIP_DECL_HEURINITSOL(), SCIPbanditReset(), SCIPbanditSelect(), SCIPbanditUpdate(), SCIPgetConfidenceBoundUcb(), and SCIPgetProbabilityExp3().
◆ SCIPcreateBanditEpsgreedy()
SCIP_RETCODE SCIPcreateBanditEpsgreedy | ( | SCIP * | scip, |
SCIP_BANDIT ** | epsgreedy, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | eps, | ||
SCIP_Bool | preferrecent, | ||
SCIP_Real | decayfactor, | ||
int | avglim, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and resets an epsilon greedy bandit algorithm
- Parameters
-
scip SCIP data structure epsgreedy pointer to store the epsilon greedy bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed eps parameter to increase probability for exploration between all actions preferrecent should the weights be updated in an exponentially decaying way? decayfactor the factor to reduce the weight of older observations if exponential decay is enabled avglim nonnegative limit on observation number before the exponential decay starts, only relevant if exponential decay is enabled nactions the positive number of possible actions initseed initial seed for random number generation
Definition at line 270 of file bandit_epsgreedy.c.
References BANDIT_NAME, NULL, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateEpsgreedy(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
◆ SCIPgetWeightsEpsgreedy()
SCIP_Real* SCIPgetWeightsEpsgreedy | ( | SCIP_BANDIT * | epsgreedy | ) |
get weights array of epsilon greedy bandit algorithm
- Parameters
-
epsgreedy epsilon greedy bandit algorithm
Definition at line 302 of file bandit_epsgreedy.c.
References NULL, and SCIPbanditGetData().
Referenced by printNeighborhoodStatistics().
◆ SCIPsetEpsilonEpsgreedy()
void SCIPsetEpsilonEpsgreedy | ( | SCIP_BANDIT * | epsgreedy, |
SCIP_Real | eps | ||
) |
set epsilon parameter of epsilon greedy bandit algorithm
- Parameters
-
epsgreedy epsilon greedy bandit algorithm eps parameter to increase probability for exploration between all actions
Definition at line 315 of file bandit_epsgreedy.c.
References eps, NULL, and SCIPbanditGetData().
◆ SCIPcreateBanditExp3()
SCIP_RETCODE SCIPcreateBanditExp3 | ( | SCIP * | scip, |
SCIP_BANDIT ** | exp3, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | gammaparam, | ||
SCIP_Real | beta, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
creates and resets an Exp.3 bandit algorithm using scip
pointer
- Parameters
-
scip SCIP data structure exp3 pointer to store bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed gammaparam weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution beta gain offset between 0 and 1 at every observation nactions the positive number of actions for this bandit algorithm initseed initial seed for random number generation
Definition at line 301 of file bandit_exp3.c.
References BANDIT_NAME, NULL, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateExp3(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
◆ SCIPsetGammaExp3()
void SCIPsetGammaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | gammaparam | ||
) |
set gamma parameter of Exp.3 bandit algorithm to increase weight of uniform distribution
- Parameters
-
exp3 bandit algorithm gammaparam weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution
Definition at line 327 of file bandit_exp3.c.
References SCIPbanditGetData().
◆ SCIPsetBetaExp3()
void SCIPsetBetaExp3 | ( | SCIP_BANDIT * | exp3, |
SCIP_Real | beta | ||
) |
set beta parameter of Exp.3 bandit algorithm to increase gain offset for actions that were not played
- Parameters
-
exp3 bandit algorithm beta gain offset between 0 and 1 at every observation
Definition at line 340 of file bandit_exp3.c.
References SCIPbanditGetData().
◆ SCIPgetProbabilityExp3()
SCIP_Real SCIPgetProbabilityExp3 | ( | SCIP_BANDIT * | exp3, |
int | action | ||
) |
returns probability to play an action
- Parameters
-
exp3 bandit algorithm action index of the requested action
Definition at line 353 of file bandit_exp3.c.
References SCIP_Real, SCIPbanditGetData(), and SCIPbanditGetNActions().
Referenced by printNeighborhoodStatistics().
◆ SCIPcreateBanditUcb()
SCIP_RETCODE SCIPcreateBanditUcb | ( | SCIP * | scip, |
SCIP_BANDIT ** | ucb, | ||
SCIP_Real * | priorities, | ||
SCIP_Real | alpha, | ||
int | nactions, | ||
unsigned int | initseed | ||
) |
create and reset UCB bandit algorithm
- Parameters
-
scip SCIP data structure ucb pointer to store bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed alpha parameter to increase confidence width nactions the positive number of actions for this bandit algorithm initseed initial random number seed
Definition at line 329 of file bandit_ucb.c.
References BANDIT_NAME, NULL, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditCreateUcb(), SCIPblkmem(), SCIPbuffer(), SCIPerrorMessage, SCIPfindBanditvtable(), and SCIPinitializeRandomSeed().
Referenced by createBandit().
◆ SCIPgetConfidenceBoundUcb()
SCIP_Real SCIPgetConfidenceBoundUcb | ( | SCIP_BANDIT * | ucb, |
int | action | ||
) |
returns the upper confidence bound of a selected action
- Parameters
-
ucb UCB bandit algorithm action index of the queried action
Definition at line 254 of file bandit_ucb.c.
References LOG1P, NULL, SCIP_Real, SCIPbanditGetData(), SCIPbanditGetNActions(), and sqrt().
Referenced by printNeighborhoodStatistics().
◆ SCIPgetStartPermutationUcb()
int* SCIPgetStartPermutationUcb | ( | SCIP_BANDIT * | ucb | ) |
return start permutation of the UCB bandit algorithm
- Parameters
-
ucb UCB bandit algorithm
Definition at line 283 of file bandit_ucb.c.
References NULL, and SCIPbanditGetData().
◆ SCIPincludeBanditvtable()
SCIP_RETCODE SCIPincludeBanditvtable | ( | SCIP * | scip, |
SCIP_BANDITVTABLE ** | banditvtable, | ||
const char * | name, | ||
SCIP_DECL_BANDITFREE((*banditfree)) | , | ||
SCIP_DECL_BANDITSELECT((*banditselect)) | , | ||
SCIP_DECL_BANDITUPDATE((*banditupdate)) | , | ||
SCIP_DECL_BANDITRESET((*banditreset)) | |||
) |
includes a bandit algorithm virtual function table
- Parameters
-
scip SCIP data structure banditvtable bandit algorithm virtual function table name a name for the algorithm represented by this vtable
Definition at line 32 of file scip_bandit.c.
References NULL, SCIP_CALL, SCIP_INVALIDDATA, SCIP_OKAY, SCIPbanditvtableCreate(), SCIPerrorMessage, SCIPfindBanditvtable(), SCIPsetIncludeBanditvtable(), and Scip::set.
Referenced by SCIPincludeBanditvtableEpsgreedy(), SCIPincludeBanditvtableExp3(), and SCIPincludeBanditvtableUcb().
◆ SCIPfindBanditvtable()
SCIP_BANDITVTABLE* SCIPfindBanditvtable | ( | SCIP * | scip, |
const char * | name | ||
) |
returns the bandit virtual function table of the given name, or NULL if not existing
- Parameters
-
scip SCIP data structure name name of bandit algorithm virtual function table
Definition at line 64 of file scip_bandit.c.
References NULL, SCIPsetFindBanditvtable(), and Scip::set.
Referenced by SCIPcreateBanditEpsgreedy(), SCIPcreateBanditExp3(), SCIPcreateBanditUcb(), and SCIPincludeBanditvtable().
◆ SCIPfreeBandit()
SCIP_RETCODE SCIPfreeBandit | ( | SCIP * | scip, |
SCIP_BANDIT ** | bandit | ||
) |
calls destructor and frees memory of bandit algorithm
- Parameters
-
scip SCIP data structure bandit pointer to bandit algorithm data structure
Definition at line 91 of file scip_bandit.c.
References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditFree(), and SCIPblkmem().
Referenced by SCIP_DECL_HEURFREE(), and SCIP_DECL_HEURINITSOL().
◆ SCIPresetBandit()
SCIP_RETCODE SCIPresetBandit | ( | SCIP * | scip, |
SCIP_BANDIT * | bandit, | ||
SCIP_Real * | priorities, | ||
unsigned int | seed | ||
) |
reset the bandit algorithm
- Parameters
-
scip SCIP data structure bandit pointer to bandit algorithm data structure priorities priorities for every action, or NULL if not needed seed initial random seed for bandit selection
Definition at line 75 of file scip_bandit.c.
References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditReset(), SCIPbuffer(), and SCIPinitializeRandomSeed().
Referenced by SCIP_DECL_HEURINITSOL().