# SCIP

Solving Constraint Integer Programs

Bandit Algorithms

## Detailed Description

methods for bandit algorithms

## Epsilon greedy

Epsilon greedy is a randomized algorithm for the multi-armed bandit problem.

In every iteration, it either selects an action uniformly at random with probability $$\varepsilon_t$$ or it greedily exploits the best action seen so far with probability $$1 - \varepsilon_t$$. In this implementation, $$\varepsilon_t$$ decreases over time (number of selections performed), controlled by the epsilon parameter.

## Exp.3

Exp.3 is a randomized selection method for the multi-armed bandit problem

Exp3 maintains a probability distribution according to which an action is drawn in every iteration. The probability distribution is a mixture between a uniform distribution and a softmax distribution based on the cumulative rewards of the actions. The weight of the uniform distribution in the mixture is controlled by the parameter $$\gamma$$, ie., setting $$\gamma = 1$$ uses a uniform distribution in every selection step. The cumulative reward for the actions can be fine-tuned by adding a general bias for all actions. The bias is given by the parameter $$\beta$$.

## Exp.3-IX TODO: update

Exp.3 is a randomized selection method for the multi-armed bandit problem

Exp3 maintains a probability distribution according to which an action is drawn in every iteration. The probability distribution is a mixture between a uniform distribution and a softmax distribution based on the cumulative rewards of the actions. The weight of the uniform distribution in the mixture is controlled by the parameter $$\gamma$$, ie., setting $$\gamma = 1$$ uses a uniform distribution in every selection step. The cumulative reward for the actions can be fine-tuned by adding a general bias for all actions. The bias is given by the parameter $$\beta$$.

## Upper Confidence Bounds (UCB)

UCB (Upper confidence bounds) is a deterministic selection algorithm for the multi-armed bandit problem. In every iteration, UCB selects the action that maximizes a tradeoff between its performance in the past and a variance term. The influence of the variance (confidence width) can be controlled by the parameter $$\alpha$$.

## Functions

SCIP_RETCODE SCIPbanditSelect (SCIP_BANDIT *bandit, int *action)

SCIP_RETCODE SCIPbanditUpdate (SCIP_BANDIT *bandit, int action, SCIP_Real score)

const char * SCIPbanditvtableGetName (SCIP_BANDITVTABLE *banditvtable)

SCIP_RANDNUMGENSCIPbanditGetRandnumgen (SCIP_BANDIT *bandit)

int SCIPbanditGetNActions (SCIP_BANDIT *bandit)

SCIP_RETCODE SCIPcreateBanditEpsgreedy (SCIP *scip, SCIP_BANDIT **epsgreedy, SCIP_Real *priorities, SCIP_Real eps, SCIP_Bool usemodification, SCIP_Bool preferrecent, SCIP_Real decayfactor, int avglim, int nactions, unsigned int initseed)

SCIP_RealSCIPgetWeightsEpsgreedy (SCIP_BANDIT *epsgreedy)

void SCIPsetEpsilonEpsgreedy (SCIP_BANDIT *epsgreedy, SCIP_Real eps)

SCIP_RETCODE SCIPcreateBanditExp3 (SCIP *scip, SCIP_BANDIT **exp3, SCIP_Real *priorities, SCIP_Real gammaparam, SCIP_Real beta, int nactions, unsigned int initseed)

void SCIPsetGammaExp3 (SCIP_BANDIT *exp3, SCIP_Real gammaparam)

void SCIPsetBetaExp3 (SCIP_BANDIT *exp3, SCIP_Real beta)

SCIP_Real SCIPgetProbabilityExp3 (SCIP_BANDIT *exp3, int action)

SCIP_RETCODE SCIPcreateBanditExp3IX (SCIP *scip, SCIP_BANDIT **exp3ix, SCIP_Real *priorities, int nactions, unsigned int initseed)

SCIP_Real SCIPgetProbabilityExp3IX (SCIP_BANDIT *exp3ix, int action)

SCIP_RETCODE SCIPcreateBanditUcb (SCIP *scip, SCIP_BANDIT **ucb, SCIP_Real *priorities, SCIP_Real alpha, int nactions, unsigned int initseed)

SCIP_Real SCIPgetConfidenceBoundUcb (SCIP_BANDIT *ucb, int action)

int * SCIPgetStartPermutationUcb (SCIP_BANDIT *ucb)

SCIP_RETCODE SCIPincludeBanditvtable (SCIP *scip, SCIP_BANDITVTABLE **banditvtable, const char *name, SCIP_DECL_BANDITFREE((*banditfree)), SCIP_DECL_BANDITSELECT((*banditselect)), SCIP_DECL_BANDITUPDATE((*banditupdate)), SCIP_DECL_BANDITRESET((*banditreset)))

SCIP_BANDITVTABLESCIPfindBanditvtable (SCIP *scip, const char *name)

SCIP_RETCODE SCIPfreeBandit (SCIP *scip, SCIP_BANDIT **bandit)

SCIP_RETCODE SCIPresetBandit (SCIP *scip, SCIP_BANDIT *bandit, SCIP_Real *priorities, unsigned int seed)

## Files

file  pub_bandit.h
public methods for bandit algorithms

file  pub_bandit_epsgreedy.h
public methods for the epsilon greedy bandit selector

file  pub_bandit_exp3.h
public methods for Exp.3

file  pub_bandit_exp3ix.h
public methods for Exp.3-IX

file  pub_bandit_ucb.h
public methods for UCB bandit selection

## ◆ SCIPbanditSelect()

 SCIP_RETCODE SCIPbanditSelect ( SCIP_BANDIT * bandit, int * action )

select the next action

Parameters
 bandit bandit algorithm data structure action pointer to store the selected action

Definition at line 153 of file bandit.c.

References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.

Referenced by selectHeuristic(), and selectNeighborhood().

## ◆ SCIPbanditUpdate()

 SCIP_RETCODE SCIPbanditUpdate ( SCIP_BANDIT * bandit, int action, SCIP_Real score )

update the score of the selected action

Parameters
 bandit bandit algorithm data structure action index of action for which the score should be updated score observed gain of the i'th action

Definition at line 174 of file bandit.c.

References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditGetNActions(), and SCIP_Bandit::vtable.

Referenced by updateBanditAlgorithm(), and updateSelectionStrategy().

## ◆ SCIPbanditvtableGetName()

 const char* SCIPbanditvtableGetName ( SCIP_BANDITVTABLE * banditvtable )

return the name of this bandit virtual function table

Parameters
 banditvtable virtual table for bandit algorithm

Definition at line 282 of file bandit.c.

References SCIP_BanditVTable::name, and NULL.

Referenced by SCIPsetSortProps().

## ◆ SCIPbanditGetRandnumgen()

 SCIP_RANDNUMGEN* SCIPbanditGetRandnumgen ( SCIP_BANDIT * bandit )

return the random number generator of a bandit algorithm

Parameters
 bandit bandit algorithm data structure

Definition at line 293 of file bandit.c.

References NULL, and SCIP_Bandit::rng.

## ◆ SCIPbanditGetNActions()

 int SCIPbanditGetNActions ( SCIP_BANDIT * bandit )

return number of actions of this bandit algorithm

Parameters
 bandit bandit algorithm data structure

Definition at line 303 of file bandit.c.

References SCIP_Bandit::nactions, and NULL.

## ◆ SCIPcreateBanditEpsgreedy()

 SCIP_RETCODE SCIPcreateBanditEpsgreedy ( SCIP * scip, SCIP_BANDIT ** epsgreedy, SCIP_Real * priorities, SCIP_Real eps, SCIP_Bool usemodification, SCIP_Bool preferrecent, SCIP_Real decayfactor, int avglim, int nactions, unsigned int initseed )

create and resets an epsilon greedy bandit algorithm

Parameters
 scip SCIP data structure epsgreedy pointer to store the epsilon greedy bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed eps parameter to increase probability for exploration between all actions usemodification TRUE if modified eps greedy should be used preferrecent should the weights be updated in an exponentially decaying way? decayfactor the factor to reduce the weight of older observations if exponential decay is enabled avglim nonnegative limit on observation number before the exponential decay starts, only relevant if exponential decay is enabled nactions the positive number of possible actions initseed initial seed for random number generation

Definition at line 313 of file bandit_epsgreedy.c.

Referenced by createBandit().

## ◆ SCIPgetWeightsEpsgreedy()

 SCIP_Real* SCIPgetWeightsEpsgreedy ( SCIP_BANDIT * epsgreedy )

get weights array of epsilon greedy bandit algorithm

Parameters
 epsgreedy epsilon greedy bandit algorithm

Definition at line 345 of file bandit_epsgreedy.c.

References NULL, and SCIPbanditGetData().

Referenced by printDivingHeurStatistics(), and printNeighborhoodStatistics().

## ◆ SCIPsetEpsilonEpsgreedy()

 void SCIPsetEpsilonEpsgreedy ( SCIP_BANDIT * epsgreedy, SCIP_Real eps )

set epsilon parameter of epsilon greedy bandit algorithm

Parameters
 epsgreedy epsilon greedy bandit algorithm eps parameter to increase probability for exploration between all actions

Definition at line 358 of file bandit_epsgreedy.c.

References eps, NULL, and SCIPbanditGetData().

## ◆ SCIPcreateBanditExp3()

 SCIP_RETCODE SCIPcreateBanditExp3 ( SCIP * scip, SCIP_BANDIT ** exp3, SCIP_Real * priorities, SCIP_Real gammaparam, SCIP_Real beta, int nactions, unsigned int initseed )

creates and resets an Exp.3 bandit algorithm using scip pointer

Parameters
 scip SCIP data structure exp3 pointer to store bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed gammaparam weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution beta gain offset between 0 and 1 at every observation nactions the positive number of actions for this bandit algorithm initseed initial seed for random number generation

Definition at line 311 of file bandit_exp3.c.

Referenced by createBandit().

## ◆ SCIPsetGammaExp3()

 void SCIPsetGammaExp3 ( SCIP_BANDIT * exp3, SCIP_Real gammaparam )

set gamma parameter of Exp.3 bandit algorithm to increase weight of uniform distribution

Parameters
 exp3 bandit algorithm gammaparam weight between uniform (gamma ~ 1) and weight driven (gamma ~ 0) probability distribution

Definition at line 337 of file bandit_exp3.c.

References SCIPbanditGetData().

## ◆ SCIPsetBetaExp3()

 void SCIPsetBetaExp3 ( SCIP_BANDIT * exp3, SCIP_Real beta )

set beta parameter of Exp.3 bandit algorithm to increase gain offset for actions that were not played

Parameters
 exp3 bandit algorithm beta gain offset between 0 and 1 at every observation

Definition at line 350 of file bandit_exp3.c.

References SCIPbanditGetData().

## ◆ SCIPgetProbabilityExp3()

 SCIP_Real SCIPgetProbabilityExp3 ( SCIP_BANDIT * exp3, int action )

returns probability to play an action

Parameters
 exp3 bandit algorithm action index of the requested action

Definition at line 363 of file bandit_exp3.c.

References SCIP_Real, SCIPbanditGetData(), and SCIPbanditGetNActions().

Referenced by printDivingHeurStatistics(), and printNeighborhoodStatistics().

## ◆ SCIPcreateBanditExp3IX()

 SCIP_RETCODE SCIPcreateBanditExp3IX ( SCIP * scip, SCIP_BANDIT ** exp3ix, SCIP_Real * priorities, int nactions, unsigned int initseed )

creates and resets an Exp.3-IX bandit algorithm using scip pointer

Parameters
 scip SCIP data structure exp3ix pointer to store bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed nactions the positive number of actions for this bandit algorithm initseed initial seed for random number generation

Definition at line 255 of file bandit_exp3ix.c.

Referenced by createBandit().

## ◆ SCIPgetProbabilityExp3IX()

 SCIP_Real SCIPgetProbabilityExp3IX ( SCIP_BANDIT * exp3ix, int action )

returns probability to play an action

Parameters
 exp3ix bandit algorithm action index of the requested action

Definition at line 279 of file bandit_exp3ix.c.

References SCIPbanditGetData(), and SCIPbanditGetNActions().

Referenced by printDivingHeurStatistics(), and printNeighborhoodStatistics().

## ◆ SCIPcreateBanditUcb()

 SCIP_RETCODE SCIPcreateBanditUcb ( SCIP * scip, SCIP_BANDIT ** ucb, SCIP_Real * priorities, SCIP_Real alpha, int nactions, unsigned int initseed )

create and reset UCB bandit algorithm

Parameters
 scip SCIP data structure ucb pointer to store bandit algorithm priorities nonnegative priorities for each action, or NULL if not needed alpha parameter to increase confidence width nactions the positive number of actions for this bandit algorithm initseed initial random number seed

Definition at line 339 of file bandit_ucb.c.

Referenced by createBandit().

## ◆ SCIPgetConfidenceBoundUcb()

 SCIP_Real SCIPgetConfidenceBoundUcb ( SCIP_BANDIT * ucb, int action )

returns the upper confidence bound of a selected action

Parameters
 ucb UCB bandit algorithm action index of the queried action

Definition at line 264 of file bandit_ucb.c.

References LOG1P, NULL, SCIP_Real, SCIPbanditGetData(), and SCIPbanditGetNActions().

Referenced by printDivingHeurStatistics(), and printNeighborhoodStatistics().

## ◆ SCIPgetStartPermutationUcb()

 int* SCIPgetStartPermutationUcb ( SCIP_BANDIT * ucb )

return start permutation of the UCB bandit algorithm

Parameters
 ucb UCB bandit algorithm

Definition at line 293 of file bandit_ucb.c.

References NULL, and SCIPbanditGetData().

## ◆ SCIPincludeBanditvtable()

 SCIP_RETCODE SCIPincludeBanditvtable ( SCIP * scip, SCIP_BANDITVTABLE ** banditvtable, const char * name, SCIP_DECL_BANDITFREE((*banditfree)) , SCIP_DECL_BANDITSELECT((*banditselect)) , SCIP_DECL_BANDITUPDATE((*banditupdate)) , SCIP_DECL_BANDITRESET((*banditreset)) )

includes a bandit algorithm virtual function table

Parameters
 scip SCIP data structure banditvtable bandit algorithm virtual function table name a name for the algorithm represented by this vtable

Definition at line 48 of file scip_bandit.c.

## ◆ SCIPfindBanditvtable()

 SCIP_BANDITVTABLE* SCIPfindBanditvtable ( SCIP * scip, const char * name )

returns the bandit virtual function table of the given name, or NULL if not existing

Parameters
 scip SCIP data structure name name of bandit algorithm virtual function table

Definition at line 80 of file scip_bandit.c.

References NULL, SCIPsetFindBanditvtable(), and Scip::set.

## ◆ SCIPfreeBandit()

 SCIP_RETCODE SCIPfreeBandit ( SCIP * scip, SCIP_BANDIT ** bandit )

calls destructor and frees memory of bandit algorithm

Parameters
 scip SCIP data structure bandit pointer to bandit algorithm data structure

Definition at line 107 of file scip_bandit.c.

References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditFree(), and SCIPblkmem().

Referenced by reinitBandit(), SCIP_DECL_HEURFREE(), and SCIP_DECL_HEURINITSOL().

## ◆ SCIPresetBandit()

 SCIP_RETCODE SCIPresetBandit ( SCIP * scip, SCIP_BANDIT * bandit, SCIP_Real * priorities, unsigned int seed )

reset the bandit algorithm

Parameters
 scip SCIP data structure bandit pointer to bandit algorithm data structure priorities priorities for every action, or NULL if not needed seed initial random seed for bandit selection

Definition at line 91 of file scip_bandit.c.

References NULL, SCIP_CALL, SCIP_OKAY, SCIPbanditReset(), SCIPbuffer(), and SCIPinitializeRandomSeed().

Referenced by SCIP_DECL_HEURINITSOL().