How to employ simulated annealing to optimize reinforcement learning hyper parameters with the programming language “R”
In the following an example for a hyper parameter optimization for a Reinforcement Learning (RL) model is presented. The optimization is performed applying simulated annealing. Code snippets illustrate how an implementation in R looks like, however, the concept could be easily implemented in other programming languages.
The best policy for a system can be found by RL. Necessary input data are:
As for most machine learning algorithms choosing the hyper parameters is not very intuitive. However, wrapping the reinforcement learning by a simulated annealing loop the ideal combination can be determined.
First step is to create trainings data for the RL algorithm. Training data can be created by a simulation of the system which logs the four needed quantities.
For complex systems a reduced simulation might be used for the optimization of the hyper parameters to speed up the process. The function Sim runs the simulation and returns trainings data. The number of iterations ought to be high enough to make sure each state of the system is visited at least once. The definition of the states is crucial because the states have to describe the system well, however, if the number of states is getting large than it becomes more difficult to find the optimal policy.
SimulationData <- Sim(dataFile, iterations, trainFlag, colNames)
The trainings data and the actions are the inputs of the RL algorithm which is part of the function optFunc in which
The RL policy is applied in the same simulation implemented in the Sim function which was used to create the trainings data, instead of random selection of actions now the actions are performed based on the defined policy for a state.
The function optFunc will be called for each iteration of the simulated annealing. The return value rlModelResult is the value which is minimized in the optimization loop.
optFunc <- function(par, dataFile, SimulationData, ActionFunctionsDf){
alpha <- par[[1]]
gamma <- par[[2]]
epsilon <- par[[3]]
control <- list(alpha = alpha, gamma = gamma, epsilon = epsilon)
RlModel <- ReinforcementLearning(SimulationData,
s = "State",
a = "Action",
r = "Reward",
s_new = "NextState",
iter = 3,
control = control)
rlModelResult <- Sim(dataFile, iterations, trainFlag, colNames, RlModel = RlModel)
return(rlModelResult)
}
The metric rlModelResult is minimized using simulated annealing. The optimizing parameter can be freely chosen, different sets of actions and states could be used as parameters.
In this case the optimization parameters are the hyper parameters of the ReinforcementLearning function.
Note, SimulationData is the data generated by the simulation. The parameters par are changed for each iterations of the simulated annealing.
sa_out <- GenSA(par = suggestions,
fn = optFunc,
lower = minOpt,
upper = maxOpt,
control=list(nb.stop.improvement = 3,
smooth = TRUE),
SimulationData = SimulationData,
ActionFunctionsDf = ActionFunctionsDf
)
The optimal hyper parameter set for the given task is:
The definitions of the parameters are given below.
• alpha The learning rate, set between 0 and 1. Setting it to 0 means that the Q-values are never updated and, hence, nothing is learned.
• gamma Discount factor, set between 0 and 1. Determines the importance of future rewards.
• epsilon Exploration parameter. Defines the exploration mechanism in ε-greedy action selection.
For reinforcement learning ReinforcementLearning
Performs model-free reinforcement learning in R. This implementation enables the learning of an optimal policy based on sample sequences consisting of states, actions and rewards. In addition, it supplies multiple predefined reinforcement learning algorithms, such as experience replay.
For simmulated annealing GenSa
Implementation of a function that searches for global minimum of a very complex non-linear objective function with a very large number of optima.
For attribution, please cite this work as
Sterr (2020, Jan. 17). Uwe's Blog: Simulated Annealing optimisation of Reinforcement Learning hyper parameters. Retrieved from http://uwesterr.de/posts/2020-01-20-simulated-annealing-optimisation-of-reinforcement-learning-hyper-parameters/
BibTeX citation
@misc{sterr2020simulated, author = {Sterr, Uwe}, title = {Uwe's Blog: Simulated Annealing optimisation of Reinforcement Learning hyper parameters}, url = {http://uwesterr.de/posts/2020-01-20-simulated-annealing-optimisation-of-reinforcement-learning-hyper-parameters/}, year = {2020} }