1 min readApr 1, 2018
By default the agent will also need to explore “don’t shutdown when asked” in order to verify that it gets a low return, though hopefully you’d also deal with that using simulated data.
By default the agent will also need to explore “don’t shutdown when asked” in order to verify that it gets a low return, though hopefully you’d also deal with that using simulated data.