Revisiting MDP Fundamentals
Policy Function and Value Function The goal of the optimal policy function is to maximize the expected discounted reward, even if this means taking actions that would lead to lower immediate next-step rewards from few states. Recall that from the previous lecture that for all s, the (optimal) value function is: where Estimating Transition Probabilities … Read more