1. What are the contributions mentioned in the paper "Stable function approximation in dynamic programming" ?
The authors provide a proof of convergence for a wide class of temporal di erence methods involving function approximators such as k-nearest-neighbor, and show experimentally that these methods can be useful.. In addition, the authors present a novel view of approximate value iteration: an approximate algorithm for one environment turns out to be an exact algorithm for a di erent environment.. The author 's current e-mail address is ggordon @ cs.. This material is based on work supported under a National Science Foundation Graduate Research Fellowship, and by NSF grant number BES-9402439.. Any opinions, ndings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily re ect the views of the National Science Foundation or the United States Government.
read more
2. What is the parallel value iteration operator for a discounted Markov decision process?
The parallel value iteration operator for a discounted Markov decision process is a contraction in max norm, with contraction factor equal to the discount.
read more
3. What is the main reason for divergence?
The chief reason for divergence is exaggeration: the more a method can exaggerate small changes in its target function, the more often it diverges under temporal di erencing.
read more
4. What is the optimal value function for a nondiscounted Markov decision process?
If all policies in a nondiscounted Markov decision process are proper, then the parallel value iteration operator for that process is a contraction in some weighted max norm.
read more





