Conference item
GradientDICE: rethinking generalized offline estimation of stationary values
- Abstract:
-
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, s...
Expand abstract
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Accepted manuscript, 1.4MB)
-
- Publication website:
- http://proceedings.mlr.press/v119/zhang20r.html
Authors
Funding
Bibliographic Details
- Publisher:
- Journal of Machine Learning Research Publisher's website
- Series:
- Proceedings of Machine Learning Research
- Series number:
- 119
- Host title:
- International Conference on Machine Learning, 13-18 July 2020, Virtual
- Publication date:
- 2020-11-21
- Acceptance date:
- 2020-06-01
- Event title:
- 37th International Conference on Machine Learning (ICML 2020)
- Event location:
- Virtual
- Event website:
- https://icml.cc/Conferences/2020
- Event start date:
- 2020-07-12T00:00:00Z
- Event end date:
- 2020-07-18T00:00:00Z
- ISSN:
-
2640-3498
Item Description
- Language:
- English
- Keywords:
- Pubs id:
-
1118780
- Local pid:
- pubs:1118780
- Deposit date:
- 2020-07-15
Terms of use
- Copyright holder:
- Zhang, s et al.
- Copyright date:
- 2020
- Rights statement:
- © 2020 The Authors.
- Notes:
- This paper was presented at the 37th International Conference on Machine Learning (ICML 2020), 12-18 July 2020. This is the accepted manuscript version of the paper. The final version is available online from PMLR at: http://proceedings.mlr.press/v119/zhang20r.html
If you are the owner of this record, you can report an update to it here: Report update to this record