Notes on ICML conference - empahsis on RL

Conference highlights

Lots of great work on off-policy evaluation and off-policy learning (see, for instance, work by Hanna et al. [35], Le et al. [49], Fujimoto et al. [26], Gottesman et al. [32], and talks in Section 6.3). These problem settings are really important, as I (and many others) anticipate RL applications will come along with loads of data from sub-optimal policies.
Exploration was a hot topic again, and rightfully so (see work by Mavrin et al. [57], Fatemi et al. [25], Hazan et al. [37], Shani et al. [76]). Along with off-policy evaluation (and a few others), it’s one of the foundational problems in RL that we’re in a good position to make serious progress on at the moment.
Some really nice work continuing to clarify distributional RL [10] (see work by [74, 57, 67]).
The AI for climate change workshop on Friday was fantastic and extremely well attended (standing room only for the talks I was there for). I’ve said this after previous conferences, but: as we all know, there are profoundly important problems, and the tools of ML can be extremely effective in their current form.
I really think we need to standardize evaluation in RL. Not that we only need a single method for doing so, or a single domain, but at the moment there is far too much variance in evaluation protocols.
Loved the panel at the RL for real life workshop (see Section 6.2.1)