I am co-organizing an RL reading group at the Vector Institute, which is conducted fully online. All participants meet at this Zoom link: https://carleton-ca.zoom.us/j/94458342852?pwd=BALQfADPJGSfdMSNA32wp5cNnN8YBs.1 (passcode: 552958) and take turns in presenting a recent RL research paper or a recent RL library/environment that is of interest to others. The schedule is maintained here: https://docs.google.com/spreadsheets/d/1SX-l9vGe9jy35ibGnAohQgvIsg_nmvGjx9LfqrMzSek/edit?gid=0#gid=0. Anyone interested in RL is welcome to join future meetings or sign up to present on that Google Sheet. The meetings take place on Mondays from 3 pm – 4 pm Eastern Time.
Here is the presentation schedule for Winter 2026.
| Date | Presenter | Paper Topic | Link | |
| Jan 19 2026 | Sriram Ganapathi Subramanian | The Big World Hypothesis and its Ramifications for Artificial Intelligence | https://openreview.net/pdf?id=Sv7DazuCn8 | [email protected] |
| No Meeting (for ICML deadline) | ||||
| Feb 2 2026 | Michal Lisicki | KL-Regularized Reinforcement Learning is Designed to Mode Collapse | https://openreview.net/forum?id=flBRtdIihA | [email protected] |
| Feb 9 2026 | Wenhao Li | A Comedy of Estimators: On KL Regularization in RL Training of LLMs | https://openreview.net/forum?id=MkLHbwSMP3 | [email protected] |
| Feb 16 2026 | No Meeting (Family Day) | |||
| Feb 23 2026 | Fae Moradi | Understanding R1-Zero-Like Training: A Critical Perspective | https://arxiv.org/abs/2503.20783 | [email protected] |
| Mar 2 2026 | Emiliano Penaloza | Privileged Information Distillation for Language Models | https://arxiv.org/abs/2602.04942 | [email protected] |
| Mar 9 2026 | No meeting (RLC Deadline) | |||
| Mar 16 2026 | Sharan Vaswani | A Systematic Framework for Designing Policy Gradient Methods: The Case of Softmax Policy | https://arxiv.org/abs/2108.05828 https://arxiv.org/abs/2411.12042 | |
| Mar 23 2026 | Ali Rad (Cognichip) | Geometric view of RLVR; noise in Reward and Diversity Collapse | https://arxiv.org/abs/2601.04411 | [email protected] or [email protected] |
| Mar 30 2026 | No Meeting | |||
| Apr 6 2026 | Arezoo Alipanah | Adversarial Reinforcement Learning for Large Language Model Agent Safety | https://arxiv.org/abs/2510.05442 | [email protected] |
| Apr 13 2026 | Jing Dong | On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization | https://arxiv.org/abs/2405.16455 | [email protected] |
| Apr 20 2026 | Ziyi Yang | Generative Predicate Invention for Task-level Planning | https://utoronto.zoom.us/j/83848219997 | https://yzylmc.github.io/ |
