RL Reading Group - Sriram Ganapathi Subramanian

I am co-organizing an RL reading group at the Vector Institute, which is conducted fully online. All participants meet at this Zoom link: https://carleton-ca.zoom.us/j/94458342852?pwd=BALQfADPJGSfdMSNA32wp5cNnN8YBs.1 (passcode: 552958) and take turns in presenting a recent RL research paper or a recent RL library/environment that is of interest to others. The schedule is maintained here: https://docs.google.com/spreadsheets/d/1SX-l9vGe9jy35ibGnAohQgvIsg_nmvGjx9LfqrMzSek/edit?gid=0#gid=0. Anyone interested in RL is welcome to join future meetings or sign up to present on that Google Sheet. The meetings take place on Mondays from 3 pm – 4 pm Eastern Time.

Here is the presentation schedule for Winter 2026.

Date	Presenter	Paper Topic	Link	Email
Jan 19 2026	Sriram Ganapathi Subramanian	The Big World Hypothesis and its Ramifications for Artificial Intelligence	https://openreview.net/pdf?id=Sv7DazuCn8	[email protected]
~~Jan 26 2026~~	No Meeting (for ICML deadline)
Feb 2 2026	Michal Lisicki	KL-Regularized Reinforcement Learning is Designed to Mode Collapse	https://openreview.net/forum?id=flBRtdIihA	[email protected]
Feb 9 2026	Wenhao Li	A Comedy of Estimators: On KL Regularization in RL Training of LLMs	https://openreview.net/forum?id=MkLHbwSMP3	[email protected]
Feb 16 2026	No Meeting (Family Day)
Feb 23 2026	Fae Moradi	Understanding R1-Zero-Like Training: A Critical Perspective	https://arxiv.org/abs/2503.20783	[email protected]
Mar 2 2026	Emiliano Penaloza	Privileged Information Distillation for Language Models	https://arxiv.org/abs/2602.04942	[email protected]
Mar 9 2026	No meeting (RLC Deadline)
Mar 16 2026	Sharan Vaswani	A Systematic Framework for Designing Policy Gradient Methods: The Case of Softmax Policy	https://arxiv.org/abs/2108.05828 https://arxiv.org/abs/2411.12042
Mar 23 2026	Ali Rad (Cognichip)	Geometric view of RLVR; noise in Reward and Diversity Collapse	https://arxiv.org/abs/2601.04411	[email protected] or [email protected]
Mar 30 2026	No Meeting
Apr 6 2026	Arezoo Alipanah	Adversarial Reinforcement Learning for Large Language Model Agent Safety	https://arxiv.org/abs/2510.05442	[email protected]
Apr 13 2026	Jing Dong	On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization	https://arxiv.org/abs/2405.16455	[email protected]
Apr 20 2026	Ziyi Yang	Generative Predicate Invention for Task-level Planning	https://utoronto.zoom.us/j/83848219997	https://yzylmc.github.io/