MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization
Jiefu Ou, Sapana Chaudhary, Kaj Bostrom, Nathaniel Weir, Shuai Zhang, Huzefa Rangwala, George Karypis
arXiv preprint, 2026
A reinforcement learning framework that optimizes code by maximizing reward signals through automated search and refinement.

Reward Modeling

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
Yu Feng, Nathaniel Weir, Kaj Bostrom, Sam Bayless, Darion Cassel, Sapana Chaudhary, Benjamin Kiesl-Reiter, Huzefa Rangwala
arXiv preprint, 2025
Combines neural and symbolic methods to validate chain-of-thought reasoning through logical consistency verification.

RL for LLM Reasoning and Alignment

Offline Learning and Forgetting for Reasoning with Large Language Models
Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor
Transactions on Machine Learning Research (TMLR), 2025
Explores offline learning dynamics and catastrophic forgetting in LLM reasoning tasks using reinforcement learning techniques.
Risk-Averse Finetuning of Large Language Models
Sapana Chaudhary, Ujwal Dinesha, Dileep Kalathil, Srinivas Shakkottai
Neural Information Processing Systems (NeurIPS), 2024
Introduces risk-averse objectives for LLM finetuning to ensure robust and safe model behavior across diverse scenarios.
Pedagogical Alignment of Large Language Models
Shashank Sonkar*, Kangqi Ni*, Sapana Chaudhary, Richard G. Baraniuk
Empirical Methods in Natural Language Processing (EMNLP), 2024
Aligns LLMs with pedagogical principles to improve their effectiveness as educational tools and tutors.

LLM Agents and Tool Use

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, Huzefa Rangwala
International Conference on Learning Representations (ICLR), 2025
Presents a minimalist approach to building effective web agents using LLMs with strong empirical performance.

Safe and Adaptive Sequential Decision Making

On Safety and Adaptivity in Sequential Decision Making
Sapana Chaudhary
International Joint Conference on Artificial Intelligence (IJCAI) Doctoral Consortium, 2023
Doctoral consortium paper exploring the interplay between safety constraints and adaptive learning in sequential decision problems.
Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems
Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour
Transactions of Machine Learning Research (TMLR), 2023
Analyzes regret bounds for distributed online optimization under safety constraints in both convex and non-convex settings.
Safe Online Convex Optimization with Unknown Linear Safety Constraints
Sapana Chaudhary, Dileep Kalathil
AAAI Conference on Artificial Intelligence, 2022
Develops algorithms for online convex optimization that learn and satisfy unknown linear safety constraints during execution.

Deep Meta-RL and Imitation Learning

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments
Desik Rengarajan*, Sapana Chaudhary*, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai (*-Equal contribution)
Neural Information Processing Systems (NeurIPS), 2022
Improves meta-RL by incorporating demonstrations to accelerate learning in challenging sparse reward environments.
Smooth Imitation Learning via Smooth Costs and Smooth Policies
Sapana Chaudhary, Balaraman Ravindran
CoDS-COMAD (ACM Digital Library), 2022
Proposes smooth cost functions and policy regularization to improve stability and performance in imitation learning.
SILC: Smoother Imitation with Lipschitz Costs
Sapana Chaudhary*, Akshat Dave*, Balaraman Ravindran (*-Equal contribution)
Workshop on Goal Specification in Reinforcement Learning, ICML, 2018
Introduces Lipschitz-constrained cost functions to achieve smoother and more robust imitation learning.