Sapana Chaudhary

Coding and Inference Time Search

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Jiefu Ou, Sapana Chaudhary, Kaj Bostrom, Nathaniel Weir, Shuai Zhang, Huzefa Rangwala, George Karypis

arXiv preprint, 2026

A reinforcement learning framework that optimizes code by maximizing reward signals through automated search and refinement.

[Paper]

Reward Modeling

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

Yu Feng, Nathaniel Weir, Kaj Bostrom, Sam Bayless, Darion Cassel, Sapana Chaudhary, Benjamin Kiesl-Reiter, Huzefa Rangwala

ICLR 2026, 2025

Combines neural and symbolic methods to validate chain-of-thought reasoning through logical consistency verification.

[Paper]

RL for LLM Reasoning and Alignment

Offline Learning and Forgetting for Reasoning with Large Language Models

Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor

Transactions on Machine Learning Research (TMLR), 2025

Explores offline learning dynamics and catastrophic forgetting in LLM reasoning tasks using reinforcement learning techniques.

[Paper]

Risk-Averse Finetuning of Large Language Models

Sapana Chaudhary, Ujwal Dinesha, Dileep Kalathil, Srinivas Shakkottai

Neural Information Processing Systems (NeurIPS), 2024

Introduces risk-averse objectives for LLM finetuning to ensure robust and safe model behavior across diverse scenarios.

[Paper]

Pedagogical Alignment of Large Language Models

Shashank Sonkar*, Kangqi Ni*, Sapana Chaudhary, Richard G. Baraniuk

Empirical Methods in Natural Language Processing (EMNLP), 2024

Aligns LLMs with pedagogical principles to improve their effectiveness as educational tools and tutors.

[Paper]

LLM Agents and Tool Use

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, Huzefa Rangwala

International Conference on Learning Representations (ICLR), 2025

Presents a minimalist approach to building effective web agents using LLMs with strong empirical performance.

[Paper]

Safe and Adaptive Sequential Decision Making

On Safety and Adaptivity in Sequential Decision Making

Sapana Chaudhary

International Joint Conference on Artificial Intelligence (IJCAI) Doctoral Consortium, 2023

Doctoral consortium paper exploring the interplay between safety constraints and adaptive learning in sequential decision problems.

[Paper]

Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour

Transactions of Machine Learning Research (TMLR), 2023

Analyzes regret bounds for distributed online optimization under safety constraints in both convex and non-convex settings.

[Paper]

Safe Online Convex Optimization with Unknown Linear Safety Constraints

Sapana Chaudhary, Dileep Kalathil

AAAI Conference on Artificial Intelligence, 2022

Develops algorithms for online convex optimization that learn and satisfy unknown linear safety constraints during execution.

[Paper] [Poster]

Deep Meta-RL and Imitation Learning

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Desik Rengarajan*, Sapana Chaudhary*, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai (*-Equal contribution)

Neural Information Processing Systems (NeurIPS), 2022

Improves meta-RL by incorporating demonstrations to accelerate learning in challenging sparse reward environments.

[Paper]

Smooth Imitation Learning via Smooth Costs and Smooth Policies

Sapana Chaudhary, Balaraman Ravindran

CoDS-COMAD (ACM Digital Library), 2022

Proposes smooth cost functions and policy regularization to improve stability and performance in imitation learning.

[Paper] [Slides] [Talk]

SILC: Smoother Imitation with Lipschitz Costs

Sapana Chaudhary*, Akshat Dave*, Balaraman Ravindran (*-Equal contribution)

Workshop on Goal Specification in Reinforcement Learning, ICML, 2018

Introduces Lipschitz-constrained cost functions to achieve smoother and more robust imitation learning.

[Paper]