Posts by Collection

portfolio

publications

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Published in Neural Information Processing Systems (NeurIPS), 2022

Improves meta-RL by incorporating demonstrations to accelerate learning in challenging sparse reward environments.

Recommended citation: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai. (2022). Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments. NeurIPS 2022. https://arxiv.org/abs/2209.13048

Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

Published in Transactions of Machine Learning Research (TMLR), 2023

Analyzes regret bounds for distributed online optimization under safety constraints in both convex and non-convex settings.

Recommended citation: Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour. (2023). Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems. TMLR 2023. https://openreview.net/forum?id=xiQXHvL1eN

Pedagogical Alignment of Large Language Models

Published in Empirical Methods in Natural Language Processing (EMNLP), 2024

Aligns LLMs with pedagogical principles to improve their effectiveness as educational tools and tutors.

Recommended citation: Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk. (2024). Pedagogical Alignment of Large Language Models. EMNLP 2024. https://arxiv.org/abs/2402.05000

Risk-Averse Finetuning of Large Language Models

Published in Neural Information Processing Systems (NeurIPS), 2024

Introduces risk-averse objectives for LLM finetuning to ensure robust and safe model behavior across diverse scenarios.

Recommended citation: Sapana Chaudhary, Ujwal Dinesha, Dileep Kalathil, Srinivas Shakkottai. (2024). Risk-Averse Finetuning of Large Language Models. NeurIPS 2024. https://arxiv.org/abs/2501.06911v1

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

Published in arXiv preprint, 2025

Combines neural and symbolic methods to validate chain-of-thought reasoning through logical consistency verification.

Recommended citation: Yu Feng, Nathaniel Weir, Kaj Bostrom, Sam Bayless, Darion Cassel, Sapana Chaudhary, Benjamin Kiesl-Reiter, Huzefa Rangwala. (2025). VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks. arXiv preprint. https://arxiv.org/abs/2511.04662

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Published in arXiv preprint, 2026

A reinforcement learning framework that optimizes code by maximizing reward signals through automated search and refinement.

Recommended citation: Jiefu Ou, Sapana Chaudhary, Kaj Bostrom, Nathaniel Weir, Shuai Zhang, Huzefa Rangwala, George Karypis. (2026). MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization. arXiv preprint. https://arxiv.org/abs/2601.05475

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.