Coding and Inference Time Search
MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization
arXiv preprint, 2026
A reinforcement learning framework that optimizes code by maximizing reward signals through automated search and refinement.
Reward Modeling
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
arXiv preprint, 2025
Combines neural and symbolic methods to validate chain-of-thought reasoning through logical consistency verification.
RL for LLM Reasoning and Alignment
Offline Learning and Forgetting for Reasoning with Large Language Models
Transactions on Machine Learning Research (TMLR), 2025
Explores offline learning dynamics and catastrophic forgetting in LLM reasoning tasks using reinforcement learning techniques.
Risk-Averse Finetuning of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Introduces risk-averse objectives for LLM finetuning to ensure robust and safe model behavior across diverse scenarios.
Pedagogical Alignment of Large Language Models
Empirical Methods in Natural Language Processing (EMNLP), 2024
Aligns LLMs with pedagogical principles to improve their effectiveness as educational tools and tutors.
LLM Agents and Tool Use
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
International Conference on Learning Representations (ICLR), 2025
Presents a minimalist approach to building effective web agents using LLMs with strong empirical performance.
Safe and Adaptive Sequential Decision Making
On Safety and Adaptivity in Sequential Decision Making
International Joint Conference on Artificial Intelligence (IJCAI) Doctoral Consortium, 2023
Doctoral consortium paper exploring the interplay between safety constraints and adaptive learning in sequential decision problems.
Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems
Transactions of Machine Learning Research (TMLR), 2023
Analyzes regret bounds for distributed online optimization under safety constraints in both convex and non-convex settings.
Deep Meta-RL and Imitation Learning
Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments
Neural Information Processing Systems (NeurIPS), 2022
Improves meta-RL by incorporating demonstrations to accelerate learning in challenging sparse reward environments.
SILC: Smoother Imitation with Lipschitz Costs
Workshop on Goal Specification in Reinforcement Learning, ICML, 2018
Introduces Lipschitz-constrained cost functions to achieve smoother and more robust imitation learning.