About Me

Hello! I am Sapana, a final year PhD student at Texas A&M University. I am being advised by Dr. Dileep Kalathil. I am passionate about using Reinforcement Learning (RL) to solve challenging real world problems.

I have worked on multiple algorithmic paradigms in RL ranging from generative adversarial imitation learning to meta-RL. More recently, I have pivoted towards fine-tuning Large Language Models (LLMs) using Reinforcement Learning from Human Feedback (RLHF). This pivot reflects my growing interest in the intersection of natural language processing and reinforcement learning. I have also built abstractive and extractive Q&A systems using retrieval augmented generation (RAG) and LLMs while doing an applied science internship at Amazon.

Previously, I was a research fellow at MPI-SWS, Germany with Dr. Adish Singla. I also did a MS (Research) at IIT Madras with Dr. Balaraman Ravindran and Dr. Radha Krishna Ganti.

Aside from work, I like to hike, cook, paint, and photograph.


  • [Feb 2024] Paper on Pedagogical Alignment of LLMs out on arxiv!
  • [Aug 2023] Paper on Safe distributed OCO accepted to TMLR!
  • [May 2023] Back in Seattle for an Applied Scientist intern at Amazon!
  • [Apr 2023] Accepted to IJCAI 2023 Doctoral Consortium!
  • [Feb 2023] New paper on Safe distributed OCO out on arxiv!
  • [Feb 2023] Gave an invited talk on ‘Adaptivity and safety in sequential decision making’ at Rice University!
  • [Sep 2022] Paper on meta-RL in sparse reward environments accepted to NeurIPS 2022!
  • [Aug 2022] Spent a wonderful summer in Seattle as an Applied Scientist intern at Amazon!
  • [Dec 2021] Paper on Safe online convex optimization accepted to AAAI 2022!