AI Alignment Reading Group

AI Alignment Reading Group

Welcome to the Carnegie AI Safety Intiative (CASI) AI Alignment Reading Group for Spring 2025. This reading group focuses on topics in alignment, covering technical approaches to ensuring AI systems are safe.

Weekly Agendas

About the Reading Group

This 8-week program explores crucial topics in AI alignment, including:

  • Fundamentals of AI and AI safety
  • Reinforcement Learning from Human Feedback (RLHF)
  • Scalable oversight techniques
  • Robustness and unlearning
  • Mechanistic interpretability
  • Technical governance
  • AI control methods

Each session runs for 2 hours with readings and discussion. No preparation is required outside the sessions.

Facilitators

The reading group is led by Lawrence Feng.

The reading list draws inspiration from the AI Safety Fundamentals course by BlueDot Impact.