UG: Reinforcement Learning for Crowd Evacuation

Project at a glance


Level	Undergraduate (AMS 487)
Prerequisites	Python, basic probability/linear algebra; exposure to RL helpful but not required
Effort	One semester
Skills gained	Reinforcement learning, OpenAI Gym, simulation design, RL policy evaluation

Goal

Build a 2D grid (or continuous) crowd-evacuation simulator and train an RL agent to learn evacuation policies that minimize average exit time.

Suggested milestones

Environment. Implement a small evacuation simulator in gymnasium: a room with one or two exits, an adjustable number of pedestrians, optional obstacles.
Baseline. Compare a hand-tuned greedy policy against random behavior.
RL agent. Train DQN or PPO using stable-baselines3. Plot the learning curve.
Analysis. Vary the obstacle layout and crowd density; report how exit time scales.
(Bonus) Replace the discrete grid with a continuous model based on our planar crowd-motion sweeping-process equations.

Why it matters

This is the small-scale, learnable cousin of the bilevel crowd-motion sweeping control problem in my research group. It is also great practice for the deep-RL skills that are increasingly valued in academia and industry.