UG: Policy Gradient for Mean-Variance Portfolio Optimization
Reproduce a regime-switching financial market and train a policy-gradient agent to balance expected return and variance — a bridge between optimization, finance, and RL.
Project at a glance
| Level | Undergraduate research (AMS 487 / 341) |
| Required courses | AMS 261 — Applied Calculus III and AMS 210 — Applied Linear Algebra |
| Optional | AMS 361 — Applied Calculus IV: Differential Equations |
| Preferred | AMS 326 — Numerical Analysis |
| Programming | Python (NumPy required; PyTorch helpful, can be learned during the project) |
| Effort | One semester (≈ 6–9 hrs/week) |
| Skills gained | Policy-gradient RL, stochastic optimization, PyTorch, quantitative finance modeling |
What you will actually do
Over one semester you will build, from scratch, a small but complete research pipeline:
- Simulate a financial market whose statistical behavior switches between hidden regimes (e.g. bull / bear / sideways), modeled as a Markov chain.
- Derive and code the classical mean-variance benchmark — the Markowitz-style optimal allocation when the regime is known — to serve as a yardstick.
- Train a reinforcement-learning agent (REINFORCE, then actor–critic) that learns a portfolio-allocation policy directly from simulated price paths, optimizing a reward that rewards return and penalizes variance.
- Run experiments and write up results — compare the learned policy against the benchmark on Sharpe ratio, drawdown, and per-regime behavior, and present the findings in a short report and a group talk.
By the end you will have a working RL system, a set of reproducible plots, and a written report suitable for an honors thesis, a research-experience application, or a quant-finance internship portfolio.
Goal
Implement a regime-switching market (Markov chain over bull/bear/sideways regimes) and train a REINFORCE / actor–critic agent to learn a portfolio policy that trades off mean return against variance.
Suggested milestones
- Market simulator. Implement a 3-regime Markov-switching model with regime-dependent asset returns.
- Baseline. Compute the classical mean-variance optimal portfolio under a known regime.
- RL agent. Train a policy-gradient agent on the simulated market with a mean-variance reward.
- Comparison. Plot Sharpe ratio, max drawdown, and learned regime-specific allocations.
- (Bonus) Compare against an LSTM-augmented policy that uses recent returns as input.
Why these courses?
| Course | Why it is needed |
|---|---|
| AMS 261 (required) | Multivariable calculus — gradients and the chain rule underlie every policy-gradient update. |
| AMS 210 (required) | Linear algebra — covariance matrices, the mean-variance objective, and PyTorch tensors are all linear-algebraic. |
| AMS 361 (optional) | Differential equations — helps in understanding continuous-time market dynamics and the bonus continuous-model extension. |
| AMS 326 (preferred) | Numerical analysis — gives you the stability, convergence, and floating-point intuition that makes the training loop trustworthy rather than a black box. |
If you have taken AMS 261 and AMS 210 and are comfortable writing Python, you are ready to apply; AMS 326 simply lets you go deeper, faster.
Why it matters
This is the undergraduate-friendly version of an ongoing research project (with L. Vu) on policy-gradient methods for mean-variance portfolio optimization under regime-switching dynamics. You will gain transferable skills for both academic ML research and quantitative finance roles.
How to apply
Email me (see the contact page) with: (1) the AMS courses you have completed and your grades in AMS 261 and AMS 210, (2) a one-paragraph note on why this project interests you, and (3) any prior Python or finance experience. No published research is expected — curiosity and follow-through matter most.