UG: Policy Gradient for Mean-Variance Portfolio Optimization

Project at a glance


Level	Undergraduate research (AMS 487 / 341)
Required courses	AMS 261 — Applied Calculus III and AMS 210 — Applied Linear Algebra
Optional	AMS 361 — Applied Calculus IV: Differential Equations
Preferred	AMS 326 — Numerical Analysis
Programming	Python (NumPy required; PyTorch helpful, can be learned during the project)
Effort	One semester (≈ 6–9 hrs/week)
Skills gained	Policy-gradient RL, stochastic optimization, PyTorch, quantitative finance modeling

What you will actually do

Over one semester you will build, from scratch, a small but complete research pipeline:

Simulate a financial market whose statistical behavior switches between hidden regimes (e.g. bull / bear / sideways), modeled as a Markov chain.
Derive and code the classical mean-variance benchmark — the Markowitz-style optimal allocation when the regime is known — to serve as a yardstick.
Train a reinforcement-learning agent (REINFORCE, then actor–critic) that learns a portfolio-allocation policy directly from simulated price paths, optimizing a reward that rewards return and penalizes variance.
Run experiments and write up results — compare the learned policy against the benchmark on Sharpe ratio, drawdown, and per-regime behavior, and present the findings in a short report and a group talk.

By the end you will have a working RL system, a set of reproducible plots, and a written report suitable for an honors thesis, a research-experience application, or a quant-finance internship portfolio.

Goal

Implement a regime-switching market (Markov chain over bull/bear/sideways regimes) and train a REINFORCE / actor–critic agent to learn a portfolio policy that trades off mean return against variance.

Suggested milestones

Market simulator. Implement a 3-regime Markov-switching model with regime-dependent asset returns.
Baseline. Compute the classical mean-variance optimal portfolio under a known regime.
RL agent. Train a policy-gradient agent on the simulated market with a mean-variance reward.
Comparison. Plot Sharpe ratio, max drawdown, and learned regime-specific allocations.
(Bonus) Compare against an LSTM-augmented policy that uses recent returns as input.

Why these courses?

Course	Why it is needed
AMS 261 (required)	Multivariable calculus — gradients and the chain rule underlie every policy-gradient update.
AMS 210 (required)	Linear algebra — covariance matrices, the mean-variance objective, and PyTorch tensors are all linear-algebraic.
AMS 361 (optional)	Differential equations — helps in understanding continuous-time market dynamics and the bonus continuous-model extension.
AMS 326 (preferred)	Numerical analysis — gives you the stability, convergence, and floating-point intuition that makes the training loop trustworthy rather than a black box.

If you have taken AMS 261 and AMS 210 and are comfortable writing Python, you are ready to apply; AMS 326 simply lets you go deeper, faster.

Why it matters

This is the undergraduate-friendly version of an ongoing research project (with L. Vu) on policy-gradient methods for mean-variance portfolio optimization under regime-switching dynamics. You will gain transferable skills for both academic ML research and quantitative finance roles.

How to apply

Email me (see the contact page) with: (1) the AMS courses you have completed and your grades in AMS 261 and AMS 210, (2) a one-paragraph note on why this project interests you, and (3) any prior Python or finance experience. No published research is expected — curiosity and follow-through matter most.