# Portfolio Diversification

## Two-Asset Portfolio

Consider an investment portfolio $V$ on two assets:

We can calculate the mean and variance of the return on the portfolio, based on the mean and variance of the return on each asset.

We can see that if $\rho<1$, we have diversification, where $\mu_p$ is linear in the portfolio allocation $w$ while the standard deviation is convex.

When $\rho=-1$, the portfolio variance can be as small as desired. If we set $w=\sigma_s/(\sigma_b+\sigma_s)$ then $\sigma_p=0$ and the portfolio becomes riskless.

In this following chart we can see a two-asset portfolio return volatility plotted against different weights and correlation.

We see that to as long as the correlation is not perfect between the two assets, there exist a weight that would minimize the portfolio variance (not taking portfolio return into consideration, yet)

## Multi-Asset Portfolio

Consider $n$ assets with return volatility $\sigma_i$ and covariance $\sigma_{i, j}$. Let $w^i$ denote the allocation to asset $i$. Then the variance of portfolio return becomes:

In the case of an equally-weighted portfolio with $w^i=1/n$.

We define:

Therefore,

We conclude that in a equally weighted portfolio (or a diversified portfolio where $\lim_{n\rightarrow\infty}w^i=0$) with a large number of assets,

• the individual asset return variances become unimportant to the portfolio return variance
• the portfolio variance instead now depends on the average covariances between the assets.

Here, the average return covariance is the systematic risk that cannot be eliminated through diversification, whereas the average return volatility is the idiosyncratic risk that is diversifiable.

Note that complete diversification ($\sigma^2_p=0$) is achieved when:

• $\textbf{avg_ret_cov}=0$ and $n\rightarrow 0$ for a multi-asset portfolio
• $\rho=-1$ for a two-asset portfolio

# Mean-Variance Frontier

In a mean-variance space, the set of all possible portfolio with $n$ assets form a convex set. The bounday of this set is known as the mean-variance frontier and forms a parabola.

The top half of the MV frontier is the set of efficient MV portfolios which maximize mean return given return variance.

Let us define $\boldsymbol{r}$ as the random variable vector of asset returns on $n$ assets:

A particular portfolio is defined by the weights assigned to various assets, and we denote the weights vector $\boldsymbol{w}$. The portfolio return $r_p$ is also a random variable, where:

Also,

## The GMV and Tangent Portfolios

The Global Minimum Variance (GMV) portfolio has the lowest return variance among all possible portfolios, characterized by the leftmost point on the MV frontier. It can be constructed with weight $\boldsymbol{w}_v$ which minimizes the objective function $\boldsymbol{w'\Sigma w}$ under the constraint of $\boldsymbol{w'1}=1$:

The Tangent portfolio is a portfolio $\boldsymbol{w}_t$ with the highest mean/variance ratio among all possible portfolios, characterized by the point that is tangent to the MV frontier and going through the origin:

## The MV Portfolio

It turns out that any portfolios on the efficient MV frontier can be constructed based off a linear combination of the GMV and tangent portfolios, which solves the following optimization:

Thus a portfolio $\boldsymbol{w}^{\ast}$ is a MV portfolio if and only if there exists $\delta$ such that:

Consider MV investors that only focus on the mean and variance of a portfolio, then such investors will only hold MV portfolios (which are linear combination of two funds only).

## Excess Return with Risk-Free Asset

Consider the existence of a risk-free asset with return $r_f$ that has zero variance and correlation with other assets. The mean excess return is defined as:

And the mean excess return of a portfolio with weight $\boldsymbol{w}$ is:

Since the risk-free asset has no variance, the return variance of the portfolio is still $\boldsymbol{w'\Sigma\ w}$

A MV portfolio with a risk-free asset is a vector $\boldsymbol{w}^{\ast}$ which solves the following optimization:

Note that the constraint that weights sum up to $1$ is now dropped, with the inclusion of the risk-free asset.

Thus a portfolio $\boldsymbol{w}^{\ast}$ with mean excess return $\tilde{\mu}_p$ is a MV portfolio with a risk-free asset if:

Where,

This result show that with a risk-free asset, any MV portfolio simply contains a position in the tangency portfolio $\boldsymbol{w}_t$ and a position in the risk-less asset (a.k.a the Two Fund Separation).

Interesting facts w.r.t. the tangency portfolio $\boldsymbol{w}_t$:

• $\boldsymbol{w}_t$ is the unique portfolio that is on both the risky and risk-less MV frontiers
• $\boldsymbol{w}_t$ is the point on the risky MV frontier at which the tangency line goes through point (0, risk-free rate).

## Sharpe Ratio

We define the Sharpe ratio (SR) of a portfolio $\boldsymbol{w}$ as:

Therefore, the tangency portfolio $\boldsymbol{w}_t$ is the portfolio on the risky MV frontier with the maximum Sharpe ratio.

On the risk-free efficient MV frontier (a.k.a the Capital Market Line) all portfolio has the same SR the tangency portfolio, since the frontier itself is a straight line.

## Sortino Ratio

The Sortino ratio improves upon the Sharpe ratio by penalizing the down-side volatility only.

# Linear Factor Model

Based on the First Fundamental Theorem of Asset Pricing, given no arbitrage there exists a risk neutral probability measure $\mathbb{P}$ and a change of measure $m$ (R-N derivative) such that for any tradable asset $i$,

proposition $m$ is a linear function of $r_t$:

Such that all portfolio returns have a factor-beta representation w.r.t. the tangency portfolio,

By mathematical identity, this will hold in sample exactly.

proof Consider the tangency portfolio from the risk-free MV frontier:

Therefore,

And since,

We can show that,

In addition, the covariance can be shown as,

Thus,

## Genaralization

The factor-beta representation is not unique to the tengancy portfolio. In fact, it holds for any arbitraty MV portfolio, $\boldsymbol{w}^{\ast}$

We will focus on the tangancy portfolio, W.L.O.G.

## Practical Consideration

The factor-beta representation seems to provde a way to estimate the mean return for any given portfolio. However, it difficult to calculate the tangency weight $\boldsymbol{w}_t$ in practice, due to either circularity (direct estimation) or imprecision (inverting $\boldsymbol{\Sigma})$

The Linear Factor Model makes an assumption regarding the identify of the tangency portfolio, which avoids the issues stated above.

The LFM assumed tangency portfolio is only used for the pricing of expected returns. However, additional assumptions can be made regarding investor’s MV preference, such that the assumed tangency portfolio will also be used in actual asset allocation.

## CAPM

The most famous LFM is the Capital Asset Pricing Model, which assumes a value-weighted market portfolio of all available assets as the tangency portfolio.

The CAPM is a relative pricing formula, which states that the expected return of any asset can be expressed as the sum of the risk-free rate and a portion of the market risk premium. In other words, it says that the expected excess return/risk premium of an asset is proportional to the market risk premium. The $\beta$ factor is estimated based on regression. CAPM also assets that market beta is the only risk associated with higher average returns, and that volatility, skewness, and other covariances do not matter in determining risk premium.

We can also re-write the formula as follow:

This shows that the Sharpe ratio earned on an asset depends only on the correlation between asset return and market returns.

There are two ways to derive CAPM:

1. If we assume that returns are jointly normal, then the mean and variance are the sufficient statistics for the return distribution, and thus every investor holds a portfolio on the risk-less MV frontier, which is a combination of the tangency portfolio and the risk-free asset. Therefore aggregating across all investors, the market portfolio of all investments is the tangency portfolio.
2. If we do not assume jointly normal returns, but instead that investors only care about mean and variance of returns. In this case all investor will also choose MV portfolios, and therefore CAPM holds.

## Fama-French Model

The Fama-French 3-factor model is a well-known multi-factor models:

Where $\tilde{r}_M$ is the excess market return as in CAPM, $\tilde{r}_s$ is a portfolio that goes long small stocks and shorts large stocks, and $\tilde{r}_v$ is a portfolio that goes long value stocks (low market price per fundamental) and shorts growth stocks.

The FF model states that beta to value and small stocks earn premium, NOT being a value or small stock. In other words, the premium is earned on how a stock acts, not how it is classified.

Continue on L3

# 📖 Option Theory↺

This is a study note on the fundamental theory of the pricing of a financial derivative, whose payoff is defined in terms of an underlying asset. We hereby try to compute a consistent price of the derivative in relative terms to the market price of the underlying asset.

# Option Pricing Theory

We make our first assumption that the market is frictionless, by which we mean that:

• can hold negative asset (shortting) and there is no margin constraint
• can hold fractional asset
• no market impact from trading

## Arbitrage (Static Portfolio)

We assume that the market lives in a probability space $\mathbb{P}$ and it includes $N$ tradable assets with non-random time-$0$ prices and random time-$T$ prices:

A static portfolio is a vector of quantities, where each $\theta$ is non-random and constant in time:

Thus the time-$t$ value of the static portfolio $\Theta$ is;

A static portfolio $\Theta$ is an arbitrage if its value $V_t$ satisfies that:

Suppose portfolio $\Theta^a$ super-replicates portfolio $\Theta^b$, which means that $\mathbb{P}[V^a_T > V^b_T] = 1$. Then $V^a_0 \geq V^b_0$, otherwise arbitrage exists. Same goes if it is a sub-replication. Therefore, if $\Theta^a$ replicate $\Theta^b$, which menas that $\mathbb{P}[V^a_T = V^b_T] = 1$, then $V^a_0 = V^b_0$. This is called the law of one price.

## Assets

### Discount Bond

A discount bond $Z$ pays $1$ at maturity $T$. Given non-random interest rate $r_t$, the no-arbitrage price of the discount bond is:

### Forward Contract

A forward contract on $S_T$ with non-random delivery price $K$ obligates its holder to pay $K$ and receive $S_T$ at time $T$. The time-$0$ value of the forward contract is $S_0 - KZ_0$.

A forward price $F_0$ is delivery price such that the value of forward contract at time-$0$ is zero.

### European Call Option

An European call option gives its holder the right at time $T$ to pay $K$ and receive $S_T$. A call has payoff $(S_T-K)^+$, and it is in the money if $S_t>K$ at time $t\leq T$.

The time-$0$ price $C_0$ of a call option satisfies:

For strike $K_1:

### European Put Option

An European put option gives its holder the right at time $T$ to pay $S_T$ and receive $K$. A put has payoff $(K-S_T)^+$, and it is in the money if $S_t at time $t\leq T$.

The time-$0$ price $P_0$ of a put option satisfies:

For strike $K_1:

# Binomial Tree

We can create a replicating portfolio to calculate the value of a call option under a simple binomial tree:

Where,

And,

Plugging in $\alpha$ and $\beta$:

We can interpret $p_u$ and $p_d$ as probabilities that construct a risk-neutal measure $\mathbb{P}$ and that:

# The Fundamental Theorem

The fundamental theorem of asset pricing states that:

no arbitrage

if and only if:

there exists a probability measure $\mathbb{P}$ equivalent to P such that the discounted prices of all tradable assets are martingales w.r.t. $\mathbb{P}$

The proof can be summarized as two ideas:

• $\exists\ M.G. \mathbb{P} \rightarrow \text{no arb}$:

a martingale is the cumulative P&L from betting on zero-$\mathbb{E}$ games, which is always zero no matter how you vary your bet size across games and time. you cannot riskless make something from nothing.

• $\text{no arb} \rightarrow \exists\ M.G. \mathbb{P}$:

the $\mathbb{P}$ probability of an event is simply the price of an asset that pays 1 unit of B iff that event happen

## Risk-Neutral Measure

The physical probability is not accurate in evaluating a payoff’s true market price. Considering a 50/50 coin flip worth $1M$ or nothing. Using physical probability the price will be $500K$.

However, the actual market price would be different. If the market is risk-adverse, the price would be lower, say $300K$. We can view it as this market represents a risk-neutral measure where the down move has higher risk-neutral probabilities than up move.

We can see that the risk-neutral probability is price, that the risk-neutral probability of an event is the price of one-unit payout contingent on the event. Taking a risk-neutral expectation is the same as pricing by replication.

In a discrete settimgn with outcomes $\{\omega_1\,\dots ,\omega_n\}$, the relatioship between the risk-neutral measure and physical measure $P$ can be expressed by the Radon-Nikodym Derivative, or liklehood ratio:

The LR is typically larger in bad states than good states, reflecting the price margin on adverse events.

## The Second Fundamental Theorem

A market is said to be complete if every random variable $Y_T$ can be replicated by a static portfolio $\Theta$.

The second fundamental theorem of asset pricing states that:

a no arbitrage market is complete

if and only if:

there exists a unqiue measure $\mathbb{P}$ equivalent to P such that the discounted prices of all tradable assets are martingales w.r.t. $\mathbb{P}$

A filtration $\{\mathcal{F}_t\}$ represents all information revealed at or before time $t$. A stochastic process $X$ is adapted to $\{\mathcal{F}_t\}$ if $X_t$ is $\mathcal{F}_t$-measurable for each $t$, meaning that the value of $X_t$ is determined by the information in $\mathcal{F}_t$.

A trading strategy is a sequence of static strategy $\Theta_t$ adapted to $\mathcal{F}_t$. A trading strategy is self-financing if for all $t>0$:

This implies that the change in the portfolio value is fully attributable to gains and losses in asset prices:

Therefore,

We define that a trading strategy $\Theta$ replicates a time-T payoff $X_T$ if it is self-financing and the value $V_T=X_T$. By the law of one price, at any time $t$, the no-arbitrage price of an asset paying $X_T$ must have the same value of the replicating portfolio.

We now expand on the previous definition of arbitrage, that an arbitrage is a self-finance trading strategy $\Theta_t$ whose value $V_t$ satisfies:

# Ito Process

We define an Ito process to be a stochastic process $X$ that:

The existence and uniqueness of a solution of $X$ can be guaranteed by Lipschitz-type technical condition on $\mu_t$ and $\sigma_t$

## Ito’s Rule

The Ito's rule states that give n an Ito process $X_t$, and a sufficiently smooth function $f(X_t)$:

With two processes $X_t$ and $Y_t$, and $f(X_t, Y_t)$:

In a special case where $Y_t = t$, the formula becomes:

Note that the Ito’s Rule applies under any probability measure, it is purely math.

## Black-Scholes Model

Assumptions Consider two basic assets $B_t$ and $S_t$ in continuous time, where:

And $S_t$ follows GBM dynamics,

Conclusion Then by no-arbitrage and Ito's rule, the time-$t$ price $C_t$ of a call option with payoff $(S_T-K)^{+}$ satisfies the Black-Scholes PDE for $(S, t)\in [0, \infty]\times (0, T)$

We can solve the call price analytically with the Black-Scholes formula:

Here we plotted the BS call price $C^{BS}$, the intrinsic value $(S_t - K)^{+}$ and the lower bound $(S_t - Ke^{-r(T-t)})^{+}$ against the current underlying price $S_t$, with paramters $K=100$, $T-t=1$, $\sigma=0.2$ and $r=0.05$

# The Greeks

## Delta

Suppose an asset has a time t value $V_t(S_t, t)$, then its Delta at time $t$ is $\partial V_t(S_t, t)/\partial S_t$. Delta can be interpreted as:

• the slope of the asset value $V_t$, plotted as a function of S_t.
• how much the asset value movies per unit move in $S_t$
• humber of $S_t$ needed to replicate this asset.

If the asset is a call option on $S_t$ and we assumes the Black-Scholes assumptions on $S_t$, then:

The Delta of a call option is strictly between 0 and 1. As the time-to-maturity decreases, the Delta increases faster the the option becomes more ITM. Here we plotted the BS Delta for $T-t$ equals $1$ and $0.25$ against the current underlying price $S_t$.

## Gamma

For a call option in a B-S model,

In this case, the Gamma can be interpreted as:

• the convextity of $C^{BS}$ w.r.t. S_t
• how much the Delta moves, per unit move in $S_t$
• how much rebalancing of the replicating portfolio is needed, per unit move in $S_t$

The Gamma of a call option is strictly positive. As the time-to-maturity decreases, the Gamma increases for ATM options. Here we plotted the BS Delta for $T-t$ equals $1$ and $0.25$ against the current underlying price $S_t$.

## Theta

For a call in B-S model,

The Theta of a call option is strictly negative. As the time-to-maturity decreases, the Theta decreases for ATM options (faster time-decay). Here we plotted the BS Theta for $T-t$ equals $1$ and $0.25$ against the current underlying price $S_t$.

## Discrete Delta Hedge and Gamma Scalping

A discretely Delta-hedged portfolio could buy $C$ and short $\text{Delta} \cdot S$. In this case it is a Delta neutral and long Gamma/Gamma scalping portfolio:

• Delta of the portfolio is $0$
• Gamma of the portfolio is positive
• achieve net profit only if the realized volatility of $S$ is high enough to overcome time decay, otherwise portfolio loss happens. This is the opposite from a short Gamma position, e.g. sell $C$ and long Delta $S$

We can visualize the P&L of a long Gamma portfolio in the following graph, where the green area indicate profits and the red area indicate losses. The curved line is $C_{t+\Delta t}$ the straight line is $\text{Delta}_t \cdot S_{t+\Delta t}$. As $\Delta t$ increases, $C_{t+\Delta t}$ shifts downwards due to time-decay.

In addition, we can show that the P&L of such portfolio $dV = dC - C_S dS$ does not depend on the drift $\mu$ of the stock:

Continue on L5

# Numerical Methods

The Taylor series of a real or complex value function $f(x)$ that is differentiable at $a$ is:

# Implied Volatility

Given the time-$t$ price of a European call option on a non-dividend stock $S$, the time-$t$ Black Scholes implied volatility $\sigma(t)$ is the unique solution to $C_t = C^{BS}(\sigma(t))$.

Uniqueness is because $C^{BS}$ is strictly increasing in $\sigma$ and Existence is because $C^{BS}$ covers the full range of arbitrage-free prices of the European option $[S_0-Ke^{-rT}, S_0]$

If $S_t$ follows the SDE dynamic $dS_t = rS_tdt + \sigma(t)S_tdW_t$, where $\sigma(t)$ a non-random function of $t$, then we can first find the implied volatility $\bar{\sigma}_T$ given call prices with different maturity $T$, and use the equation below to find (not uniquely) the true function $\sigma(t)$:

## Volatility Smile, Skew and Surface

If $S_t$ truely follows GBM with constant volatility $\sigma$, then $\sigma_{imp}(K, T) = \sigma, \;\forall\;K, T$. However, empirically the $\sigma_{imp}$ is lower when $K \approx S_t$ (volatility smile), possibly because

• the market price option using a risk-neutral distribution of log-returns with fatter tails than Normal

Note that $\sigma_{imp}$ is also higher when $K < S_t$ (volatility skew), possibly due to:

• instantaneous volatility increases as price decreases
• possibility of severe crash fuels demand for downside protection

In addition, the $\sigma_{imp}$ has a term structure and varies for different $T$. The function $\sigma_{imp}(K, T)$ is call the implied volatility surface

# Tree Model

## Binomial Tree

### European Option

Given option price at the $j$-th node $C_T^j = f(S_T^j)$, we can induct backward to find $C_t^j$:

### American Option - Put

Given option price at the $j$-th node $C_T^j = (K - S_T^j)^+$, we can induct backward to find $C_t^j$:

### American Option - Call

Given option price at the $j$-th node $C_T^j = (S_T^j - K)^+$. If $r>0$ and stock dividend $\delta=0$, then it is never optimal to exercise early on an American call option. Therefore $C^{American} = C^{European}$

Argument 1 At all $t>0$, the American call is worth more than the exercise payoff $S_t - K$:

Argument 2 If $C^{American} > C^{European}$ then construct portfolio $V = [-C^{American}, C^{European}]$. Then V is an arbitrage as $V_0 > 0$ and $V_T \geq 0$.

## Trinomial Tree

Let $\Delta t:= T/N$ and choose $\Delta x \approx \sigma\sqrt{3\Delta t}$ to improve accruacy.

# Finite Difference Model

## Explicit Scheme

Inducting backward from $t=T$ to $0$:

Solving for the B-S PDE: $rC = C_t + vC_S + 0.5\sigma^2C_{SS}$ where $v = (r-\sigma^2/2)$, we get:

Where:

Note that $p_u, p_m, p_d$ are trinomial tree probabilities.

## Implicit Scheme

Inducting backward from $t=T$ to $0$:

Solving the $LHS$ requires solutions of a system of $2J-1$ equation with $2J-1$ unknowns.

## Crank-Nicolson Scheme

Inducting backward from $t=T$ to $0$:

If given terminal conditions, then we know $C_{t+1}$‘s and can solve for $C_t$.

# Monte Carlo Model

Given $Y$ be a discounted payoff and the time-$0$ price of the payoff $C=\mathbb{E}Y$. The Monte Carlo estimator $\hat{C}_M$ of $C$:

By the strong law of large numbers, the sample average $\hat{C}_M$ converges almost surely to the expected value $C$ as $M \rightarrow \infty$. By the central limit theorem:

Often times we need to estimate $\sigma$ with sample estimator for the variance of $Y$:

The standard error $SE = \hat{\sigma}_M/\sqrt{M}$, and a $95\%$ confident interval for $C$ is $\hat{C}_M \pm 1.96SE$

## Variance Reduction Techniques

### Antithetic Variate

Let $\tilde{Y}:=Y_{Z = -z\;}$. The antithetic variate estimoator $\hat{C}^{av}_M$:

### Control Variate

A control variate $Y^\ast$ is a random variable, correlated to $Y$ such that $C^\ast := \mathbb{E}Y^\ast$ has an explicit formula.

Example Let $Y$ be the discounted payoff on a call on $S_t$ where $dS_t = \sigma(t)S_tdW_t$. We can choose $Y^\ast$ to be the discounted payoff on a call on $S_t^{\ast}$ where $dS_t^{\ast} = \sigma S^{\ast}_tdW_t$, in which case $C^{\ast}$ can be calculated explicitely through B-S formula given constant $\sigma$ close to $\sigma(t)$.

The control variate estimator $\hat{C}^{cv, \beta}_M$ estimates $C$ by simulating $Y - \beta Y^{\ast}$.

Choose $\beta$ to minimize $Var(\hat{C}^{cv}_M)$, we get:

Note that when using sample estimate $\hat{\beta^{\ast}\;}$, the estimated $\hat{C}^{cv, \hat{\beta}^{\ast}\;}_M$ is biased, only when $M$ is small.

### Importance Sampling

Suppose $X$ are IID draws from density $f$, and $C := \mathbb{E}h(X)$. Ordinary Monte Carlo estimator provides:

With importance sampling, find $g$ s.t. $g(x) > 0$ iff $f(x)g(x) \neq 0$. Then re-draw $X$ from density $g$ and the importance sampling estimator $\hat{C}^{is}_M$ is:

### Conditional Monte Carlo

Given a random variable $X$:

The condintional Monte Carlo estimator:

# Fourier Transform Model

Given $f:\mathbb{R}\rightarrow\mathbb{R}$ be integrable, meaning $\int|f(x)|dx < \infty$. The Fourier transform of $f$ is the function $\hat{f}: \mathbb{R}\rightarrow\mathbb{C}$ defined by:

Theorem If $\hat{f}$ is also integrable, then the inversion formula holds:

## Characteristic Function

The complex conjugate of a complex number $z = x + yi$ is given by $\bar{z} = x - yi$. so $\text{Re}(z) = \text{Re}(\bar{z})$.

The characteristic function of any random variable $X$ is the function $F_X(z)$ defined by:

Therefore if $X$ has density $f$, then $F_X(z) = \hat{f}(z)$. A characteristic function uniquely identifies a distribution. For example, $F_X(z) = e^{-z^2/2}$, if $X\sim\mathcal{N}(0,1)$

• To calculate the moments of $X$ using CF, take the $n$-derivatives of $F_X(z)$ w.r.t. $z$:
• To calculate the CDF of $X$ using CF:
• To calculate asset-or-nothing call price using CF, given $e^X$ be the asset share price, define the share measure $\mathbb{P}^{\ast}$ with likelihood ratio $e^X/\mathbb{E}e^X$.

Therefore for any $k\in\mathbb{R}$, the asset-or-nothing call price:

• To calculate a vanilla European call price on $e^X$ struck at $K$ with $k := log\;K$:

## Heston Model

Provided that:

Where $W^S$ and $W^V$ are $\mathbb{P}$ BM with correlation $\rho$, $\kappa$ is the rate of mean-reversion, $\theta$ is the long-term mean, and $\eta$ is the volatility of volatility.

We want to find the CF of $X$ in order to price options on $S_T$. The time-$t$ conditional Heston CF provides an answer:

# Discrete Time Martingales

## Conditional expectation

Definition A Borel set is any set in a topological space that can be formed from open sets through the operations of:

• complement
• countable union
• countable intersection

Definition Let $Y$ be a random vector and $X$ be a integrable random variable with $\mathbb{E}|X|<\infty$. The conditional expectation of $X$ given $Y$ is the unique measureable function $f(Y)$ such that for every Borel set $\mathcal{B}$:

We denote $f(Y)$ as $\mathbb{E}(X|Y)$

Example 1 Suppose random variable $X$ and $Y$ are discrete.

Example 2 Suppose random variable $X$ and $Y$ are continuous, with joint probability density function $f_{X, Y}(x. y)$ and marginal density $f_X(x)$ and $f_Y(y)$.

Here are some basic properties of conditional expectation:

• Linearity: $\mathbb{E}[aX + bY|\mathcal{F}] = a\mathbb{E}[X|\mathcal{F}] + b\mathbb{E}[Y|\mathcal{F}]$
• Constant: if $X = a$, then $\mathbb{E}[X|\mathcal{F}] = a$
• Independence: if $X$ is independent of $\mathcal{F}$, then $\mathbb{E}[X|\mathcal{F}] = \mathbb{E}X$
• Tower Property: if $\mathcal{G} \subset \mathcal{F}$ then $\mathbb{E}[ \mathbb{E}[X | \mathcal{F}]|\mathcal{G}] = \mathbb{E}[X | \mathcal{G}]$
• Factorization Property: if Z is $\mathcal{F}$-measurable then $\mathbb{E}[ZX|\mathcal{F}] = Z\mathbb{E}[X|\mathcal{F}]$
• Monotonicity: if $X \leq Y$, then $\mathbb{E}[X|\mathcal{F}] \leq \mathbb{E}[Y|\mathcal{F}]$ a.s.

## $L^2$ Theory

Definition A $\boldsymbol{\sigma}$-algebra is a collection $\Sigma$ of subsets of a Borel set $\mathcal{B}$, that is closed under:

• complement, e.g. if $A \in \Sigma$, then $\mathcal{B}\backslash A \in \Sigma$
• countable unions, e.g. if $A_n \in \Sigma$, then $\cup A_n \in \Sigma$

Definition $L^2(\Omega, \mathcal{F}, \mathbb{P})$ is the set of all $\mathcal{F}$-measurable square-integrable random variable $X$, with finite 2nd moment $\mathbb{E}X^2$.

Definition A real Hilbert space is a real vector space $\mathcal{H}$ with an inner product $<,>$, such that $\mathcal{H}$ is a complete metric space w.r.t. to the metric $d$, where:

Hilbert space examples: $\mathbb{R}^n$, with inner product $<\textbf{x}, \textbf{y}> = \sum x_iy_i$. Or, $L^2(\Omega, \mathcal{F}, \mathbb{P})$, with inner product $<\textbf{X}_1, \textbf{X}_2> = \mathbb{E}[X_1X_2]$. The reason we are interested at $L^2$ rather than $L^p$ for other $p$ is that the innner product $\mathbb{E}[X_1X_2]$ give rise of orthogonality.

Proposition If $X \in L^2(\Omega, \mathcal{F}, \mathbb{P})$, then for any $\sigma$-algebra $\mathcal{G} \in \mathcal{F}$, the conditional expectation $\mathbb{E}[X|\mathcal{G}]$ is the orthogonal projection of X onto $L^2(\Omega, \mathcal{G}, \mathbb{P})$, such that:

Also, $\mathbb{E}[X|\mathcal{G}]$ can be interpreted as a $\mathcal{G}$-measurable random variable that minimizes the mean square error $\mathbb{E}[(X - \mathbb{E}[X|\mathcal{G}])^2]$.

## Martingales

Definition A filtration is an increasing sequence of $\sigma$-algebra $\mathcal{F}_n \subset \mathcal{F}$, where $\mathcal{F}$ is the $\sigma$-algebra of all events.

Definition A martingale is a sequence of $\mathcal{F}$ measurable integrable random variable $X_{n}$ such that:

The tower property implies that $\mathbb{E}X_n = X_0$.

Example 1 Given I.I.D. random variable $X_n \subset L^2$ with $\mathbb{E}X_n = 0$ and variance $\sigma^2$.

• Sequence $S_n = \sum_{i=1}^n X_i$, and
• Sequence$T_n = (\sum_{i=1}^n X_i)^2 - n\sigma^2$

are both martingales.

Example 2 Let $X$ be any $L^1$ random variable and $\mathcal{F}_n$ be any filtration. Then the sequence $X_n := \mathbb{E}[X|\mathcal{F}_n]$ is a closed martingales.

Note that the St. Petersburg martingale $X_n$ is not closed, where $X_0 \in \mathbb{R}$ and $P(X_{n}=2X_{n-1}) = 1/2$ and $P(X_{n}=0) = 1/2$. This is because $X \notin L^{1}$.

Example 3 Given I.I.D. random variable $X_n$ with moment generating function $M_{X} = \mathbb{E}e^{\theta X}$. Then the exponential martingales $Z_{n}$ is a positive martingale with definition:

## Doob’s Indentity

Definition A sequence $Z_n$ of random variables is predictable with respect to filtration $\mathcal{F}_n$ if $Z_n$ is measurable with respect to $\mathcal{F}_{n-1}$

Definition A sequence $Z_n$ of random variables is adapted to filtration $\mathcal{F}_n$ if $Z_n$ is measurable with respect to $\mathcal{F}_{n}$

Proposition If $X_n$ is a martingale with $X_0 = 0$ and $Z_n$ is a predictable sequence of bounded random variables, then the martingale transform $\{Z \cdot X\}_n$ is a martingale:

Definition A stopping time with respect to filtration $\mathcal{F}$ is a random variable $T \in \mathbb{N} \cup \{\infty\}$ such that $\{T = n\} \in \mathcal{F}_{n} \;\forall\; n \geq 0$

Lemma Let $T$ be a stopping time, then the sequence $Z_n := \textbf{1}_{T \geq n}$ is predictable.

Theorem Let $X_{n}$ be a martingale and $T$ be a stopping time. For all $m \in \mathbb{N}$, the Doob’s Identity states that $\mathbb{E}X_{T\wedge m} = \mathbb{E}X_{0}$. Note that if $|X_{T\wedge m}|$ is bounded for all $m$, DCT shows that $\mathbb{E}X_{T} = \mathbb{E}X_{0}$.

Proof. $X_{T\wedge n}$ is a martingale:

Theorem Let $f_n$ be a sequence functions on measure space $(\mathcal{S}, \Sigma, \mu)$ that converge point-wise to a function f. For $\lim_{n \rightarrow \infty} \int_{\mathcal{S}} f_{n}d\mu = \int_{\mathcal{S}} f d\mu$,

• The Dominated Convergence Theroem (DCT) requires $f_{n}$ to be dominated by an integrable function $g$: $|f_{n}(x)| \leq g(x)$

• The Monotone Convergence Theroem (MCT) requires $f_{n}$ to be monotone (increasing or decreasing): $f_{1} \leq f_{2} \leq f_{3} ...$ or $f_{1} \geq f_{2} \geq f_{3} ...$

Example 1 Let $S_{n} = \sum X_{i}$ be a simple random walk with $X_{i} = \pm 1$. Let stopping time $T := min[n: S_{n} = +A \;or -B]$, where $A, B>0$.

We know that $S_{n}$ is a martingale and $S_{T\wedge n} < max(A, B)$. Apply Doobs’s Identity and DCT we have:

We know that $S_{n}^2 - n$ is a martingale. Apply Doobs’s Identity we have $\mathbb{E}S_{T\wedge n}^2 = \mathbb{E}(T\wedge n)$. Since $S_{T\wedge n}^2$ is bounded by $max(A^2, B^2)$ and $T\wedge n$ is monotone, apply DCT on the RHS and MCT on LHS we get:

Combine both results we can get some interesting result for the Gambler’s Ruin problem:

Example 2 Let $S_{n}$ be a simple random walk. Let stopping time $T := min\{n: S_{n} = +A\}$, where $A>0$. Note that now DCT fails as $S_{T\wedge n}$ is not bounded. Hence $\mathbb{E}S_{T} \neq 0$.

In fact, $\mathbb{E}S_{T} = 1$ because $S_{T} \equiv 1$:

## Doob’s Maximal Inequality

Definition An adapted sequence of random variable $X_n$ is a:

• sub-martingale if $\mathbb{E}[X_n | \mathcal{F}_{n-1}] \geq X_{n-1}$
• super-martingale if $\mathbb{E}[X_n | \mathcal{F}_{n-1}] \leq X_{n-1}$

Proposition If $\varphi :\mathbb{R} \rightarrow \mathbb{R}$ is a convex function and $X_n$ is a martingale, then:

• The Jensen’s Inequality holds: $\varphi(\mathbb{E}X) \leq \mathbb{E}\varphi(X)$
• the sequence $\varphi(X_n)$ is a sub-martingale.

Proposition If $X_n$ is a martingale with $X_0 = 0$ and $Z_n$ is a predictable sequence of boundedm non-negative random variables, then the martingale transform $\{Z \cdot X\}_n$ is a sub-martingale:

Proposition If $X_n$ is a martingale with $X_0 = 0$ and $Z_n$ is a predictable sequence of random variables such that $Z_n \in [0, 1]$, then $\mathbb{E}\{Z \cdot X\}_n \leq \mathbb{E}X_n$

Corollary If $X_n$ is a non-negative sub-martingale with initial term $X_0 = 0$, then Doob’s Maximal Inequality claims that for any $\alpha \in \mathbb{R}$:

and that:

Note that this is a big improvement on the Chebyshev Inequality, which claims that given $L^2$-bounded random variable $X$ and for any $k\in\mathbb{R}^+$:

## Martingale Convergence Theorem

Definition a sequence $x_i$ of real numbers is called a Cauchy sequence if for every positive real number $\epsilon$, there is a positive integer $N$ such that for all natural numbers $m, n > N$ such that $|x_m - x_n| \leq \epsilon$

Definition $L^2$ martingales have orthogonal increments. Given $X_n$ a $L^2$ martingale with increments $\xi_n := X_n - X_{n-1}$ and $X_0 = 0$, then:

• $\mathbb{E}\xi_n\xi_{n+m} = 0$, $\forall \;n < m$, and

Theorem Suppose $X_n$ is $L^1$-bounded martingale, then there exists a $L^1$-bounded random variable $X_{\infty}$ such that:

Theorem Suppose $X_n$ is $L^2$-bounded martingale, then there exists a $L^2$-bounded random variable $X_{\infty}$ such that:

(1) $\lim_{n \rightarrow \infty} X_n = X_{\infty} \;\text{a.s.}$
(2) $\lim_{n \rightarrow \infty} \mathbb{E}|X_n - X_{\infty}|^2 = 0, and\; \lim_{n \rightarrow \infty} \mathbb{E}X_n^2 = \mathbb{E}X_{\infty}^2$

## Change Of Measure

Proposition Given $P$ a probability measure and $Z$ is a non-negative random variable satisfying $\mathbb{E}_{P}Z = 1$, then there exist a probability measure $Q$ such that for any bounded or non-negative random variable $Y$ that $\mathbb{E}_QY = \mathbb{E}_PYZ$. Z is called the likelihood ratio of probability measure $Q$ w.r.t. $P$, written as $Z = dQ/dP$ and that:

Proposition If the outcome space $\Omega$ is finite, then for each outcome $\omega \in \Omega$, $Q(\omega) = P(\omega)Z(\omega)$

Example 1 In a $N$-period market with finite set of outcomes and tradable assets. Let $P, Q$ denote the risk-neutural measure for USD and EUR investors. Let $S_t^i, \tilde{S}_t^i$ denote the USD and EUR price of the risk-less (w.r.t. its own measure) asset $B^i$ at time t. Then $dP/dQ = S^1_0/S^1_N$

Proof. By fundamental theorem, $\tilde{S}_t^i = \mathbb{E}_Q\tilde{S}_N^i$, and $\tilde{S}_t^i=S_t^i/S_t^1$, so:

Theorem Let $P$ and $Q$ be two probability measure on the same measurable space, and let $\mathcal{F}_n$ be a filtration such that for all n $Q$ is absolutely continuous w.r.t. $P$ on $mathcal{F}_n$. Then the sequence of likelihood ratio $L_n$ is a martingale:

# Brownian Motion

## Standard Bronwian Motion

Definition A standard Brownian motion (SBM) is a continuous-time random process $B_t$ such that $B_0 = 0$ and:
(a) $B_t$ has stationary increments.
(b) $B_t$ has independent increments.
(c) The sample path $t \rightarrow B_t$ are continuous.

Note that (a), (b), and (c) imply that for some constant $\sigma^2>0$ the distribution of $B_{t+s}-B_s$ is $\mathcal{N}(0, \sigma^2t)$

Definition Given a SBM $B_t$, $W_t = \mu t + \sigma B_t$ is a Brownian motion with drift $\mu$ and variance $\sigma^2$.

Proposition Given a SBM $B_t$, its reflection $-B_t$ is also a SBM.

Proposition Given a SBM $B_t$, then for any $\alpha \in\mathbb{R}^+$, $\tilde{B} := B_{\alpha t}/\sqrt{\alpha}$ is a SBM

Definition The nth level quadratic variation of a function $f: [0,t] \rightarrow \mathbb{R}$ is the sum of squares of the increments across intervals of length $2^{-n}$:

Theorem Given a SBM $W_t$ with drift $\mu$ and variance $\sigma^2 > 0$, then for all $t>0$ with probability $1$:

## Strong Markov Property

Definition Given a SBM $B_t$, a stoping time is a non-negative random variable $T$ such that for every fixed $t \geq 0$, the event $\{T \leq t\}$ depends only on the path $\{B_s\}_{s\leq t}$

Theorem If $W_t$ is a Brownian motion and $T$ is a stopping time then the strong Markov property holds:
(a) the process $\{B_{t+T} - B_T\}_{t\geq 0}$ is a Brownian motion, and
(b) the process $\{B_{t+T} - B_T\}_{t\geq 0}$ is independent of the path $\{B_s\}_{s\leq T}$

Theorem Run Brownian motion $W_t$, at the first time $\tau$ that $W_{\tau} = a > 0$, reflect the path in the line $y=a$, by the reflection principle the new process $W^{\ast}_t$ is another Brownian motion:

• for $t \leq \tau$, $W^{\ast}_t = W_t$
• for $t > \tau$, $W^{\ast}_t = 2a - W_t$

Corollary $P[\tau \leq s] = 2P[W_s > a]$

Corollary $M_t := max_{s \leq t} W_s$ has the same distribution as $|W_s|$

Corollary $-M_t^- := -min_{s \leq t} W_s$ has the same distribution as $M_t$. Hence $P[M_t>a]=P[-M_t^-<-a]=2P[W_t>a]>0$. Consequently, for every $t>0$ with probability 1 $M_t>0$ adn $M_t^-<0$. Therefore for every $\epsilon>0$, the Brownian path crosses the t-axis infinitely many times by time $\epsilon$

## Martingales In Continuous Times

Definition A filtration is a nested family of $\sigma$-algebra indexed by time $t$.

Definition The natural filtration for a Brownian motion $W_t$ is the filtration with $\mathcal{F}_t$-the collection of all events determined by Brownian path up to time $t$.

Definition A continuous-time stohastic process X_t is a martingale relative to a filtration $\mathcal{F_t}_{t\geq 0}$ if:
(a) each random variable $X_t$ is measurable w.r.t. $\mathcal{F_t}$ and
(b) for any $s, t\geq 0$, $\mathbb{E}(X_{t+s}|\mathcal{F}_t)=X_t$

Proposition Given a SBM $B_t$ then each of these is a martingale relative to the natural filtration:
(a) $B_t$
(b) $B_t^2 - t$
(c) $e^{\theta B_t - \theta^2t/2}$

Theorem Define $P_{\theta}$ to be the probability measure with likehood ratio $Z_t^{\theta} = dP_{\theta}/dP_0= e^{\theta B_t - \theta^2t/2}$. The Cameron-Martin theorem states that the SBM $B_t$ under $P_0$ is a Brownian motion with drift $\theta$ and variance $\sigma^2=1$ under $P_{\theta}$.

Corollary For any real value $\theta, \eta$ and $t <\infty$

Corollary For any stopping time $\tau$ and $T <\infty$,

# Ito Calculus

## Ito Integral

Definition If $X_t$ is an uniformally bounded process with continuous paths $t \rightarrow X_t$ adapted to $\mathcal{F_t}$ then we can define an Ito Integral $I_t(X)$, where $X^{(n)}$ is $X$ truncted at $\pm n$ :

Property The Ito Integral satisfy the following properties:
(1) Linearity: 􏰃$\int (aX_s +bY_s)dW_s = a\int X_sdW_s + b\int Y_sdW_s$.
(2) Continuity: the paths $t \rightarrow \int_0^t X_sdW_s$ are continuous.
(3) Mean Zero: $\mathbb{E} \int_0^t X_sdW_s = 0$
(4) Variance， a.k.a. Ito Isometry:

Defintion Define the quadratic variation of the Ito Itegral:

Proposition
(a) The process $I_t(X)$ is a martingale
(b) The process $I_t(X)^2 - [I_t(X), I_t(X)]$ is a martingale

Example $\int_0^T W_sdW_s = (W_T^2 - T)/2$

Example For any stopping time $\tau$ and any $t < \infty$:

Theorem Let $W_t$ be a SBM and let $\mathcal{F}_t$ be the $\sigma$−algebra of all events determined by the path $\{W_s\}_{s\leq t}$. If $Y_t$ is any random variable with mean 0 and finite variance that is measurable with respect to $\mathcal{F}_t$ , for some $t > 0$, then the Ito representation theorem claims that $\exists$ adapted process $A_s$ such that:

This theorem is of importance in finance because it implies that in the Black-Scholes setting, every contingent claim can be hedged.

## Ito Formula

Theorem Let $W_t$ be a SBM, and let $f: \mathbb{R} \rightarrow \mathbb{R}$ be a twice-continuously differentiable function such that $f, f', f''$ are all bounded (or at most have exponential growth). Then for any $t > 0$:

Theorem Let $W_t$ be a SBM, and let $U: [0, \infty) \times \mathbb{R} \rightarrow \mathbb{R}$ be a twice-continuously differentiable function whose partial derivatives are all bounded. Then for any $t > 0$:

Proposition Assume $f(t)$ is nonrandom and continuously differentiable. Then:

## Ito Process

Definition An Ito process is a stochastic process $X_t$ that satisfies a stochastic differential equation of the form:

Equivalently, $X_t$ satisfies the stochastic integral equation:

Definition For any adapted process $U_t$ define:

Theorem Let $X_t$ be an Ito process, and let $U$ be a twice-continuously differentiable function whose partial derivatives are all bounded. Then:

### The Ornstein-Uhlenbeck Process

Definition The Ornstein-Uhlenbeck SDE: $dX_t = −\alpha X_t dt + dW_t$
(a) This SDE describes a process Xt that has a proportional tendency to return to an “equilibrium” position 0.
(b) In finance, the OU process is often called the Vasicek model.
(c) Solving the SDE: $Xt =e^{−\alpha t}X_0 + e^{-\alpha t} \int_0^t e^{\alpha s}dW_s$
(d) The Ornstein-Uhlenbeck process is Gaussian.

### The Exponential Martingale

Definition The Exponential Martingale SDE: $dX_t = −\theta X_t dW_t$
(a) Solving the SDE: $X_t = Ce^{ − \theta^2t/2 + \theta W_t}$

### The Diffusion Process

Definition The Diffusion SDE: $dX_t = \mu(X_t)dt+ \sigma (X_t)dW_t$

Definition The Harmonic Function is a function $f(x)$ that satisfies the ODE:

Example Let $X_t$ be a solution of the diffusion SDE with initial value $X_0 = x_0$, and for any real numbers $A \leq x_0 \leq B$ let $\tau := min\{t: X_t \notin (A, B)\}$. Find $P(X_{\tau} = B)$

We first apply the Ito Formula to $df(X_t)$ and observe that a harmonic function $f$ will force the $dt$ term to vanish. Therefore $f(X_t)$ is a martingale and that $\mathbb{E}f(X_{\tau}) = f(x_0):$

We can solve for $f(x)$:

### The Diffusion Process - Bassel Process

Definition The Diffusion SDE: $dX_t = a/X_tdt+ dW_t$

Example Similar problem as above:

Note that if $x_0 > 0$ and $a \geq 1/2$ then $X_t$ will never reach $0$.

## Ito Formula - Multi-Variable

Theorem Let $\textbf{W}_t =(W_t^1,W_t^2,...,W_t^K)$ be a K−dimensional SBM, and let $u: \mathbb{R}^K \rightarrow \mathbb{R}$ be a $C^2$ function with bounded first and second partial derivatives. Then the Ito Formula states:

Where:

Corollary If $\tau$ is a stopping time for the SBM $\textbf{W}_t$ then Dynkin’s Formula shows that for any fixed time $t$:

And that $u(\textbf{W_t}) \dfrac{1}{2} \int_0^{t} \triangle u(\textbf{W}_s)ds$ is a martingale

Definition A $C^2$ function $u: \mathbb{R}^K \rightarrow \mathbb{R}$ is said to be a Harmonic Function in a region $\mathcal{U}$ if $\triangle u(x)=0, \; \forall x \in \mathcal{U}$

(a) 2D Harmonic Function Exmaple: $u(x,y)=log(x^2 +y^2)=2logr$
(b) 3D Harmonic Function Example: $u(x,y,z)=1/\sqrt{x^2 +y^2 +z^2} =1/r$

Corollary Let $u$ be harmonic in the an open region $\mathcal{U}$ with compact support, and assume that $u$ and its partials extend continuously to the boundary $\partial\mathcal{U}$. Define $\tau$ to be the first exit time of Brownian motion from $\mathcal{U}$, then:

(a) the process $u(W_t\wedge\tau)$ is a martingale, and
(b) for every $x \in \mathcal{U}$, $\;\mathbb{E}^xu(W_{\tau}) = u(x)$

Example If a 2D SBM starts at a point on the circle $C_1$ of radius 1, find out the probability $p$ that it hits concentric circles $C_2$ before $C_{1/2}$.

Let $u(x, y) = log r$ be harmonic. Then $u(W_t\wedge\tau)$ is a martingale and that $\mathbb{E} u(W_t\wedge\tau) = u(W_0) = log(1) = 0$.

Example If a 3D SBM starts at a point on the sphere $C_1$ of radius 1, find out the probability $p$ that it hits concentric sphere $C_2$ before $C_{1/2}$.

Let $u(x, y) = 1/r$ be harmonic. Then $u(W_t\wedge\tau)$ is a martingale and that $\mathbb{E} u(W_t\wedge\tau) = u(W_0) = 1/1 = 1$.

## Ito Process - Multi-Variable

Definition An Ito process is a continuous-time stochastic process $X_t$ of the form:

Where the quadratic variation $d[X_t, X_t] = \textbf{N}_t \cdot \textbf{N}_t dt$

Let $\textbf{X}_t = (X^1_t,X^2_t,...,X^m_T)$ be a vector of Ito processes. For any $C^2$ function $u:\mathbb{R}^m \rightarrow \mathbb{R}$ with bounded first and second partial derivatives, then:

Theorem Let $\textbf{W}_t$ be a K −dimensional SBM, and let $\textbf{U}_t$ be an adapted, K−dimensional process satisfying $|\textbf{U}_t|=1, \;\;\forall t \geq 0$. Then the Knight’s Theorem states that the 1-dimensional Ito process $X_t$ is a SBM:

Proposition Let $\textbf{W}_t$ be a K −dimensional SBM. Define $R_t := |\textbf{W}_t|$ be the radial part of $\textbf{W}_t$. Then $R_t$ is a Bessel process with parameter $(K-1)$:

# Barrier Option

## Pricing

Definition A barrier option at time $T$ pays:
(a) $\$1$if$max_{0 \leq t \leq T}\;S_t \geq AS_0$, (b) $\$0$ otherwise.

Assume that $S_t$ follows GBM:

The no-arbitrage price $V_t$ of the barrier option at $t=0$ is the expected payoff:

At time $t$, there are two possibilities:
(a) if $max_{0 \leq r \leq t}\;S_r \geq AS_0$, then $V_t = e^{-r(T-t)}$
(b) if $max_{0 \leq r \leq t}\;S_r \leq AS_0$, then $V_t$ is the same as the time-$0$ value $V_0$ of a barrier option with time-to-maturity $T-t$ and $A'=AS_0/S_t$

## Hedging

Let $v(t, S_t)$ be the value of the barrier option at time $t$. The Fundamental Theorem and Ito Formula show that v(t, S_t satisfy the Black-Scholes PDE:

A replicating portfolio for the barrier option holds
(a) $v_S$ share of stock
(b) $e^{-rt}(v - v_SS)$ share of cash

provided that $S_t\leq AS_0$. Once $S_t\geq AS_0$ the portfolio convert all holdings to cash and hold till maturity.

# The Black-Scholes

## The Black-Scholes Formula

Theorem Under a risk-neutral $P$, the Fundamental Theorem asserts that discounted share price $S_t/M_t$ is a martingale, where:

Therefore $\mu_t \equiv r_t$:

Definition A European contingent claim with expiration date $T > 0$ and payoff function $f: \mathbb{R}\rightarrow\mathbb{R}$ is a tradeable asset with:
(a) share price at time $T$: $f(S_T)$
(b) discounted share price at time $t < T$: $\mathbb{E}[f(S_T)/M_T | \mathcal{F}_t]$

Proposition Let $W_t$ be a standard Brownian motion and $g:\mathbb{R}\rightarrow\mathbb{R}$ is a function such that $\mathbb{E}|g(W_T)| < \infty$. Then for every $0 \leq t \leq T$:

Corollary Given $dS_t = r_t S_tdt + \sigma S_tdW_t$, the Black Scholes Formula shows:

Under risk-neutral $P$, the time $t$ option price $u(t,S_t)/M_T$ is a martingale. With the Ito Formula we can set the drift of $du$ to be zero and therefore derive the Black Scholes PDE:

## Hedging In Continuous Times

Definition A portfolio $V_t = \alpha_t M_t + \beta_t S_t$ is self-financing if $dV_t = \alpha_t dM_t + \beta_t dS_t$ for all $t \leq T$

Proposition A portfolio $V_t$ is self-financing if and only if its discounted value $V_t/M_t$ is a martingale and satisfies:

Definition A replicating portfolio $V_t$ for a payoff function $f(S_T)$ is a self-financing portfolio such that $V_T = f(S_T)$

Theorem A replicating portfolio for contingent claims $f(S_T)$ is given by:
(a) $\alpha_t = (u - u_SS_t)/M_t$ cash, and
(b) $\beta_t = u_S$ shares of stock

where u is the solution of the Black Scholes PDE satisfying $u(T, S_T) = f(S_T)$

# The Girsanov Theorem

Proposition The exponential process $Z_t$ is a positive martingale.

Applying Ito Formula $Z_t = 1 + \int_0^t Z_sY_sdW_s$ and therefore $\mathbb{E}Z_t = 1$

Therorem Given $W_t$ a SBM under $P$-measure and the likelihood ratio $Z_t$, define the $Q$-measure where $dQ/dP = Z_t$. Then the Girsanov’s Theorem states that under the $Q$-measure:
(a) $\tilde{W}_t = W_t - \int_0^t Y_sds$ is a SBM
(b) $W_t$ is a BM with time-dependent drift $\int_0^t Y_sds$

Example 1 Given $W_t$ a brownian motion with $W_0 \in (0, A)$, define measure $Q$ be the conditional probability measure on event $\{W_T = A\}$. Therefore $W_t$ is a BM with drift $W_t^{-1}dt$.

Proof. We know that $\mathbb{P}[W_T = A] = W_0/A$, therefore by change of measure:

Therefore Girsanov’s Theorem implies that under $Q$, $\tilde{W}_t = W_t - \int_0^{T\wedge t} W_s^{-1}ds$ is a SBM.

Example 2 Given currency $A, B$ and their respective bank account $dA_t = r^A_tA_tdt$ and $dB_t = r^B_tB_tdt$. Define exchange rate (# B per A) $Y_t$ that $dY_t = \mu_t Y_tdt + \sigma Y_tdW_t$

Theorem If $W_t$ is a SBM under measure $Q^B$ then $\mu_t = r^B_t - r^A_t$.

Proof. $Y_t(A_t/B_t)$ is a martingale only if $\mu_t = r^B_t - r^A_t$

Theorem

# Levy Process

## Poisson Process

Definition A Levy process is a continuous-time random process $\{X_t\}_{t\geq 0}$ such that $X_0 = 0$ and:
(a) $X_t$ has stationary increments;
(b) $X_t$ has independent increments;
(c) the sample paths $t \rightarrow$X_t\$ are right-continuous.

Note that Brownian motion and Poisson process are both Levy processes and the basic building blocks of Levy processes. Brownian motion is the only Levy process with continuous paths.

Example Let $W_t$ be a SBM and for $a \geq 0$, the random variable $\tau_a$ is a Levy process.

Note that:
(a) $\tau_a$ has stationary, independent increments
(b) $\tau_{ab}$ has the same distribution as $b^2\tau_a$

Definition A Poisson process with rate $\lambda > 0$ is a Levy process $N_t$ such that for all $t \geq 0$ the random variable $N_t$ follows Poisson distribution with mean $\lambda t$:

Proposition If $X, Y$ are independent Poisson distributions with mean $\lambda, \mu$, then $X+Y \sim Poisson(\lambda + \mu)$.

Proof. $P(X+Y=n) = \sum_{m=0}^nP(X=m \;\;\text{and}\;\; Y=n-m)$

Corollary IF $N_t, M_t$ are independent Poisson processes with rates $\lambda, \mu$ then the superposition $N_t + M_t$ is a Poisson process with rate $\lambda + \mu$

Proposition Every discontinuity of a Poisson process is of size $1$

Proposition Let $N_t$ be a Poisson process of rate $\lambda > 0$, and let $\xi_i$ be an independent sequence of i.i.d. Bernoulli−$p$ random variables. Then the Thinning Theorem states that $N^S_t, N^F_t$ are independent Poisson processes with rates $\lambda p, \lambda (1-p)$:

Theorem If $n \rightarrow \infty$ and $p_n \rightarrow 0$ in such a way that $np_n \rightarrow \lambda > 0$, then the Law of Small Numbers states that the $Binomial(n, p_n)$ distribution converges to the $Poisson(\lambda)$ distribution.

Proposition If $N_t$ is a rate−$\lambda$ Poisson process, then for any real number $\theta$ the process $Z_t :=e^{\theta N_t + (\lambda - \lambda e^{\theta})t}$􏰍 is a martingale.

Theorem Define $Q$ with likelihood ratio $Z_t$ such that $dQ/dP | \mathcal{F}_t = Z_t$. Then under $Q$ the process $N_t$ is a rate-$-\lambda e^{\theta}$ Poisson process.

## Compound Poisson Process

Definition A compound Poisson process $X_t$ is a Levy process of the form:

Where $N_t$ is rate-$\lambda$ Poisson process and $Y_i$ are i.i.d. random variable independent of $N_t$. The distribution $F_{Y}$ is the compounding distribution and the measure $\lambda \times F_{Y}$ is the Levy measure.

At each $T_i \in N_t$, a random $Y_i$ is draw from $F_{Y}$. $X_t$ is the sum of all draws made by time $t$

Proposition If $\psi(\theta) = \mathbb{E}e^{\theta Y_i} < \infty$, then $\mathbb{E} e^{\theta X_t} = e^{-t\lambda (1- \psi(\theta))}$, and $\theta \in \mathbb{R}$, $Z_t^{\theta} = e^{\theta X_t - \lambda t(\psi(\theta)-1)}$ is an exponential martingale.

## Poisson Point Process

Definition Let $\mu$ be a $\sigma$−finite Borel measure on $\mathbb{R}^n$. A Poisson point process $\mathcal{P}$ with intensity measure $\mu$ is a collection $\{N_B\}_{B\in\mathcal{B}}$ of extended nonnegative integer-valued random variables such that
(A) If $\mu(B) = \infty$ then $N_B = \infty$ a.s.
(B) If $\mu(B) < \infty$ then $N_B \sim Poisson(\mu(B))$
(C) If $\{N_i\}_{i\in\mathbb{N}}$ are pairwise disjoint, then the r.v.s $N_{B_i}$ are independent, and $N_{\cup_i B_i} = \sum_{i=1}^{\infty} N_{B_i}$

Proposition The point process $(T_n, Y_n)$ associated with a CPP is a Poisson point process with intensity measure $Lebesgue \times v$, where $v=\lambda F$ is the Levy measure for the CPP.

Theorem Let $X_t$ be any Levy process, and let $J$ be the random set of points $(t,y) \in [0,\infty) \times \mathbb{R}$ such that the Levy process $X$ has a jump discontinuity of size $y$ at time $t$, i.e.,

Then $J$ is a Poisson point process with intensity measure $Lebesgue \times v$ where $v$ is a $\sigma$−finite measure called the Levy measure of the process.

# Standard Simulation Model on Credit Portfolio

## Credit Risk

Lenders, such as banks, are subject to many kinds of risks. among which credit risk is the most likely to cause bank failure.

• Credit risk
• Market risk
• Operation risk
• Reputation risk

Each loan is part of a legal agreement that requires the borrower to pay interest and repay principle on schedule, while some borrowers are required to obey specified covenants, such as maintaining earning above a certain threshold.

If the borrower fails to follow the agreement, the lender holds the borrower to be in default, which can be money default or covenant default. Purchaser of public bonds only experiences money default.

At default, the loan agreement calls for fee to be paid by the borrower, gives the bank power to seize collateral (for secured loans), and has a cross default provision (where all loans are in default once one loan is in default).

In the 20th century, most banks did not define default until they discovered a model that could help them manage credit risk.

## Rating Agencies

There are 3 major Nationally Recognized Statistical Rating Organizations (NRSRO) to which firms pay to rate their bonds to increase liquidity.

• Standard & Poor
• Moody’s
• Fitch

Under S&P ratings, the grades are:

• Investment grade: AAA, AA, A, BBB
• Non-investment grade: BB, B, CCC, CC
• Selectively defaulted: SD
• Defaulted: D

## D and PD

Let D be the default indicator of a loan, taking only two values: 0 and 1. PD is the probability of default annually.

By mathematical identity:

• Knowing PD, we can simulate D by a Bernoulli Distribution with parameter as PD.
• Given data on D, we can calculate the implied PD.

In a portfolio of N firms, the portfolio default rate, DR, equals:

## Exposure, Recovery and LGD

Exposure is the amount that is owed to the borrowers. Recovery is measured in either of two ways:

• Market price of the loan at the time of default
• Discounted future cash flows back to the time of default

LGD (Loss Given Defaults) is a random variable with values usually between 0 and 1:

For a defaulted loan, there are two ways to measure recovery/LGD. For a current loan, there is a distribution for LGD. The expectation is written as:

US investment grade bond LGD is about 0.20%, while non-investment grade is about 3.60%. Bank loans are almost alwasy senior to bonds and have lower LGD.

## Loss and EL

Loss is measured as a fraction of exposure:

EL is the expected loss. Because D and LGD are indepndent, so:

Lenders often need to estimate and include EL in the spread they charged.

## Change Of Variable

Note the LGD is often measured in fractions. To change the measure to dollar amount, we need to use the Chain Rule.

Given the pdf of LGD:

We define the function g such that:

Hence the function g-inverse is:

The partial derivative can be expressed as:

By definition:

Taking derivative on both sides and with chain rule:

Finally:

## Simulate Portfolio Loss On One Single Loan

We know that:

To simulate loss, we first simulate D:

Then simulate LGD based on the pdf of LGD. Multiple each D and LGD to get Loss. Repeat the process to produce a distribution of Loss.

## Simulate Portfolio Loss On N Independent Loan

Assume the default of each of the N loan is independent and have the same probability of default, PD:

Then the total number of defaults follows binomial distribution:

However, based on historically data, the variance is much higher than that of the binomial distribution. Hence default correltion needs to be introduced.

## Simulate Portfolio Loss On N Correlated Loan

Assume that there is a latent unobserved variable zi that is responsible for the default of firm i, i.e. firm i defaults if:

Assume any two firms i and j are jointly normal. Denote the correlation between zi and zj:

Let ri, j be the correlation between asset return of firm i and j, we know that almost certainly:

Denote PDJ as the probability that both firm i and j default:

To calculate PDJ with python:

Returns:

Now that we have the Di, we can simulate portfolio loss rate, given the LGD distribution and exposures for each firm.

Denote Dcorr to be the correlation between Di and Dj:

Note that holding PDi, PDj fixed:

• greater Dcorr => greater PDJ
• greater ρ => greater PDJ
• ρ between -1 and 1 => PDJ between 0 and min[PDi, PDj]

## Copula

When we model more than three firms, pair-wise correlation is not enough to determine the entire distribution of outcomes. For example, there are N PD’s and N(N-1)/2 pair-wise correlations while we want to calculate 2N outcomes. Hence we introduce the Gauss copula which helps describe the group-wise correlations.

Consider a set of multivariate normals:

The quantiles of the set are uniformly distributed by definition:

The copula of the set (Z1, Z2, …, ZN) is defined as the joint cumulative distribution function of (Φ(Z1), Φ(Z2), …, Φ(ZN)):

The Gauss copula is as follow. Note that among all possible copula, the Central Limit Theorem defines and supports the Gauss copula:

In fact, the copula does not contain any information on the marginal distribution. Here we set the marginal distribution FZ to follow standard normal only as an example, but it can be anything continuous such that:

And so:

In the context of default modeling, we assume that each company’s default follows Bernoulli and simulate with standard normal distribution:

The probability of all firms default at the same time is by definition:

Note that given a pair-wise correlation matrix Σ, this probability can take any values between 0 and the lowest single firm default probability.

Now we assume all firms’z are connected by the Gauss copula, which suggests a single value for the probability of all defaulting.

With python we can either numerically evaluate the integral or use simulation to calculate the probability that all firms default at the same time.

Returns:

Note that the compared to the other copulas, the Gauss copula requires only a pair-wise correlation matrix and the PD to tell a lot of information. Most of the times the Gauss copula has not been shown invalid, while the calibration of the marginals and correlation matrix are often proved erroneous.

## Simulate Rating Transitions

The default model only has two states, 0 and 1:

To simulate rating transitions, we require two matrix:

• Transition Matrix: P[i \rightarrow j], \forall i, j
• Cost Matrix, e.g. the loss due to deterioration of borrowers: cost[i \rightarrow j], \forall i, j

# Factor Model

## Single Factor Model

We construct the single risk factor model with latent variable Zi:

The pair-wise correlation between two firms i and j’s latent variables is:

Where:

• Z and Xi are Independent
• Z is the systematic factor that affects all firms. If Z increase, all Zi decrease and become more likely to default. Z summarizes the effects of all observable macroeconomic factors plus the effects of unobservable factors.
• Xi is the idiosyncatic factor that affects only firm i’s latent variable
• Zi ~ N(0, 1) by construction
• {Zi} are jointly normal and connected by a Gauss copula

## cDR and Vasicek

Define Conditional (Expected) Default Rate (cDR) as:

This gives the final form of cDR, which is called the Vasicek formula, named after Oldrich Vasicek. Note that the Vasicek formula is monotonic in z and in PD, i.e., higher the z/PD, higher the cDR.

The expected default rate for firm i is always PDi, since:

However, when Z is known, the expected default rate is cDRi. Firms are now uncorrelated as Z is known:

If there are large numbers of identical firms with uniform PD and ρ, the default rate of such asymptotic portfolio follows the unconditional Vasicek distribution.

The unconditional Vasicek pdf can be derived with change-of-variable technique. Note that we eliminate z and the pdf only has parameter PD and ρ:

The mean of cDR is PD:

## Multi-factor Model

Suppose that there are two jointly normal systematic risk factors ψ and ω, and that there are two group of firms depending on each of the factors:

Between the two groups:

Note that:

• If corr[ψ, ω] = 1, this becomes the single factor model and that:
• If corr[ψ, ω] < 1, the cross-correlations are less than that in the single factor case. It is called diversification.
• With multi-factor model, risk becomes sub-additive, as oppose to additive in the single factor models. This means that the risk in the portfolio is less than the sum of the cDRs’.
• The Moody's Factor Model attribute each Zi to about 250 factors, along with a firm-specific idiosyncratic factor.

## Basel II Capital formula

The Bank For International Settlements is in Basel, Switzerland. The Basel Committee on Bank Supervision drafted legislation requiring banks to hold minimum capital, e.g. Basel II, Basel III, etc.

The Basel II formula is an Asymptotic Single Risk Factor model, where the portfolio is large enough for the Law of Large Number to work and it generalizes the Vasicek Distribution and include a diverse choice of PD and ρ within the portfolio. The core of the capital requirement for credit capital is the inverse CDF of Vasicek Distribution.

Inverse Vasicek (with parameter PD and ρ):

Note:

• K is the capital requirement per dollar of wholesale loan.
• LGD is the average LGD in historical downturn conditions
• R (correlation) = 0.12 + 0.12 x exp(-50 x PD)
• b = [ 0.11852 - 0.05478 x log (PD) ]2
• M is maturity

Making sense of the Basel II formula:

• Capital requirement is for loss, as oppose to only default, hence the formula multiplies by LGD.
• Capital requirement is for unexpected loss, hence the formula subtracted the expected loss LGD X PD. The expected portion is handled by bank reserves.
• Loans might deteriorate without defaulting, hence a maturity adjustment is added to impose higher capital for longer maturity loan.
• The estimation of PD and LGD is performed by the banks and supervised by bank supervisor.

# Estimation, Statistical Test and Overfit

## Estimating PD

Firms differ widely in their credit quality, and PD tend to change over time as well. So a firm’s PD is neither known or fixed. We analyze analogous firms with identical credit ratings to estimate PD.

Method 1, for all A-rated firms in the dataset:

Method 2, for all A-rated firms in the dataset:

Method 3, estimate PD as a parameter in a pdf describing A-rated firms. This tries to find a distribution that best fits the data. We will focus on this method.

## Method Of Moments

Given a dataset {Xi}N, we set the moments of the Vasicek distribution equal to the moments of the data.

First moment:

Second moment (unbiased, using N-1 in denominator):

Note:

• The method of moment matches the broad features of distribution with the data
• The solution is not unique. Choices can be made between central moment/raw moment, lower moment/higher moment.
• By Jensen’s Inequality, functions of moments are not moments of functions

## Maximum Likelihood Estimation

The MLE method chooses parameter values that make the data most likely under the assumed distribution. MLE matches the distribution to the data as a whole, as oppose to M.o.M. which only matches the moments. The MLE fits the PDF to the dataset.

When data is not highly dispersed, however, the MLE estimate tend to be close to the M.o.M. estimate.

The MLE method is biased estimate that choose parameters that maximize the likelihood function. Given a dataset {Xi}N, we assume the true default rates follow Vasicek distribution. The likelihood function is:

Often we try to maximize the log-likelihood function, i.e. find PD and ρ such that:

## Hypothesis Testing & Wilks’ Theorem

We does not assert truth, as truth is often unknown. With a given set of data, we can only assert some models are better in predicting the future behavior of similar data.

We called the simpler model the null hypothesis, the more complicated ones the alternative hypothesis. The null generally nests under the alternative, i.e. the alternative becomes the null when some parameters are set to certain values.

We prefer the null, because it is simpler, and by doing so we avoid Type 1 error, which is the rejection of a true null.

Hence we only reject the null if the alternative fits the data significantly better through a statistical test.

Wilks Theorem asserts that if:

• There is an asymptotic amount of data
• The null hypothesis is true

Then D has a distribution that approaches the χ2 distribution (with df = number of extra parameters in the alternative), given dataset {Xi}N:

The likelihood ratio is defined as follow. It is less or equal than 1 as the alternative is more flexible, and it leads to more probability densities given certain data:

We reject the null hypothesis if D statistic is a tail observation that either the null is not true or the null is true and something (type 1 error) unlikely happen. We reject the null when:

For example when df = 1, the critical value = 3.84, we will reject the null with 95% confidence when:

## Overfit

An overfit model makes worse forecast than a simpler model.

We assume the population data (X, Y) follows bivariate normal distribution:

Given ρ, the population regression line is:

The sample regression line is:

From a sample of 30 observations of (X, Y), ordinary least square (OLS) is performed to find the in-sample p-value for the coefficient and R2. MSE is used to evaluate forecast error.

• When ρ = 0.8, the sample regression line (yellow) is close to the population regression line (red):

• When ρ = 0.2, the sample regression line does NOT match well.

This shows that when the population has a week relationship (ρ = 0.2), estimates of slope are more dispersed.

Now we look at the relationship between statistically significance and MSE. The population Mean-Squared Error (MSE) is an out-of-sample measure of forecast errors. The population MSE does NOT depend on any in-sample data:

We can see that the population regression (b = ρ, a = 0) would minimize MSE, by taking partial derivatives. We can also see that higher the ρ, lower the MSE.

A regression is significant (at 95% confidence) if the p-value for the coefficient b is less than 0.05.

We have observed that when population has a weak relationship (ρ = 0.2):

• Forecasts by significant regressions tend to have greater MSE.
• Forecasts by regressions with higher R-square tend to have greater MSE.

This is because the strong relationship suggested by the regression does NOT forecast the week population relationship well.

When population has a strong relation (ρ = 0.8), however, the significant regression/high R-square holds out-of-sample.

# Conditional LGD Risk

## cLGD

The history of bond LGD shows that LGD is elevated when default rate is elevated. The elevation is shown to be moderate and similar across different debt types:

It is important to model LGD appropriately in different economic conditions. Like cDR, we define cLGD:

Note that:

There are two ways to calculate ELGD:

Futhermore,

Where:

• EcLGD is the average LGD over conditions
• ELGD is the average LGD over different loans
• ELGD is higher than EcLGD because when cLGD is higher, cDR/PD is also higher, which increase the probability weight on the higher cLGDs, while in EcLGD, higher cLGD does not have higher weight.

## Frye-Jacobs

Modeling cLGD separately from cDR introduces complexity and potential overfit to the cLoss model. Instead, the Frye-Jacobs LGD function assumes that both cDR and cLoss follow Vasicek distribution, and infers cLGD as a function of cDR.

Frey-Jacobs assumptions:

1. cDR and cLoss are comonotonic.

• If cDR goes up, cLoss must go up.
• If cDR is in its qth quantile, then cLoss must also be in its qth quantile. This implies that there is a cLGD function of cDR:

2. cDR follows Vasicek distribution, which stems from the simplest portfolio structure:

• Large number of Firms
• Each firm same PD
• Each pair-wise ρ the same (same PDJ)
• Gauss copulas
1. Distribution of cLoss does NOT depend of the definition of default.
\times This implies the distribution of cLoss does not have separate parameters for PD and ELGD. It does have a parameter EL.

2. cLoss follows Vasicek distribution

1. cLoss and cDR have the same ρ parameter.
\times This ensure that the LGD function is monotonic

Finally,

Observations:

1. cLGD is strictly monotonic with range (0, 1), for all k
1. cLGD increases slowly, and similarly for all k, at low cDR
2. Elasticity is greatest for loans wth low LGD.

## Frye-Jacobs: Develop Alternative Hypothesis

Introduce an additional sensitivity parameter to test the slope of the LGD function.

We know that:

In integration form:

Bring in the Frye-Jacobs cLGD function:

Note that EL is in both lhs and rhs, divide both EL by ELGDa:

Note that we have identified a new LGD function:

Analyzing the choice of a:

• When a = 0, the cLGD function is the Frye-Jacob formula.
• When a = 1, cLGD = ELGD, which implies cLGD does not depend on conditions:

## Frye-Jacobs: Hypothesis Test

We introduce finite portfolio, which brings randomness into the D’s and LGD^{dollar}s.

• We assume the finite portfolio is uniform and all N loans have the same PD and ρ
• We assume that given portfolio cDR, the number of defaults is binomial:
• We assume that LGD is normally distributed around cLGD, with σ = 0.2. Note under this assumption, ELGD = cLGD which correspond with a = 1.

Under finite portfolio, the probability of 0 defaults is:

When conditional on cDR and Σ D > 0, the average portfolio LGD rate is normal:

Let Y ~ N(0, 1) be a standard normal variable, then LGD becomes:

Now calculate Loss based on DR and LGD:

Use change-of-variable technique to calculate the pdf for Loss:

Where:

Finally, the pdf of loss conditional on Σ D and cDR:

Removing the conditional, the distribution of loss in a uniform portfolio, with N loans, same PD and ρ and the cLGD function, becomes:

Here is a plot of the the unconditional loss density in a finite (N = 10) portfolio in red and loss density in an infinite portfolio (Vasicek) in blue. (note that the plot use D to denote Σ D):

Now we have the pdf for loss, we an test the hypothesis:

• H0: a = 0
• H1: a = MLE Based On Moody’s Loss data

As a result MLE(a) = 0.01 based on all loan data and the test failed to reject the null. Same with other bonds and bonds/loans data combination. We conclude that the Fyre-Jacob model is consistent with Moody’s data

# Vender Estimation

## Distance-To-Default and EDF

Robert Merton argues that:

• the default of firm i depends on its asset return
• Merton asserts that a firm defaults if and only if the value of its asset drops below the value of its liability, i.e. its asset return is too low
• joint default of firm i and j depends on PD and asset return correlation

Moody’s suggests that loan contains the option to default, and attempts to use risk-neutral probability to estimate the probability of default. In the context of a put:

Under Moody’s assumption, the firm has an option to default on its assets once it drops below its liability. Here, liability is the strike price, for which Moody’s uses D, or “default point”, to denote short term debt plus half of long term debt to represent liability. DD stands for Distance-To-Default, suggested by Merton. So the probability of default is:

Moody’s then estimates the value and volatility of the assets (unobservable) based on the value and volatility of the market capitalization (observable).

However, since Φ(-DD) gave very poor estimate for the default probability, Moody’s sets the EDF(Estimated Default Frequency) of a firm equal to the average historical default rate of firms with the same Distance-To-Default. An EDF uses DD to find historical analogs of current firms.

## Correlation

Merton assumes that the correlation ρ between the latent variable Z’s is equal to the asset return correlation r.

However, data suggests that correlation estimated from credit data is less than the correlation based on asset returns. Hence a credit portfolio model that uses asset correlation to estimate ρ overstates credit risk.

# Theoretical Pricing

## FX Spot Contract

The spot price $S_t$ is the observable market price of $1$ unit of foeign currency. Let $\mathbb{F}$ denote foreign currency and $\mathbb{D}$ denote domestic currency:

A FX spot contract is an agreement where the buyer purchase $1$ units of foreign currency at a fixed rate $R$ at current time $t$.

The contract value to the buyer is:

## FX Forward Contract

Denote domestic interest rate = $r^{d}$. The price of domestic zero-coupon bond $P^d(\tau) = e^{-r^{d}\tau}$

A FX forward contract is an agreement where the buyer agree to purchase $1$ units of foreign currency at a fixed rate $R$ at future time $T$:

The time-$t$ value of a forward contract is:

We set $PV_t^{forward}=0$ to calculate the forward price $F_t$ at time $t$. The equation is also called the covered interest parity, or CIP:

### Non-Deliverable forward

Non-deliverable currency has restricted exchange by local regulations. CIP does not hold since covered interest arbitrage is not possible. For example:

Asia

• CNY: China Yuan
• TWD: New Taiwan Dollar
• KRW: South Korean Won
• INR: India Rupee
• PHP: Philippine Piso
• IDR: Indonesia Rupiah
• MYR: Malaysian Ringgit

Latin America:

• COP: Colombian Peso
• VEB: Venezuelan Bolívar
• BRL: Brazilian Real
• PEN: Peru Sol
• UYU: Uruguayan Peso
• CLP: Chilean Peso
• ARS: Argentine Peso

Europe, Middle East and Africa:

• EGP: Egyptian Pound
• KZT: Kazakhstani Tenge

Given CIP, we can calculate the implied yield, which is the foreign interest rate implied by the forward rate, domestic spot rate and domestic interest rate.

We know that the exponential function $e^x$ can be expressed as the sum of the Maclaurin series:

Applying this to the forward rate:

## FX Swap Contract

A FX swap contract contains two FX forward contracts at time $T_1, T_2$ with opposite directions.

For example, a buy/sell swap contract:

The present value of the swap contract is the sum of the present value of the two sub-contracts:

Note that the value of a swap contract is fairly insensitive to spot rate changes, comparing to that of a forward contract.

## FX Option

A FX option conveys the right, but not the obligation, to exchange $1$ units of foreign currency for $K$ units of domestic currency, at a future date $T$.

For example, the buyer of a foreign currency call strike at $K$, have the right at maturity to buy $1$ unit of $\mathbb{F}$ at $K$ even if $S_T > K$.

This is equivalent to the the buyer of $K$ units of domestic currency put strike at $1/K$, which grants the buyer the right at maturity to sell $K$ unit of $\mathbb{D}$ at a rate of $1/K$, even if the exchange rate $1/S_T$ falls below $1/K$.

In formula:

Visualizing the transactions on a foreign currency call:

Visualizing the transactions on a domestic currency put:

FX options also satisfy put-call parity:

### Garman-Kohlhagen

To evaluate the price of the option:

• Assumptions on the stochastic nature of St
• Create a “risk-free” hedge portfolio, in order to find a governing PDE for the option value, which also leads to an equivalent risk-neutral probability measure
• Solve the PDE directly, with appropriate boundary conditions

We know that if a tradable asset $S_t$ follows the geometric Brownian motion:

Applying Ito's formula any value of a derivative contract $V(S_t, t)$:

Setting the drift term to be zero as the derivative contract is tradeable, we can derive the Black-Scholes PDE equation characterize $V$ as such:

However, since the foreign exchange spot rate $S_t$ is not tradable, we need to tweak the B-S formula. Let $B^d$ and $B^f$ denote a bank account in domestic and foreign currencies, where $dB^d = r^dB^d \;dt$ and $dB^f = r^fB^f \;dt$. Construct replicating portfolio and set the drift term to be $0$, the Garman-Kohlhagen PDE equation can be derived:

Solving the PDF:

Using the Freynman-Kac equation with additional derivation, we can conclude that $\exists \; \mathbb{Q}$ s.t. the arbitrage-free price of the contingent claim $V$ is unequivocally determined as the expected value of the discounted final payoff under $\mathbb{Q}$, and $S_t$ obeys the stochastic differential equation:

# Practical Pricing

## FX Spot Contract

The trade date is when the terms of the transaction are agreed, and the value date is when transaction occurs, which is trade date$+2$ for most currency pairs.

The spot rate quote $EURUSD = 1.2$ means:

• $1\;EUR = 1.2\;USD$, i.e. higher the $EURUSD$, stronger the $EUR$.
• $EUR$ is the base currency and is set to 1 unit, whereas $USD$ is the numeraire currency which is used as the numeraire.

The bid-offer spread $EURUSD = 1.199 / 1.201$ means:

• The dealer is willing to buy $1\;EUR$ for $1.199\;USD$
• The dealer is willing to sell $1\;EUR$ for $1.201\;USD$

Equivalently:

• The highest price YOU can sell $1\;EUR$ is $1.199\;USD$
• The lowest price YOU can buy $1\;EUR$ is $1.201\;USD$

## FX Forward Contract

The forward point is commonly expressed in the unit pip, or point in percentage, that is worth $0.01\%$.

Example 1 When selling a forward for foreign currency $\mathbb{F}$, the bid side spot rate plus bid side forward points shall be equal to the bid side outright forward rate.

A market-maker would construct the short $\mathbb{F}$ forward as follow. Note that borrowing $\mathbb{F}$ and lending $\mathbb{D}$ correspond to selling a forward and therefore the bid-side forward point.

Time Transactions
$t = 0$ borrow $e^{-r^f_{offer}T}\mathbb{F}$
execute a short $\mathbb{F}$ spot contract
lend $S_te^{-r^f_{offer}T}\mathbb{D}$
$t = T$ receive $S_te^{(r^d_{bid}-r^f_{offer})T}\mathbb{D}$
execute a long $\mathbb{F}$ spot contract
pay $1\mathbb{F}$

This is the same as selling an outright forward contract:

Time Transactions
$t = 0$ N/A
$t = T$ receive $F_t\mathbb{D}$
pay $1\mathbb{F}$

## FX Swap Contract

A FX swap contract intends to adjust the timing of cash flows from $T_1$ to $T_2$ and alter the value date on an existing trade. The near rate should be consistent with the market forward rate for the near date, and the same goes for the far rate. The swap point is equal to:

A buy/sell swap on $\mathbb{F}$ means that it buys a forward on $\mathbb{F}$ at $T_1$ and sells a forward on $mathbb{F}$ at $T_2$. This correspond to borrowing $\mathbb{F}$ and lending $\mathbb{D}$.

Example 2 A short outright forward position on $\mathbb{F}$ can be thought of as a buy/sell swap on $\mathbb{F}$ with a spot transaction at the near date and $T_1=0$, similar to Example 1. Here $\tau = T_2 - T_1$:

Time Transactions
$t = T_1$ borrow $e^{-r^f_{offer}\tau}\mathbb{F}$
execute a short $\mathbb{F}$ forward contract:
$\;\;-\;\;$ pay $e^{-r^f_{offer}\tau}\mathbb{F}$
$\;\;-\;\;$ receive $F_{T_1}e^{-r^f_{offer}\tau}\mathbb{D}$
lend $F_{T_1}e^{-r^f_{offer}\tau}\mathbb{D}$
$t = T_2$ receive $F_{T_1}e^{(r^d_{bid}-r^f_{offer})\tau}\mathbb{D}$
execute a long $\mathbb{F}$ forward contract:
$\;\;-\;\;$ pay $F_{T_1}e^{(r^d_{bid}-r^f_{offer})\tau}\mathbb{D}$
$\;\;-\;\;$ receive $(1/F_{T_2})F_{T_1}e^{(r^d_{bid}-r^f_{offer})\tau}\mathbb{F}$
pay $1\mathbb{F}$

This is the same as a buy/sell swap:

Time Transactions
$t = T_1$ recieve $e^{-r^f_{offer}\tau}\mathbb{F}$
pay $F_{T_1}e^{-r^f_{offer}\tau}\mathbb{D}$
$t = T_2$ receive $F_{T_2}\mathbb{D}$
pay $1\mathbb{F}$

Example 3 From a market-maker perspective:

Contract Swap Point T1 T2
Buy/Sell offer-side swap point pay at bid-side points sell at offer-side points
Sell/Buy bid-side swap point sell at bid-side$\ast$ points pay at bid-side points

Note($\ast$): because a swap has less interest rate risk than an outright forward, the market-maker can easily construct a swap with bid-side points for both near and far dates.

Example 4 Say the swap point is $-0.01$, then a party that buy/sell the foreign currency $\mathbb{F}$ is paying the swap point, because it is selling at a lower Far rate.

Conversely, a party that sell/buy $\mathbb{F}$ is earning the swap point.

### Risk Characteristics

Contract Transactions FX Risk IR Spread Risk
Spot 1 Yes No
Forward (Outright) 1 Yes Yes
Swap 1 No Yes

## FX Option

There are four ways to express an option price:

Price $\rightarrow$ in $\mathbb{D}$ units in $\mathbb{F}$ units
Notional as $1\mathbb{F}$ $P_{numccy}$
$=\text{Garman-Kohlhagen}$
$\rightarrow\mathbb{D}\text{ pips}$
$P_{baseccy\%}$
$=P_{numccy}/S_t$
$\rightarrow\mathbb{F}\text{ %}$
Notional as $1\mathbb{D}$ $P_{numccy\%}$
$=P_{numccy}/K$
$\rightarrow\mathbb{D}\text{ %}$
$P_{baseccy}$
$=P_{numccy\%}/S_t$
$\rightarrow\mathbb{F}\text{ pips}$

The meaning of $ATM$ can be different:

• $ATMS$: at the spot rate
• $ATMF$: at the forward rate (preferred by traders)
• $DNS$: delta-neutral

### Risk Reversal

Where a $25$-delta option is an option with a delta of $\pm25\%$. Risk reversal can also denote the difference in implied volatility:

### Butterfly

Note that butterfly is vega ($\partial V/\partial \sigma$) neutral, e.e. the strangle notional is usually larger than the straddle notional to create equal and offestting vega . BF can also denote the difference in implied volatility:

Under the Black-Scholes framework, delta-netural strike ($K=Se^{\sigma^2/2}$) options have the highest vega $\mathcal{V}$:

In addition, option gamma $\Gamma = \partial^2 V/\partial S^2 = \mathcal{V}/(S^2\sigma T)$

# 📖 C++↺

C++ is a complied （vs interpreted: python), general-purpose (vs domain-specific: HTML) programming language created by Danish programmer Bjarne Stroustrup as an extension to C.

# Basic

## Compiler

A compiler translate a high level language into a low level language and create an executable program.

1. Pre-processor: read preprocessing lines #include "foo.hpp"
2. Compiler: turn the above code it into assembly code (ASM).
• front end create IR (intermediate representation) with SSA (static singale assignment). The runtime is $O(n)$.
• middle end optimize IR. remove unnecessary operations, $O(n^2)$ or more.
• back end produce ASM
3. Assembler: turn ASM into binary code
5. Debugger: type checking
6. Object Copy: generate .exe (for windows), and .bin (for mac)

### G++

Compile with g++ at the command line:

Running the complied result:

The C++ standard library is a collection of classes and functions, represented by different headers. For example, include the <iostream> header to handle input and outputs and other non-standard headers using double quoto.

### Guards

In C++, function, class and variable can only be declared once. We use guards to make sure we do not duplicate declaration in multiple files.

## Namespace

Some classes and functions are grouped under the same name, which divides the global scope into sub-scopes, each with its own namespaces.

Functions and classes in the C++ standard library are defined in the std namespace. For example, the cin (standard input), cout (standard output) and end (end line) objects.

Alternatively, we can use using namespace std;.

## Data Type

Every variable has to have a type in C++, and the type has to be declared and cannot be changed. There are fundamental types and user-defined types (classes)

Characters In computer, each bit stores a binary (0/1) value. A byte is 8 bits. The computer stores characters in a byte using the ASCII format.

Numbers The computer stores numbers in binary format with bits. The leftmost bit is used to store the sign of a number. (See twos-complement method). Real values are stored using a mantissa and an exponent:

Note that very few values can be exactly represented, and how close we can get depends on the number of bits available.

Type Size (Bytes) Value Range
bool 1 true or false
char 1 -128 to 127
short 2 -32,768 to 32,767
int 4 -2,147,483,648 to 2,147,483,647
float 4 3.4E +/- 38
double 8 1.7E +/- 308

C++ is a strongly typed language, which means type errors needs to be resolved for all variables at compile time.

## Function

Every console application has to have a main() function, which takes no argument and returns an integer value by default.

A function that adds two numbers:

Overloading allows 2 or more functions to have the same name, but they must have different input argument types.

### Function Object

Function object, or functors, are objects that behave like functions, are functions with state.

A regular function looks like this:

A function object implementaion:

### Lambda

Lambdas is a new feature introduced in C++11, which is an inline function that can be used as a parameter or local object.

Example 1

Example 2

Example 3

### Extern

The keyword extern means the function is declared in another file.

### Inline Function

C++ provides inline funcitons such that the overhead of a small function can be reduced. When inline function is called the entire code of the function is inserted at the point of the inline function call.

## Typedef

Use typedef keyword to define a type alias.

## Operators

Standard operations:

Note the difference between i++ and ++i

## Const

Use the const keyword to define a constant value. The compiler will stop any attempt to alter the constant values.

Since C++ is a strongly typed language, it is preferred to use const int N = 4, instead of #define N 4, as the former defines a type.

## Reference

Example 1 A reference is an alias for a variable and cannot rebind to a different variable. We can change val by changing ref:

Example 2 We can also bind a const reference to a const object. An error will be raised if attempt to change the value or the reference.

Example 3 We can also bind a const reference to a non-const object, thereafter we can NOT change the object using the reference.

Pass By Value In a function, we can pass an argument by either value or reference. When passing by value, the variable x will NOT be changed. In this case, we waste time to both create a copy inside the function and memory to store the copy

Pass By Reference When passing by reference (by adding & in the function argument parameter), the variable x WILL be changed.

Pass By Const Reference We add const when we do not want the specific function argument to be tempered when passed by reference. In this example, there will be a compiler error as we are trying to change the const reference number in the function.

## Pointer

In computer memory, each stored values has an address associated with it. We use a pointer object to store address of another object and access it indirectly.

There are two pointer operator:

1. &: address of operator, used to get the address of an object
2. *: de-reference operator, used to access the object

Example 1

Example 2 If the object is const, a pointer cannot be used to change it.

Example 3 You can have a pointer that itself is const

## Casting

C++ allows implicit and explicit conversions of types.

However, the traditional explicit type-casting allows conversions between any types, and leads to run-time error. To control these conversions, we introduce four specific casting operators:

• dynamic_cast<new_type>( ): used only with pointers (and/or references to objects); can cast a derived class to its base class; base-to-derived conversions are allowed only with polymorphic base class

• static_cast < new_type>( ): used only with pointers (and/or references to objects); can cast base-to-derived or derived-to-base, but no safety check at run-time;

• reinterpret_cast <new_type>( ): convert pointer to another unrelated class; often lead to unsafe de-referencing

• const_cast <new_type>( ): remove/set the constant-ness of an object

## Array (C-Style)

An array is a fixed collection of similar kinds of items that are stored in a contiguous block in memory. We define the size of the array at creation, and the array index starts a 0 in C++.

The address of the array is the same as the address of the first element of the array. Therefore, we can access an array using pointer increment - very efficient.

## Dynamic Allocation

Dynamic memory allocation is necessary when you do NOT know the size of the array at compile time. We use a new keyword paired with a delete keyword.

Dynamic allocate a $4\times4$ matrix with cast.

## Library

A C++ library is a package of reusable code typically with these two components:

• precompiled binary containing the machine code for functionality implemntation

There are two types of c++ libraries: static and dynamic libraries.

• a static library has a .a (.lib on Windows) extension and the library codes are complied as part of the executable - so that user only need to distribute the executable for other users to run the file with a static library.
• a dynamic library has a .so (.dll on Windows) extension and is loaded at run times. It saves space as many program can share a copy of dynamic library code, and it can be upgraded to new versions without replacing all the executables using it.

# Condition

## Switch

A switch statement tests an integral or enum value against a set of constants. we can NOT use a string in the switch statement.

## While / Do While / For Loop

While loop:

Do while loop:

For loop:

For loop with two variables:

## Enum

The enum (enumerated) type is used to define collections of named integar constants.

# Class

A class achieve data abstraction and encapsulation.

• abstraction refers to the separation of interface and implementation
• encapsulation refers to combining data and functions so that data is only accessible through functions.

## Member Variable & Function

Define a customer class with member variable and function.

Instantiate Customer class instances to represent different customer.

## Protection Level

There are three protection levels to keep class data member internal to the class.

1. public accessible to all.
2. protected accessible in the class that defines them and in classes that inherit from that class.
3. private only accessible within the class defining them.

## Constructor / Destructor

A constructor is a special member functions used to initialize the data members when an object is created. This is an example to use initializer list to create more efficient constructors

## Free-Store

There are several ways to create objects on a computer:

• Automatic/Stack int a;

• Dynamic Allocated

• Free Store int* ptr = new a[10];
• Heap allocated/freed by malloc/free

Summarized in a table from geeksforgeeks

Parameter Stack Heap
Basic Memory is allocated in a contiguous block Memory is allocated in any random order
Allocated and de-allocation Automatic by compiler instructions Manual by programmer
Cost Less More
Access time Faster Slower
Main issue Shortage of memory Memory leak/fragmentation

We use -> to access free-store object’s member functions:

## Const Member Functions

A const object can only invoke const member function on the class. A const member function is not allowed to modify any of the data members on the object on which it is invoked. However, if a data member is marked mutable, it then can be modified inside a const member function.

## Static Member

We use static keyword to associate a member with the class, as oppose to class instances. A static data member can NOT be accessed directly using a non-static member function.

Static member variables can NOT be initialized through the class constructor, rather, they are initialized once outside the class body. However, a const static member variable can be initialized within the class body.

## This

Every non-static member function has access to a this pointer, which is initialized with the address of the object when the member function is invoked.

## Copy Constructor

We use the copy constructor to construct an object from another already constructed object of the same type.

## Assignment Operator

We use the assignment operator to assign an object of the same type.

## Shallow / Deep Copy

The default copy constructor and assignment operator provides shallow copy, which copies each member of the class individually. For pointer member, the shallow copying copies the address of the pointer, resulting in both members pointing to the same object on the free store.

A deep copy, however, creates a new object on the free store and copy the contents of the object the original pointer is pointing to.

Deep Copy copy constructor

Deep Copy assignment operator

## The Rule of 3

There are 3 operations that control the copies of an object: copy constructor, assignment operator, and destructor. If you define one of them, you will most likely need to define the other two as well.

## Singleten Class

The Singleton design pattern makes sure only one instance of an object of a given type is instantiated in a program, and provides a global point of access to it

1. change the access level of the constructor to private
2. add new public member function Instance() to create the object
3. use static member variable to hold the object

## Inheritance

Classes related by inheritance form a hierachy consisting of base and derived classes. The derived class inherit some members from the base class subject to protection level restrictions, and may extend/override implementation of member functions in the base class.

## Virtual

Different derived classes may inplement member functions from the base class differently. The base class uses virtual keyword to indicate a member function that may be specialized by derived classes.

## Abstract Class

The base class has to either provide a default implementation for that function or declare it pure virtual. If a class has one or more pure virtual function, it is called an abstract class or interface. An abstract class cannot be instantiated.

## Virtual Destructor

When we delete a derived class we should execute both the derived class destructor and the base class destructor. A virtual base class destructor is needed to make sure the destructors are called properly when a derived class object is deleted through a pointer to a base class.

If we delete a derived class object through a pointer to a base class when the base class destructor is non-virtual, the result is undefined.

## Polymorphism

The types related by inheritance are known as polymorphic. types. We can use polymorphic types interchangeably.

We can use a pointer or a reference to a base class object to point to an object of a derived class – this is known as the Liskov Substitution Principle (LSP). This allows us to write code without needing to know the dynamic type of an object

We can write one function which applies to all account types.

# Standard Template Library (STL)

## Sequential Container

### std::array

The STL array class from offers a more efficient and reliable alternative for C-style arrays, where size is known and we do not have to pass size of array as separate parameter.

### std::vector

Vectors are the stored contiguously same as dynamic arrays with the ability to resize itself automatically when an element is inserted or deleted. Vector size is double whenever half is reached.

### std::list

Different from arrays and vectors, A list is a sequential container that allows non-contiguous memory allocation.

### std::string

The STL string class stores the characters as a sequence of bytes, allowing access to single byte character. Any string is terminated by a \0, so the string foo actually stores four characters.

### size()

The use sizeof() to return the size of an array in bytes. Use .size() member function to return the number of elements in a STL container.

## Associative Container

### std::set

Sets are an associative container where each element is unique. The value of the element cannot be modified once it is added to the set.

### std::map

A std::map sorts its elements by the keys.

## Algorithm

The STL provides implementations of some widely used algorithms.

• <algorithms> header: sorting, searching, copying, modifying elements

## Smart Pointer

### std::unique_ptr

A unique pointer takes unique ownership in its pointed object. The unique pointer delete the object they managed either when the unique pointer is destroyed or when the object’s value changes.

### std::shared_ptr

The shared pointer counts the reference to its pointed object and can store and pass a reference beyond the scope of a function. In OOP, the share pointer is used to store a pointer as a member variable and can be used to reference value outside the scope of the class.

Creating a vector of shared_ptr:

### std::weak_ptr

A weak_ptr works the same as shared pointer, but will not increment the reference count.

# Parallel Processing

A thread is a small sequence of programmed instruction and is usually a component of a process. Multi-threading can exist within one process, executing concurrently and share resources such as memory, while processes do not share their resources.

The std::thread class in c++ supports multi-threading, and can be initiated to represent a single thread. We need to pass a callable object (function pointer, function, or lambda) to the constructor of the std::thread class. We use the std::thread.join() method to wait for the copmletion of a thread.

Here we initiate two threads. Both threads share memory and attempt to modify the balance variable at the same time which lead to concurrency issue.

We introduce the an mutex, or mutual exclusive, object, which contains a unique id for the resources allocated to the program. A thread can lock the resource by a std::mutex.lock() method, which prevent other thread from sharing the resource until the mutex becomes unlocked.

## Condition Variable

A condition variable is an object that can block the calling thread until notified to resume. It uses a unique_lock (over a mutex) to lock the thread when one of its wait functions is called.

Reference:

• Stochastic Calculus: An Introduction with Applications, Gregory F. Lawler
• FINM 32000, 33000, 34500, 36700, 36702, 322 Lecture Notes, the University of Chicago