📖 Notes on Financial Mathematics
01. Portfolio Theory
02. Option Theory
03. Stochastic Calculus
04. Credit Risk Model
05. Foreign Exchange
06. C++
📖 Portfolio Theory ^{↺}
Portfolio Diversification
TwoAsset Portfolio
Consider an investment portfolio on two assets:
We can calculate the mean and variance of the return on the portfolio, based on the mean and variance of the return on each asset.
We can see that if , we have diversification
, where is linear in the portfolio allocation while the standard deviation is convex
.
When , the portfolio variance can be as small as desired. If we set then and the portfolio becomes riskless.
In this following chart we can see a twoasset portfolio return volatility plotted against different weights and correlation.
We see that to as long as the correlation is not perfect between the two assets, there exist a weight that would minimize the portfolio variance (not taking portfolio return into consideration, yet)
MultiAsset Portfolio
Consider assets with return volatility and covariance . Let denote the allocation to asset . Then the variance of portfolio return becomes:
In the case of an equallyweighted portfolio with .
We define:
Therefore,
We conclude that in a equally weighted portfolio
(or a diversified portfolio where ) with a large
number of assets,
 the individual asset return variances become unimportant to the portfolio return variance
 the portfolio variance instead now depends on the
average covariances
between the assets.
Here, the average return covariance is the systematic risk
that cannot be eliminated through diversification, whereas the average return volatility is the idiosyncratic risk
that is diversifiable.
Note that complete diversification () is achieved when:
 and for a multiasset portfolio
 for a twoasset portfolio
MeanVariance Frontier
In a meanvariance space, the set of all possible portfolio with assets form a convex set
. The bounday of this set is known as the meanvariance frontier
and forms a parabola.
The top half of the MV frontier is the set of efficient MV portfolios
which maximize mean return given return variance.
Let us define as the random variable vector of asset returns on assets:
A particular portfolio is defined by the weights assigned to various assets, and we denote the weights vector . The portfolio return is also a random variable, where:
Also,
The GMV and Tangent Portfolios
The Global Minimum Variance
(GMV) portfolio has the lowest return variance
among all possible portfolios, characterized by the leftmost point on the MV frontier. It can be constructed with weight which minimizes the objective function under the constraint of :
The Tangent
portfolio is a portfolio with the highest mean/variance ratio
among all possible portfolios, characterized by the point that is tangent to the MV frontier and going through the origin:
The MV Portfolio
It turns out that any portfolios on the efficient MV frontier can be constructed based off a linear combination of the GMV and tangent portfolios, which solves the following optimization:
Thus a portfolio is a MV portfolio
if and only if there exists such that:
Consider MV investors
that only focus on the mean and variance of a portfolio, then such investors will only hold MV portfolios (which are linear combination of two funds only).
Excess Return with RiskFree Asset
Consider the existence of a riskfree asset with return that has zero variance and correlation with other assets. The mean excess return
is defined as:
And the mean excess return of a portfolio with weight is:
Since the riskfree asset has no variance, the return variance of the portfolio is still
A MV portfolio
with a riskfree asset is a vector which solves the following optimization:
Note that the constraint that weights sum up to is now dropped, with the inclusion of the riskfree asset.
Thus a portfolio with mean excess return is a MV portfolio
with a riskfree asset if:
Where,
This result show that with a riskfree asset, any MV portfolio simply contains a position in the tangency portfolio and a position in the riskless asset (a.k.a the Two Fund Separation
).
Interesting facts w.r.t. the tangency portfolio :
 is the
unique
portfolio that is on both the risky and riskless MV frontiers  is the point on the
risky MV frontier
at which the tangency line goes through point (0, riskfree rate).
Sharpe Ratio
We define the Sharpe ratio
(SR) of a portfolio as:
Therefore, the tangency portfolio is the portfolio on the risky MV frontier with the maximum Sharpe ratio.
On the riskfree efficient MV frontier (a.k.a the Capital Market Line
) all portfolio has the same SR the tangency portfolio, since the frontier itself is a straight line.
Sortino Ratio
The Sortino ratio
improves upon the Sharpe ratio by penalizing the downside volatility only.
Linear Factor Model
Based on the First Fundamental Theorem of Asset Pricing, given no arbitrage there exists a risk neutral probability measure and a change of measure (RN derivative) such that for any tradable asset ,
proposition is a linear function of :
Such that all portfolio returns have a factorbeta
representation w.r.t. the tangency portfolio,
By mathematical identity, this will hold in sample exactly.
proof Consider the tangency portfolio from the riskfree MV frontier:
Therefore,
And since,
We can show that,
In addition, the covariance can be shown as,
Thus,
Genaralization
The factorbeta representation is not unique to the tengancy portfolio. In fact, it holds for any arbitraty MV portfolio
,
We will focus on the tangancy portfolio, W.L.O.G.
Practical Consideration
The factorbeta representation seems to provde a way to estimate the mean return for any given portfolio. However, it difficult to calculate the tangency weight in practice, due to either circularity (direct estimation) or imprecision (inverting
The Linear Factor Model
makes an assumption regarding the identify of the tangency portfolio, which avoids the issues stated above.
The LFM assumed tangency portfolio is only used for the pricing of expected returns
. However, additional assumptions can be made regarding investor’s MV preference, such that the assumed tangency portfolio will also be used in actual asset allocation
.
CAPM
The most famous LFM is the Capital Asset Pricing Model
, which assumes a valueweighted market portfolio of all available assets as the tangency portfolio.
The CAPM is a relative pricing formula, which states that the expected return of any asset can be expressed as the sum of the riskfree rate and a portion of the market risk premium
. In other words, it says that the expected excess return/risk premium
of an asset is proportional to the market risk premium. The factor is estimated based on regression. CAPM also assets that market beta is the only
risk associated with higher average returns, and that volatility, skewness, and other covariances do not matter in determining risk premium.
We can also rewrite the formula as follow:
This shows that the Sharpe ratio earned on an asset depends only on the correlation between asset return and market returns.
There are two ways to derive CAPM:
 If we assume that
returns
are jointly normal, then the mean and variance are the sufficient statistics for the return distribution, and thus every investor holds a portfolio on the riskless MV frontier, which is a combination of the tangency portfolio and the riskfree asset. Therefore aggregating across all investors, the market portfolio of all investments is the tangency portfolio.  If we do not assume jointly normal returns, but instead that investors only care about mean and variance of returns. In this case all investor will also choose MV portfolios, and therefore CAPM holds.
Treynor’s Ratio
FamaFrench Model
The FamaFrench 3factor model
is a wellknown multifactor models:
Where is the excess market return as in CAPM, is a portfolio that goes long small stocks
and shorts large stocks, and is a portfolio that goes long value
stocks (low market price per fundamental) and shorts growth stocks.
The FF model states that beta
to value and small stocks earn premium, NOT
being
a value or small stock. In other words, the premium is earned on how a stock acts, not how it is classified.
Continue on L3
📖 Option Theory ^{↺}
This is a study note on the fundamental theory of the pricing of a financial derivative
, whose payoff is defined in terms of an underlying asset. We hereby try to compute a consistent price of the derivative in relative terms to
the market price of the underlying asset.
Option Pricing Theory
We make our first assumption that the market is frictionless
, by which we mean that:
 no transaction cost (commission, bidask spread, taxes)
 can hold negative asset (shortting) and there is no margin constraint
 can hold fractional asset
 no market impact from trading
Arbitrage (Static Portfolio)
We assume that the market lives in a probability space and it includes tradable assets
with nonrandom time prices and random time prices:
A static portfolio
is a vector of quantities, where each is nonrandom and constant in time:
Thus the time value of the static portfolio is;
A static portfolio is an arbitrage
if its value satisfies that:
Suppose portfolio superreplicates
portfolio , which means that . Then , otherwise arbitrage exists. Same goes if it is a subreplication
. Therefore, if replicate
, which menas that , then . This is called the law of one price
.
Assets
Discount Bond
A discount bond
pays at maturity . Given nonrandom interest rate , the noarbitrage price of the discount bond is:
Forward Contract
A forward contract
on with nonrandom delivery price obligates its holder to pay and receive at time . The time value of the forward contract is .
A forward price
is delivery price
such that the value of forward contract at time is zero.
European Call Option
An European call option
gives its holder the right at time to pay and receive . A call has payoff , and it is in the money
if at time .
The time price of a call option satisfies:
For strike :
European Put Option
An European put option
gives its holder the right at time to pay and receive . A put has payoff , and it is in the money
if at time .
The time price of a put option satisfies:
For strike :
In addition,
PutCall Parity
Binomial Tree
We can create a replicating portfolio
to calculate the value of a call option under a simple binomial tree:
Where,
And,
Plugging in and :
We can interpret and as probabilities that construct a riskneutal measure and that:
The Fundamental Theorem
The fundamental theorem of asset pricing
states that:
no arbitrage
if and only if:
there exists a probability measure equivalent to P such that the discounted prices of all tradable assets are martingales w.r.t.
The proof can be summarized as two ideas:
 :
a martingale is the cumulative P&L from betting on zero games, which is always zero no matter how you vary your bet size across games and time. you cannot riskless make something from nothing.
 :
the probability of an event is simply the price of an asset that pays 1 unit of B iff that event happen
RiskNeutral Measure
The physical probability is not accurate in evaluating a payoff’s true market price. Considering a 50/50 coin flip worth or nothing. Using physical probability the price will be .
However, the actual market
price would be different. If the market is riskadverse, the price would be lower, say . We can view it as this market represents a riskneutral measure where the down move has higher riskneutral probabilities than up move.
We can see that the riskneutral probability is
price, that the riskneutral probability of an event is the price of oneunit payout contingent on the event. Taking a riskneutral expectation is the same as pricing by replication.
RadonNikodym derivative
In a discrete settimgn with outcomes , the relatioship between the riskneutral measure and physical measure can be expressed by the RadonNikodym Derivative
, or liklehood ratio:
The LR is typically larger in bad states than good states, reflecting the price margin on adverse events.
The Second Fundamental Theorem
A market is said to be complete
if every random variable can be replicated by a static portfolio .
The second fundamental theorem of asset pricing
states that:
a no arbitrage market is complete
if and only if:
there exists a unqiue measure equivalent to P such that the discounted prices of all tradable assets are martingales w.r.t.
Trading Strategy
A filtration
represents all information revealed at or before time . A stochastic process is adapted
to if is measurable
for each , meaning that the value of is determined by the information in .
A trading strategy
is a sequence of static strategy adapted to . A trading strategy is selffinancing
if for all :
This implies that the change in the portfolio value is fully attributable to gains and losses in asset prices:
Therefore,
We define that a trading strategy replicates
a timeT payoff if it is selffinancing
and the value . By the law of one price
, at any time , the noarbitrage price of an asset paying must have the same value of the replicating portfolio.
Arbitrage (Trading Strategy)
We now expand on the previous definition of arbitrage, that an arbitrage
is a selffinance trading strategy whose value satisfies:
Ito Process
We define an Ito process
to be a stochastic process that:
The existence and uniqueness of a solution of can be guaranteed by Lipschitztype technical condition on and
Ito’s Rule
The Ito's rule
states that give n an Ito process , and a sufficiently smooth function :
With two processes and , and :
In a special case where , the formula becomes:
Note that the Ito’s Rule applies under any probability measure, it is purely math.
BlackScholes Model
Assumptions Consider two basic assets and in continuous time, where:
And follows GBM dynamics,
Conclusion Then by noarbitrage
and Ito's rule
, the time price of a call option with payoff satisfies the BlackScholes PDE
for
We can solve the call price analytically with the BlackScholes formula
:
Here we plotted the BS call price , the intrinsic value and the lower bound against the current underlying price , with paramters , , and
The Greeks
Delta
Suppose an asset has a time t value , then its Delta
at time is . Delta can be interpreted as:
 the slope of the asset value , plotted as a function of S_t.
 how much the asset value movies per unit move in
 humber of needed to replicate this asset.
If the asset is a call option on and we assumes the BlackScholes assumptions
on , then:
The Delta
of a call option is strictly between 0 and 1
. As the timetomaturity decreases, the Delta increases faster the the option becomes more ITM. Here we plotted the BS Delta for equals and against the current underlying price .
Gamma
For a call option in a BS model,
In this case, the Gamma can be interpreted as:
 the convextity of w.r.t. S_t
 how much the Delta moves, per unit move in
 how much rebalancing of the replicating portfolio is needed, per unit move in
The Gamma
of a call option is strictly positive
. As the timetomaturity decreases, the Gamma increases for ATM options. Here we plotted the BS Delta for equals and against the current underlying price .
Theta
For a call in BS model,
The Theta
of a call option is strictly negative
. As the timetomaturity decreases, the Theta decreases for ATM options (faster timedecay). Here we plotted the BS Theta for equals and against the current underlying price .
Discrete Delta Hedge and Gamma Scalping
A discretely Deltahedged
portfolio could buy and short . In this case it is a Delta neutral
and long Gamma
/Gamma scalping
portfolio:
 Delta of the portfolio is
 Gamma of the portfolio is positive
 achieve net profit only if the
realized volatility
of ishigh
enough to overcometime decay
, otherwise portfolio loss happens. This is the opposite from ashort Gamma
position, e.g. sell and long Delta
We can visualize the P&L of a long Gamma portfolio in the following graph, where the green area indicate profits and the red area indicate losses. The curved line is the straight line is . As increases, shifts downwards due to timedecay.
In addition, we can show that the P&L of such portfolio does not depend on the drift of the stock:
Continue on L5
Numerical Methods
The Taylor series
of a real or complex value function that is differentiable at is:
Implied Volatility
Given the time price of a European call option on a nondividend stock , the time Black Scholes implied volatility
is the unique solution to .
Uniqueness is because is strictly increasing in and Existence is because covers the full range of arbitragefree prices of the European option
If follows the SDE dynamic , where a nonrandom function of , then we can first find the implied volatility given call prices with different maturity , and use the equation below to find (not uniquely) the true function :
Volatility Smile, Skew and Surface
If truely follows GBM with constant volatility , then . However, empirically the is lower when (volatility smile
), possibly because
 the market price option using a riskneutral distribution of logreturns with fatter tails than Normal
Note that is also higher when (volatility skew
), possibly due to:
 instantaneous volatility increases as price decreases
 possibility of severe crash fuels demand for downside protection
In addition, the has a term structure and varies for different . The function is call the implied volatility surface
Tree Model
Binomial Tree
European Option
Given option price at the th node , we can induct backward to find :
American Option  Put
Given option price at the th node , we can induct backward to find :
American Option  Call
Given option price at the th node . If and stock dividend , then it is never
optimal to exercise early on an American call option. Therefore
Argument 1
At all , the American call is worth more than the exercise payoff :
Argument 2
If then construct portfolio . Then V is an arbitrage as and .
Trinomial Tree
Let and choose to improve accruacy.
Finite Difference Model
Explicit Scheme
Inducting backward from to :
Solving for the BS PDE: where , we get:
Where:
Note that are trinomial tree probabilities.
Implicit Scheme
Inducting backward from to :
Solving the requires solutions of a system of equation with unknowns.
CrankNicolson Scheme
Inducting backward from to :
If given terminal conditions, then we know ‘s and can solve for .
Monte Carlo Model
Given be a discounted payoff and the time price of the payoff . The Monte Carlo estimator
of :
By the strong law of large numbers
, the sample average converges almost surely to the expected value as . By the central limit theorem
:
Often times we need to estimate with sample estimator for the variance of :
The standard error
, and a confident interval for is
Variance Reduction Techniques
Antithetic Variate
Let . The antithetic variate estimoator
:
Control Variate
A control variate
is a random variable, correlated to such that has an explicit formula.
Example
Let be the discounted payoff on a call on where . We can choose to be the discounted payoff on a call on where , in which case can be calculated explicitely through BS formula given constant close to .
The control variate estimator
estimates by simulating .
Choose to minimize , we get:
Note that when using sample estimate , the estimated is biased, only when is small.
Importance Sampling
Suppose are IID draws from density , and . Ordinary Monte Carlo estimator provides:
With importance sampling, find s.t. iff . Then redraw from density and the importance sampling estimator
is:
Conditional Monte Carlo
Given a random variable :
The condintional Monte Carlo estimator
:
Fourier Transform Model
Given be integrable, meaning . The Fourier transform
of is the function defined by:
Theorem
If is also integrable, then the inversion formula
holds:
Characteristic Function
The complex conjugate
of a complex number is given by . so .
The characteristic function
of any random variable is the function defined by:
Therefore if has density , then . A characteristic function uniquely
identifies a distribution. For example, , if
 To calculate the
moments
of using CF, take the derivatives of w.r.t. :
 To calculate the
CDF
of using CF:
 To calculate
assetornothing
call price using CF, given be the asset share price, define the share measure with likelihood ratio .
Therefore for any , the assetornothing call price:
 To calculate a vanilla
European
call price on struck at with :
Heston Model
Provided that:
Where and are BM with correlation , is the rate of meanreversion, is the longterm mean, and is the volatility of volatility.
We want to find the CF of in order to price options on . The time conditional Heston CF
provides an answer:
📖 Stochastic Calculus ^{↺}
Discrete Time Martingales
Conditional expectation
Definition A Borel set is any set in a topological space that can be formed from open sets through the operations of:
 complement
 countable union
 countable intersection
Definition Let be a random vector and be a integrable random variable with . The conditional expectation of given is the unique measureable function such that for every Borel set :
We denote as
Example 1 Suppose random variable and are discrete.
Example 2 Suppose random variable and are continuous, with joint probability density function and marginal density and .
Here are some basic properties of conditional expectation:
 Linearity:
 Constant: if , then
 Independence: if is independent of , then
 Tower Property: if then
 Factorization Property: if Z is measurable then
 Monotonicity: if , then a.s.
Theory
Definition A algebra is a collection of subsets of a Borel set , that is closed under:
 complement, e.g. if , then
 countable unions, e.g. if , then
Definition is the set of all measurable squareintegrable random variable , with finite 2nd moment .
Definition A real Hilbert space is a real vector space with an inner product , such that is a complete metric space w.r.t. to the metric , where:
Hilbert space examples: , with inner product . Or, , with inner product . The reason we are interested at rather than for other is that the innner product give rise of orthogonality.
Proposition If , then for any algebra , the conditional expectation is the orthogonal projection of X onto , such that:
Also, can be interpreted as a measurable random variable that minimizes the mean square error .
Martingales
Definition A filtration is an increasing sequence of algebra , where is the algebra of all events.
Definition A martingale is a sequence of measurable integrable random variable such that:
The tower property implies that .
Example 1 Given I.I.D. random variable with and variance .
 Sequence , and
 Sequence
are both martingales.
Example 2 Let be any random variable and be any filtration. Then the sequence is a closed martingales.
Note that the St. Petersburg martingale is not closed, where and and . This is because .
Example 3 Given I.I.D. random variable with moment generating function . Then the exponential martingales is a positive martingale with definition:
Doob’s Indentity
Definition A sequence of random variables is predictable with respect to filtration if is measurable with respect to
Definition A sequence of random variables is adapted to filtration if is measurable with respect to
Proposition If is a martingale with and is a predictable sequence of bounded random variables, then the martingale transform is a martingale:
Definition A stopping time with respect to filtration is a random variable such that
Lemma Let be a stopping time, then the sequence is predictable.
Theorem Let be a martingale and be a stopping time. For all , the Doob’s Identity states that . Note that if is bounded for all , DCT shows that .
Proof. is a martingale:
Theorem Let be a sequence functions on measure space that converge pointwise to a function f. For ,
The Dominated Convergence Theroem (DCT) requires to be dominated by an integrable function :
The Monotone Convergence Theroem (MCT) requires to be monotone (increasing or decreasing): or
Example 1 Let be a simple random walk with . Let stopping time , where .
We know that is a martingale and . Apply Doobs’s Identity and DCT we have:
We know that is a martingale. Apply Doobs’s Identity we have . Since is bounded by and is monotone, apply DCT on the RHS and MCT on LHS we get:
Combine both results we can get some interesting result for the Gambler’s Ruin problem:
Example 2 Let be a simple random walk. Let stopping time , where . Note that now DCT fails as is not bounded. Hence .
In fact, because :
Doob’s Maximal Inequality
Definition An adapted sequence of random variable is a:
 submartingale if
 supermartingale if
Proposition If is a convex function and is a martingale, then:
 The Jensen’s Inequality holds:
 the sequence is a submartingale.
Proposition If is a martingale with and is a predictable sequence of boundedm nonnegative random variables, then the martingale transform is a submartingale:
Proposition If is a martingale with and is a predictable sequence of random variables such that , then
Corollary If is a nonnegative submartingale with initial term , then Doob’s Maximal Inequality claims that for any :
and that:
Note that this is a big improvement on the Chebyshev Inequality, which claims that given bounded random variable and for any :
Martingale Convergence Theorem
Definition a sequence of real numbers is called a Cauchy sequence if for every positive real number , there is a positive integer such that for all natural numbers such that
Definition martingales have orthogonal increments. Given a martingale with increments and , then:
 , , and
Theorem Suppose is bounded martingale, then there exists a bounded random variable such that:
Theorem Suppose is bounded martingale, then there exists a bounded random variable such that:
(1)
(2)
Change Of Measure
Proposition Given a probability measure and is a nonnegative random variable satisfying , then there exist a probability measure such that for any bounded or nonnegative random variable that . Z is called the likelihood ratio of probability measure w.r.t. , written as and that:
Proposition If the outcome space is finite, then for each outcome ,
Example 1 In a period market with finite set of outcomes and tradable assets. Let denote the riskneutural measure for USD and EUR investors. Let denote the USD and EUR price of the riskless (w.r.t. its own measure) asset at time t. Then
Proof. By fundamental theorem, , and , so:
Theorem Let and be two probability measure on the same measurable space, and let be a filtration such that for all n is absolutely continuous w.r.t. on . Then the sequence of likelihood ratio is a martingale:
Brownian Motion
Standard Bronwian Motion
Definition A standard Brownian motion (SBM) is a continuoustime random process such that and:
(a) has stationary increments.
(b) has independent increments.
(c) The sample path are continuous.
Note that (a), (b), and (c) imply that for some constant the distribution of is
Definition Given a SBM , is a Brownian motion with drift and variance .
Proposition Given a SBM , its reflection is also a SBM.
Proposition Given a SBM , then for any , is a SBM
Quadratic Variation
Definition The nth level quadratic variation of a function is the sum of squares of the increments across intervals of length :
Theorem Given a SBM with drift and variance , then for all with probability :
Strong Markov Property
Definition Given a SBM , a stoping time is a nonnegative random variable such that for every fixed , the event depends only on the path
Theorem If is a Brownian motion and is a stopping time then the strong Markov property holds:
(a) the process is a Brownian motion, and
(b) the process is independent of the path
Theorem Run Brownian motion , at the first time that , reflect the path in the line , by the reflection principle the new process is another Brownian motion:
 for ,
 for ,
Corollary
Corollary has the same distribution as
Corollary has the same distribution as . Hence . Consequently, for every with probability 1 adn . Therefore for every , the Brownian path crosses the taxis infinitely many times by time
Martingales In Continuous Times
Definition A filtration is a nested family of algebra indexed by time .
Definition The natural filtration for a Brownian motion is the filtration with the collection of all events determined by Brownian path up to time .
Definition A continuoustime stohastic process X_t is a martingale relative to a filtration if:
(a) each random variable is measurable w.r.t. and
(b) for any ,
Proposition Given a SBM then each of these is a martingale relative to the natural filtration:
(a)
(b)
(c)
Theorem Define to be the probability measure with likehood ratio . The CameronMartin theorem states that the SBM under is a Brownian motion with drift and variance under .
Corollary For any real value and
Corollary For any stopping time and ,
Ito Calculus
Ito Integral
Definition If is an uniformally bounded process with continuous paths adapted to then we can define an Ito Integral , where is truncted at :
Property The Ito Integral satisfy the following properties:
(1) Linearity: .
(2) Continuity: the paths are continuous.
(3) Mean Zero:
(4) Variance， a.k.a. Ito Isometry:
Defintion Define the quadratic variation of the Ito Itegral:
Proposition
(a) The process is a martingale
(b) The process is a martingale
Example
Example For any stopping time and any :
Theorem Let be a SBM and let be the −algebra of all events determined by the path . If is any random variable with mean 0 and finite variance that is measurable with respect to , for some , then the Ito representation theorem claims that adapted process such that:
This theorem is of importance in finance because it implies that in the BlackScholes setting, every contingent claim can be hedged.
Ito Formula
Theorem Let be a SBM, and let be a twicecontinuously differentiable function such that are all bounded (or at most have exponential growth). Then for any :
Theorem Let be a SBM, and let be a twicecontinuously differentiable function whose partial derivatives are all bounded. Then for any :
Proposition Assume is nonrandom and continuously differentiable. Then:
Ito Process
Definition An Ito process is a stochastic process that satisfies a stochastic differential equation of the form:
Equivalently, satisfies the stochastic integral equation:
Definition For any adapted process define:
Theorem Let be an Ito process, and let be a twicecontinuously differentiable function whose partial derivatives are all bounded. Then:
The OrnsteinUhlenbeck Process
Definition The OrnsteinUhlenbeck SDE:
(a) This SDE describes a process Xt that has a proportional tendency to return to an “equilibrium” position 0.
(b) In finance, the OU process is often called the Vasicek model.
(c) Solving the SDE:
(d) The OrnsteinUhlenbeck process is Gaussian.
The Exponential Martingale
Definition The Exponential Martingale SDE:
(a) Solving the SDE:
The Diffusion Process
Definition The Diffusion SDE:
Definition The Harmonic Function is a function that satisfies the ODE:
Example Let be a solution of the diffusion SDE with initial value , and for any real numbers let . Find
We first apply the Ito Formula to and observe that a harmonic function will force the term to vanish. Therefore is a martingale and that
We can solve for :
The Diffusion Process  Bassel Process
Definition The Diffusion SDE:
Example Similar problem as above:
Note that if and then will never reach .
Ito Formula  MultiVariable
Theorem Let be a K−dimensional SBM, and let be a function with bounded first and second partial derivatives. Then the Ito Formula states:
Where:
Corollary If is a stopping time for the SBM then Dynkin’s Formula shows that for any fixed time :
And that is a martingale
Definition A function is said to be a Harmonic Function in a region if
(a) 2D Harmonic Function Exmaple:
(b) 3D Harmonic Function Example:
Corollary Let be harmonic in the an open region with compact support, and assume that and its partials extend continuously to the boundary . Define to be the first exit time of Brownian motion from , then:
(a) the process is a martingale, and
(b) for every ,
Example If a 2D SBM starts at a point on the circle of radius 1, find out the probability that it hits concentric circles before .
Let be harmonic. Then is a martingale and that .
Example If a 3D SBM starts at a point on the sphere of radius 1, find out the probability that it hits concentric sphere before .
Let be harmonic. Then is a martingale and that .
Ito Process  MultiVariable
Definition An Ito process is a continuoustime stochastic process of the form:
Where the quadratic variation
Let be a vector of Ito processes. For any function with bounded first and second partial derivatives, then:
Theorem Let be a K −dimensional SBM, and let be an adapted, K−dimensional process satisfying . Then the Knight’s Theorem states that the 1dimensional Ito process is a SBM:
Proposition Let be a K −dimensional SBM. Define be the radial part of . Then is a Bessel process with parameter :
Barrier Option
Pricing
Definition A barrier option at time pays:
(a) 1max_{0 \leq t \leq T}\;S_t \geq AS_0$,
(b) 0$ otherwise.
Assume that follows GBM:
The noarbitrage price of the barrier option at is the expected payoff:
At time , there are two possibilities:
(a) if , then
(b) if , then is the same as the time value of a barrier option with timetomaturity and
Hedging
Let be the value of the barrier option at time . The Fundamental Theorem and Ito Formula show that v(t, S_t satisfy the BlackScholes PDE:
A replicating portfolio for the barrier option holds
(a) share of stock
(b) share of cash
provided that . Once the portfolio convert all holdings to cash and hold till maturity.
The BlackScholes
The BlackScholes Formula
Theorem Under a riskneutral , the Fundamental Theorem asserts that discounted share price is a martingale, where:
Therefore :
Definition A European contingent claim with expiration date and payoff function is a tradeable asset with:
(a) share price at time :
(b) discounted share price at time :
Proposition Let be a standard Brownian motion and is a function such that . Then for every :
Corollary Given , the Black Scholes Formula shows:
Under riskneutral , the time option price is a martingale. With the Ito Formula we can set the drift of to be zero and therefore derive the Black Scholes PDE:
Hedging In Continuous Times
Definition A portfolio is selffinancing if for all
Proposition A portfolio is selffinancing if and only if its discounted value is a martingale and satisfies:
Definition A replicating portfolio for a payoff function is a selffinancing portfolio such that
Theorem A replicating portfolio for contingent claims is given by:
(a) cash, and
(b) shares of stock
where u is the solution of the Black Scholes PDE satisfying
The Girsanov Theorem
Proposition The exponential process is a positive martingale.
Applying Ito Formula and therefore
Therorem Given a SBM under measure and the likelihood ratio , define the measure where . Then the Girsanov’s Theorem states that under the measure:
(a) is a SBM
(b) is a BM with timedependent drift
Example 1 Given a brownian motion with , define measure be the conditional probability measure on event . Therefore is a BM with drift .
Proof. We know that , therefore by change of measure:
Therefore Girsanov’s Theorem implies that under , is a SBM.
Example 2 Given currency and their respective bank account and . Define exchange rate (# B per A) that
Theorem If is a SBM under measure then .
Proof. is a martingale only if
Theorem
Levy Process
Poisson Process
Definition A Levy process is a continuoustime random process such that and:
(a) has stationary increments;
(b) has independent increments;
(c) the sample paths X_t$ are rightcontinuous.
Note that Brownian motion and Poisson process are both Levy processes and the basic building blocks of Levy processes. Brownian motion is the only Levy process with continuous paths.
Example Let be a SBM and for , the random variable is a Levy process.
Note that:
(a) has stationary, independent increments
(b) has the same distribution as
Definition A Poisson process with rate is a Levy process such that for all the random variable follows Poisson distribution with mean :
Proposition If are independent Poisson distributions with mean , then .
Proof.
Corollary IF are independent Poisson processes with rates then the superposition is a Poisson process with rate
Proposition Every discontinuity of a Poisson process is of size
Proposition Let be a Poisson process of rate , and let be an independent sequence of i.i.d. Bernoulli− random variables. Then the Thinning Theorem states that are independent Poisson processes with rates :
Theorem If and in such a way that , then the Law of Small Numbers states that the distribution converges to the distribution.
Proposition If is a rate− Poisson process, then for any real number the process is a martingale.
Theorem Define with likelihood ratio such that . Then under the process is a rate Poisson process.
Compound Poisson Process
Definition A compound Poisson process is a Levy process of the form:
Where is rate Poisson process and are i.i.d. random variable independent of . The distribution is the compounding distribution and the measure is the Levy measure.
At each , a random is draw from . is the sum of all draws made by time
Proposition If , then , and , is an exponential martingale.
Poisson Point Process
Definition Let be a −finite Borel measure on . A Poisson point process with intensity measure is a collection of extended nonnegative integervalued random variables such that
(A) If then a.s.
(B) If then
(C) If are pairwise disjoint, then the r.v.s are independent, and
Proposition The point process associated with a CPP is a Poisson point process with intensity measure , where is the Levy measure for the CPP.
Theorem Let be any Levy process, and let be the random set of points such that the Levy process has a jump discontinuity of size at time , i.e.,
Then is a Poisson point process with intensity measure where is a −finite measure called the Levy measure of the process.
📖 Credit Risk Model ^{↺}
Standard Simulation Model on Credit Portfolio
Credit Risk
Lenders, such as banks, are subject to many kinds of risks. among which credit risk is the most likely to cause bank failure.
 Credit risk
 Market risk
 Operation risk
 Reputation risk
Each loan is part of a legal agreement that requires the borrower to pay interest and repay principle on schedule, while some borrowers are required to obey specified covenants
, such as maintaining earning above a certain threshold.
If the borrower fails to follow the agreement, the lender holds the borrower to be in default, which can be money default
or covenant default
. Purchaser of public bonds only experiences money default.
At default, the loan agreement calls for fee to be paid by the borrower, gives the bank power to seize collateral (for secured loans
), and has a cross default
provision (where all loans are in default once one loan is in default).
In the 20th century, most banks did not define default until they discovered a model that could help them manage credit risk.
Rating Agencies
There are 3 major Nationally Recognized Statistical Rating Organizations
(NRSRO) to which firms pay to rate their bonds to increase liquidity.
 Standard & Poor
 Moody’s
 Fitch
Under S&P ratings, the grades are:
 Investment grade: AAA, AA, A, BBB
 Noninvestment grade: BB, B, CCC, CC
 Selectively defaulted: SD
 Defaulted: D
D and PD
Let D
be the default indicator of a loan, taking only two values: 0 and 1. PD
is the probability of default annually.
By mathematical identity:
 Knowing PD, we can simulate D by a Bernoulli Distribution with parameter as PD.
 Given data on D, we can calculate the implied PD.
In a portfolio of N firms, the portfolio default rate, DR, equals:
Exposure, Recovery and LGD
Exposure
is the amount that is owed to the borrowers. Recovery
is measured in either of two ways:
 Market price of the loan at the time of default
 Discounted future cash flows back to the time of default
LGD
(Loss Given Defaults) is a random variable with values usually between 0 and 1:
For a defaulted loan, there are two ways to measure recovery/LGD. For a current loan, there is a distribution for LGD. The expectation is written as:
US investment grade bond LGD is about 0.20%, while noninvestment grade is about 3.60%. Bank loans are almost alwasy senior to bonds and have lower LGD.
Loss and EL
Loss
is measured as a fraction of exposure:
EL
is the expected loss. Because D and LGD are indepndent, so:
Lenders often need to estimate and include EL
in the spread they charged.
Change Of Variable
Note the LGD is often measured in fractions. To change the measure to dollar amount, we need to use the Chain Rule.
Given the pdf of LGD:
We define the function g such that:
Hence the function ginverse is:
The partial derivative can be expressed as:
By definition:
Taking derivative on both sides and with chain rule:
Finally:
Simulate Portfolio Loss On One Single Loan
We know that:
To simulate loss, we first simulate D:
1  Draw x ~ Uniform[0, 1] 
Then simulate LGD based on the pdf of LGD. Multiple each D and LGD to get Loss. Repeat the process to produce a distribution of Loss.
Simulate Portfolio Loss On N Independent Loan
Assume the default of each of the N loan is independent and have the same probability of default, PD:
Then the total number of defaults follows binomial distribution:
However, based on historically data, the variance is much higher than that of the binomial distribution. Hence default correltion needs to be introduced.
Simulate Portfolio Loss On N Correlated Loan
Assume that there is a latent unobserved variable z_{i} that is responsible for the default of firm i, i.e. firm i defaults if:
Assume any two firms i and j are jointly normal. Denote the correlation between z_{i} and z_{j}:
Let r_{i, j} be the correlation between asset return of firm i and j, we know that almost certainly:
Denote PDJ
as the probability that both firm i and j default:
To calculate PDJ with python:
1  import numpy as np 
Returns:
1  Pr[D1=1, D2=1]: 0.0515 
Now that we have the D_{i}, we can simulate portfolio loss rate, given the LGD distribution and exposures for each firm.
Denote Dcorr
to be the correlation between D_{i} and D_{j}:
Note that holding PD_{i}, PD_{j} fixed:
 greater
Dcorr
=> greaterPDJ
 greater ρ => greater
PDJ
 ρ between 1 and 1 => PDJ between 0 and min[PD_{i}, PD_{j}]
Copula
When we model more than three firms, pairwise correlation is not enough to determine the entire distribution of outcomes. For example, there are N PD’s and N(N1)/2 pairwise correlations while we want to calculate 2^{N} outcomes. Hence we introduce the Gauss copula
which helps describe the groupwise correlations.
Consider a set of multivariate normals:
The quantiles of the set are uniformly distributed by definition:
The copula
of the set (Z_{1}, Z_{2}, …, Z_{N}) is defined as the joint cumulative distribution function of (Φ(Z_{1}), Φ(Z_{2}), …, Φ(Z_{N})):
The Gauss copula
is as follow. Note that among all possible copula, the Central Limit Theorem defines and supports the Gauss copula:
In fact, the copula does not contain any information on the marginal distribution. Here we set the marginal distribution F_{Z} to follow standard normal only as an example, but it can be anything continuous such that:
And so:
In the context of default modeling, we assume that each company’s default follows Bernoulli and simulate with standard normal distribution:
The probability of all firms default at the same time is by definition:
Note that given a pairwise correlation matrix Σ, this probability can take any values between 0 and the lowest single firm default probability.
Now we assume all firms’z are connected by the Gauss copula
, which suggests a single value for the probability of all defaulting.
With python we can either numerically evaluate the integral or use simulation to calculate the probability that all firms default at the same time.
1  import numpy as np 
Returns:
1  Probability Of All Default: 0.017 
Note that the compared to the other copulas, the Gauss copula requires only a pairwise correlation matrix and the PD to tell a lot of information. Most of the times the Gauss copula has not been shown invalid, while the calibration of the marginals and correlation matrix are often proved erroneous.
Simulate Rating Transitions
The default model only has two states, 0 and 1:
To simulate rating transitions, we require two matrix:
 Transition Matrix: $$$$P[i \rightarrow j], \forall i, j$$$$
 Cost Matrix, e.g. the loss due to deterioration of borrowers: $$$$cost[i \rightarrow j], \forall i, j$$$$
Factor Model
Single Factor Model
We construct the single risk factor
model with latent variable Z_{i}:
The pairwise correlation between two firms i and j’s latent variables is:
Where:
 Z and X_{i} are Independent
 Z is the
systematic factor
that affects all firms. If Z increase, all Z_{i} decrease and become more likely to default. Z summarizes the effects of all observable macroeconomic factors plus the effects of unobservable factors.  X_{i} is the
idiosyncatic factor
that affects only firm i’s latent variable  Z_{i} ~ N(0, 1) by construction
 {Z_{i}} are jointly normal and connected by a
Gauss copula
cDR and Vasicek
Define Conditional (Expected) Default Rate
(cDR) as:
This gives the final form of cDR, which is called the Vasicek
formula, named after Oldrich Vasicek. Note that the Vasicek formula is monotonic in z and in PD, i.e., higher the z/PD, higher the cDR.
The expected default rate for firm i is always PD_{i}, since:
However, when Z is known, the expected default rate is cDR_{i}. Firms are now uncorrelated as Z is known:
If there are large numbers of identical firms with uniform PD and ρ, the default rate of such asymptotic portfolio follows the unconditional Vasicek distribution
.
The unconditional Vasicek pdf
can be derived with changeofvariable technique. Note that we eliminate z and the pdf only has parameter PD and ρ:
The mean of cDR is PD:
Multifactor Model
Suppose that there are two jointly normal systematic risk factors ψ and ω, and that there are two group of firms depending on each of the factors:
Between the two groups:
Note that:
 If corr[ψ, ω] = 1, this becomes the single factor model and that:
 If corr[ψ, ω] < 1, the crosscorrelations are less than that in the single factor case. It is called
diversification
.  With multifactor model, risk becomes
subadditive
, as oppose toadditive
in the single factor models. This means that the risk in the portfolio is less than the sum of the cDRs’.  The
Moody's Factor Model
attribute each Z_{i} to about 250 factors, along with a firmspecific idiosyncratic factor.
Basel II Capital formula
The Bank For International Settlements
is in Basel, Switzerland. The Basel Committee on Bank Supervision
drafted legislation requiring banks to hold minimum capital, e.g. Basel II, Basel III, etc.
The Basel II
formula is an Asymptotic Single Risk Factor
model, where the portfolio is large enough for the Law of Large Number to work and it generalizes the Vasicek Distribution and include a diverse choice of PD and ρ within the portfolio. The core of the capital requirement for credit capital
is the inverse CDF of Vasicek Distribution.
Inverse Vasicek (with parameter PD and ρ):
Note:
 K is the capital requirement per dollar of wholesale loan.
 LGD is the average LGD in historical downturn conditions
 R (correlation) = 0.12 + 0.12 x exp(50 x PD)
 b = [ 0.11852  0.05478 x log (PD) ]^{2}
 M is maturity
Making sense of the Basel II formula:
 Capital requirement is for
loss
, as oppose to only default, hence the formula multiplies by LGD.  Capital requirement is for
unexpected loss
, hence the formula subtracted the expected loss LGD X PD. Theexpected
portion is handled by bank reserves.  Loans might deteriorate without defaulting, hence a
maturity adjustment
is added to impose higher capital for longer maturity loan.  The estimation of PD and LGD is performed by the banks and supervised by bank supervisor.
Estimation, Statistical Test and Overfit
Estimating PD
Firms differ widely in their credit quality, and PD tend to change over time as well. So a firm’s PD is neither known or fixed. We analyze analogous firms with identical credit ratings
to estimate PD.
Method 1, for all Arated firms in the dataset:
Method 2, for all Arated firms in the dataset:
Method 3, estimate PD as a parameter in a pdf describing Arated firms. This tries to find a distribution that best fits the data. We will focus on this method.
Method Of Moments
Given a dataset {X_{i}}_{N}, we set the moments of the Vasicek distribution equal to the moments of the data.
First moment:
Second moment (unbiased, using N1 in denominator):
Note:
 The method of moment matches the broad features of distribution with the data
 The solution is not unique. Choices can be made between central moment/raw moment, lower moment/higher moment.
 By Jensen’s Inequality, functions of moments are not moments of functions
Maximum Likelihood Estimation
The MLE method chooses parameter values that make the data most likely under the assumed distribution. MLE matches the distribution to the data as a whole
, as oppose to M.o.M. which only matches the moments
. The MLE fits the PDF
to the dataset
.
When data is not highly dispersed, however, the MLE estimate tend to be close to the M.o.M. estimate.
The MLE method is biased estimate that choose parameters that maximize the likelihood function
. Given a dataset {X_{i}}_{N}, we assume the true default rates follow Vasicek distribution. The likelihood function is:
Often we try to maximize the loglikelihood function, i.e. find PD and ρ such that:
Hypothesis Testing & Wilks’ Theorem
We does not assert truth, as truth is often unknown. With a given set of data, we can only assert some models are better
in predicting the future behavior of similar data.
We called the simpler model the null hypothesis
, the more complicated ones the alternative hypothesis
. The null generally nests under the alternative, i.e. the alternative becomes the null when some parameters are set to certain values.
We prefer the null, because it is simpler
, and by doing so we avoid Type 1 error
, which is the rejection of a true null.
Hence we only reject the null if the alternative fits the data significantly better
through a statistical test.
Wilks Theorem
asserts that if:
 There is an asymptotic amount of data
 The null hypothesis is true
Then D
has a distribution that approaches the χ^{2} distribution (with df = number of extra parameters in the alternative), given dataset {X_{i}}_{N}:
The likelihood ratio is defined as follow. It is less or equal than 1 as the alternative is more flexible, and it leads to more probability densities given certain data:
We reject the null hypothesis if D statistic is a tail observation that either the null is not true or the null is true and something (type 1 error) unlikely happen. We reject the null when:
For example when df = 1, the critical value = 3.84, we will reject the null with 95% confidence when:
Overfit
An overfit
model makes worse forecast than a simpler model.
We assume the population data (X, Y) follows bivariate normal distribution:
Given ρ, the population regression line is:
The sample regression line is:
From a sample of 30 observations of (X, Y), ordinary least square
(OLS) is performed to find the insample pvalue for the coefficient and R^{2}. MSE is used to evaluate forecast error.
 When ρ = 0.8, the sample regression line (yellow) is close to the population regression line (red):
 When ρ = 0.2, the sample regression line does NOT match well.
This shows that when the population has a week relationship (ρ = 0.2), estimates of slope are more dispersed.
Now we look at the relationship between statistically significance and MSE. The population MeanSquared Error
(MSE) is an outofsample
measure of forecast errors. The population MSE does NOT depend on any insample data:
We can see that the population regression (b = ρ, a = 0) would minimize MSE, by taking partial derivatives. We can also see that higher the ρ, lower the MSE.
A regression is significant (at 95% confidence) if the pvalue for the coefficient b is less than 0.05.
We have observed that when population has a weak relationship
(ρ = 0.2):
 Forecasts by
significant regressions
tend to havegreater
MSE.  Forecasts by
regressions with higher Rsquare
tend to havegreater
MSE.
This is because the strong relationship suggested by the regression does NOT forecast the week population relationship well.
When population has a strong relation
(ρ = 0.8), however, the significant regression/high Rsquare holds outofsample.
Conditional LGD Risk
cLGD
The history of bond
LGD shows that LGD is elevated when default rate is elevated. The elevation is shown to be moderate and similar across different debt types:
It is important to model LGD appropriately in different economic conditions. Like cDR, we define cLGD
:
Note that:
There are two ways to calculate ELGD:
Futhermore,
Where:
 EcLGD is the average LGD over conditions
 ELGD is the average LGD over different loans
ELGD is higher than EcLGD
because when cLGD is higher, cDR/PD is also higher, which increase the probability weight on the higher cLGDs, while in EcLGD, higher cLGD does not have higher weight.
FryeJacobs
Modeling cLGD separately from cDR introduces complexity and potential overfit to the cLoss model. Instead, the FryeJacobs
LGD function assumes that both cDR and cLoss follow Vasicek distribution, and infers cLGD as a function of cDR.
FreyJacobs assumptions:
cDR and cLoss are
comonotonic
. If cDR goes up, cLoss must go up.
If cDR is in its q^{th} quantile, then cLoss must also be in its q^{th} quantile. This implies that there is a cLGD function of cDR:
cDR
follows Vasicek distribution, which stems from the simplest portfolio structure: Large number of Firms
 Each firm same PD
 Each pairwise ρ the same (same PDJ)
 Gauss copulas
Distribution of
cLoss
does NOT depend of the definition of default.
\times This implies the distribution of cLossdoes not
have separate parameters for PD and ELGD. Itdoes
have a parameter EL.cLoss
follows Vasicek distribution
cLoss
andcDR
have the same ρ parameter.
\times This ensure that the LGD function ismonotonic
Finally,
Observations:
 cLGD is strictly monotonic with range (0, 1), for all k
 cLGD increases slowly, and similarly for all k, at low cDR
 Elasticity is greatest for loans wth low LGD.
FryeJacobs: Develop Alternative Hypothesis
Introduce an additional sensitivity parameter to test the slope
of the LGD function.
We know that:
In integration form:
Bring in the FryeJacobs cLGD function:
Note that EL is in both lhs and rhs, divide both EL by ELGD^{a}:
Note that we have identified a new LGD function:
Analyzing the choice of a:
 When a = 0, the cLGD function is the FryeJacob formula.
 When a = 1, cLGD = ELGD, which implies cLGD does not depend on conditions:
FryeJacobs: Hypothesis Test
We introduce finite portfolio
, which brings randomness into the D’s and LGD^{dollar}s.
 We assume the finite portfolio is uniform and all N loans have the same PD and ρ
 We assume that given portfolio cDR, the number of defaults is binomial:
 We assume that LGD is normally distributed around cLGD, with σ = 0.2. Note under this assumption, ELGD = cLGD which correspond with a = 1.
Under finite portfolio, the probability of 0 defaults is:
When conditional on cDR and Σ D > 0, the average portfolio LGD rate
is normal:
Let Y ~ N(0, 1) be a standard normal variable, then LGD becomes:
Now calculate Loss based on DR and LGD:
Use changeofvariable technique to calculate the pdf for Loss:
Where:
Finally, the pdf of loss conditional on Σ D and cDR:
Removing the conditional, the distribution of loss in a uniform portfolio, with N loans, same PD and ρ and the cLGD function, becomes:
Here is a plot of the the unconditional loss density in a finite (N = 10) portfolio in red and loss density in an infinite portfolio (Vasicek) in blue. (note that the plot use D to denote Σ D):
Now we have the pdf for loss, we an test the hypothesis:
 H_{0}: a = 0
 H_{1}: a = MLE Based On Moody’s Loss data
As a result MLE(a) = 0.01 based on all loan data and the test failed to reject the null. Same with other bonds and bonds/loans data combination. We conclude that the FyreJacob model is consistent with Moody’s data
Vender Estimation
DistanceToDefault and EDF
Robert Merton argues that:
 the default of firm i depends on its asset return
 Merton asserts that a firm defaults if and only if the value of its asset drops below the value of its liability, i.e. its asset return is too low
 joint default of firm i and j depends on PD and asset return correlation
Moody’s suggests that loan contains the option to default, and attempts to use riskneutral probability to estimate the probability of default. In the context of a put:
Under Moody’s assumption, the firm has an option to default on its assets once it drops below its liability. Here, liability is the strike price, for which Moody’s uses D
, or “default point”, to denote short term debt plus half of long term debt to represent liability. DD
stands for DistanceToDefault
, suggested by Merton. So the probability of default is:
Moody’s then estimates the value and volatility of the assets (unobservable) based on the value and volatility of the market capitalization (observable).
However, since Φ(DD) gave very poor estimate for the default probability, Moody’s sets the EDF
(Estimated Default Frequency) of a firm equal to the average historical default rate
of firms with the same DistanceToDefault
. An EDF uses DD to find historical analogs of current firms.
Correlation
Merton assumes that the correlation ρ between the latent variable Z’s is equal to the asset return correlation r.
However, data suggests that correlation estimated from credit data is less
than the correlation based on asset returns. Hence a credit portfolio model that uses asset correlation to estimate ρ overstates credit risk.
📖 Foreign Exchange ^{↺}
Theoretical Pricing
FX Spot Contract
The spot price
is the observable market price of unit of foeign currency. Let denote foreign currency and denote domestic currency:
A FX spot contract
is an agreement where the buyer purchase units of foreign currency at a fixed rate at current time .
The contract value to the buyer is:
FX Forward Contract
Denote domestic interest rate = . The price of domestic zerocoupon bond
A FX forward contract
is an agreement where the buyer agree to purchase units of foreign currency at a fixed rate at future time :
The time value of a forward contract is:
We set to calculate the forward price
at time . The equation is also called the covered interest parity
, or CIP:
NonDeliverable forward
Nondeliverable
currency has restricted exchange by local regulations. CIP does not hold since covered interest arbitrage is not possible. For example:
Asia
 CNY: China Yuan
 TWD: New Taiwan Dollar
 KRW: South Korean Won
 INR: India Rupee
 PHP: Philippine Piso
 IDR: Indonesia Rupiah
 MYR: Malaysian Ringgit
Latin America:
 COP: Colombian Peso
 VEB: Venezuelan Bolívar
 BRL: Brazilian Real
 PEN: Peru Sol
 UYU: Uruguayan Peso
 CLP: Chilean Peso
 ARS: Argentine Peso
Europe, Middle East and Africa:
 EGP: Egyptian Pound
 KZT: Kazakhstani Tenge
Given CIP, we can calculate the implied yield
, which is the foreign interest rate implied by the forward rate, domestic spot rate and domestic interest rate.
We know that the exponential function can be expressed as the sum of the Maclaurin series:
Applying this to the forward rate:
FX Swap Contract
A FX swap contract
contains two FX forward contracts at time with opposite directions.
For example, a buy/sell
swap contract:
The present value of the swap contract is the sum of the present value of the two subcontracts:
Note that the value of a swap contract is fairly insensitive to spot rate changes, comparing to that of a forward contract.
FX Option
A FX option
conveys the right, but not the obligation, to exchange units of foreign currency for units of domestic currency, at a future date .
For example, the buyer of a foreign currency call strike at , have the right at maturity to buy unit of at even if .
This is equivalent to the the buyer of units of domestic currency put strike at , which grants the buyer the right at maturity to sell unit of at a rate of , even if the exchange rate falls below .
In formula:
Visualizing the transactions on a foreign currency call:
Visualizing the transactions on a domestic currency put:
FX options also satisfy putcall parity
:
GarmanKohlhagen
To evaluate the price of the option:
 Assumptions on the stochastic nature of S_{t}
 Create a “riskfree” hedge portfolio, in order to find a governing PDE for the option value, which also leads to an equivalent riskneutral probability measure
 Solve the PDE directly, with appropriate boundary conditions
We know that if a tradable asset follows the geometric Brownian motion
:
Applying Ito's formula
any value of a derivative contract :
Setting the drift term to be zero as the derivative contract is tradeable, we can derive the BlackScholes
PDE equation characterize as such:
However, since the foreign exchange spot rate is not tradable, we need to tweak the BS formula. Let and denote a bank account in domestic and foreign currencies, where and . Construct replicating portfolio and set the drift term to be , the GarmanKohlhagen
PDE equation can be derived:
Solving the PDF:
Using the FreynmanKac
equation with additional derivation, we can conclude that s.t. the arbitragefree
price of the contingent claim is unequivocally determined as the expected value of the discounted final payoff under , and obeys the stochastic differential equation:
Practical Pricing
FX Spot Contract
The trade date
is when the terms of the transaction are agreed, and the value date
is when transaction occurs, which is trade date for most currency pairs.
The spot rate quote
means:
 , i.e. higher the , stronger the .
 is the
base currency
and is set to 1 unit, whereas is thenumeraire currency
which is used as the numeraire.
The bidoffer spread
means:
 The dealer is willing to buy for
 The dealer is willing to sell for
Equivalently:
 The highest price YOU can sell is
 The lowest price YOU can buy is
FX Forward Contract
The forward point is commonly expressed in the unit pip
, or point in percentage, that is worth .
Example 1
When selling a forward for foreign currency , the bid side spot rate plus bid side forward points shall be equal to the bid side outright forward rate.
A marketmaker would construct the short
forward as follow. Note that borrowing and lending correspond to selling a forward and therefore the bidside
forward point.
Time  Transactions 

borrow execute a short spot contract lend 

receive execute a long spot contract pay 
This is the same as selling an outright forward contract:
Time  Transactions 

N/A  
receive pay 
FX Swap Contract
A FX swap contract intends to adjust the timing of cash flows from to and alter the value date on an existing trade. The near rate
should be consistent with the market forward rate for the near date, and the same goes for the far rate
. The swap point
is equal to:
A buy/sell
swap on means that it buys a forward on at and sells a forward on at . This correspond to borrowing and lending .
Example 2
A short outright forward position on can be thought of as a buy/sell swap on with a spot transaction at the near date and , similar to Example 1. Here :
Time  Transactions 

borrow execute a short forward contract: pay receive lend 

receive execute a long forward contract: pay receive pay 
This is the same as a buy/sell swap:
Time  Transactions 

recieve pay 

receive pay 
Example 3
From a marketmaker
perspective:
Contract  Swap Point  T1  T2 

Buy/Sell  offerside swap point  pay at bidside points  sell at offerside points 
Sell/Buy  bidside swap point  sell at bidside points  pay at bidside points 
Note(): because a swap has less interest rate risk than an outright forward, the marketmaker can easily construct a swap with bidside points for both near and far dates.
Example 4
Say the swap point is , then a party that buy/sell the foreign currency is paying
the swap point, because it is selling at a lower Far rate.
Conversely, a party that sell/buy is earning
the swap point.
Risk Characteristics
Contract  Transactions  FX Risk  IR Spread Risk 

Spot  1  Yes  No 
Forward (Outright)  1  Yes  Yes 
Swap  1  No  Yes 
FX Option
There are four ways to express an option price:
Price  in units  in units 

Notional as  

Notional as  

Straddle
The meaning of can be different:
 : at the spot rate
 : at the forward rate (preferred by traders)
 : deltaneutral
Risk Reversal
Where a delta option is an option with a delta of . Risk reversal can also denote the difference in implied volatility
:
Butterfly
Note that butterfly is vega () neutral, e.e. the strangle notional is usually larger than the straddle notional to create equal and offestting vega . BF
can also denote the difference in implied volatility
:
Under the BlackScholes framework, deltanetural strike () options have the highest vega :
In addition, option gamma
📖 C++ ^{↺}
C++ is a complied
（vs interpreted: python), generalpurpose
(vs domainspecific: HTML) programming language created by Danish programmer Bjarne Stroustrup
as an extension to C.
Basic
Compiler
A compiler translate a high level language into a low level language and create an executable program.
 Preprocessor: read preprocessing lines
#include "foo.hpp"
 Compiler: turn the above code it into assembly code (ASM).
 front end create IR (intermediate representation) with SSA (static singale assignment). The runtime is .
 middle end optimize IR. remove unnecessary operations, or more.
 back end produce ASM
 Assembler: turn ASM into binary code
 Linker: link all relevant headers, libraries together
 Debugger: type checking
 Object Copy: generate .exe (for windows), and .bin (for mac)
G++
Compile with g++ at the command line:
1  $ g++ toto.cpp 
Running the complied result:
1  $ /a.exe 
Header
The C++ standard library
is a collection of classes and functions, represented by different headers. For example, include the <iostream>
header to handle input and outputs and other nonstandard headers using double quoto.
1 
Macro
1  define N 4 
Guards
In C++, function, class and variable can only be declared once. We use guards
to make sure we do not duplicate declaration in multiple files.
1  #ifndef "foo.h" 
Namespace
Some classes and functions are grouped under the same name, which divides the global scope into subscopes, each with its own namespaces.
Functions and classes in the C++ standard library are defined in the std
namespace. For example, the cin
(standard input), cout
(standard output) and end
(end line) objects.
1  char c; 
Alternatively, we can use using namespace std;
.
Data Type
Every variable has to have a type
in C++, and the type has to be declared and cannot be changed. There are fundamental types and userdefined types (classes)
Characters In computer, each bit
stores a binary (0/1) value. A byte
is 8 bits. The computer stores characters in a byte using the ASCII format.
Numbers The computer stores numbers in binary format with bits. The leftmost
bit is used to store the sign of a number. (See twoscomplement method). Real values are stored using a mantissa
and an exponent:
Note that very few values can be exactly represented, and how close we can get depends on the number of bits available.
Type  Size (Bytes)  Value Range 

bool  1  true or false 
char  1  128 to 127 
short  2  32,768 to 32,767 
int  4  2,147,483,648 to 2,147,483,647 
float  4  3.4E +/ 38 
double  8  1.7E +/ 308 
C++ is a strongly typed
language, which means type errors needs to be resolved for all variables at compile
time.
Function
Every console application has to have a main()
function, which takes no argument and returns an integer value by default.
A function that adds two numbers:
1 

Overloading
allows 2 or more functions to have the same name, but they must have different input argument types
.
Function Object
Function object, or functors
, are objects that behave like functions, are functions with state.
A regular function looks like this:
1  int AddOne(int val) 
A function object implementaion:
1  class AddOne 
Lambda
Lambdas is a new feature introduced in C++11, which is an inline function that can be used as a parameter or local object.
1  [] (string s) // [] is the lambda introducer/capture clause 
Example 1
1  vector<int> v{1, 3, 2, 4, 6}; 
Example 2
1  vector<int> v{1, 3, 2, 4, 6}; 
Example 3
1  vector<Person> ppl; 
Extern
The keyword extern means the function is declared in another file.
1  extern int foo(int a); 
Inline Function
C++ provides inline
funcitons such that the overhead of a small function can be reduced. When inline function is called the entire code of the function is inserted at the point of the inline function call.
Typedef
Use typedef
keyword to define a type alias.
1  typedef double OptionPrice; 
Operators
Standard operations:
1  Arithmetic: +, , *, / 
Note the difference between i++
and ++i
1  i++; // return (old) i and increment i 
Const
Use the const
keyword to define a constant value. The compiler
will stop any attempt to alter the constant values.
Since C++ is a strongly typed language, it is preferred to use const int N = 4
, instead of #define N 4
, as the former defines a type.
Reference
Example 1 A reference is an alias for a variable and cannot rebind to a different variable. We can change val
by changing ref
:
1  int val = 10; 
Example 2 We can also bind a const reference to a const object. An error will be raised if attempt to change the value or the reference.
1  const int val = 10; 
Example 3 We can also bind a const reference to a nonconst object, thereafter we can NOT change the object using the reference.
1  int val = 10; 
Pass By Value In a function, we can pass an argument by either value
or reference
. When passing by value
, the variable x
will NOT be changed. In this case, we waste time to both create a copy inside the function and memory to store the copy
1  void DoubleValue(int number) 
1  x = 5 
Pass By Reference When passing by reference
(by adding &
in the function argument parameter), the variable x
WILL be changed.
1  void DoubleValue(int& number) 
1  x = 10 
Pass By Const Reference We add const when we do not want the specific function argument to be tempered when passed by reference. In this example, there will be a compiler error as we are trying to change the const reference number
in the function.
1  void DoubleValue(const int& number) 
Pointer
In computer memory, each stored values has an address associated with it. We use a pointer
object to store address of another object and access it indirectly.
There are two pointer operator:
&
:address of
operator, used to get the address of an object*
:dereference
operator, used to access the object
Example 1
1  int* ptr = nullptr; // initiate an empty pointer 
Example 2 If the object is const, a pointer cannot be used to change it.
1  const int val = 10; 
Example 3 You can have a pointer that itself is const
1  int val = 10; 
Casting
C++ allows implicit
and explicit
conversions of types.
1  short a = 1; 
However, the traditional explicit typecasting allows conversions between any types, and leads to runtime error. To control these conversions, we introduce four specific casting operators:
dynamic_cast<new_type>( )
: used only with pointers (and/or references to objects); can cast aderived
class to itsbase
class;basetoderived
conversions are allowed only withpolymorphic
base class
1  class Base {virtual void foo() {} }; 
1  derived_ptr_2: 0x7fa5cec00630 
static_cast < new_type>( )
: used only with pointers (and/or references to objects); can castbasetoderived
orderivedtobase
, but no safety check at runtime;
1  Base* base_ptr_3 = new Base; 
1  derived_ptr_3: 0x7fc3d7400690 
reinterpret_cast <new_type>( )
: convert pointer to another unrelated class; often lead to unsafe dereferencing
1  class A {}; 
const_cast <new_type>( )
: remove/set the constantness of an object
Array (CStyle)
An array is a fixed collection of similar kinds of items that are stored in a contiguous block in memory. We define the size of the array at creation, and the array index starts a 0 in C++.
1  int a[10]; 
The address of the array is the same as the address of the first element of the array. Therefore, we can access an array using pointer increment  very efficient.
1  int a[10]; 
Dynamic Allocation
Dynamic memory allocation
is necessary when you do NOT know the size of the array at compile time. We use a new
keyword paired with a delete
keyword.
1  int* a = new int[10]; 
Dynamic allocate a matrix with cast.
1 

1  1 1 1 1 
Library
A C++ library
is a package of reusable code typically with these two components:
 header file
 precompiled binary containing the machine code for functionality implemntation
There are two types of c++ libraries: static
and dynamic
libraries.
 a
static
library has a.a
(.lib
on Windows) extension and the library codes are complied as part of the executable  so that user only need to distribute the executable for other users to run the file with a static library.  a
dynamic
library has a.so
(.dll
on Windows) extension and is loaded at run times. It saves space as many program can share a copy of dynamic library code, and it can be upgraded to new versions without replacing all the executables using it.
Condition
If/Else
1  if (condition_1) 
Switch
A switch statement tests an integral or enum value against a set of constants. we can NOT use a string in the switch statement.
1  int main() 
While / Do While / For Loop
While loop:
1  int n = 0; 
Do while loop:
1 

For loop:
1  for (unsigned int n = 0; n < 10; ++n) 
For loop with two variables:
1  for (unsigned int i = 0, j = 0; i < 10 && j < 10; ++i, j+=2) 
Enum
The enum
(enumerated) type is used to define collections of named integar constants.
1  enum CurrencyType {USD, EUR, GBP}; 
Class
A class
achieve data abstraction
and encapsulation
.
 abstraction refers to the separation of interface and implementation
 encapsulation refers to combining data and functions so that data is only accessible through functions.
Member Variable & Function
Define a customer class with member variable and function.
1  class Customer 
Instantiate Customer class instances to represent different customer.
1  Customer c1("Joe", "Hyde Park"); 
Protection Level
There are three protection levels to keep class data member internal to the class.
 public accessible to all.
 protected accessible in the class that defines them and in classes that
inherit
from that class.  private only accessible within the class defining them.
Constructor / Destructor
A constructor
is a special member functions used to initialize the data members when an object is created. This is an example to use initializer list
to create more efficient constructors
1  Customer::Customer() 
FreeStore
There are several ways to create objects on a computer:
Automatic/Stack
int a;
Dynamic Allocated
 Free Store
int* ptr = new a[10];
 Heap allocated/freed by
malloc/free
 Free Store
Summarized in a table from geeksforgeeks
Parameter  Stack  Heap 

Basic  Memory is allocated in a contiguous block  Memory is allocated in any random order 
Allocated and deallocation  Automatic by compiler instructions  Manual by programmer 
Cost  Less  More 
Access time  Faster  Slower 
Main issue  Shortage of memory  Memory leak/fragmentation 
We use >
to access freestore object’s member functions:
1  Customer* c = new Customer("Joe", "Chicago"); 
Const Member Functions
A const object
can only invoke const member function
on the class. A const member function is not allowed to modify any of the data members on the object on which it is invoked. However, if a data member is marked mutable
, it then can be modified inside a const member function.
1  const Customer c1("Joe", "Hyde Park"); 
Static Member
We use static
keyword to associate a member with the class, as oppose to class instances. A static data member can NOT be accessed directly using a nonstatic member function.
Static member variables can NOT be initialized through the class constructor, rather, they are initialized once outside the class body. However, a const static member variable can be initialized within the class body.
1  class Counter 
This
Every nonstatic member function has access to a this
pointer, which is initialized with the address of the object when the member function is invoked.
1  double Currency::GetExchangeRate() 
Copy Constructor
We use the copy constructor to construct an object from another already constructed object of the same type.
1  class Customer 
Assignment Operator
We use the assignment operator to assign an object of the same type.
1  class Customer 
Shallow / Deep Copy
The default copy constructor and assignment operator provides shallow copy
, which copies each member of the class individually. For pointer member, the shallow copying copies the address of the pointer, resulting in both members pointing to the same object on the free store.
A deep copy
, however, creates a new object on the free store and copy the contents of the object the original pointer is pointing to.
Deep Copy copy constructor
1  Customer::Customer(const Customer& other) 
Deep Copy assignment operator
1  Customer& Customer::operator=(const Customer& other) 
The Rule of 3
There are 3 operations that control the copies of an object: copy constructor, assignment operator, and destructor. If you define one of them, you will most likely need to define the other two as well.
Singleten Class
The Singleton
design pattern makes sure only one instance of an object of a given type is instantiated in a program, and provides a global point of access to it
 change the access level of the constructor to private
 add new public member function
Instance()
to create the object  use static member variable to hold the object
1  class CurrencyFactory 
1  CurrencyFactory* CurrencyFactory::Instance() 
1 

Inheritance
Classes related by inheritance
form a hierachy consisting of base and derived classes. The derived
class inherit some members from the base class subject to protection level restrictions, and may extend/override implementation of member functions in the base class.
1  class Person 
Virtual
Different derived classes may inplement member functions from the base class differently. The base class uses virtual
keyword to indicate a member function that may be specialized by derived classes.
1  class Base 
Abstract Class
The base class has to
either provide a default implementation for that function or declare it pure virtual
. If a class has one or more pure virtual function, it is called an abstract class
or interface
. An abstract class cannot be instantiated.
1  class Base 
Virtual Destructor
When we delete a derived class we should execute both the derived class destructor and the base class destructor. A virtual base class destructor
is needed to make sure the destructors are called properly when a derived class object is deleted through a pointer to a base class
.
If we delete a derived class object through a pointer to a base class when the base class destructor is nonvirtual, the result is undefined
.
Polymorphism
The types related by inheritance are known as polymorphic
. types. We can use polymorphic types interchangeably.
We can use a pointer
or a reference
to a base class object to point to an object of a derived class – this is known as the Liskov Substitution Principle
(LSP). This allows us to write code without needing to know the dynamic type of an object
1  BankAccount* acc1 = new Savings(); 
We can write one function which applies to all account types.
1  void UpdateAccount(BankAccount* acc) 
1  void UpdateAccount(BankAccount& acc) 
Standard Template Library (STL)
Sequential Container
std::array
The STL array class from offers a more efficient and reliable alternative for Cstyle arrays, where size is known and we do not have to pass size of array as separate parameter.
1 

std::vector
Vectors are the stored contiguously same as dynamic arrays with the ability to resize itself automatically when an element is inserted or deleted. Vector size is double whenever half is reached.
1 

std::list
Different from arrays and vectors, A list is a sequential container that allows noncontiguous memory allocation.
1 

std::string
The STL string class stores the characters as a sequence of bytes, allowing access to single byte character. Any string is terminated by a \0
, so the string foo
actually stores four characters.
size()
The use sizeof()
to return the size of an array in bytes. Use .size()
member function to return the number of elements in a STL container.
1 

1  The size of a: 20 bytes 
Associative Container
std::set
Sets are an associative container where each element is unique. The value of the element cannot be modified once it is added to the set.
1 

std::map
A std::map
sorts its elements by the keys.
Algorithm
The STL provides implementations of some widely used algorithms.
 <algorithms> header: sorting, searching, copying, modifying elements
 <numeric> header: numeric operation
Sort
1  int main() 
Binary Search
1  int main() 
Copy
1  int main() 
Replace
1  int main() 
Numeric
1  int main() 
Complexity Comparison
Smart Pointer
std::unique_ptr
A unique pointer
takes unique ownership in its pointed object
. The unique pointer delete the object they managed either when the unique pointer is destroyed or when the object’s value changes.
1 

std::shared_ptr
The shared pointer
counts the reference to its pointed object and can store and pass a reference beyond the scope of a function. In OOP, the share pointer is used to store a pointer as a member variable and can be used to reference value outside the scope of the class.
1  std::share_ptr<Option> sp2; 
Creating a vector of shared_ptr:
1 

std::weak_ptr
A weak_ptr
works the same as shared pointer
, but will not increment the reference count.
1  std::weak_ptr<Option> sp2; 
Parallel Processing
Threading
A thread
is a small sequence of programmed instruction and is usually a component of a process
. Multithreading
can exist within one process, executing concurrently
and share resources such as memory, while processes do not share their resources.
The std::thread
class in c++ supports multithreading, and can be initiated to represent a single thread. We need to pass a callable object (function pointer, function, or lambda) to the constructor of the std::thread class. We use the std::thread.join()
method to wait for the copmletion of a thread.
Here we initiate two threads. Both threads share memory and attempt to modify the balance
variable at the same time which lead to concurrency issue.
1 

1  153258 
We introduce the an mutex
, or mutual exclusive, object, which contains a unique id for the resources allocated to the program. A thread can lock
the resource by a std::mutex.lock()
method, which prevent other thread from sharing the resource until the mutex becomes unlocked.
1 

1  0 
Condition Variable
A condition variable
is an object that can block the calling thread until notified to resume. It uses a unique_lock
(over a mutex
) to lock the thread when one of its wait
functions is called.
1 

1  main() signals ready for processing 
Reference:
 Stochastic Calculus: An Introduction with Applications, Gregory F. Lawler
 FINM 32000, 33000, 34500, 36700, 36702, 322 Lecture Notes, the University of Chicago