# Standard Simulation Model on Credit Portfolio

## Credit Risk

Lenders, such as banks, are subject to many kinds of risks. among which credit risk is the most likely to cause bank failure.

• Credit risk
• Market risk
• Operation risk
• Reputation risk

Each loan is part of a legal agreement that requires the borrower to pay interest and repay principle on schedule, while some borrowers are required to obey specified covenants, such as maintaining earning above a certain threshold.

If the borrower fails to follow the agreement, the lender holds the borrower to be in default, which can be money default or covenant default. Purchaser of public bonds only experiences money default.

At default, the loan agreement calls for fee to be paid by the borrower, gives the bank power to seize collateral (for secured loans), and has a cross default provision (where all loans are in default once one loan is in default).

In the 20th century, most banks did not define default until they discovered a model that could help them manage credit risk.

## Rating Agencies

There are 3 major Nationally Recognized Statistical Rating Organizations (NRSRO) to which firms pay to rate their bonds to increase liquidity.

• Standard & Poor
• Moody’s
• Fitch

Under S&P ratings, the grades are:

• Investment grade: AAA, AA, A, BBB
• Non-investment grade: BB, B, CCC, CC
• Selectively defaulted: SD
• Defaulted: D

## D and PD

Let D be the default indicator of a loan, taking only two values: 0 and 1. PD is the probability of default annually.

By mathematical identity:

• Knowing PD, we can simulate D by a Bernoulli Distribution with parameter as PD.
• Given data on D, we can calculate the implied PD.

In a portfolio of N firms, the portfolio default rate, DR, equals:

## Exposure, Recovery and LGD

Exposure is the amount that is owed to the borrowers. Recovery is measured in either of two ways:

• Market price of the loan at the time of default
• Discounted future cash flows back to the time of default

LGD (Loss Given Defaults) is a random variable with values usually between 0 and 1:

For a defaulted loan, there are two ways to measure recovery/LGD. For a current loan, there is a distribution for LGD. The expectation is written as:

US investment grade bond LGD is about 0.20%, while non-investment grade is about 3.60%. Bank loans are almost alwasy senior to bonds and have lower LGD.

## Loss and EL

Loss is measured as a fraction of exposure:

EL is the expected loss. Because D and LGD are indepndent, so:

Lenders often need to estimate and include EL in the spread they charged.

## Change Of Variable

Note the LGD is often measured in fractions. To change the measure to dollar amount, we need to use the Chain Rule.

Given the pdf of LGD:

We define the function g such that:

Hence the function g-inverse is:

The partial derivative can be expressed as:

By definition:

Taking derivative on both sides and with chain rule:

Finally:

## Simulate Portfolio Loss On One Single Loan

We know that:

To simulate loss, we first simulate D:

Then simulate LGD based on the pdf of LGD. Multiple each D and LGD to get Loss. Repeat the process to produce a distribution of Loss.

## Simulate Portfolio Loss On N Independent Loan

Assume the default of each of the N loan is independent and have the same probability of default, PD:

Then the total number of defaults follows binomial distribution:

However, based on historically data, the variance is much higher than that of the binomial distribution. Hence default correltion needs to be introduced.

## Simulate Portfolio Loss On N Correlated Loan

Assume that there is a latent unobserved variable zi that is responsible for the default of firm i, i.e. firm i defaults if:

Assume any two firms i and j are jointly normal. Denote the correlation between zi and zj:

Let ri, j be the correlation between asset return of firm i and j, we know that almost certainly:

Denote PDJ as the probability that both firm i and j default:

To calculate PDJ with python:

Returns:

Now that we have the Di, we can simulate portfolio loss rate, given the LGD distribution and exposures for each firm.

Denote Dcorr to be the correlation between Di and Dj:

Note that holding PDi, PDj fixed:

• greater Dcorr => greater PDJ
• greater ρ => greater PDJ
• ρ between -1 and 1 => PDJ between 0 and min[PDi, PDj]

## Copula

When we model more than three firms, pair-wise correlation is not enough to determine the entire distribution of outcomes. For example, there are N PD’s and N(N-1)/2 pair-wise correlations while we want to calculate 2N outcomes. Hence we introduce the Gauss copula which helps describe the group-wise correlations.

Consider a set of multivariate normals:

The quantiles of the set are uniformly distributed by definition:

The copula of the set (Z1, Z2, …, ZN) is defined as the joint cumulative distribution function of (Φ(Z1), Φ(Z2), …, Φ(ZN)):

The Gauss copula is as follow. Note that among all possible copula, the Central Limit Theorem defines and supports the Gauss copula:

In fact, the copula does not contain any information on the marginal distribution. Here we set the marginal distribution FZ to follow standard normal only as an example, but it can be anything continuous such that:

And so:

In the context of default modeling, we assume that each company’s default follows Bernoulli and simulate with standard normal distribution:

The probability of all firms default at the same time is by definition:

Note that given a pair-wise correlation matrix Σ, this probability can take any values between 0 and the lowest single firm default probability.

Now we assume all firms’z are connected by the Gauss copula, which suggests a single value for the probability of all defaulting.

With python we can either numerically evaluate the integral or use simulation to calculate the probability that all firms default at the same time.

Returns:

Note that the compared to the other copulas, the Gauss copula requires only a pair-wise correlation matrix and the PD to tell a lot of information. Most of the times the Gauss copula has not been shown invalid, while the calibration of the marginals and correlation matrix are often proved erroneous.

## Simulate Rating Transitions

The default model only has two states, 0 and 1:

To simulate rating transitions, we require two matrix:

• Transition Matrix: P[i \rightarrow j], \forall i, j
• Cost Matrix, e.g. the loss due to deterioration of borrowers: cost[i \rightarrow j], \forall i, j

# Factor Model

## Single Factor Model

We construct the single risk factor model with latent variable Zi:

The pair-wise correlation between two firms i and j’s latent variables is:

Where:

• Z and Xi are Independent
• Z is the systematic factor that affects all firms. If Z increase, all Zi decrease and become more likely to default. Z summarizes the effects of all observable macroeconomic factors plus the effects of unobservable factors.
• Xi is the idiosyncatic factor that affects only firm i’s latent variable
• Zi ~ N(0, 1) by construction
• {Zi} are jointly normal and connected by a Gauss copula

## cDR and Vasicek

Define Conditional (Expected) Default Rate (cDR) as:

This gives the final form of cDR, which is called the Vasicek formula, named after Oldrich Vasicek. Note that the Vasicek formula is monotonic in z and in PD, i.e., higher the z/PD, higher the cDR.

The expected default rate for firm i is always PDi, since:

However, when Z is known, the expected default rate is cDRi. Firms are now uncorrelated as Z is known:

If there are large numbers of identical firms with uniform PD and ρ, the default rate of such asymptotic portfolio follows the unconditional Vasicek distribution.

The unconditional Vasicek pdf can be derived with change-of-variable technique. Note that we eliminate z and the pdf only has parameter PD and ρ:

The mean of cDR is PD:

## Multi-factor Model

Suppose that there are two jointly normal systematic risk factors ψ and ω, and that there are two group of firms depending on each of the factors:

Between the two groups:

Note that:

• If corr[ψ, ω] = 1, this becomes the single factor model and that:
• If corr[ψ, ω] < 1, the cross-correlations are less than that in the single factor case. It is called diversification.
• With multi-factor model, risk becomes sub-additive, as oppose to additive in the single factor models. This means that the risk in the portfolio is less than the sum of the cDRs’.
• The Moody's Factor Model attribute each Zi to about 250 factors, along with a firm-specific idiosyncratic factor.

## Basel II Capital formula

The Bank For International Settlements is in Basel, Switzerland. The Basel Committee on Bank Supervision drafted legislation requiring banks to hold minimum capital, e.g. Basel II, Basel III, etc.

The Basel II formula is an Asymptotic Single Risk Factor model, where the portfolio is large enough for the Law of Large Number to work and it generalizes the Vasicek Distribution and include a diverse choice of PD and ρ within the portfolio. The core of the capital requirement for credit capital is the inverse CDF of Vasicek Distribution.

Inverse Vasicek (with parameter PD and ρ):

Note:

• K is the capital requirement per dollar of wholesale loan.
• LGD is the average LGD in historical downturn conditions
• R (correlation) = 0.12 + 0.12 x exp(-50 x PD)
• b = [ 0.11852 - 0.05478 x log (PD) ]2
• M is maturity

Making sense of the Basel II formula:

• Capital requirement is for loss, as oppose to only default, hence the formula multiplies by LGD.
• Capital requirement is for unexpected loss, hence the formula subtracted the expected loss LGD X PD. The expected portion is handled by bank reserves.
• Loans might deteriorate without defaulting, hence a maturity adjustment is added to impose higher capital for longer maturity loan.
• The estimation of PD and LGD is performed by the banks and supervised by bank supervisor.

# Estimation, Statistical Test and Overfit

## Estimating PD

Firms differ widely in their credit quality, and PD tend to change over time as well. So a firm’s PD is neither known or fixed. We analyze analogous firms with identical credit ratings to estimate PD.

Method 1, for all A-rated firms in the dataset:

Method 2, for all A-rated firms in the dataset:

Method 3, estimate PD as a parameter in a pdf describing A-rated firms. This tries to find a distribution that best fits the data. We will focus on this method.

## Method Of Moments

Given a dataset {Xi}N, we set the moments of the Vasicek distribution equal to the moments of the data.

First moment:

Second moment (unbiased, using N-1 in denominator):

Note:

• The method of moment matches the broad features of distribution with the data
• The solution is not unique. Choices can be made between central moment/raw moment, lower moment/higher moment.
• By Jensen’s Inequality, functions of moments are not moments of functions

## Maximum Likelihood Estimation

The MLE method chooses parameter values that make the data most likely under the assumed distribution. MLE matches the distribution to the data as a whole, as oppose to M.o.M. which only matches the moments. The MLE fits the PDF to the dataset.

When data is not highly dispersed, however, the MLE estimate tend to be close to the M.o.M. estimate.

The MLE method is biased estimate that choose parameters that maximize the likelihood function. Given a dataset {Xi}N, we assume the true default rates follow Vasicek distribution. The likelihood function is:

Often we try to maximize the log-likelihood function, i.e. find PD and ρ such that:

## Hypothesis Testing & Wilks’ Theorem

We does not assert truth, as truth is often unknown. With a given set of data, we can only assert some models are better in predicting the future behavior of similar data.

We called the simpler model the null hypothesis, the more complicated ones the alternative hypothesis. The null generally nests under the alternative, i.e. the alternative becomes the null when some parameters are set to certain values.

We prefer the null, because it is simpler, and by doing so we avoid Type 1 error, which is the rejection of a true null.

Hence we only reject the null if the alternative fits the data significantly better through a statistical test.

Wilks Theorem asserts that if:

• There is an asymptotic amount of data
• The null hypothesis is true

Then D has a distribution that approaches the χ2 distribution (with df = number of extra parameters in the alternative), given dataset {Xi}N:

The likelihood ratio is defined as follow. It is less or equal than 1 as the alternative is more flexible, and it leads to more probability densities given certain data:

We reject the null hypothesis if D statistic is a tail observation that either the null is not true or the null is true and something (type 1 error) unlikely happen. We reject the null when:

For example when df = 1, the critical value = 3.84, we will reject the null with 95% confidence when:

## Overfit

An overfit model makes worse forecast than a simpler model.

We assume the population data (X, Y) follows bivariate normal distribution:

Given ρ, the population regression line is:

The sample regression line is:

From a sample of 30 observations of (X, Y), ordinary least square (OLS) is performed to find the in-sample p-value for the coefficient and R2. MSE is used to evaluate forecast error.

• When ρ = 0.8, the sample regression line (yellow) is close to the population regression line (red): • When ρ = 0.2, the sample regression line does NOT match well. This shows that when the population has a week relationship (ρ = 0.2), estimates of slope are more dispersed.

Now we look at the relationship between statistically significance and MSE. The population Mean-Squared Error (MSE) is an out-of-sample measure of forecast errors. The population MSE does NOT depend on any in-sample data:

We can see that the population regression (b = ρ, a = 0) would minimize MSE, by taking partial derivatives. We can also see that higher the ρ, lower the MSE.

A regression is significant (at 95% confidence) if the p-value for the coefficient b is less than 0.05.

We have observed that when population has a weak relationship (ρ = 0.2):

• Forecasts by significant regressions tend to have greater MSE.
• Forecasts by regressions with higher R-square tend to have greater MSE.

This is because the strong relationship suggested by the regression does NOT forecast the week population relationship well.

When population has a strong relation (ρ = 0.8), however, the significant regression/high R-square holds out-of-sample.

# Conditional LGD Risk

## cLGD

The history of bond LGD shows that LGD is elevated when default rate is elevated. The elevation is shown to be moderate and similar across different debt types:

It is important to model LGD appropriately in different economic conditions. Like cDR, we define cLGD:

Note that:

There are two ways to calculate ELGD:

Futhermore,

Where:

• EcLGD is the average LGD over conditions
• ELGD is the average LGD over different loans
• ELGD is higher than EcLGD because when cLGD is higher, cDR/PD is also higher, which increase the probability weight on the higher cLGDs, while in EcLGD, higher cLGD does not have higher weight.

## Frye-Jacobs

Modeling cLGD separately from cDR introduces complexity and potential overfit to the cLoss model. Instead, the Frye-Jacobs LGD function assumes that both cDR and cLoss follow Vasicek distribution, and infers cLGD as a function of cDR.

Frey-Jacobs assumptions:

1. cDR and cLoss are comonotonic.

• If cDR goes up, cLoss must go up.
• If cDR is in its qth quantile, then cLoss must also be in its qth quantile. This implies that there is a cLGD function of cDR:

2. cDR follows Vasicek distribution, which stems from the simplest portfolio structure:

• Large number of Firms
• Each firm same PD
• Each pair-wise ρ the same (same PDJ)
• Gauss copulas
1. Distribution of cLoss does NOT depend of the definition of default.
\times This implies the distribution of cLoss does not have separate parameters for PD and ELGD. It does have a parameter EL.

2. cLoss follows Vasicek distribution

1. cLoss and cDR have the same ρ parameter.
\times This ensure that the LGD function is monotonic

Finally,

Observations:

1. cLGD is strictly monotonic with range (0, 1), for all k
1. cLGD increases slowly, and similarly for all k, at low cDR
2. Elasticity is greatest for loans wth low LGD.

## Frye-Jacobs: Develop Alternative Hypothesis

Introduce an additional sensitivity parameter to test the slope of the LGD function.

We know that:

In integration form:

Bring in the Frye-Jacobs cLGD function:

Note that EL is in both lhs and rhs, divide both EL by ELGDa:

Note that we have identified a new LGD function:

Analyzing the choice of a:

• When a = 0, the cLGD function is the Frye-Jacob formula.
• When a = 1, cLGD = ELGD, which implies cLGD does not depend on conditions:

## Frye-Jacobs: Hypothesis Test

We introduce finite portfolio, which brings randomness into the D’s and LGD^{dollar}s.

• We assume the finite portfolio is uniform and all N loans have the same PD and ρ
• We assume that given portfolio cDR, the number of defaults is binomial:
• We assume that LGD is normally distributed around cLGD, with σ = 0.2. Note under this assumption, ELGD = cLGD which correspond with a = 1.

Under finite portfolio, the probability of 0 defaults is:

When conditional on cDR and Σ D > 0, the average portfolio LGD rate is normal:

Let Y ~ N(0, 1) be a standard normal variable, then LGD becomes:

Now calculate Loss based on DR and LGD:

Use change-of-variable technique to calculate the pdf for Loss:

Where:

Finally, the pdf of loss conditional on Σ D and cDR:

Removing the conditional, the distribution of loss in a uniform portfolio, with N loans, same PD and ρ and the cLGD function, becomes:

Here is a plot of the the unconditional loss density in a finite (N = 10) portfolio in red and loss density in an infinite portfolio (Vasicek) in blue. (note that the plot use D to denote Σ D): Now we have the pdf for loss, we an test the hypothesis:

• H0: a = 0
• H1: a = MLE Based On Moody’s Loss data

As a result MLE(a) = 0.01 based on all loan data and the test failed to reject the null. Same with other bonds and bonds/loans data combination. We conclude that the Fyre-Jacob model is consistent with Moody’s data

# Vender Estimation

## Distance-To-Default and EDF

Robert Merton argues that:

• the default of firm i depends on its asset return
• Merton asserts that a firm defaults if and only if the value of its asset drops below the value of its liability, i.e. its asset return is too low
• joint default of firm i and j depends on PD and asset return correlation

Moody’s suggests that loan contains the option to default, and attempts to use risk-neutral probability to estimate the probability of default. In the context of a put:

Under Moody’s assumption, the firm has an option to default on its assets once it drops below its liability. Here, liability is the strike price, for which Moody’s uses D, or “default point”, to denote short term debt plus half of long term debt to represent liability. DD stands for Distance-To-Default, suggested by Merton. So the probability of default is:

Moody’s then estimates the value and volatility of the assets (unobservable) based on the value and volatility of the market capitalization (observable).

However, since Φ(-DD) gave very poor estimate for the default probability, Moody’s sets the EDF(Estimated Default Frequency) of a firm equal to the average historical default rate of firms with the same Distance-To-Default. An EDF uses DD to find historical analogs of current firms.

## Correlation

Merton assumes that the correlation ρ between the latent variable Z’s is equal to the asset return correlation r.

However, data suggests that correlation estimated from credit data is less than the correlation based on asset returns. Hence a credit portfolio model that uses asset correlation to estimate ρ overstates credit risk.

References:

• FINM-36702 Portfolio Management II, Jon Frye, University of Chicago