# Notes on Credit Risk

# Standard Simulation Model on Credit Portfolio

## Credit Risk

Lenders, such as banks, are subject to many kinds of risks. among which credit risk is the most likely to cause bank failure.

- Credit risk
- Market risk
- Operation risk
- Reputation risk

Each loan is part of a legal agreement that requires the borrower to pay interest and repay principle on schedule, while some borrowers are required to obey specified `covenants`

, such as maintaining earning above a certain threshold.

If the borrower fails to follow the agreement, the lender holds the borrower to be in default, which can be `money default`

or `covenant default`

. Purchaser of public bonds only experiences money default.

At default, the loan agreement calls for fee to be paid by the borrower, gives the bank power to seize collateral (for `secured loans`

), and has a `cross default`

provision (where all loans are in default once one loan is in default).

In the 20th century, most banks did not define default until they discovered a model that could help them manage credit risk.

## Rating Agencies

There are 3 major `Nationally Recognized Statistical Rating Organizations`

(NRSRO) to which firms pay to rate their bonds to increase liquidity.

- Standard & Poor
- Moody’s
- Fitch

Under S&P ratings, the grades are:

- Investment grade: AAA, AA, A, BBB
- Non-investment grade: BB, B, CCC, CC
- Selectively defaulted: SD
- Defaulted: D

## D and PD

Let `D`

be the default indicator of a loan, taking only two values: 0 and 1. `PD`

is the probability of default annually.

By mathematical identity:

- Knowing PD, we can simulate D by a Bernoulli Distribution with parameter as PD.
- Given data on D, we can calculate the implied PD.

In a portfolio of N firms, the portfolio default rate, DR, equals:

## Exposure, Recovery and LGD

`Exposure`

is the amount that is owed to the borrowers. `Recovery`

is measured in either of two ways:

- Market price of the loan at the time of default
- Discounted future cash flows back to the time of default

`LGD`

(Loss Given Defaults) is a random variable with values usually between 0 and 1:

For a defaulted loan, there are two ways to measure recovery/LGD. For a current loan, there is a distribution for LGD. The expectation is written as:

US investment grade bond LGD is about 0.20%, while non-investment grade is about 3.60%. Bank loans are almost alwasy senior to bonds and have lower LGD.

## Loss and EL

`Loss`

is measured as a fraction of exposure:

`EL`

is the expected loss. Because D and LGD are indepndent, so:

Lenders often need to estimate and include `EL`

in the spread they charged.

## Change Of Variable

Note the LGD is often measured in fractions. To change the measure to dollar amount, we need to use the Chain Rule.

Given the pdf of LGD:

We define the function g such that:

Hence the function g-inverse is:

The partial derivative can be expressed as:

By definition:

Taking derivative on both sides and with chain rule:

Finally:

## Simulate Portfolio Loss On One Single Loan

We know that:

To simulate loss, we first simulate D:

1 | Draw x ~ Uniform[0, 1] |

Then simulate LGD based on the pdf of LGD. Multiple each D and LGD to get Loss. Repeat the process to produce a distribution of Loss.

## Simulate Portfolio Loss On N Independent Loan

Assume the default of each of the N loan is independent and have the same probability of default, PD:

Then the total number of defaults follows binomial distribution:

However, based on historically data, the variance is much higher than that of the binomial distribution. Hence default correltion needs to be introduced.

## Simulate Portfolio Loss On N Correlated Loan

Assume that there is a latent unobserved variable z_{i} that is responsible for the default of firm i, i.e. firm i defaults if:

Assume any two firms i and j are jointly normal. Denote the correlation between z_{i} and z_{j}:

Let r_{i, j} be the correlation between asset return of firm i and j, we know that almost certainly:

Denote `PDJ`

as the probability that both firm i and j default:

To calculate PDJ with python:

1 | import numpy as np |

Returns:

1 | Pr[D1=1, D2=1]: 0.0515 |

Now that we have the D_{i}, we can simulate portfolio loss rate, given the LGD distribution and exposures for each firm.

Denote `Dcorr`

to be the correlation between D_{i} and D_{j}:

Note that holding PD_{i}, PD_{j} fixed:

- greater
`Dcorr`

=> greater`PDJ`

- greater ρ => greater
`PDJ`

- ρ between -1 and 1 => PDJ between 0 and min[PD
_{i}, PD_{j}]

- ρ between -1 and 1 => PDJ between 0 and min[PD

## Copula

When we model more than three firms, pair-wise correlation is not enough to determine the entire distribution of outcomes. For example, there are N PD’s and N(N-1)/2 pair-wise correlations while we want to calculate 2^{N} outcomes. Hence we introduce the `Gauss copula`

which helps describe the group-wise correlations.

Consider a set of multivariate normals:

The quantiles of the set are uniformly distributed by definition:

The `copula`

of the set (Z_{1}, Z_{2}, …, Z_{N}) is defined as the joint cumulative distribution function of (Φ(Z_{1}), Φ(Z_{2}), …, Φ(Z_{N})):

The `Gauss copula`

is as follow. Note that among all possible copula, the Central Limit Theorem defines and supports the Gauss copula:

In fact, the copula does not contain any information on the marginal distribution. Here we set the marginal distribution F_{Z} to follow standard normal only as an example, but it can be anything continuous such that:

And so:

In the context of default modeling, we assume that each company’s default follows Bernoulli and simulate with standard normal distribution:

The probability of all firms default at the same time is by definition:

Note that given a pair-wise correlation matrix Σ, this probability can take any values between 0 and the lowest single firm default probability.

Now we assume all firms’z are connected by the `Gauss copula`

, which suggests a single value for the probability of all defaulting.

With python we can either numerically evaluate the integral or use simulation to calculate the probability that all firms default at the same time.

1 | import numpy as np |

Returns:

1 | Probability Of All Default: 0.017 |

Note that the compared to the other copulas, the Gauss copula requires only a pair-wise correlation matrix and the PD to tell a lot of information. Most of the times the Gauss copula has not been shown invalid, while the calibration of the marginals and correlation matrix are often proved erroneous.

## Simulate Rating Transitions

The default model only has two states, 0 and 1:

To simulate rating transitions, we require two matrix:

- Transition Matrix: $$$$P[i \rightarrow j], \forall i, j$$$$
- Cost Matrix, e.g. the loss due to deterioration of borrowers: $$$$cost[i \rightarrow j], \forall i, j$$$$

# Factor Model

## Single Factor Model

We construct the `single risk factor`

model with latent variable Z_{i}:

The pair-wise correlation between two firms i and j’s latent variables is:

Where:

- Z and X
_{i}are Independent - Z is the
`systematic factor`

that affects all firms. If Z increase, all Z_{i}decrease and become more likely to default. Z summarizes the effects of all observable macroeconomic factors plus the effects of unobservable factors. - X
_{i}is the`idiosyncatic factor`

that affects only firm i’s latent variable - Z
_{i}~ N(0, 1) by construction - {Z
_{i}} are jointly normal and connected by a`Gauss copula`

## cDR and Vasicek

Define `Conditional (Expected) Default Rate`

(cDR) as:

This gives the final form of cDR, which is called the `Vasicek`

formula, named after Oldrich Vasicek. Note that the Vasicek formula is monotonic in z and in PD, i.e., higher the z/PD, higher the cDR.

The expected default rate for firm i is always PD_{i}, since:

However, when Z is known, the expected default rate is cDR_{i}. Firms are now uncorrelated as Z is known:

If there are large numbers of identical firms with uniform PD and ρ, the default rate of such asymptotic portfolio follows the unconditional `Vasicek distribution`

.

The unconditional `Vasicek pdf`

can be derived with change-of-variable technique. Note that we eliminate z and the pdf only has parameter PD and ρ:

The mean of cDR is PD:

## Multi-factor Model

Suppose that there are two jointly normal systematic risk factors ψ and ω, and that there are two group of firms depending on each of the factors:

Between the two groups:

Note that:

- If corr[ψ, ω] = 1, this becomes the single factor model and that:

- If corr[ψ, ω] < 1, the cross-correlations are less than that in the single factor case. It is called
`diversification`

. - With multi-factor model, risk becomes
`sub-additive`

, as oppose to`additive`

in the single factor models. This means that the risk in the portfolio is less than the sum of the cDRs’. - The
`Moody's Factor Model`

attribute each Z_{i}to about 250 factors, along with a firm-specific idiosyncratic factor.

## Basel II Capital formula

The `Bank For International Settlements`

is in Basel, Switzerland. The `Basel Committee on Bank Supervision`

drafted legislation requiring banks to hold minimum capital, e.g. Basel II, Basel III, etc.

The `Basel II`

formula is an `Asymptotic Single Risk Factor`

model, where the portfolio is large enough for the Law of Large Number to work and it generalizes the Vasicek Distribution and include a diverse choice of PD and ρ within the portfolio. The core of the capital requirement for `credit capital`

is the inverse CDF of Vasicek Distribution.

Inverse Vasicek (with parameter PD and ρ):

Note:

- K is the capital requirement per dollar of wholesale loan.
- LGD is the average LGD in historical downturn conditions
- R (correlation) = 0.12 + 0.12 x exp(-50 x PD)
- b = [ 0.11852 - 0.05478 x log (PD) ]
^{2} - M is maturity

Making sense of the Basel II formula:

- Capital requirement is for
`loss`

, as oppose to only default, hence the formula multiplies by LGD. - Capital requirement is for
`unexpected loss`

, hence the formula subtracted the expected loss LGD X PD. The`expected`

portion is handled by bank reserves. - Loans might deteriorate without defaulting, hence a
`maturity adjustment`

is added to impose higher capital for longer maturity loan. - The estimation of PD and LGD is performed by the banks and supervised by bank supervisor.

# Estimation, Statistical Test and Overfit

## Estimating PD

Firms differ widely in their credit quality, and PD tend to change over time as well. So a firm’s PD is neither known or fixed. We analyze analogous firms with `identical credit ratings`

to estimate PD.

Method 1, for all A-rated firms in the dataset:

Method 2, for all A-rated firms in the dataset:

Method 3, estimate PD as a parameter in a pdf describing A-rated firms. This tries to find a distribution that best fits the data. We will focus on this method.

## Method Of Moments

Given a dataset {X_{i}}_{N}, we set the moments of the Vasicek distribution equal to the moments of the data.

First moment:

Second moment (unbiased, using N-1 in denominator):

Note:

- The method of moment matches the broad features of distribution with the data
- The solution is not unique. Choices can be made between central moment/raw moment, lower moment/higher moment.
- By Jensen’s Inequality, functions of moments are not moments of functions

## Maximum Likelihood Estimation

The MLE method chooses parameter values that make the data most likely under the assumed distribution. MLE matches the distribution to the data `as a whole`

, as oppose to M.o.M. which only matches the `moments`

. The MLE fits the `PDF`

to the `dataset`

.

When data is not highly dispersed, however, the MLE estimate tend to be close to the M.o.M. estimate.

The MLE method is biased estimate that choose parameters that maximize the `likelihood function`

. Given a dataset {X_{i}}_{N}, we assume the true default rates follow Vasicek distribution. The likelihood function is:

Often we try to maximize the log-likelihood function, i.e. find PD and ρ such that:

## Hypothesis Testing & Wilks’ Theorem

We does not assert truth, as truth is often unknown. With a given set of data, we can only assert some models are `better`

in predicting the future behavior of similar data.

We called the simpler model the `null hypothesis`

, the more complicated ones the `alternative hypothesis`

. The null generally nests under the alternative, i.e. the alternative becomes the null when some parameters are set to certain values.

We prefer the null, because it is `simpler`

, and by doing so we avoid `Type 1 error`

, which is the rejection of a true null.

Hence we only reject the null if the alternative fits the data `significantly better`

through a statistical test.

`Wilks Theorem`

asserts that if:

- There is an asymptotic amount of data
- The null hypothesis is true

Then `D`

has a distribution that approaches the χ^{2} distribution (with df = number of extra parameters in the alternative), given dataset {X_{i}}_{N}:

The likelihood ratio is defined as follow. It is less or equal than 1 as the alternative is more flexible, and it leads to more probability densities given certain data:

We reject the null hypothesis if D statistic is a tail observation that either the null is not true or the null is true and something (type 1 error) unlikely happen. We reject the null when:

For example when df = 1, the critical value = 3.84, we will reject the null with 95% confidence when:

## Overfit

An `overfit`

model makes worse forecast than a simpler model.

We assume the population data (X, Y) follows bivariate normal distribution:

Given ρ, the population regression line is:

The sample regression line is:

From a sample of 30 observations of (X, Y), `ordinary least square`

(OLS) is performed to find the in-sample p-value for the coefficient and R^{2}. MSE is used to evaluate forecast error.

- When ρ = 0.8, the sample regression line (yellow) is close to the population regression line (red):

- When ρ = 0.2, the sample regression line does NOT match well.

This shows that when the population has a week relationship (ρ = 0.2), estimates of slope are more dispersed.

Now we look at the relationship between statistically significance and MSE. The population `Mean-Squared Error`

(MSE) is an `out-of-sample`

measure of forecast errors. The population MSE does NOT depend on any in-sample data:

We can see that the population regression (b = ρ, a = 0) would minimize MSE, by taking partial derivatives. We can also see that higher the ρ, lower the MSE.

A regression is significant (at 95% confidence) if the p-value for the coefficient b is less than 0.05.

We have observed that when population has a `weak relationship`

(ρ = 0.2):

- Forecasts by
`significant regressions`

tend to have`greater`

MSE. - Forecasts by
`regressions with higher R-square`

tend to have`greater`

MSE.

This is because the strong relationship suggested by the regression does NOT forecast the week population relationship well.

When population has a `strong relation`

(ρ = 0.8), however, the significant regression/high R-square holds out-of-sample.

# Conditional LGD Risk

## cLGD

The history of `bond`

LGD shows that LGD is elevated when default rate is elevated. The elevation is shown to be moderate and similar across different debt types:

It is important to model LGD appropriately in different economic conditions. Like cDR, we define `cLGD`

:

Note that:

There are two ways to calculate ELGD:

Futhermore,

Where:

- EcLGD is the average LGD over conditions
- ELGD is the average LGD over different loans
`ELGD is higher than EcLGD`

because when cLGD is higher, cDR/PD is also higher, which increase the probability weight on the higher cLGDs, while in EcLGD, higher cLGD does not have higher weight.

## Frye-Jacobs

Modeling cLGD separately from cDR introduces complexity and potential overfit to the cLoss model. Instead, the `Frye-Jacobs`

LGD function assumes that both cDR and cLoss follow Vasicek distribution, and infers cLGD as a function of cDR.

Frey-Jacobs assumptions:

cDR and cLoss are

`comonotonic`

.- If cDR goes up, cLoss must go up.
If cDR is in its q

^{th}quantile, then cLoss must also be in its q^{th}quantile. This implies that there is a cLGD function of cDR:

`cDR`

follows Vasicek distribution, which stems from the simplest portfolio structure:- Large number of Firms
- Each firm same PD
- Each pair-wise ρ the same (same PDJ)
- Gauss copulas

Distribution of

`cLoss`

does NOT depend of the definition of default.

\times This implies the distribution of cLoss`does not`

have separate parameters for PD and ELGD. It`does`

have a parameter EL.`cLoss`

follows Vasicek distribution

`cLoss`

and`cDR`

have the same ρ parameter.

\times This ensure that the LGD function is`monotonic`

Finally,

Observations:

- cLGD is strictly monotonic with range (0, 1), for all k

- cLGD increases slowly, and similarly for all k, at low cDR
- Elasticity is greatest for loans wth low LGD.

## Frye-Jacobs: Develop Alternative Hypothesis

Introduce an additional sensitivity parameter to test the `slope`

of the LGD function.

We know that:

In integration form:

Bring in the Frye-Jacobs cLGD function:

Note that EL is in both lhs and rhs, divide both EL by ELGD^{a}:

Note that we have identified a new LGD function:

Analyzing the choice of a:

- When a = 0, the cLGD function is the Frye-Jacob formula.
- When a = 1, cLGD = ELGD, which implies cLGD does not depend on conditions:

## Frye-Jacobs: Hypothesis Test

We introduce `finite portfolio`

, which brings randomness into the D’s and LGD^{dollar}s.

- We assume the finite portfolio is uniform and all N loans have the same PD and ρ
- We assume that given portfolio cDR, the number of defaults is binomial:

- We assume that LGD is normally distributed around cLGD, with σ = 0.2. Note under this assumption, ELGD = cLGD which correspond with a = 1.

Under finite portfolio, the probability of 0 defaults is:

When conditional on cDR and Σ D > 0, the `average portfolio LGD rate`

is normal:

Let Y ~ N(0, 1) be a standard normal variable, then LGD becomes:

Now calculate Loss based on DR and LGD:

Use change-of-variable technique to calculate the pdf for Loss:

Where:

Finally, the pdf of loss conditional on Σ D and cDR:

Removing the conditional, the distribution of loss in a uniform portfolio, with N loans, same PD and ρ and the cLGD function, becomes:

Here is a plot of the the unconditional loss density in a finite (N = 10) portfolio in red and loss density in an infinite portfolio (Vasicek) in blue. (note that the plot use D to denote Σ D):

Now we have the pdf for loss, we an test the hypothesis:

- H
_{0}: a = 0 - H
_{1}: a = MLE Based On Moody’s Loss data

As a result MLE(a) = 0.01 based on all loan data and the test failed to reject the null. Same with other bonds and bonds/loans data combination. We conclude that the Fyre-Jacob model is consistent with Moody’s data

# Vender Estimation

## Distance-To-Default and EDF

Robert Merton argues that:

- the default of firm i depends on its asset return
- Merton asserts that a firm defaults if and only if the value of its asset drops below the value of its liability, i.e. its asset return is too low

- joint default of firm i and j depends on PD and asset return correlation

Moody’s suggests that loan contains the option to default, and attempts to use risk-neutral probability to estimate the probability of default. In the context of a put:

Under Moody’s assumption, the firm has an option to default on its assets once it drops below its liability. Here, liability is the strike price, for which Moody’s uses `D`

, or “default point”, to denote short term debt plus half of long term debt to represent liability. `DD`

stands for `Distance-To-Default`

, suggested by Merton. So the probability of default is:

Moody’s then estimates the value and volatility of the assets (unobservable) based on the value and volatility of the market capitalization (observable).

However, since Φ(-DD) gave very poor estimate for the default probability, Moody’s sets the `EDF`

(Estimated Default Frequency) of a firm equal to the `average historical default rate`

of firms with the same `Distance-To-Default`

. An EDF uses DD to find historical analogs of current firms.

## Correlation

Merton assumes that the correlation ρ between the latent variable Z’s is equal to the asset return correlation r.

However, data suggests that correlation estimated from credit data is `less`

than the correlation based on asset returns. Hence a credit portfolio model that uses asset correlation to estimate ρ overstates credit risk.

References:

- FINM-36702 Portfolio Management II, Jon Frye, University of Chicago