In this sample piece, we will go through how to use the elliptical copulae as defined in the copulae package. We will not be covering what elliptical copula are.
copulae
We will use the residuals data from the package for this tutorial. The data is a historical realization of the unknown processes. Each column follows their distinct (and unknown) process. However, these processes are related (have a dependency structure) to one another. Our task is to learn the dependency structure so that we could simulate future events.
This example is essentially a stripped down case of the GARCH-Copula model, which is common in certain industries.
[1]:
from copulae.datasets import load_residuals residuals = load_residuals() residuals.head()
We will use both the GaussianCopula and the StudentCopula. But let’s first start off with the Gaussian copula.
GaussianCopula
StudentCopula
An alias of the GaussianCopula is NormalCopula. We can use either as they both refer to the same underlying structure.
NormalCopula
[2]:
from copulae import GaussianCopula _, ndim = residuals.shape g_cop = GaussianCopula(dim=ndim) # initializing the copula g_cop.fit(residuals) # fit the copula to the data
<copulae.elliptical.gaussian.GaussianCopula at 0x209346ba370>
Internally, the fit method will convert the data to pseudo observations so there is no need to for that sort of data treatment prior. However, even if your data is already in pseudo observations, there will be no change to the results as the transformation is monotonic in nature.
fit
To understand the quality of the fit, you can use the summary method.
summary
[3]:
g_cop.summary()
All the pdf, cdf and random methods of every copula work in the same manner. The only thing to note is that the input data for pdf and cdf must match the dimensions of the copula. In this case, we generate a 2x7 matrix, notice that the second dimension matches the dimension of the copula.
pdf
cdf
random
[4]:
import numpy as np random_matrix = np.random.uniform(0, 1, size=(2, 7)) pdf = g_cop.pdf(random_matrix) # length 2 ndarray cdf = g_cop.cdf(random_matrix) # length 2 ndarray rv = g_cop.random(2) # shape is 2 by 7.
All copulas are parameterized in their own ways. Archimedeans, for example, is parameterized by a single \(\theta\). For the Gaussian Copula, it is parameterized by the correlation matrix. To get the parameters for the copula, use the params property.
params
[5]:
g_cop.params
array([ 0.19108074, -0.36594392, 0.12820349, 0.12885289, 0.11053555, 0.30997234, 0.51268315, -0.02704055, -0.08223887, -0.0320201 , 0.20789831, 0.05828388, -0.00646736, 0.0551271 , 0.01064824, 0.62411583, 0.93611501, 0.59010122, 0.71107239, 0.41607171, 0.56243697])
In this case, we get a vector instead of a correlation matrix (even though I mentioned that Gaussian Copulas are parameterized by the correlation matrix!). The answer is simple, these numbers are actually the diagonal elements of the correlation matrix. After all, in a correlation matrix, only the elements in the diagonals are “unique”.
For elliptical copulas, to see the correlation matrix, use the sigma property.
sigma
[6]:
np.set_printoptions(linewidth=120) g_cop.sigma
array([[ 1. , 0.19108074, -0.36594392, 0.12820349, 0.12885289, 0.11053555, 0.30997234], [ 0.19108074, 1. , 0.51268315, -0.02704055, -0.08223887, -0.0320201 , 0.20789831], [-0.36594392, 0.51268315, 1. , 0.05828388, -0.00646736, 0.0551271 , 0.01064824], [ 0.12820349, -0.02704055, 0.05828388, 1. , 0.62411583, 0.93611501, 0.59010122], [ 0.12885289, -0.08223887, -0.00646736, 0.62411583, 1. , 0.71107239, 0.41607171], [ 0.11053555, -0.0320201 , 0.0551271 , 0.93611501, 0.71107239, 1. , 0.56243697], [ 0.30997234, 0.20789831, 0.01064824, 0.59010122, 0.41607171, 0.56243697, 1. ]])
The parameters are fit according to the empirical data. Many times, this is fine. However, there are instances where we want to overwrite the fitted parameters due to better understanding of the domain problem and any other reasons.
The basic way is to overwrite via the params property setter as seen in the example below.
cop.params = 123
However, for the elliptical copulas, we have a convenience function that makes it easier to work with correlation matrix.
To overwrite single elements:
[7]:
assert g_cop[0, 1] == g_cop[1, 0] g_cop[0, 1] = 0.5 g_cop.sigma
array([[ 1. , 0.5 , -0.36594392, 0.12820349, 0.12885289, 0.11053555, 0.30997234], [ 0.5 , 1. , 0.51268315, -0.02704055, -0.08223887, -0.0320201 , 0.20789831], [-0.36594392, 0.51268315, 1. , 0.05828388, -0.00646736, 0.0551271 , 0.01064824], [ 0.12820349, -0.02704055, 0.05828388, 1. , 0.62411583, 0.93611501, 0.59010122], [ 0.12885289, -0.08223887, -0.00646736, 0.62411583, 1. , 0.71107239, 0.41607171], [ 0.11053555, -0.0320201 , 0.0551271 , 0.93611501, 0.71107239, 1. , 0.56243697], [ 0.30997234, 0.20789831, 0.01064824, 0.59010122, 0.41607171, 0.56243697, 1. ]])
To overwrite an entire correlation matrix, follow the code snippet below:
my_corr_mat = # some correlation matrix g_cop[:] = my_corr_mat # this overwrites the entire correlation matrix
Behind the scenes, after overwriting the parameters, some transformations will be done to ensure that the resulting matrix remains positive semi-definite. If the matrix is already positive semi-definite, nothing will be done. But if it isn’t, there will be some shifts to ensure that the resulting matrix has the minimum difference from the original matrix whilst being positive semi-definite. Thus don’t be surprised if you change an element and notice that there are some bumps to the numbers.
An alias of the StudentCopula is TCopula. We can use either as they both refer to the same underlying structure.
TCopula
[8]:
from copulae import StudentCopula degrees_of_freedom = 5.5 # some random number, unnecessary to specify df but done for demonstration purposes t_cop = StudentCopula(dim=ndim, df=degrees_of_freedom) t_cop.fit(residuals)
<copulae.elliptical.student.StudentCopula at 0x20934674250>
The Student Copula differs from the Gaussian Copula in that it has one additonal parameter, the degrees of freedom df. This can be seen from the summary.
df
[9]:
t_cop.summary()
[10]:
t_cop.params
StudentParams(df=10.544336897837123, rho=array([ 0.17744812, -0.3743528 , 0.09229151, 0.11115283, 0.07197855, 0.2650173 , 0.52528912, -0.05500374, -0.07738755, -0.06585443, 0.18123705, 0.05795765, 0.01324161, 0.06096664, 0.01837413, 0.63014145, 0.9398473 , 0.57962713, 0.71668954, 0.41156475, 0.5589529 ]))
The rest of the StudentCopula work in the same way as the GaussianCopula. The only thing to note is that to change the degrees of freedom, you use t_cop.df = 5.
t_cop.df = 5
That’s all folks. We’ve gone through a simple example on
How to fit a copula
How to get a summary of the fitted copula
How to get the PDF, CDF and Random Variates (these can be done even before fitting provided you set the parameters of the copula manually)
How to overwrite parameters of the copula
All the copulas pretty much follow a similar API so you probaby know about all of them already.