Usage Guide

In this sample piece, we will go through how to use the elliptical copulae as defined in the copulae package. We will not be covering what elliptical copula are.

We will use the residuals data from the package for this tutorial. The data is a historical realization of the unknown processes. Each column follows their distinct (and unknown) process. However, these processes are related (have a dependency structure) to one another. Our task is to learn the dependency structure so that we could simulate future events.

This example is essentially a stripped down case of the GARCH-Copula model, which is common in certain industries.

[1]:
from copulae.datasets import load_residuals

residuals = load_residuals()
residuals.head()
[1]:
A B C D E F G
0 0.730967 0.530860 0.287320 1.193049 0.019040 1.100507 0.278214
1 2.067853 -1.181313 -2.546173 0.381538 -0.038734 0.269874 -0.603940
2 -2.181835 0.380326 0.928632 -0.316861 0.106473 -0.324854 -0.447824
3 0.445040 0.734531 -0.133299 -0.374091 0.173616 -0.319402 -0.775106
4 0.296363 3.024053 0.815791 1.168521 0.134044 1.110424 1.705190

We will use both the GaussianCopula and the StudentCopula. But let’s first start off with the Gaussian copula.

Gaussian Copula

An alias of the GaussianCopula is NormalCopula. We can use either as they both refer to the same underlying structure.

[2]:
from copulae import GaussianCopula

_, ndim = residuals.shape
g_cop = GaussianCopula(dim=ndim)  # initializing the copula
g_cop.fit(residuals)  # fit the copula to the data
[2]:
<copulae.elliptical.gaussian.GaussianCopula at 0x209346ba370>

Internally, the fit method will convert the data to pseudo observations so there is no need to for that sort of data treatment prior. However, even if your data is already in pseudo observations, there will be no change to the results as the transformation is monotonic in nature.

To understand the quality of the fit, you can use the summary method.

[3]:
g_cop.summary()
[3]:

Gaussian Copula Summary

Gaussian Copula with 7 dimensions

Parameters

Correlation Matrix
1.000000 0.191081 -0.365944 0.128203 0.128853 0.110536 0.309972
0.191081 1.000000 0.512683 -0.027041 -0.082239 -0.032020 0.207898
-0.365944 0.512683 1.000000 0.058284 -0.006467 0.055127 0.010648
0.128203 -0.027041 0.058284 1.000000 0.624116 0.936115 0.590101
0.128853 -0.082239 -0.006467 0.624116 1.000000 0.711072 0.416072
0.110536 -0.032020 0.055127 0.936115 0.711072 1.000000 0.562437
0.309972 0.207898 0.010648 0.590101 0.416072 0.562437 1.000000

Fit Summary


Fit Summary
Log Likelihood-810.9309775286824
Variance EstimateNot Implemented Yet
MethodMaximum pseudo-likelihood
Data Points394

Optimization SetupResults
bounds[(-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001)]x[ 0.19108074 -0.36594392 0.12820349 0.12885289 0.11053555 0.30997234 0.51268315 -0.02704055 -0.08223887 -0.0320201 0.20789831 0.05828388 -0.00646736 0.0551271 0.01064824 0.62411583 0.93611501 0.59010122 0.71107239 0.41607171 0.56243697]
options{'maxiter': 20000, 'ftol': 1e-06, 'iprint': 1, 'disp': False, 'eps': 1.5e-08}fun-810.9309775286824
methodSLSQPjac[-0.00162193 -0.00035622 -0.01109584 0.00124298 0.00200847 0.00139456 -0.00116718 0.00067454 -0.00859472 0.00973917 -0.00045475 0.00281185 0.00683637 -0.01135353 -0.00180383 0.020691 -0.0533646 -0.01230849 -0.02125186 -0.00400178 0.01490055]
NoneNonenit26
NoneNonenfev631
NoneNonenjev26
NoneNonestatus0
NoneNonemessageOptimization terminated successfully
NoneNonesuccessTrue

PDF, CDF and Random Variates

All the pdf, cdf and random methods of every copula work in the same manner. The only thing to note is that the input data for pdf and cdf must match the dimensions of the copula. In this case, we generate a 2x7 matrix, notice that the second dimension matches the dimension of the copula.

[4]:
import numpy as np

random_matrix = np.random.uniform(0, 1, size=(2, 7))

pdf = g_cop.pdf(random_matrix)  # length 2 ndarray
cdf = g_cop.cdf(random_matrix)  # length 2 ndarray
rv = g_cop.random(2)  # shape is 2 by 7.

Reading Parameters

All copulas are parameterized in their own ways. Archimedeans, for example, is parameterized by a single \(\theta\). For the Gaussian Copula, it is parameterized by the correlation matrix. To get the parameters for the copula, use the params property.

[5]:
g_cop.params
[5]:
array([ 0.19108074, -0.36594392,  0.12820349,  0.12885289,  0.11053555,
        0.30997234,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,
        0.20789831,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824,
        0.62411583,  0.93611501,  0.59010122,  0.71107239,  0.41607171,
        0.56243697])

In this case, we get a vector instead of a correlation matrix (even though I mentioned that Gaussian Copulas are parameterized by the correlation matrix!). The answer is simple, these numbers are actually the diagonal elements of the correlation matrix. After all, in a correlation matrix, only the elements in the diagonals are “unique”.

For elliptical copulas, to see the correlation matrix, use the sigma property.

[6]:
np.set_printoptions(linewidth=120)
g_cop.sigma
[6]:
array([[ 1.        ,  0.19108074, -0.36594392,  0.12820349,  0.12885289,  0.11053555,  0.30997234],
       [ 0.19108074,  1.        ,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,  0.20789831],
       [-0.36594392,  0.51268315,  1.        ,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824],
       [ 0.12820349, -0.02704055,  0.05828388,  1.        ,  0.62411583,  0.93611501,  0.59010122],
       [ 0.12885289, -0.08223887, -0.00646736,  0.62411583,  1.        ,  0.71107239,  0.41607171],
       [ 0.11053555, -0.0320201 ,  0.0551271 ,  0.93611501,  0.71107239,  1.        ,  0.56243697],
       [ 0.30997234,  0.20789831,  0.01064824,  0.59010122,  0.41607171,  0.56243697,  1.        ]])

Overwriting Parameters

The parameters are fit according to the empirical data. Many times, this is fine. However, there are instances where we want to overwrite the fitted parameters due to better understanding of the domain problem and any other reasons.

The basic way is to overwrite via the params property setter as seen in the example below.

cop.params = 123

However, for the elliptical copulas, we have a convenience function that makes it easier to work with correlation matrix.

To overwrite single elements:

[7]:
assert g_cop[0, 1] == g_cop[1, 0]
g_cop[0, 1] = 0.5
g_cop.sigma
[7]:
array([[ 1.        ,  0.5       , -0.36594392,  0.12820349,  0.12885289,  0.11053555,  0.30997234],
       [ 0.5       ,  1.        ,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,  0.20789831],
       [-0.36594392,  0.51268315,  1.        ,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824],
       [ 0.12820349, -0.02704055,  0.05828388,  1.        ,  0.62411583,  0.93611501,  0.59010122],
       [ 0.12885289, -0.08223887, -0.00646736,  0.62411583,  1.        ,  0.71107239,  0.41607171],
       [ 0.11053555, -0.0320201 ,  0.0551271 ,  0.93611501,  0.71107239,  1.        ,  0.56243697],
       [ 0.30997234,  0.20789831,  0.01064824,  0.59010122,  0.41607171,  0.56243697,  1.        ]])

To overwrite an entire correlation matrix, follow the code snippet below:

my_corr_mat = # some correlation matrix
g_cop[:] = my_corr_mat  # this overwrites the entire correlation matrix

Behind the scenes, after overwriting the parameters, some transformations will be done to ensure that the resulting matrix remains positive semi-definite. If the matrix is already positive semi-definite, nothing will be done. But if it isn’t, there will be some shifts to ensure that the resulting matrix has the minimum difference from the original matrix whilst being positive semi-definite. Thus don’t be surprised if you change an element and notice that there are some bumps to the numbers.

Student Copula

An alias of the StudentCopula is TCopula. We can use either as they both refer to the same underlying structure.

[8]:
from copulae import StudentCopula

degrees_of_freedom = 5.5  # some random number, unnecessary to specify df but done for demonstration purposes
t_cop = StudentCopula(dim=ndim, df=degrees_of_freedom)
t_cop.fit(residuals)
[8]:
<copulae.elliptical.student.StudentCopula at 0x20934674250>

The Student Copula differs from the Gaussian Copula in that it has one additonal parameter, the degrees of freedom df. This can be seen from the summary.

[9]:
t_cop.summary()
[9]:

Student Copula Summary

Student Copula with 7 dimensions

Parameters

Degree of Freedom10.544336897837123
Correlation Matrix
1.000000 0.177448 -0.374353 0.092292 0.111153 0.071979 0.265017
0.177448 1.000000 0.525289 -0.055004 -0.077388 -0.065854 0.181237
-0.374353 0.525289 1.000000 0.057958 0.013242 0.060967 0.018374
0.092292 -0.055004 0.057958 1.000000 0.630141 0.939847 0.579627
0.111153 -0.077388 0.013242 0.630141 1.000000 0.716690 0.411565
0.071979 -0.065854 0.060967 0.939847 0.716690 1.000000 0.558953
0.265017 0.181237 0.018374 0.579627 0.411565 0.558953 1.000000

Fit Summary


Fit Summary
Log Likelihood-838.7958878695256
Variance EstimateNot Implemented Yet
MethodMaximum pseudo-likelihood
Data Points394

Optimization SetupResults
bounds[(0.0, inf), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001)]x[10.5443369 0.17744812 -0.3743528 0.09229151 0.11115283 0.07197855 0.2650173 0.52528912 -0.05500374 -0.07738755 -0.06585443 0.18123705 0.05795765 0.01324161 0.06096664 0.01837413 0.63014145 0.9398473 0.57962713 0.71668954 0.41156475 0.5589529 ]
options{'maxiter': 20000, 'ftol': 1e-06, 'iprint': 1, 'disp': False, 'eps': 1.5e-08}fun-838.7958878695256
methodSLSQPjac[-0.00013642 0.00107624 -0.00327418 0.0056995 0.00177351 -0.00823093 0.00236469 0.00270575 -0.00523717 0.00107624 0.00525233 -0.00133393 0.01387737 0.00406241 -0.0178261 0.00187962 0.01274808 -0.02455636 -0.00270575 -0.01692418 -0.00370619 0.00285733]
NoneNonenit31
NoneNonenfev771
NoneNonenjev31
NoneNonestatus0
NoneNonemessageOptimization terminated successfully
NoneNonesuccessTrue
[10]:
t_cop.params
[10]:
StudentParams(df=10.544336897837123, rho=array([ 0.17744812, -0.3743528 ,  0.09229151,  0.11115283,  0.07197855,  0.2650173 ,  0.52528912, -0.05500374,
       -0.07738755, -0.06585443,  0.18123705,  0.05795765,  0.01324161,  0.06096664,  0.01837413,  0.63014145,
        0.9398473 ,  0.57962713,  0.71668954,  0.41156475,  0.5589529 ]))

The rest of the StudentCopula work in the same way as the GaussianCopula. The only thing to note is that to change the degrees of freedom, you use t_cop.df = 5.

Summary

That’s all folks. We’ve gone through a simple example on

  1. How to fit a copula

  2. How to get a summary of the fitted copula

  3. How to get the PDF, CDF and Random Variates (these can be done even before fitting provided you set the parameters of the copula manually)

  4. How to overwrite parameters of the copula

All the copulas pretty much follow a similar API so you probaby know about all of them already.