Usage Guide¶

In this sample piece, we will go through how to use the elliptical copulae as defined in the copulae package. We will not be covering what elliptical copula are.

We will use the residuals data from the package for this tutorial. The data is a historical realization of the unknown processes. Each column follows their distinct (and unknown) process. However, these processes are related (have a dependency structure) to one another. Our task is to learn the dependency structure so that we could simulate future events.

This example is essentially a stripped down case of the GARCH-Copula model, which is common in certain industries.

[1]:

from copulae.datasets import load_residuals

residuals = load_residuals()
residuals.head()

[1]:

	A	B	C	D	E	F	G
0	0.730967	0.530860	0.287320	1.193049	0.019040	1.100507	0.278214
1	2.067853	-1.181313	-2.546173	0.381538	-0.038734	0.269874	-0.603940
2	-2.181835	0.380326	0.928632	-0.316861	0.106473	-0.324854	-0.447824
3	0.445040	0.734531	-0.133299	-0.374091	0.173616	-0.319402	-0.775106
4	0.296363	3.024053	0.815791	1.168521	0.134044	1.110424	1.705190

We will use both the GaussianCopula and the StudentCopula. But let’s first start off with the Gaussian copula.

Gaussian Copula¶

An alias of the GaussianCopula is NormalCopula. We can use either as they both refer to the same underlying structure.

[2]:

from copulae import GaussianCopula

_, ndim = residuals.shape
g_cop = GaussianCopula(dim=ndim)  # initializing the copula
g_cop.fit(residuals)  # fit the copula to the data

[2]:

<copulae.elliptical.gaussian.GaussianCopula at 0x209346ba370>

Internally, the fit method will convert the data to pseudo observations so there is no need to for that sort of data treatment prior. However, even if your data is already in pseudo observations, there will be no change to the results as the transformation is monotonic in nature.

To understand the quality of the fit, you can use the summary method.

[3]:

g_cop.summary()

[3]:

Gaussian Copula Summary

Gaussian Copula with 7 dimensions

Parameters

Correlation Matrix

1.000000	0.191081	-0.365944	0.128203	0.128853	0.110536	0.309972
0.191081	1.000000	0.512683	-0.027041	-0.082239	-0.032020	0.207898
-0.365944	0.512683	1.000000	0.058284	-0.006467	0.055127	0.010648
0.128203	-0.027041	0.058284	1.000000	0.624116	0.936115	0.590101
0.128853	-0.082239	-0.006467	0.624116	1.000000	0.711072	0.416072
0.110536	-0.032020	0.055127	0.936115	0.711072	1.000000	0.562437
0.309972	0.207898	0.010648	0.590101	0.416072	0.562437	1.000000

Fit Summary

Fit Summary
Log Likelihood	-810.9309775286824
Variance Estimate	Not Implemented Yet
Method	Maximum pseudo-likelihood
Data Points	394

Optimization Setup		Results
bounds	[(-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001)]	x	[ 0.19108074 -0.36594392 0.12820349 0.12885289 0.11053555 0.30997234 0.51268315 -0.02704055 -0.08223887 -0.0320201 0.20789831 0.05828388 -0.00646736 0.0551271 0.01064824 0.62411583 0.93611501 0.59010122 0.71107239 0.41607171 0.56243697]
options	{'maxiter': 20000, 'ftol': 1e-06, 'iprint': 1, 'disp': False, 'eps': 1.5e-08}	fun	-810.9309775286824
method	SLSQP	jac	[-0.00162193 -0.00035622 -0.01109584 0.00124298 0.00200847 0.00139456 -0.00116718 0.00067454 -0.00859472 0.00973917 -0.00045475 0.00281185 0.00683637 -0.01135353 -0.00180383 0.020691 -0.0533646 -0.01230849 -0.02125186 -0.00400178 0.01490055]
None	None	nit	26
None	None	nfev	631
None	None	njev	26
None	None	status	0
None	None	message	Optimization terminated successfully
None	None	success	True

PDF, CDF and Random Variates¶

All the pdf, cdf and random methods of every copula work in the same manner. The only thing to note is that the input data for pdf and cdf must match the dimensions of the copula. In this case, we generate a 2x7 matrix, notice that the second dimension matches the dimension of the copula.

[4]:

import numpy as np

random_matrix = np.random.uniform(0, 1, size=(2, 7))

pdf = g_cop.pdf(random_matrix)  # length 2 ndarray
cdf = g_cop.cdf(random_matrix)  # length 2 ndarray
rv = g_cop.random(2)  # shape is 2 by 7.

Reading Parameters¶

All copulas are parameterized in their own ways. Archimedeans, for example, is parameterized by a single \(\theta\). For the Gaussian Copula, it is parameterized by the correlation matrix. To get the parameters for the copula, use the params property.

[5]:

g_cop.params

[5]:

array([ 0.19108074, -0.36594392,  0.12820349,  0.12885289,  0.11053555,
        0.30997234,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,
        0.20789831,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824,
        0.62411583,  0.93611501,  0.59010122,  0.71107239,  0.41607171,
        0.56243697])

In this case, we get a vector instead of a correlation matrix (even though I mentioned that Gaussian Copulas are parameterized by the correlation matrix!). The answer is simple, these numbers are actually the diagonal elements of the correlation matrix. After all, in a correlation matrix, only the elements in the diagonals are “unique”.

For elliptical copulas, to see the correlation matrix, use the sigma property.

[6]:

np.set_printoptions(linewidth=120)
g_cop.sigma

[6]:

array([[ 1.        ,  0.19108074, -0.36594392,  0.12820349,  0.12885289,  0.11053555,  0.30997234],
       [ 0.19108074,  1.        ,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,  0.20789831],
       [-0.36594392,  0.51268315,  1.        ,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824],
       [ 0.12820349, -0.02704055,  0.05828388,  1.        ,  0.62411583,  0.93611501,  0.59010122],
       [ 0.12885289, -0.08223887, -0.00646736,  0.62411583,  1.        ,  0.71107239,  0.41607171],
       [ 0.11053555, -0.0320201 ,  0.0551271 ,  0.93611501,  0.71107239,  1.        ,  0.56243697],
       [ 0.30997234,  0.20789831,  0.01064824,  0.59010122,  0.41607171,  0.56243697,  1.        ]])

Overwriting Parameters¶

The parameters are fit according to the empirical data. Many times, this is fine. However, there are instances where we want to overwrite the fitted parameters due to better understanding of the domain problem and any other reasons.

The basic way is to overwrite via the params property setter as seen in the example below.

cop.params = 123

However, for the elliptical copulas, we have a convenience function that makes it easier to work with correlation matrix.

To overwrite single elements:

[7]:

assert g_cop[0, 1] == g_cop[1, 0]
g_cop[0, 1] = 0.5
g_cop.sigma

[7]:

array([[ 1.        ,  0.5       , -0.36594392,  0.12820349,  0.12885289,  0.11053555,  0.30997234],
       [ 0.5       ,  1.        ,  0.51268315, -0.02704055, -0.08223887, -0.0320201 ,  0.20789831],
       [-0.36594392,  0.51268315,  1.        ,  0.05828388, -0.00646736,  0.0551271 ,  0.01064824],
       [ 0.12820349, -0.02704055,  0.05828388,  1.        ,  0.62411583,  0.93611501,  0.59010122],
       [ 0.12885289, -0.08223887, -0.00646736,  0.62411583,  1.        ,  0.71107239,  0.41607171],
       [ 0.11053555, -0.0320201 ,  0.0551271 ,  0.93611501,  0.71107239,  1.        ,  0.56243697],
       [ 0.30997234,  0.20789831,  0.01064824,  0.59010122,  0.41607171,  0.56243697,  1.        ]])

To overwrite an entire correlation matrix, follow the code snippet below:

my_corr_mat = # some correlation matrix
g_cop[:] = my_corr_mat  # this overwrites the entire correlation matrix

Behind the scenes, after overwriting the parameters, some transformations will be done to ensure that the resulting matrix remains positive semi-definite. If the matrix is already positive semi-definite, nothing will be done. But if it isn’t, there will be some shifts to ensure that the resulting matrix has the minimum difference from the original matrix whilst being positive semi-definite. Thus don’t be surprised if you change an element and notice that there are some bumps to the numbers.

Student Copula¶

An alias of the StudentCopula is TCopula. We can use either as they both refer to the same underlying structure.

[8]:

from copulae import StudentCopula

degrees_of_freedom = 5.5  # some random number, unnecessary to specify df but done for demonstration purposes
t_cop = StudentCopula(dim=ndim, df=degrees_of_freedom)
t_cop.fit(residuals)

[8]:

<copulae.elliptical.student.StudentCopula at 0x20934674250>

The Student Copula differs from the Gaussian Copula in that it has one additonal parameter, the degrees of freedom df. This can be seen from the summary.

[9]:

t_cop.summary()

[9]:

Student Copula Summary

Student Copula with 7 dimensions

Parameters

Degree of Freedom10.544336897837123
Correlation Matrix

1.000000	0.177448	-0.374353	0.092292	0.111153	0.071979	0.265017
0.177448	1.000000	0.525289	-0.055004	-0.077388	-0.065854	0.181237
-0.374353	0.525289	1.000000	0.057958	0.013242	0.060967	0.018374
0.092292	-0.055004	0.057958	1.000000	0.630141	0.939847	0.579627
0.111153	-0.077388	0.013242	0.630141	1.000000	0.716690	0.411565
0.071979	-0.065854	0.060967	0.939847	0.716690	1.000000	0.558953
0.265017	0.181237	0.018374	0.579627	0.411565	0.558953	1.000000

Fit Summary

Fit Summary
Log Likelihood	-838.7958878695256
Variance Estimate	Not Implemented Yet
Method	Maximum pseudo-likelihood
Data Points	394

Optimization Setup		Results
bounds	[(0.0, inf), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001), (-1.000001, 1.000001)]	x	[10.5443369 0.17744812 -0.3743528 0.09229151 0.11115283 0.07197855 0.2650173 0.52528912 -0.05500374 -0.07738755 -0.06585443 0.18123705 0.05795765 0.01324161 0.06096664 0.01837413 0.63014145 0.9398473 0.57962713 0.71668954 0.41156475 0.5589529 ]
options	{'maxiter': 20000, 'ftol': 1e-06, 'iprint': 1, 'disp': False, 'eps': 1.5e-08}	fun	-838.7958878695256
method	SLSQP	jac	[-0.00013642 0.00107624 -0.00327418 0.0056995 0.00177351 -0.00823093 0.00236469 0.00270575 -0.00523717 0.00107624 0.00525233 -0.00133393 0.01387737 0.00406241 -0.0178261 0.00187962 0.01274808 -0.02455636 -0.00270575 -0.01692418 -0.00370619 0.00285733]
None	None	nit	31
None	None	nfev	771
None	None	njev	31
None	None	status	0
None	None	message	Optimization terminated successfully
None	None	success	True

[10]:

t_cop.params

[10]:

StudentParams(df=10.544336897837123, rho=array([ 0.17744812, -0.3743528 ,  0.09229151,  0.11115283,  0.07197855,  0.2650173 ,  0.52528912, -0.05500374,
       -0.07738755, -0.06585443,  0.18123705,  0.05795765,  0.01324161,  0.06096664,  0.01837413,  0.63014145,
        0.9398473 ,  0.57962713,  0.71668954,  0.41156475,  0.5589529 ]))

The rest of the StudentCopula work in the same way as the GaussianCopula. The only thing to note is that to change the degrees of freedom, you use t_cop.df = 5.

Summary¶

That’s all folks. We’ve gone through a simple example on

How to fit a copula
How to get a summary of the fitted copula
How to get the PDF, CDF and Random Variates (these can be done even before fitting provided you set the parameters of the copula manually)
How to overwrite parameters of the copula

All the copulas pretty much follow a similar API so you probaby know about all of them already.

Examples Explainers