Gaussian Mixture Copula¶

class
copulae.mixtures.gmc.gmc.
GaussianMixtureCopula
(n_clusters, ndim, param=None)[source]¶ The Gaussian Mixture Copula (GMC).
A Gaussian Copula has many normal marginal densities bound together by a single multivariate and unimodel Gaussian density. However, if a dataset has multiple modes (peaks) with different dependence structure, the applicability of the Gaussian Copula gets severely limited. A Gaussian Mixture Copula on the other hand allows modeling of data with many modes (peaks).
The GMC’s dependence structure is obtained from a Gaussian Mixture Model (GMM). For a GMC with \(M\) components and \(d\) dimensions, the density (PDF) is given by
\[ \begin{align}\begin{aligned}\phi &= \sum_i^M w_i \phi_i (x_1, x_2, \dots, x_d; \theta_i)\\&\text{s.t.}\\&\sum^M_i w_i = 1\\&0 \leq w_i \leq 1 \quad \forall i \in [1, \dots, M]\\&\text{where}\\w_i &= \text{weight of the marginal density}\\\phi_i &= \text{marginal density}\\\theta_i &= \text{parameters of the Gaussian marginal}\end{aligned}\end{align} \]The Gaussian Mixture Copula is thus given by
\[ \begin{align}\begin{aligned}C(u_1, u_2, \dots, u_d; \Theta) &= \frac{\phi(\Phi_1^{1}(u_1), \Phi_2^{1}(u_2), \dots, \Phi_d^{1}(u_d); \Theta} {\phi_1(\Phi_1^{1}(u_1)) \cdot \phi_2(\Phi_2^{1}(u_2)) \cdots \phi_d(\Phi_1^{1}(u_d))}\\& \text{where}\\\Phi_i &= \text{Inverse function of GMM marginal CDF}\\\Theta &= (w_i, \theta_i) \forall i \in [1, \dots, M]\end{aligned}\end{align} \]
property
bounds
¶ Bounds is not meaningful for
GaussianMixtureCopula

cdf
(x, log=False)[source]¶ Returns the cumulative distribution function (CDF) of the copulae.
The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.
 Parameters
x (
Union
[ndarray
,Collection
[Number
],Series
,Collection
[Collection
[Number
]],DataFrame
]) – Vector or matrix of the observed data. This vector must be (n x d) where d is the dimension of the copulalog – If True, the log of the probability is returned
 Returns
The CDF of the random variates
 Return type
np.ndarray or float

property
clusters
¶ Number of clusters in the
GaussianMixtureCopula

property
dim
¶ Number of dimensions for each copula in the
GaussianMixtureCopula

fit
(data, x0=None, method='pem', optim_options=None, ties='average', verbose=1, max_iter=3000, criteria='GMCM', eps=0.0001)[source]¶ Fit the copula with specified data
 Parameters
data (
Union
[DataFrame
,ndarray
]) – Array of data used to fit copula. Usually, data should not be pseudo observations as this will skew the model parametersx0 (
Union
[GMCParam
,ndarray
,Collection
[float
],None
]) – Initial starting point. If value is None, best starting point will be estimatedmethod (
Literal
[‘pem’, ‘sgd’, ‘kmeans’]) – Method of fitting. Supported methods are: ‘pem’  Expectation Maximization with pseudo loglikelihood, ‘kmeans’  Kmeans, ‘sgd’  stochastic gradient descentoptim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize. Only applicable for gradientdescent optimizations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
max_iter (int) – Maximum number of iterations
criteria ({ 'GMCM', 'GMM', 'Li' }) – The stopping criteria. Only applicable for Expectation Maximization (EM). ‘GMCM’ uses the absolute difference between the current and last based off the GMCM log likelihood, ‘GMM’ uses the absolute difference between the current and last based off the GMM log likelihood and ‘Li’ uses the stopping criteria defined by Li et. al. (2011)
eps (float) – The epsilon value for which any absolute delta will mean that the model has converged
Notes
Maximizing the exact likelihood of GMCM is technically intractable using expectation maximization. The pseudolikelihood
See also
scipy.optimize.minimize
the scipy minimize function use for optimization

log_lik
(data, *, to_pobs=True, ties='average')¶ Returns the log likelihood (LL) of the copula given the data.
The greater the LL (closer to \(\infty\)) the better.
 Parameters
data (
ndarray
) – Data set used to calculate the log likelihoodto_pobs – If True, converts the data input to pseudo observations.
ties (
Literal
[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only ifto_pobs
is True.
 Returns
Log Likelihood
 Return type
float

property
params
¶ The parameter set which describes the copula
 Returns
The model parameter
 Return type
GMCParam

pdf
(x, log=False)[source]¶ Returns the probability distribution function (PDF) of the copulae.
The PDF is also the density of the RV at for the particular distribution. Equivalent to the ‘d’ generic function in R.
 Parameters
x (
Union
[ndarray
,Collection
[Number
],Series
,Collection
[Collection
[Number
]],DataFrame
]) – Vector or matrix of observed datalog – If True, the density ‘d’ is given as log(d)
 Returns
The density (PDF) of the RV
 Return type
np.ndarray or float

static
pobs
(data, ties='average')¶ Compute the pseudoobservations for the given data matrix
 Parameters
data ({ array_like, DataFrame }) – Random variates to be converted to pseudoobservations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples
 Returns
matrix or vector of the same dimension as data containing the pseudo observations
 Return type
ndarray
See also
pseudo_obs()
The pseudoobservations function

property