Gaussian Mixture Copula¶

class copulae.mixtures.gmc.gmc.GaussianMixtureCopula(n_clusters, ndim, params=None)[source]¶

The Gaussian Mixture Copula (GMC).

A Gaussian Copula has many normal marginal densities bound together by a single multivariate and uni-model Gaussian density. However, if a dataset has multiple modes (peaks) with different dependence structure, the applicability of the Gaussian Copula gets severely limited. A Gaussian Mixture Copula on the other hand allows modeling of data with many modes (peaks).

The GMC’s dependence structure is obtained from a Gaussian Mixture Model (GMM). For a GMC with \(M\) components and \(d\) dimensions, the density (PDF) is given by

\[ \begin{align}\begin{aligned}\phi &= \sum_i^M w_i \phi_i (x_1, x_2, \dots, x_d; \theta_i)\\&\text{s.t.}\\&\sum^M_i w_i = 1\\&0 \leq w_i \leq 1 \quad \forall i \in [1, \dots, M]\\&\text{where}\\w_i &= \text{weight of the marginal density}\\\phi_i &= \text{marginal density}\\\theta_i &= \text{parameters of the Gaussian marginal}\end{aligned}\end{align} \]

The Gaussian Mixture Copula is thus given by

\[ \begin{align}\begin{aligned}C(u_1, u_2, \dots, u_d; \Theta) &= \frac{\phi(\Phi_1^{-1}(u_1), \Phi_2^{-1}(u_2), \dots, \Phi_d^{-1}(u_d); \Theta} {\phi_1(\Phi_1^{-1}(u_1)) \cdot \phi_2(\Phi_2^{-1}(u_2)) \cdots \phi_d(\Phi_1^{-1}(u_d))}\\& \text{where}\\\Phi_i &= \text{Inverse function of GMM marginal CDF}\\\Theta &= (w_i, \theta_i) \forall i \in [1, \dots, M]\end{aligned}\end{align} \]

property bounds¶: Bounds is not meaningful for GaussianMixtureCopula

cdf(x, log=False)[source]¶

Returns the cumulative distribution function (CDF) of the copulae.

The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.

Parameters

x (Union[ndarray, Collection[Number], Series, Collection[Collection[Number]], DataFrame]) – Vector or matrix of the observed data. This vector must be (n x d) where d is the dimension of the copula
log – If True, the log of the probability is returned

Returns

The CDF of the random variates

Return type

np.ndarray or float

property clusters¶: Number of clusters in the GaussianMixtureCopula

property dim¶: Number of dimensions for each copula in the GaussianMixtureCopula

fit(data, x0=None, method='pem', optim_options=None, ties='average', verbose=1, max_iter=3000, criteria='GMCM', eps=0.0001)[source]¶

Fit the copula with specified data

Parameters

data (Union[DataFrame, ndarray]) – Array of data used to fit copula. Usually, data should not be pseudo observations as this will skew the model parameters
x0 (Union[Collection[float], ndarray, GMCParam, None]) – Initial starting point. If value is None, best starting point will be estimated
method (Literal[‘pem’, ‘sgd’, ‘kmeans’]) – Method of fitting. Supported methods are: ‘pem’ - Expectation Maximization with pseudo log-likelihood, ‘kmeans’ - K-means, ‘sgd’ - stochastic gradient descent
optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize. Only applicable for gradient-descent optimizations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
max_iter (int) – Maximum number of iterations
criteria ({ 'GMCM', 'GMM', 'Li' }) – The stopping criteria. Only applicable for Expectation Maximization (EM). ‘GMCM’ uses the absolute difference between the current and last based off the GMCM log likelihood, ‘GMM’ uses the absolute difference between the current and last based off the GMM log likelihood and ‘Li’ uses the stopping criteria defined by Li et. al. (2011)
eps (float) – The epsilon value for which any absolute delta will mean that the model has converged

Notes

Maximizing the exact likelihood of GMCM is technically intractable using expectation maximization. The pseudo-likelihood