Gaussian Mixture Copula

class copulae.mixtures.gmc.gmc.GaussianMixtureCopula(n_clusters, ndim, param=None)[source]

The Gaussian Mixture Copula (GMC).

A Gaussian Copula has many normal marginal densities bound together by a single multivariate and uni-model Gaussian density. However, if a dataset has multiple modes (peaks) with different dependence structure, the applicability of the Gaussian Copula gets severely limited. A Gaussian Mixture Copula on the other hand allows modeling of data with many modes (peaks).

The GMC’s dependence structure is obtained from a Gaussian Mixture Model (GMM). For a GMC with \(M\) components and \(d\) dimensions, the density (PDF) is given by

\[ \begin{align}\begin{aligned}\phi &= \sum_i^M w_i \phi_i (x_1, x_2, \dots, x_d; \theta_i)\\&\text{s.t.}\\&\sum^M_i w_i = 1\\&0 \leq w_i \leq 1 \quad \forall i \in [1, \dots, M]\\&\text{where}\\w_i &= \text{weight of the marginal density}\\\phi_i &= \text{marginal density}\\\theta_i &= \text{parameters of the Gaussian marginal}\end{aligned}\end{align} \]

The Gaussian Mixture Copula is thus given by

\[ \begin{align}\begin{aligned}C(u_1, u_2, \dots, u_d; \Theta) &= \frac{\phi(\Phi_1^{-1}(u_1), \Phi_2^{-1}(u_2), \dots, \Phi_d^{-1}(u_d); \Theta} {\phi_1(\Phi_1^{-1}(u_1)) \cdot \phi_2(\Phi_2^{-1}(u_2)) \cdots \phi_d(\Phi_1^{-1}(u_d))}\\& \text{where}\\\Phi_i &= \text{Inverse function of GMM marginal CDF}\\\Theta &= (w_i, \theta_i) \forall i \in [1, \dots, M]\end{aligned}\end{align} \]
property bounds

Bounds is not meaningful for GaussianMixtureCopula

cdf(x, log=False)[source]

Returns the cumulative distribution function (CDF) of the copulae.

The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.

Parameters
  • x (Union[ndarray, Collection[Number], Series, Collection[Collection[Number]], DataFrame]) – Vector or matrix of the observed data. This vector must be (n x d) where d is the dimension of the copula

  • log – If True, the log of the probability is returned

Returns

The CDF of the random variates

Return type

np.ndarray or float

property clusters

Number of clusters in the GaussianMixtureCopula

property dim

Number of dimensions for each copula in the GaussianMixtureCopula

fit(data, x0=None, method='pem', optim_options=None, ties='average', verbose=1, max_iter=3000, criteria='GMCM', eps=0.0001)[source]

Fit the copula with specified data

Parameters
  • data (Union[DataFrame, ndarray]) – Array of data used to fit copula. Usually, data should not be pseudo observations as this will skew the model parameters

  • x0 (Union[GMCParam, ndarray, Collection[float], None]) – Initial starting point. If value is None, best starting point will be estimated

  • method (Literal[‘pem’, ‘sgd’, ‘kmeans’]) – Method of fitting. Supported methods are: ‘pem’ - Expectation Maximization with pseudo log-likelihood, ‘kmeans’ - K-means, ‘sgd’ - stochastic gradient descent

  • optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize. Only applicable for gradient-descent optimizations

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form

  • verbose – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.

  • max_iter (int) – Maximum number of iterations

  • criteria ({ 'GMCM', 'GMM', 'Li' }) – The stopping criteria. Only applicable for Expectation Maximization (EM). ‘GMCM’ uses the absolute difference between the current and last based off the GMCM log likelihood, ‘GMM’ uses the absolute difference between the current and last based off the GMM log likelihood and ‘Li’ uses the stopping criteria defined by Li et. al. (2011)

  • eps (float) – The epsilon value for which any absolute delta will mean that the model has converged

Notes

Maximizing the exact likelihood of GMCM is technically intractable using expectation maximization. The pseudo-likelihood

See also

scipy.optimize.minimize

the scipy minimize function use for optimization

log_lik(data, *, to_pobs=True, ties='average')

Returns the log likelihood (LL) of the copula given the data.

The greater the LL (closer to \(\infty\)) the better.

Parameters
  • data (ndarray) – Data set used to calculate the log likelihood

  • to_pobs – If True, converts the data input to pseudo observations.

  • ties (Literal[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if to_pobs is True.

Returns

Log Likelihood

Return type

float

property params

The parameter set which describes the copula

Returns

The model parameter

Return type

GMCParam

pdf(x, log=False)[source]

Returns the probability distribution function (PDF) of the copulae.

The PDF is also the density of the RV at for the particular distribution. Equivalent to the ‘d’ generic function in R.

Parameters
  • x (Union[ndarray, Collection[Number], Series, Collection[Collection[Number]], DataFrame]) – Vector or matrix of observed data

  • log – If True, the density ‘d’ is given as log(d)

Returns

The density (PDF) of the RV

Return type

np.ndarray or float

static pobs(data, ties='average')

Compute the pseudo-observations for the given data matrix

Parameters
  • data ({ array_like, DataFrame }) – Random variates to be converted to pseudo-observations

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples

Returns

matrix or vector of the same dimension as data containing the pseudo observations

Return type

ndarray

See also

pseudo_obs()

The pseudo-observations function

random(n, seed=None)[source]

Generate random observations for the copula

Parameters
  • n (int) – Number of observations to be generated

  • seed (int, optional) – Seed for the random generator

Returns

array of generated observations

Return type

np.ndarray

summary()[source]

Constructs the summary information about the copula