copulae.mixtures.gmc.gmc.
GaussianMixtureCopula
The Gaussian Mixture Copula (GMC).
A Gaussian Copula has many normal marginal densities bound together by a single multivariate and uni-model Gaussian density. However, if a dataset has multiple modes (peaks) with different dependence structure, the applicability of the Gaussian Copula gets severely limited. A Gaussian Mixture Copula on the other hand allows modeling of data with many modes (peaks).
The GMC’s dependence structure is obtained from a Gaussian Mixture Model (GMM). For a GMC with \(M\) components and \(d\) dimensions, the density (PDF) is given by
The Gaussian Mixture Copula is thus given by
bounds
Bounds is not meaningful for GaussianMixtureCopula
cdf
Returns the cumulative distribution function (CDF) of the copulae.
The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.
x (Union[ndarray, Collection[Number], Series, Collection[Collection[Number]], DataFrame]) – Vector or matrix of the observed data. This vector must be (n x d) where d is the dimension of the copula
Union
ndarray
Collection
Number
Series
DataFrame
log – If True, the log of the probability is returned
The CDF of the random variates
np.ndarray or float
clusters
Number of clusters in the GaussianMixtureCopula
dim
Number of dimensions for each copula in the GaussianMixtureCopula
fit
Fit the copula with specified data
data (Union[DataFrame, ndarray]) – Array of data used to fit copula. Usually, data should not be pseudo observations as this will skew the model parameters
x0 (Union[Collection[float], ndarray, GMCParam, None]) – Initial starting point. If value is None, best starting point will be estimated
float
GMCParam
None
method (Literal[‘pem’, ‘sgd’, ‘kmeans’]) – Method of fitting. Supported methods are: ‘pem’ - Expectation Maximization with pseudo log-likelihood, ‘kmeans’ - K-means, ‘sgd’ - stochastic gradient descent
Literal
optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize. Only applicable for gradient-descent optimizations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
max_iter (int) – Maximum number of iterations
criteria ({ 'GMCM', 'GMM', 'Li' }) – The stopping criteria. Only applicable for Expectation Maximization (EM). ‘GMCM’ uses the absolute difference between the current and last based off the GMCM log likelihood, ‘GMM’ uses the absolute difference between the current and last based off the GMM log likelihood and ‘Li’ uses the stopping criteria defined by Li et. al. (2011)
eps (float) – The epsilon value for which any absolute delta will mean that the model has converged
Notes
Maximizing the exact likelihood of GMCM is technically intractable using expectation maximization. The pseudo-likelihood
See also
scipy.optimize.minimize
the scipy minimize function use for optimization
log_lik
Returns the log likelihood (LL) of the copula given the data.
The greater the LL (closer to \(\infty\)) the better.
data (Union[ndarray, DataFrame]) – Data set used to calculate the log likelihood
to_pobs – If True, converts the data input to pseudo observations.
ties (Literal[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if to_pobs is True.
to_pobs
Log Likelihood
params
The parameter set which describes the copula
The model parameter
pdf
Returns the probability distribution function (PDF) of the copulae.
The PDF is also the density of the RV at for the particular distribution. Equivalent to the ‘d’ generic function in R.
x (Union[ndarray, Collection[Number], Series, Collection[Collection[Number]], DataFrame]) – Vector or matrix of observed data
log – If True, the density ‘d’ is given as log(d)
The density (PDF) of the RV
pobs
Compute the pseudo-observations for the given data matrix
data ({ array_like, DataFrame }) – Random variates to be converted to pseudo-observations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples
matrix or vector of the same dimension as data containing the pseudo observations
pseudo_obs()
The pseudo-observations function
random
Generate random observations for the copula
n (int) – Number of observations to be generated
seed (int, optional) – Seed for the random generator
array of generated observations
np.ndarray
summary
Constructs the summary information about the copula