Empirical Copula

The empirical distribution function is a natural nonparametric estimator of a distribution function. Given a copula \(C\), a non-parametric estimator of \(C\) is given by

\[\begin{split}C_n(\mathbf{u}) &= \frac{1}{n} \sum^n_{i=1} 1(\mathbf{U}_{i, n} \leq \mathbf{u} \\ &= \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} 1(U_{ij, n} \leq u_j) \quad \mathbf{u} \in [0, 1]^d\end{split}\]

The estimator above is the default used by the EmpiricalCopula class when smoothing is set to None.

The empirical copula, being a particular multivariate empirical distribution function, often exhibits a large bias when the sample size is small. One way to counteract this is to use the empirical beta copula. The estimator is given by

\[C^\beta_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} F_{n, R_{ij}}(u_j) \quad \mathbf{u} \in [0, 1]^d\]

where \(F_{n, r}\) represents a beta distribution function with parameters \(r\) and \(n + 1 - r\) and where \(R_{ij}\) represents the rank of \(X_{ij}\) where \(\mathbf{X}\) is the original data set used to “fit” the empirical copula.

Another smooth version of the empirical copula estimator is the “checkerboard” copula. Its estimator is given by

\[C^\#_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} \min\{\max\{nu_j - R_{i,j} + 1, 0\}, 1\} \quad \mathbf{u} \in [0, 1]^d\]
class copulae.empirical.EmpiricalCopula(data, smoothing=None, ties='average', offset=0)[source]

Given pseudo-observations from a distribution with continuous margins and copula, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of the copula.

Examples

>>> from copulae import EmpiricalCopula, pseudo_obs
>>> from copulae.datasets import load_marginal_data
>>> df = load_marginal_data()
>>> df.head(3)
    STUDENT      NORM       EXP
0 -0.485878  2.646041  0.393322
1 -1.088878  2.906977  0.253731
2 -0.462133  3.166951  0.480696
>>> u = pseudo_obs(df)  # data must be converted to pseudo observations
>>> emp_cop = EmpiricalCopula(u, smoothing="beta")
>>> data = emp_cop.data  # getting the pseudo-observation data (this is the converted df)
>>> data[:3]
    STUDENT      NORM       EXP
0  0.325225  0.188604  0.557814
1  0.151616  0.399533  0.409530
2  0.336221  0.656115  0.626458
# must feed pseudo-observations into cdf
>>> emp_cop.cdf(data[:2])
array([0.06865595, 0.06320104])
>>> emp_cop.pdf([[0.5, 0.5, 0.5],
...              [0.6, 0.4, 0.6]])
array([0.00926857, 0.00868833])
>>> emp_cop.random(3, seed=10)
    STUDENT      NORM       EXP
0  0.590470  0.984672  0.164945
1  0.319893  0.280906  0.090636
2  0.603799  0.617794  0.542153
property bounds

Gets the bounds for the parameters

Returns

Lower and upper bound of the copula’s parameters

Return type

(scalar or array_like, scalar or array_like)

cdf(u, log=False)[source]

Returns the cumulative distribution function (CDF) of the copulae.

The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.

Parameters
  • u (ndarray) – Vector or matrix of the pseudo-observations of the observed data. This vector must be (n x d) where d is the dimension of the copula and must have values between 0 and 1. The caller must have specified the density.

  • log (bool) – If True, the log of the probability is returned

Returns

The CDF of the random variates

Return type

ndarray or float

property dim

Number of dimensions in copula

fit(data, x0=None, method='ml', optim_options=None, ties='average', verbose=1, to_pobs=True, scale=1.0)[source]

Fit the copula with specified data

Parameters
  • data (ndarray) – Array of data used to fit copula. Usually, data should be the pseudo observations

  • x0 (ndarray) – Initial starting point. If value is None, best starting point will be estimated

  • method ({ 'ml', 'irho', 'itau' }, optional) – Method of fitting. Supported methods are: ‘ml’ - Maximum Likelihood, ‘irho’ - Inverse Spearman Rho, ‘itau’ - Inverse Kendall Tau

  • optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize()

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form

  • verbose (int, optional) – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.

  • to_pobs (bool) – If True, casts the input data along the column axis to uniform marginals (i.e. convert variables to values between [0, 1]). Set this to False if the input data are already uniform marginals.

  • scale (float) – Amount to scale the objective function value of the numerical optimizer. This is helpful in achieving higher accuracy as it increases the sensitivity of the optimizer. The downside is that the optimizer could likely run longer as a result. Defaults to 1.

See also

pobs()

The psuedo-observation method converts non-uniform data into uniform marginals.

scipy.optimize.minimize

the scipy minimize function use for optimization

log_lik(data, *, to_pobs=True, ties='average')

Returns the log likelihood (LL) of the copula given the data.

The greater the LL (closer to \(\infty\)) the better.

Parameters
  • data (Union[ndarray, DataFrame]) – Data set used to calculate the log likelihood

  • to_pobs – If True, converts the data input to pseudo observations.

  • ties (Literal[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if to_pobs is True.

Returns

Log Likelihood

Return type

float

property params

By default, the Empirical copula has no “parameters” as everything is defined by the input data

static pobs(data, ties='average')

Compute the pseudo-observations for the given data matrix

Parameters
  • data ({ array_like, DataFrame }) – Random variates to be converted to pseudo-observations

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples

Returns

matrix or vector of the same dimension as data containing the pseudo observations

Return type

ndarray

See also

pseudo_obs()

The pseudo-observations function

property smoothing

The smoothing parameter. “none” provides no smoothing. “beta” and “checkerboard” provide a smoothed version of the empirical copula. See equations (2.1) - (4.1) in Segers, Sibuya and Tsukahara

References

The Empirical Beta Copula <https://arxiv.org/pdf/1607.04430.pdf>

property ties

The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.

‘average’:

The average of the ranks that would have been assigned to all the tied values is assigned to each value.

‘min’:

The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)

‘max’:

The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

‘dense’:

Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements. ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

classmethod to_marginals(u, original)[source]

Transforms a sample marginal data (pseudo-observations) to empirical margins based on the input dataset

Parameters
  • u (Union[ndarray, Collection[Collection[Number]], DataFrame]) – Sample marginals (pseudo observations). Values must be between [0, 1]

  • original (Union[ndarray, Collection[Collection[Number]], DataFrame]) – The original dataset (do not cast these into pseudo-observations)

Returns

Transformed marginals

Return type

np.ndarray or pd.DataFrame

Examples

>>> from copulae import EmpiricalCopula, pseudo_obs
>>> from copulae.datasets import load_marginal_data
>>> src = load_marginal_data()
>>> src.head(3)
    STUDENT      NORM       EXP
0 -0.485878  2.646041  0.393322
1 -1.088878  2.906977  0.253731
2 -0.462133  3.166951  0.480696
# Pretend to get some psuedo marginals (~new data)
>>> sample_marginals = pseudo_obs(src)[:3] + 0.01
>>> sample_marginals
    STUDENT      NORM       EXP
0  0.335225  0.198604  0.567814
1  0.161616  0.409533  0.419530
2  0.346221  0.666115  0.636458
# Get the empirical marginals
>>> EmpiricalCopula.to_marginals(sample_marginals, src)
    STUDENT      NORM       EXP
0 -0.465186  2.663224  0.408747
1 -1.054686  2.917943  0.261550
2 -0.428498  3.176723  0.492262