Empirical Copula

The empirical distribution function is a natural nonparametric estimator of a distribution function. Given a copula \(C\), a non-parametric estimator of \(C\) is given by

\[\begin{split}C_n(\mathbf{u}) &= \frac{1}{n} \sum^n_{i=1} 1(\mathbf{U}_{i, n} \leq \mathbf{u} \\ &= \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} 1(U_{ij, n} \leq u_j) \quad \mathbf{u} \in [0, 1]^d\end{split}\]

The estimator above is the default used by the EmpiricalCopula class when smoothing is set to None.

The empirical copula, being a particular multivariate empirical distribution function, often exhibits a large bias when the sample size is small. One way to counteract this is to use the empirical beta copula. The estimator is given by

\[C^\beta_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} F_{n, R_{ij}}(u_j) \quad \mathbf{u} \in [0, 1]^d\]

where \(F_{n, r}\) represents a beta distribution function with parameters \(r\) and \(n + 1 - r\) and where \(R_{ij}\) represents the rank of \(X_{ij}\) where \(\mathbf{X}\) is the original data set used to “fit” the empirical copula.

Another smooth version of the empirical copula estimator is the “checkerboard” copula. Its estimator is given by

\[C^\#_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} \min\{\max\{nu_j - R_{i,j} + 1, 0\}, 1\} \quad \mathbf{u} \in [0, 1]^d\]
class copulae.empirical.EmpiricalCopula(data, smoothing=None, ties='average', offset=0)[source]

Given pseudo-observations from a distribution with continuous margins and copula, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of the copula.

Examples

>>> from copulae import EmpiricalCopula
>>> from copulae.datasets import load_marginal_data
>>> df = load_marginal_data()
>>> df.head(3)
    STUDENT      NORM       EXP
0 -0.485878  2.646041  0.393322
1 -1.088878  2.906977  0.253731
2 -0.462133  3.166951  0.480696
>>> emp_cop = EmpiricalCopula(3, df, smoothing="beta")
>>> data = emp_cop.data  # getting the pseudo-observation data (this is the converted df)
>>> data[:3]
array([[0.32522493, 0.1886038 , 0.55781406],
       [0.15161613, 0.39953349, 0.40953016],
       [0.33622126, 0.65611463, 0.62645785]])
# must feed pseudo-observations into cdf
>>> emp_cop.cdf(data[:2])
array([0.06865595, 0.06320104])
>>> emp_cop.pdf([[0.5, 0.5, 0.5]])
0.009268568506099015
>>> emp_cop.random(3, seed=10)
array([[0.59046984, 0.98467178, 0.16494502],
       [0.31989337, 0.28090636, 0.09063645],
       [0.60379873, 0.61779407, 0.54215262]])
property bounds

Gets the bounds for the parameters

Returns

Lower and upper bound of the copula’s parameters

Return type

(scalar or array_like, scalar or array_like)

cdf(u, log=False)[source]

Returns the cumulative distribution function (CDF) of the copulae.

The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.

Parameters
  • u (ndarray) – Vector or matrix of the pseudo-observations of the observed data. This vector must be (n x d) where d is the dimension of the copula and must have values between 0 and 1. The caller must have specified the density.

  • log (bool) – If True, the log of the probability is returned

Returns

The CDF of the random variates

Return type

ndarray or float

property dim

Number of dimensions in copula

fit(data, x0=None, method='ml', optim_options=None, ties='average', verbose=1)[source]

Fit the copula with specified data

Parameters
  • data (ndarray) – Array of data used to fit copula. Usually, data should be the pseudo observations

  • x0 (ndarray) – Initial starting point. If value is None, best starting point will be estimated

  • method ({ 'ml', 'irho', 'itau' }, optional) – Method of fitting. Supported methods are: ‘ml’ - Maximum Likelihood, ‘irho’ - Inverse Spearman Rho, ‘itau’ - Inverse Kendall Tau

  • optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize()

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form

  • verbose (int, optional) – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.

  • kwargs – Other keyword arguments. See Notes for more details

Notes

Other valid keyword arguments and their purpose

scale

Amount to scale the objective function value of the numerical optimizer. This is helpful in achieving higher accuracy as it increases the sensitivity of the optimizer. The downside is that the optimizer could likely run longer as a result. Defaults to 1.

See also

scipy.optimize.minimize

the scipy minimize function use for optimization

log_lik(data, *, to_pobs=True, ties='average')

Returns the log likelihood (LL) of the copula given the data.

The greater the LL (closer to \(\infty\)) the better.

Parameters
  • data (ndarray) – Data set used to calculate the log likelihood

  • to_pobs – If True, converts the data input to pseudo observations.

  • ties (Literal[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if to_pobs is True.

Returns

Log Likelihood

Return type

float

property params

By default, the Empirical copula has no “parameters” as everything is defined by the input data

static pobs(data, ties='average')

Compute the pseudo-observations for the given data matrix

Parameters
  • data ({ array_like, DataFrame }) – Random variates to be converted to pseudo-observations

  • ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples

Returns

matrix or vector of the same dimension as data containing the pseudo observations

Return type

ndarray

See also

pseudo_obs()

The pseudo-observations function

property smoothing

The smoothing parameter. “none” provides no smoothing. “beta” and “checkerboard” provide a smoothed version of the empirical copula. See equations (2.1) - (4.1) in Segers, Sibuya and Tsukahara

References

The Empirical Beta Copula <https://arxiv.org/pdf/1607.04430.pdf>

summary()[source]

Constructs the summary information about the copula

property ties

The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.

‘average’:

The average of the ranks that would have been assigned to all the tied values is assigned to each value.

‘min’:

The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)

‘max’:

The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

‘dense’:

Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements. ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

to_marginals(u)[source]

Transforms a sample marginal data (pseudo-observations) to empirical margins based on the input dataset

Parameters

u (Union[ndarray, Collection[Collection[Number]], DataFrame]) – Sample marginals (pseudo observations). Values must be between [0, 1]

Returns

Transformed marginals

Return type

np.ndarray or pd.DataFrame