Empirical Copula¶

The empirical distribution function is a natural nonparametric estimator of a distribution function. Given a copula \(C\), a non-parametric estimator of \(C\) is given by

\[\begin{split}C_n(\mathbf{u}) &= \frac{1}{n} \sum^n_{i=1} 1(\mathbf{U}_{i, n} \leq \mathbf{u} \\ &= \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} 1(U_{ij, n} \leq u_j) \quad \mathbf{u} \in [0, 1]^d\end{split}\]

The estimator above is the default used by the EmpiricalCopula class when smoothing is set to None.

The empirical copula, being a particular multivariate empirical distribution function, often exhibits a large bias when the sample size is small. One way to counteract this is to use the empirical beta copula. The estimator is given by

\[C^\beta_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} F_{n, R_{ij}}(u_j) \quad \mathbf{u} \in [0, 1]^d\]

where \(F_{n, r}\) represents a beta distribution function with parameters \(r\) and \(n + 1 - r\) and where \(R_{ij}\) represents the rank of \(X_{ij}\) where \(\mathbf{X}\) is the original data set used to “fit” the empirical copula.

Another smooth version of the empirical copula estimator is the “checkerboard” copula. Its estimator is given by

\[C^\#_n(\mathbf{u}) = \frac{1}{n} \sum^n_{i=1} \prod^d_{j=1} \min\{\max\{nu_j - R_{i,j} + 1, 0\}, 1\} \quad \mathbf{u} \in [0, 1]^d\]

class copulae.empirical.EmpiricalCopula(data, smoothing=None, ties='average', offset=0)[source]¶

Given pseudo-observations from a distribution with continuous margins and copula, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of the copula.

Examples

>>> from copulae import EmpiricalCopula, pseudo_obs
>>> from copulae.datasets import load_marginal_data
>>> df = load_marginal_data()
>>> df.head(3)
    STUDENT      NORM       EXP
0 -0.485878  2.646041  0.393322
1 -1.088878  2.906977  0.253731
2 -0.462133  3.166951  0.480696
>>> u = pseudo_obs(df)  # data must be converted to pseudo observations
>>> emp_cop = EmpiricalCopula(u, smoothing="beta")
>>> data = emp_cop.data  # getting the pseudo-observation data (this is the converted df)
>>> data[:3]
    STUDENT      NORM       EXP
0  0.325225  0.188604  0.557814
1  0.151616  0.399533  0.409530
2  0.336221  0.656115  0.626458
# must feed pseudo-observations into cdf
>>> emp_cop.cdf(data[:2])
array([0.06865595, 0.06320104])
>>> emp_cop.pdf([[0.5, 0.5, 0.5],
...              [0.6, 0.4, 0.6]])
array([0.00926857, 0.00868833])
>>> emp_cop.random(3, seed=10)
    STUDENT      NORM       EXP
0  0.590470  0.984672  0.164945
1  0.319893  0.280906  0.090636
2  0.603799  0.617794  0.542153

property bounds¶

Gets the bounds for the parameters

Returns: Lower and upper bound of the copula’s parameters
Return type: (scalar or array_like, scalar or array_like)

cdf(u, log=False)[source]¶

Returns the cumulative distribution function (CDF) of the copulae.

The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.

Parameters

u (ndarray) – Vector or matrix of the pseudo-observations of the observed data. This vector must be (n x d) where d is the dimension of the copula and must have values between 0 and 1. The caller must have specified the density.
log (bool) – If True, the log of the probability is returned

Returns

The CDF of the random variates

Return type

ndarray or float

property dim¶: Number of dimensions in copula

fit(data, x0=None, method='ml', optim_options=None, ties='average', verbose=1, to_pobs=True, scale=1.0)[source]¶

Fit the copula with specified data

Parameters

data (ndarray) – Array of data used to fit copula. Usually, data should be the pseudo observations
x0 (ndarray) – Initial starting point. If value is None, best starting point will be estimated
method ({ 'ml', 'irho', 'itau' }, optional) – Method of fitting. Supported methods are: ‘ml’ - Maximum Likelihood, ‘irho’ - Inverse Spearman Rho, ‘itau’ - Inverse Kendall Tau
optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize()
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose (int, optional) – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
to_pobs (bool) – If True, casts the input data along the column axis to uniform marginals (i.e. convert variables to values between [0, 1]). Set this to False if the input data are already uniform marginals.
scale (float) – Amount to scale the objective function value of the numerical optimizer. This is helpful in achieving higher accuracy as it increases the sensitivity of the optimizer. The downside is that the optimizer could likely run longer as a result. Defaults to 1.

See also

pseudo_obs(): The pseudo-observations function

property smoothing¶

The smoothing parameter. “none” provides no smoothing. “beta” and “checkerboard” provide a smoothed version of the empirical copula. See equations (2.1) - (4.1) in Segers, Sibuya and Tsukahara

References

The Empirical Beta Copula <https://arxiv.org/pdf/1607.04430.pdf>

property ties¶

The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.

‘average’:: The average of the ranks that would have been assigned to all the tied values is assigned to each value.
‘min’:: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
‘max’:: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
‘dense’:: Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements. ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

classmethod to_marginals(u, original)[source]¶

Transforms a sample marginal data (pseudo-observations) to empirical margins based on the input dataset

Parameters

u (Union[ndarray, Collection[Collection[Number]], DataFrame]) – Sample marginals (pseudo observations). Values must be between [0, 1]
original (Union[ndarray, Collection[Collection[Number]], DataFrame]) – The original dataset (do not cast these into pseudo-observations)

Returns

Transformed marginals

Return type

np.ndarray or pd.DataFrame

Examples

>>> from copulae import EmpiricalCopula, pseudo_obs
>>> from copulae.datasets import load_marginal_data
>>> src = load_marginal_data()
>>> src.head(3)
    STUDENT      NORM       EXP
0 -0.485878  2.646041  0.393322
1 -1.088878  2.906977  0.253731
2 -0.462133  3.166951  0.480696
# Pretend to get some psuedo marginals (~new data)
>>> sample_marginals = pseudo_obs(src)[:3] + 0.01
>>> sample_marginals
    STUDENT      NORM       EXP
0  0.335225  0.198604  0.567814
1  0.161616  0.409533  0.419530
2  0.346221  0.666115  0.636458
# Get the empirical marginals
>>> EmpiricalCopula.to_marginals(sample_marginals, src)
    STUDENT      NORM       EXP
0 -0.465186  2.663224  0.408747
1 -1.054686  2.917943  0.261550
2 -0.428498  3.176723  0.492262

Student Independent Copula