Empirical Copula¶
The empirical distribution function is a natural nonparametric estimator of a distribution function. Given a copula \(C\), a nonparametric estimator of \(C\) is given by
The estimator above is the default used by the EmpiricalCopula
class when smoothing is set to None
.
The empirical copula, being a particular multivariate empirical distribution function, often exhibits a large bias when the sample size is small. One way to counteract this is to use the empirical beta copula. The estimator is given by
where \(F_{n, r}\) represents a beta distribution function with parameters \(r\) and \(n + 1  r\) and where \(R_{ij}\) represents the rank of \(X_{ij}\) where \(\mathbf{X}\) is the original data set used to “fit” the empirical copula.
Another smooth version of the empirical copula estimator is the “checkerboard” copula. Its estimator is given by

class
copulae.empirical.
EmpiricalCopula
(data, smoothing=None, ties='average', offset=0)[source]¶ Given pseudoobservations from a distribution with continuous margins and copula, the empirical copula is the (default) empirical distribution function of these pseudoobservations. It is thus a natural nonparametric estimator of the copula.
Examples
>>> from copulae import EmpiricalCopula >>> from copulae.datasets import load_marginal_data >>> df = load_marginal_data() >>> df.head(3) STUDENT NORM EXP 0 0.485878 2.646041 0.393322 1 1.088878 2.906977 0.253731 2 0.462133 3.166951 0.480696 >>> emp_cop = EmpiricalCopula(3, df, smoothing="beta") >>> data = emp_cop.data # getting the pseudoobservation data (this is the converted df) >>> data[:3] array([[0.32522493, 0.1886038 , 0.55781406], [0.15161613, 0.39953349, 0.40953016], [0.33622126, 0.65611463, 0.62645785]]) # must feed pseudoobservations into cdf >>> emp_cop.cdf(data[:2]) array([0.06865595, 0.06320104]) >>> emp_cop.pdf([[0.5, 0.5, 0.5]]) 0.009268568506099015 >>> emp_cop.random(3, seed=10) array([[0.59046984, 0.98467178, 0.16494502], [0.31989337, 0.28090636, 0.09063645], [0.60379873, 0.61779407, 0.54215262]])

property
bounds
¶ Gets the bounds for the parameters
 Returns
Lower and upper bound of the copula’s parameters
 Return type
(scalar or array_like, scalar or array_like)

cdf
(u, log=False)[source]¶ Returns the cumulative distribution function (CDF) of the copulae.
The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.
 Parameters
u (ndarray) – Vector or matrix of the pseudoobservations of the observed data. This vector must be (n x d) where d is the dimension of the copula and must have values between 0 and 1. The caller must have specified the density.
log (bool) – If True, the log of the probability is returned
 Returns
The CDF of the random variates
 Return type
ndarray or float

property
dim
¶ Number of dimensions in copula

fit
(data, x0=None, method='ml', optim_options=None, ties='average', verbose=1)[source]¶ Fit the copula with specified data
 Parameters
data (ndarray) – Array of data used to fit copula. Usually, data should be the pseudo observations
x0 (ndarray) – Initial starting point. If value is None, best starting point will be estimated
method ({ 'ml', 'irho', 'itau' }, optional) – Method of fitting. Supported methods are: ‘ml’  Maximum Likelihood, ‘irho’  Inverse Spearman Rho, ‘itau’  Inverse Kendall Tau
optim_options (dict, optional) – Keyword arguments to pass into
scipy.optimize.minimize()
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose (int, optional) – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
kwargs – Other keyword arguments. See Notes for more details
Notes
Other valid keyword arguments and their purpose
scale
Amount to scale the objective function value of the numerical optimizer. This is helpful in achieving higher accuracy as it increases the sensitivity of the optimizer. The downside is that the optimizer could likely run longer as a result. Defaults to 1.
See also
scipy.optimize.minimize
the scipy minimize function use for optimization

log_lik
(data, *, to_pobs=True, ties='average')¶ Returns the log likelihood (LL) of the copula given the data.
The greater the LL (closer to \(\infty\)) the better.
 Parameters
data (
ndarray
) – Data set used to calculate the log likelihoodto_pobs – If True, converts the data input to pseudo observations.
ties (
Literal
[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only ifto_pobs
is True.
 Returns
Log Likelihood
 Return type
float

property
params
¶ By default, the Empirical copula has no “parameters” as everything is defined by the input data

static
pobs
(data, ties='average')¶ Compute the pseudoobservations for the given data matrix
 Parameters
data ({ array_like, DataFrame }) – Random variates to be converted to pseudoobservations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples
 Returns
matrix or vector of the same dimension as data containing the pseudo observations
 Return type
ndarray
See also
pseudo_obs()
The pseudoobservations function

property
smoothing
¶ The smoothing parameter. “none” provides no smoothing. “beta” and “checkerboard” provide a smoothed version of the empirical copula. See equations (2.1)  (4.1) in Segers, Sibuya and Tsukahara
References
The Empirical Beta Copula <https://arxiv.org/pdf/1607.04430.pdf>

property
ties
¶ The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.
 ‘average’:
The average of the ranks that would have been assigned to all the tied values is assigned to each value.
 ‘min’:
The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
 ‘max’:
The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
 ‘dense’:
Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements. ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

to_marginals
(u)[source]¶ Transforms a sample marginal data (pseudoobservations) to empirical margins based on the input dataset
 Parameters
u (
Union
[ndarray
,Collection
[Collection
[Number
]],DataFrame
]) – Sample marginals (pseudo observations). Values must be between [0, 1] Returns
Transformed marginals
 Return type
np.ndarray or pd.DataFrame

property