The empirical distribution function is a natural nonparametric estimator of a distribution function. Given a copula \(C\), a non-parametric estimator of \(C\) is given by
The estimator above is the default used by the EmpiricalCopula class when smoothing is set to None.
EmpiricalCopula
None
The empirical copula, being a particular multivariate empirical distribution function, often exhibits a large bias when the sample size is small. One way to counteract this is to use the empirical beta copula. The estimator is given by
where \(F_{n, r}\) represents a beta distribution function with parameters \(r\) and \(n + 1 - r\) and where \(R_{ij}\) represents the rank of \(X_{ij}\) where \(\mathbf{X}\) is the original data set used to “fit” the empirical copula.
Another smooth version of the empirical copula estimator is the “checkerboard” copula. Its estimator is given by
copulae.empirical.
Given pseudo-observations from a distribution with continuous margins and copula, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of the copula.
Examples
>>> from copulae import EmpiricalCopula, pseudo_obs >>> from copulae.datasets import load_marginal_data >>> df = load_marginal_data() >>> df.head(3) STUDENT NORM EXP 0 -0.485878 2.646041 0.393322 1 -1.088878 2.906977 0.253731 2 -0.462133 3.166951 0.480696 >>> u = pseudo_obs(df) # data must be converted to pseudo observations >>> emp_cop = EmpiricalCopula(u, smoothing="beta") >>> data = emp_cop.data # getting the pseudo-observation data (this is the converted df) >>> data[:3] STUDENT NORM EXP 0 0.325225 0.188604 0.557814 1 0.151616 0.399533 0.409530 2 0.336221 0.656115 0.626458 # must feed pseudo-observations into cdf >>> emp_cop.cdf(data[:2]) array([0.06865595, 0.06320104]) >>> emp_cop.pdf([[0.5, 0.5, 0.5], ... [0.6, 0.4, 0.6]]) array([0.00926857, 0.00868833]) >>> emp_cop.random(3, seed=10) STUDENT NORM EXP 0 0.590470 0.984672 0.164945 1 0.319893 0.280906 0.090636 2 0.603799 0.617794 0.542153
bounds
Gets the bounds for the parameters
Lower and upper bound of the copula’s parameters
(scalar or array_like, scalar or array_like)
cdf
Returns the cumulative distribution function (CDF) of the copulae.
The CDF is also the probability of a RV being less or equal to the value specified. Equivalent to the ‘p’ generic function in R.
u (ndarray) – Vector or matrix of the pseudo-observations of the observed data. This vector must be (n x d) where d is the dimension of the copula and must have values between 0 and 1. The caller must have specified the density.
log (bool) – If True, the log of the probability is returned
The CDF of the random variates
ndarray or float
dim
Number of dimensions in copula
fit
Fit the copula with specified data
data (ndarray) – Array of data used to fit copula. Usually, data should be the pseudo observations
x0 (ndarray) – Initial starting point. If value is None, best starting point will be estimated
method ({ 'ml', 'irho', 'itau' }, optional) – Method of fitting. Supported methods are: ‘ml’ - Maximum Likelihood, ‘irho’ - Inverse Spearman Rho, ‘itau’ - Inverse Kendall Tau
optim_options (dict, optional) – Keyword arguments to pass into scipy.optimize.minimize()
scipy.optimize.minimize()
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if the data has not been converted to its pseudo observations form
verbose (int, optional) – Log level for the estimator. The higher the number, the more verbose it is. 0 prints nothing.
to_pobs (bool) – If True, casts the input data along the column axis to uniform marginals (i.e. convert variables to values between [0, 1]). Set this to False if the input data are already uniform marginals.
scale (float) – Amount to scale the objective function value of the numerical optimizer. This is helpful in achieving higher accuracy as it increases the sensitivity of the optimizer. The downside is that the optimizer could likely run longer as a result. Defaults to 1.
See also
pobs()
The psuedo-observation method converts non-uniform data into uniform marginals.
scipy.optimize.minimize
the scipy minimize function use for optimization
log_lik
Returns the log likelihood (LL) of the copula given the data.
The greater the LL (closer to \(\infty\)) the better.
data (Union[ndarray, DataFrame]) – Data set used to calculate the log likelihood
Union
ndarray
DataFrame
to_pobs – If True, converts the data input to pseudo observations.
ties (Literal[‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’]) – Specifies how ranks should be computed if there are ties in any of the coordinate samples. This is effective only if to_pobs is True.
Literal
to_pobs
Log Likelihood
float
params
By default, the Empirical copula has no “parameters” as everything is defined by the input data
pobs
Compute the pseudo-observations for the given data matrix
data ({ array_like, DataFrame }) – Random variates to be converted to pseudo-observations
ties ({ 'average', 'min', 'max', 'dense', 'ordinal' }, optional) – Specifies how ranks should be computed if there are ties in any of the coordinate samples
matrix or vector of the same dimension as data containing the pseudo observations
pseudo_obs()
The pseudo-observations function
smoothing
The smoothing parameter. “none” provides no smoothing. “beta” and “checkerboard” provide a smoothed version of the empirical copula. See equations (2.1) - (4.1) in Segers, Sibuya and Tsukahara
References
The Empirical Beta Copula <https://arxiv.org/pdf/1607.04430.pdf>
ties
The method used to assign ranks to tied elements. The options are ‘average’, ‘min’, ‘max’, ‘dense’ and ‘ordinal’.
The average of the ranks that would have been assigned to all the tied values is assigned to each value.
The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements. ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.
to_marginals
Transforms a sample marginal data (pseudo-observations) to empirical margins based on the input dataset
u (Union[ndarray, Collection[Collection[Number]], DataFrame]) – Sample marginals (pseudo observations). Values must be between [0, 1]
Collection
Number
original (Union[ndarray, Collection[Collection[Number]], DataFrame]) – The original dataset (do not cast these into pseudo-observations)
Transformed marginals
np.ndarray or pd.DataFrame
>>> from copulae import EmpiricalCopula, pseudo_obs >>> from copulae.datasets import load_marginal_data >>> src = load_marginal_data() >>> src.head(3) STUDENT NORM EXP 0 -0.485878 2.646041 0.393322 1 -1.088878 2.906977 0.253731 2 -0.462133 3.166951 0.480696 # Pretend to get some psuedo marginals (~new data) >>> sample_marginals = pseudo_obs(src)[:3] + 0.01 >>> sample_marginals STUDENT NORM EXP 0 0.335225 0.198604 0.567814 1 0.161616 0.409533 0.419530 2 0.346221 0.666115 0.636458 # Get the empirical marginals >>> EmpiricalCopula.to_marginals(sample_marginals, src) STUDENT NORM EXP 0 -0.465186 2.663224 0.408747 1 -1.054686 2.917943 0.261550 2 -0.428498 3.176723 0.492262