Title: | Collection of Robust Covariance and (Sparse) Precision Matrix Estimators |
---|---|
Description: | Collection of methods for robust covariance and (sparse) precision matrix estimation based on Loh and Tan (2018) <doi:10.1214/18-EJS1427>. |
Authors: | Yunyi Shen [aut, cre] , David Simcha [cph] |
Maintainer: | Yunyi Shen <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-11-04 04:04:19 UTC |
Source: | https://github.com/yunyishen/robustcov |
This function samples normal distribution with normal contamination
conta_normal( n, Omega, byrow = FALSE, cont_rate = 0.05, mu = 10, sd = sqrt(0.2) )
conta_normal( n, Omega, byrow = FALSE, cont_rate = 0.05, mu = 10, sd = sqrt(0.2) )
n |
samplesize |
Omega |
precision matrix of the normal |
byrow |
whether the contamination happened by row? FALSE stand for cellwise contamination |
cont_rate |
how many cells/rows are contaminated? |
mu |
mean of the contamination |
sd |
standard deviation of the contamination |
a matrix of contaminated (multivariate) normal distributed data, row as sample
This routine calculates the Kendall's tau
corKendall(data)
corKendall(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p, Kendall's tau
corKendall(matrix(rnorm(500),100,5))
corKendall(matrix(rnorm(500),100,5))
This routine calculates Quadrant correlation coefficients
corQuadrant(data)
corQuadrant(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p, Quadrant correlation coefficients
corQuadrant(matrix(rnorm(500),100,5))
corQuadrant(matrix(rnorm(500),100,5))
This routine calculates the Spearman correlation
corSpearman(data)
corSpearman(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p of spearman correlations
corSpearman(matrix(rnorm(500),100,5))
corSpearman(matrix(rnorm(500),100,5))
This routine calculates the Gnanadesikan-Kettenring estimator, diagonal will be MAD
covGK(data)
covGK(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p, GK estimator, note that it's not necessarily positive
covGK(matrix(rnorm(500),100,5))
covGK(matrix(rnorm(500),100,5))
This routine calculates the NPD estimator for *covariance* based on Qn
covNPD(data, eigenTol = 1e-06, convTol = 1e-07, psdTol = 1e-08, maxit = 1000L)
covNPD(data, eigenTol = 1e-06, convTol = 1e-07, psdTol = 1e-08, maxit = 1000L)
data |
the n by p raw data matrix |
eigenTol |
tolerance in eigen system, used in finding nearest positive matrix |
convTol |
tolerance in cov, used in finding nearest positive matrix |
psdTol |
tolerance in psd, used in finding nearest positive matrix |
maxit |
max iterations in finding nearest positive matrix |
a matrix with dimension p by p, NPD estimator
covNPD(matrix(rnorm(500),100,5))
covNPD(matrix(rnorm(500),100,5))
This routine calculates the Orthogonalized Gnanadesikan-Kettenring (OGK) estimator for *covariance*, using scale estimation of Gn, as in Maronna and Zamar
covOGK(data)
covOGK(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p, OGK estimator
covOGK(matrix(rnorm(500),100,5))
covOGK(matrix(rnorm(500),100,5))
This routine calculates the SpearmanU, the pairwise covariance matrix estimator proposed in Oellererand Croux
covSpearmanU(data)
covSpearmanU(data)
data |
the n by p raw data matrix |
a matrix with dimension p by p of spearmanU correlation
covSpearmanU(matrix(rnorm(500),100,5))
covSpearmanU(matrix(rnorm(500),100,5))
This routine use k fold cross validation to chose tuning parameter
cvglasso( data, k = 10, covest = cov, rhos = seq(0.1, 1, 0.1), evaluation = negLLrobOmega, ... )
cvglasso( data, k = 10, covest = cov, rhos = seq(0.1, 1, 0.1), evaluation = negLLrobOmega, ... )
data |
The full dataset, should be a matrix or a data.frame, row as sample |
k |
number of folds |
covest |
a *function* or name of a function (string) that takes a matrix to estimate covariance |
rhos |
a vector of tuning parameter to be tested |
evaluation |
a *function* or name of a function (string) that takes only two arguments, the estimated covariance and the test covariance, when NULL, we use negative log likelihood on test sets |
... |
extra arguments send to glasso |
a matrix with k rows, each row is the evaluation loss of that fold
cvglasso(matrix(rnorm(100),20,5))
cvglasso(matrix(rnorm(100),20,5))
This routine calculate the nearest positive semi0definite projection
nearPPSD(X, eigenTol = 1e-06, convTol = 1e-07, psdTol = 1e-08, maxit = 1000L)
nearPPSD(X, eigenTol = 1e-06, convTol = 1e-07, psdTol = 1e-08, maxit = 1000L)
X |
the matrix |
eigenTol |
tolerance in eigen system, used in finding nearest positive matrix |
convTol |
tolerance in cov, used in finding nearest positive matrix |
psdTol |
tolerance in psd, used in finding nearest positive matrix |
maxit |
max iterations in finding nearest positive matrix |
a matrix which is the nearest positive semi-definite matrix of input X
The default evaluation function in corss validation, -log liekihood on test set
negLLrobOmega(Sigma_hat, Sigma)
negLLrobOmega(Sigma_hat, Sigma)
Sigma_hat |
the estimated *covariance* matrix of training set |
Sigma |
the *covariance* matrix of test sets |
-log likelihood
This routine samples alternative multivarate t distribution
raltert(n, Omega, nu)
raltert(n, Omega, nu)
n |
sample size |
Omega |
**precision** matrix of dimension p by p |
nu |
degree of freedom |
a matrix with dimension n by p, each row is a sample
This routine samples multivarate normal distribution of mean 0 from precision matrix
rmvnorm(n, Omega)
rmvnorm(n, Omega)
n |
sample size |
Omega |
**precision** matrix of dimension p by p |
a matrix with dimension n by p, each row is a sample
This routine samples multivarate t distribution
rmvt(n, Omega, nu)
rmvt(n, Omega, nu)
n |
sample size |
Omega |
**precision** matrix of dimension p by p |
nu |
degree of freedom |
a matrix with dimension n by p, each row is a sample
This routine fits glasso using a robust covariance matrix
robglasso( data, covest = cov, rho = 0.1, CV = FALSE, k = 10, grids = 15, evaluation = negLLrobOmega, ... )
robglasso( data, covest = cov, rho = 0.1, CV = FALSE, k = 10, grids = 15, evaluation = negLLrobOmega, ... )
data |
raw data, should be a matrix or a data.frame, row as sample |
covest |
a *function* or name of a function (string) that takes a matrix to estimate covariance |
rho |
a scalar or vector of tuning parameters to be chosen, if CV=FALSE, should be a scalar, if CV=TRUE scalar input will be override and tuning parameter will be chosen based on CV |
CV |
bool, whether doing cross validation for tuning parameter, if set to TRUE and rho is a scalar, the candidate will be chosen automatically by log spacing between 0.01 max covariance and max covariance with number of grids |
k |
fold for cross validation if applicable |
grids |
number of candidate tuning parameters in cross validation |
evaluation |
a *function* or name of a function (string) that takes only two arguments, the estimated *covariance* and the test *covariace*, when NULL, we use negative log likelihood on test sets |
... |
extra argument sent to glasso::glasso |
a glasso return (see ?glasso::glasso), most important one is $X the estimated sparse precision,with an extra entry of tuning parameter lambda
robglasso(matrix(rnorm(100),20,5))
robglasso(matrix(rnorm(100),20,5))