Robust statistics theory and methods (with R)

A new edition of this popular text on robust statistics, thoroughly updated to include new and improved methods and focus on implementation of methodology using the increasingly popular open-source software R. Classical statistics fail to cope well with outliers associated with deviations from stand...

Descripción completa

Detalles Bibliográficos
Otros Autores: Maronna, Ricardo A., author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Hoboken, New Jersey : WIley 2019.
Edición:Second edition
Colección:Wiley series in probability and statistics.
THEi Wiley ebooks.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631533406719
Tabla de Contenidos:
  • Note: sections marked with an asterisk can be skipped on first reading
  • Preface xv
  • Preface to the First Edition xxi
  • About the Companion Website xxix
  • 1 Introduction 1
  • 1.1 Classical and robust approaches to statistics 1
  • 1.2 Mean and standard deviation 2
  • 1.3 The “three sigma edit” rule 6
  • 1.4 Linear regression 8
  • 1.4.1 Straight-line regression 8
  • 1.4.2 Multiple linear regression 9
  • 1.5 Correlation coefficients 12
  • 1.6 Other parametric models 13
  • 1.7 Problems 16
  • 2 Location and Scale 17
  • 2.1 The location model 17
  • 2.2 Formalizing departures from normality 19
  • 2.3 M-estimators of location 22
  • 2.3.1 Generalizing maximum likelihood 22
  • 2.3.2 The distribution of M-estimators 25
  • 2.3.3 An intuitive view of M-estimators 28
  • 2.3.4 Redescending M-estimators 29
  • 2.4 Trimmed and Winsorized means 31
  • 2.5 M-estimators of scale 33
  • 2.6 Dispersion estimators 35
  • 2.7 M-estimators of location with unknown dispersion 37
  • 2.7.1 Previous estimation of dispersion 38
  • 2.7.2 Simultaneous M-estimators of location and dispersion 38
  • 2.8 Numerical computing of M-estimators 40
  • 2.8.1 Location with previously-computed dispersion estimation 40
  • 2.8.2 Scale estimators 41
  • 2.8.3 Simultaneous estimation of location and dispersion 42
  • 2.9 Robust confidence intervals and tests 42
  • 2.9.1 Confidence intervals 42
  • 2.9.2 Tests 44
  • 2.10 Appendix: proofs and complements 45
  • 2.10.1 Mixtures 45
  • 2.10.2 Asymptotic normality of M-estimators 46
  • 2.10.3 Slutsky’s lemma 47
  • 2.10.4 Quantiles 47
  • 2.10.5 Alternative algorithms for M-estimators 47
  • 2.11 Recommendations and software 48
  • 2.12 Problems 49
  • 3 Measuring Robustness 51
  • 3.1 The influence function 55
  • 3.1.1 *The convergence of the SC to the IF 57
  • 3.2 The breakdown point 58
  • 3.2.1 Location M-estimators 59
  • 3.2.2 Scale and dispersion estimators 59
  • 3.2.3 Location with previously-computed dispersion estimator 60
  • 3.2.4 Simultaneous estimation 61.
  • 3.2.5 Finite-sample breakdown point 61
  • 3.3 Maximum asymptotic bias 62
  • 3.4 Balancing robustness and efficiency 64
  • 3.5 *“Optimal” robustness 66
  • 3.5.1 Bias- and variance-optimality of location estimators 66
  • 3.5.2 Bias optimality of scale and dispersion estimators 66
  • 3.5.3 The infinitesimal approach 67
  • 3.5.4 The Hampel approach 68
  • 3.5.5 Balancing bias and variance: the general problem 70
  • 3.6 Multidimensional parameters 70
  • 3.7 *Estimators as functionals 72
  • 3.8 Appendix: Proofs of results 76
  • 3.8.1 IF of general M-estimators 76
  • 3.8.2 Maximum BP of location estimators 76
  • 3.8.3 BP of location M-estimators 77
  • 3.8.4 Maximum bias of location M-estimators 79
  • 3.8.5 The minimax bias property of the median 80
  • 3.8.6 Minimizing the GES 80
  • 3.8.7 Hampel optimality 82
  • 3.9 Problems 85
  • 4 Linear Regression 1 87
  • 4.1 Introduction 87
  • 4.2 Review of the least squares method 91
  • 4.3 Classical methods for outlier detection 94
  • 4.4 Regression M-estimators 97
  • 4.4.1 M-estimators with known scale 99
  • 4.4.2 M-estimators with preliminary scale 100
  • 4.4.3 Simultaneous estimation of regression and scale 102
  • 4.5 Numerical computing of monotone M-estimators 103
  • 4.5.1 The L1 estimator 103
  • 4.5.2 M-estimators with smooth 𝜓-function 104
  • 4.6 BP of monotone regression estimators 104
  • 4.7 Robust tests for linear hypothesis 106
  • 4.7.1 Review of the classical theory 106
  • 4.7.2 Robust tests using M-estimators 108
  • 4.8 *Regression quantiles 109
  • 4.9 Appendix: Proofs and complements 110
  • 4.9.1 Why equivariance? 110
  • 4.9.2 Consistency of estimated slopes under asymmetric errors 110
  • 4.9.3 Maximum FBP of equivariant estimators 111
  • 4.9.4 The FBP of monotone M-estimators 112
  • 4.10 Recommendations and software 113
  • 4.11 Problems 113
  • 5 Linear Regression 2 115
  • 5.1 Introduction 115
  • 5.2 The linear model with random predictors 118
  • 5.3 M-estimators with a bounded 𝜌-function 119.
  • 5.3.1 Properties of M-estimators with a bounded 𝝆-function 120
  • 5.4 Estimators based on a robust residual scale 124
  • 5.4.1 S-estimators 124
  • 5.4.2 L-estimators of scale and the LTS estimator 126
  • 5.4.3 𝜏−estimators 127
  • 5.5 MM-estimators 128
  • 5.6 Robust inference and variable selection for M-estimators 133
  • 5.6.1 Bootstrap robust confidence intervals and tests 134
  • 5.6.2 Variable selection 135
  • 5.7 Algorithms 138
  • 5.7.1 Finding local minima 140
  • 5.7.2 Starting values: the subsampling algorithm 141
  • 5.7.3 A strategy for faster subsampling-based algorithms 143
  • 5.7.4 Starting values: the Peña-Yohai estimator 144
  • 5.7.5 Starting values with numeric and categorical predictors 146
  • 5.7.6 Comparing initial estimators 149
  • 5.8 Balancing asymptotic bias and efficiency 150
  • 5.8.1 “Optimal” redescending M-estimators 153
  • 5.9 Improving the efficiency of robust regression estimators 155
  • 5.9.1 Improving efficiency with one-step reweighting 155
  • 5.9.2 A fully asymptotically efficient one-step procedure 156
  • 5.9.3 Improving finite-sample efficiency and robustness 158
  • 5.9.4 Choosing a regression estimator 164
  • 5.10 Robust regularized regression 164
  • 5.10.1 Ridge regression 165
  • 5.10.2 Lasso regression 168
  • 5.10.3 Other regularized estimators 171
  • 5.11 *Other estimators 172
  • 5.11.1 Generalized M-estimators 172
  • 5.11.2 Projection estimators 174
  • 5.11.3 Constrained M-estimators 175
  • 5.11.4 Maximum depth estimators 175
  • 5.12 Other topics 176
  • 5.12.1 The exact fit property 176
  • 5.12.2 Heteroskedastic errors 177
  • 5.12.3 A robust multiple correlation coefficient 180
  • 5.13 *Appendix: proofs and complements 182
  • 5.13.1 The BP of monotone M-estimators with random X 182
  • 5.13.2 Heavy-tailed x 183
  • 5.13.3 Proof of the exact fit property 183
  • 5.13.4 The BP of S-estimators 184
  • 5.13.5 Asymptotic bias of M-estimators 186
  • 5.13.6 Hampel optimality for GM-estimators 187
  • 5.13.7 Justification of RFPE∗ 188.
  • 5.14 Recommendations and software 191
  • 5.15 Problems 191
  • 6 Multivariate Analysis 195
  • 6.1 Introduction 195
  • 6.2 Breakdown and efficiency of multivariate estimators 200
  • 6.2.1 Breakdown point 200
  • 6.2.2 The multivariate exact fit property 201
  • 6.2.3 Efficiency 201
  • 6.3 M-estimators 202
  • 6.3.1 Collinearity 205
  • 6.3.2 Size and shape 205
  • 6.3.3 Breakdown point 206
  • 6.4 Estimators based on a robust scale 207
  • 6.4.1 The minimum volume ellipsoid estimator 208
  • 6.4.2 S-estimators 208
  • 6.4.3 The MCD estimator 210
  • 6.4.4 S-estimators for high dimension 210
  • 6.4.5 𝜏-estimators 214
  • 6.4.6 One-step reweighting 215
  • 6.5 MM-estimators 215
  • 6.6 The Stahel-Donoho estimator 217
  • 6.7 Asymptotic bias 219
  • 6.8 Numerical computing of multivariate estimators 220
  • 6.8.1 Monotone M-estimators 220
  • 6.8.2 Local solutions for S-estimators 221
  • 6.8.3 Subsampling for estimators based on a robust scale 221
  • 6.8.4 The MVE 223
  • 6.8.5 Computation of S-estimators 223
  • 6.8.6 The MCD 223
  • 6.8.7 The Stahel-Donoho estimator 224
  • 6.9 Faster robust scatter matrix estimators 224
  • 6.9.1 Using pairwise robust covariances 224
  • 6.9.2 The Peña-Prieto procedure 228
  • 6.10 Choosing a location/scatter estimator 229
  • 6.10.1 Efficiency 230
  • 6.10.2 Behavior under contamination 231
  • 6.10.3 Computing times 232
  • 6.10.4 Tuning constants 233
  • 6.10.5 Conclusions 233
  • 6.11 Robust principal components 234
  • 6.11.1 Spherical principal components 236
  • 6.11.2 Robust PCA based on a robust scale 237
  • 6.12 Estimation of multivariate scatter and location with missing data 240
  • 6.12.1 Notation 240
  • 6.12.2 GS estimators for missing data 241
  • 6.13 Robust estimators under the cellwise contamination model 242
  • 6.14 Regularized robust estimators of the inverse of the covariance matrix 245
  • 6.15 Mixed linear models 246
  • 6.15.1 Robust estimation for MLM 248
  • 6.15.2 Breakdown point of MLM estimators 248
  • 6.15.3 S-estimators for MLMs 250.
  • 6.15.4 Composite 𝜏-estimators 250
  • 6.16 *Other estimators of location and scatter 254
  • 6.16.1 Projection estimators 254
  • 6.16.2 Constrained M-estimators 255
  • 6.16.3 Multivariate depth 256
  • 6.17 Appendix: proofs and complements 256
  • 6.17.1 Why affine equivariance? 256
  • 6.17.2 Consistency of equivariant estimators 256
  • 6.17.3 The estimating equations of the MLE 257
  • 6.17.4 Asymptotic BP of monotone M-estimators 258
  • 6.17.5 The estimating equations for S-estimators 260
  • 6.17.6 Behavior of S-estimators for high p 261
  • 6.17.7 Calculating the asymptotic covariance matrix of location M-estimators 262
  • 6.17.8 The exact fit property 263
  • 6.17.9 Elliptical distributions 264
  • 6.17.10 Consistency of Gnanadesikan-Kettenring correlations 265
  • 6.17.11 Spherical principal components 266
  • 6.17.12 Fixed point estimating equations and computing algorithm for the GS estimator 267
  • 6.18 Recommendations and software 268
  • 6.19 Problems 269
  • 7 Generalized Linear Models 271
  • 7.1 Binary response regression 271
  • 7.2 Robust estimators for the logistic model 275
  • 7.2.1 Weighted MLEs 275
  • 7.2.2 Redescending M-estimators 276
  • 7.3 Generalized linear models 281
  • 7.3.1 Conditionally unbiased bounded influence estimators 283
  • 7.4 Transformed M-estimators 284
  • 7.4.1 Definition of transformed M-estimators 284
  • 7.4.2 Some examples of variance-stabilizing transformations 286
  • 7.4.3 Other estimators for GLMs 286
  • 7.5 Recommendations and software 289
  • 7.6 Problems 290
  • 8 Time Series 293
  • 8.1 Time series outliers and their impact 294
  • 8.1.1 Simple examples of outliers influence 296
  • 8.1.2 Probability models for time series outliers 298
  • 8.1.3 Bias impact of AOs 301
  • 8.2 Classical estimators for AR models 302
  • 8.2.1 The Durbin-Levinson algorithm 305
  • 8.2.2 Asymptotic distribution of classical estimators 307
  • 8.3 Classical estimators for ARMA models 308
  • 8.4 M-estimators of ARMA models 310
  • 8.4.1 M-estimators and their asymptotic distribution 310.
  • 8.4.2 The behavior of M-estimators in AR processes with additive outliers 311
  • 8.4.3 The behavior of LS and M-estimators for ARMA processes with infinite innovation variance 312
  • 8.5 Generalized M-estimators 313
  • 8.6 Robust AR estimation using robust filters 315
  • 8.6.1 Naive minimum robust scale autoregression estimators 315
  • 8.6.2 The robust filter algorithm 316
  • 8.6.3 Minimum robust scale estimators based on robust filtering 318
  • 8.6.4 A robust Durbin-Levinson algorithm 319
  • 8.6.5 Choice of scale for the robust Durbin-Levinson procedure 320
  • 8.6.6 Robust identification of AR order 320
  • 8.7 Robust model identification 321
  • 8.8 Robust ARMA model estimation using robust filters 324
  • 8.8.1 𝜏-estimators of ARMA models 324
  • 8.8.2 Robust filters for ARMA models 326
  • 8.8.3 Robustly filtered 𝜏-estimators 328
  • 8.9 ARIMA and SARIMA models 329
  • 8.10 Detecting time series outliers and level shifts 333
  • 8.10.1 Classical detection of time series outliers and level shifts 334
  • 8.10.2 Robust detection of outliers and level shifts for ARIMA models 336
  • 8.10.3 REGARIMA models: estimation and outlier detection 338
  • 8.11 Robustness measures for time series 340
  • 8.11.1 Influence function 340
  • 8.11.2 Maximum bias 342
  • 8.11.3 Breakdown point 343
  • 8.11.4 Maximum bias curves for the AR (1) model 343
  • 8.12 Other approaches for ARMA models 345
  • 8.12.1 Estimators based on robust autocovariances 345
  • 8.12.2 Estimators based on memory-m prediction residuals 346
  • 8.13 High-efficiency robust location estimators 347
  • 8.14 Robust spectral density estimation 348
  • 8.14.1 Definition of the spectral density 348
  • 8.14.2 AR spectral density 349
  • 8.14.3 Classic spectral density estimation methods 349
  • 8.14.4 Prewhitening 350
  • 8.14.5 Influence of outliers on spectral density estimators 351
  • 8.14.6 Robust spectral density estimation 353
  • 8.14.7 Robust time-average spectral density estimator 354
  • 8.15 Appendix A: Heuristic derivation of the asymptotic distribution of M-estimators for ARMA models 356.
  • 8.16 Appendix B: Robust filter covariance recursions 359
  • 8.17 Appendix C: ARMA model state-space representation 360
  • 8.18 Recommendations and software 361
  • 8.19 Problems 361
  • 9 Numerical Algorithms 363
  • 9.1 Regression M-estimators 363
  • 9.2 Regression S-estimators 366
  • 9.3 The LTS-estimator 366
  • 9.4 Scale M-estimators 367
  • 9.4.1 Convergence of the fixed-point algorithm 367
  • 9.4.2 Algorithms for the non-concave case 368
  • 9.5 Multivariate M-estimators 369
  • 9.6 Multivariate S-estimators 370
  • 9.6.1 S-estimators with monotone weights 370
  • 9.6.2 The MCD 371
  • 9.6.3 S-estimators with non-monotone weights 371
  • 9.6.4 *Proof of (9.27) 372
  • 10 Asymptotic Theory of M-estimators 373
  • 10.1 Existence and uniqueness of solutions 374
  • 10.1.1 Redescending location estimators 375
  • 10.2 Consistency 376
  • 10.3 Asymptotic normality 377
  • 10.4 Convergence of the SC to the IF 379
  • 10.5 M-estimators of several parameters 381
  • 10.6 Location M-estimators with preliminary scale 384
  • 10.7 Trimmed means 386
  • 10.8 Optimality of the MLE 386
  • 10.9 Regression M-estimators: existence and uniqueness 388
  • 10.10 Regression M-estimators: asymptotic normality 389
  • 10.10.1 Fixed X 389
  • 10.10.2 Asymptotic normality: random X 394
  • 10.11 Regression M estimators: Fisher-consistency 394
  • 10.11.1 Redescending estimators 394
  • 10.11.2 Monotone estimators 396
  • 10.12 Nonexistence of moments of the sample median 398
  • 10.13 Problems 399
  • 11 Description of Datasets 401
  • References 407
  • Index 423.