1
Overview
1.1
RMarkdown files
1.2
Directory structure
1.3
Acknowledgments
2
Initial data import
2.1
Import raw data
2.2
Merge data
2.3
Recoding
2.4
Exclusions
2.5
Categoricals to factors
2.6
Data structure
2.7
Extreme value review
2.8
Remove constant predictors
2.9
Tutorial-only: add random missingness
2.10
Summarize predictors
Save unimputed dataset
3
Missing values and imputation
3.1
Load data
3.2
Examine predictor missingness
3.2.1
Missingness table
3.2.2
Missingness heatmap
3.2.3
Missingness count plot
3.3
Examine outcome missingness
3.4
Impute missing predictor values
3.4.1
Missingness indicators
3.4.2
Impute to 0
3.4.3
GLRM prep
3.4.4
Start h2o
3.4.5
Load data into h2o
3.4.6
GLRM train/test split
3.4.7
Define GLRM grid
3.4.8
GLRM grid search
3.4.9
Apply best GLRM
3.4.10
Review GLRM
3.4.11
Evaluate imputation
3.4.12
Replace missing values.
3.4.13
Shutdown h2o
3.5
Update predictor summary
3.6
Histogram condense
3.7
Update predictor summary
Save imputed dataset
4
Dataset finalization
Load data
4.1
Factors to indicators
4.2
Remove collinear predictors
4.3
Confirm predictor matrix invertability
Save finalized dataset
5
Exploratory data analysis
Load data
TBD
6
Modeling
Load data
6.1
Random forest
6.1.1
RF convergence plot
6.2
Ensemble
6.2.1
Prep SL library
6.2.2
Estimate SuperLearner
6.2.3
Review SL results
6.2.4
SL PR-AUC
6.2.5
SL plots
6.3
Nested ensemble
6.3.1
Ensemble weights
6.3.2
AUC analysis
6.3.3
Precision-Recall analysis
6.3.4
Brier score
6.3.5
Index of prediction accuracy
7
Calibration
TBD
8
Interpretation
Load data
8.1
Variable importance
8.1.1
Random forest
8.2
Accumulated local effect plots
Predictive modeling in R
Chapter 7
Calibration
TBD