5  Supervised Learning

using Rtemis, DataFrames

Print available algorithms:

modselect()

Note: Many algorithms have not been implemented in Julia yet.

5.1 Synthetic data

x = rnormmat(100, 10);
y = x[:, 3] .+ x[:, 5].^2 .+ randn(100);
dat = hcat(x, DataFrame(y = y));
res = resample(dat);
dat_train, dat_test = split(dat, res)

5.2 Training Individual Models

5.2.1 Generalized Linear Model (GLM)

mod_glm = s_GLM(dat_train, dat_test)

5.2.2 LASSO/Elastic Net

mod_elnet = s_LASSO(dat_train, dat_test)

5.2.3 Classification and Regression Tree (CART)

mod_cart = s_CART(dat_train, dat_test)

5.2.4 Random Forest

mod_rf = s_RF(dat_train, dat_test)

5.2.5 Gradient Boosting with XGBoost

mod_xgb = s_XGB(dat_train, dat_test)

5.3 Automatic Hyperparameter Tuning

Each learner will perform automatic hyperparameter tuning using gridsearch() if instead of a single value, you pass a vector of values to any of its tunable parameters.

For example, to tune CART’s max_depth:

mod_cart = s_CART(dat_train, dat_test, max_depth = [3, 5, 7, 9])

5.4 Cross-validation

elevate() performs cross-validation by resampling the input dataset using resample() and training and testing using the specified algorithm and parameters on each resample. If a vector of values is passed to any parameter, this automatically results in nested cross-validation.

mod_el = elevate(dat, modname = :CART)

5.4.1 Nested Cross-validation

Model tuning within each cross-validation iteration

mod_el = elevate(dat, modname = :CART, modparams = (max_depth = [3, 5, 7, 9],))