5 Supervised Learning
using Rtemis, DataFrames
Print available algorithms:
modselect()
Note: Many algorithms have not been implemented in Julia yet.
5.1 Synthetic data
x = rnormmat(100, 10);
y = x[:, 3] .+ x[:, 5].^2 .+ randn(100);
dat = hcat(x, DataFrame(y = y));
res = resample(dat);
dat_train, dat_test = split(dat, res)
5.2 Training Individual Models
5.2.1 Generalized Linear Model (GLM)
mod_glm = s_GLM(dat_train, dat_test)
5.2.2 LASSO/Elastic Net
mod_elnet = s_LASSO(dat_train, dat_test)
5.2.3 Classification and Regression Tree (CART)
mod_cart = s_CART(dat_train, dat_test)
5.2.4 Random Forest
mod_rf = s_RF(dat_train, dat_test)
5.2.5 Gradient Boosting with XGBoost
mod_xgb = s_XGB(dat_train, dat_test)
5.3 Automatic Hyperparameter Tuning
Each learner will perform automatic hyperparameter tuning using gridsearch()
if instead of a single value, you pass a vector of values to any of its tunable parameters.
For example, to tune CART’s max_depth
:
mod_cart = s_CART(dat_train, dat_test, max_depth = [3, 5, 7, 9])
5.4 Cross-validation
elevate()
performs cross-validation by resampling the input dataset using resample()
and training and testing using the specified algorithm and parameters on each resample. If a vector of values is passed to any parameter, this automatically results in nested cross-validation.
mod_el = elevate(dat, modname = :CART)
5.4.1 Nested Cross-validation
Model tuning within each cross-validation iteration
mod_el = elevate(dat, modname = :CART, modparams = (max_depth = [3, 5, 7, 9],))