How to perform feature selection and hyperparameter optimization in cross validation?

2018-06-13 19:06:17

note: I read a lot of the questions already posted on this topic, but still have some confusion.

I want to perform feature selection and model selection for multiple models e.g. Random forest (RF), Support vector machine (SVM), lasso regression. There seem to be a few ways to do feature selection (fs) or hyper parameter optimization (hpo) through cross validation (cv). My data set is n~700 (sample size) and p = 272 (number of features). However, adding another set of features could increase p to ~20272.

My current plan is the following:

Run whatever resampling method (k fold or Monte carlo) to get different splits of pseudo test and training data.

In each iteration of resampling:

Run feature selection on pseudo training data

Increment counts for which top variables are selected

Train model using those features on pseudo training data

Get estimate for how well it does by testing on pseudo test data

Now we can select our feature set by taking the top k selected varia