27  Nested Cross-Validation Example - mtry

\(~\)


\(~\)

Install if not Function

install_if_not <- function( list.of.packages ) {
  new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
  if(length(new.packages)) { install.packages(new.packages) } else { print(paste0("the package '", list.of.packages , "' is already installed")) }
}

\(~\)

\(~\)

\(~\)

27.1 Getting Started!

\(~\)

The rsample::initial_split function works similarly to the caret::createDataPartition function.

The rsample::training and rsample::testing functions will return back the training or testing data.

diab_pop <- readRDS('C:/Users/jkyle/Documents/GitHub/Intro_Jeff_Data_Science/DATA/diab_pop.RDS')

library('tidyverse')
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
diab_pop.no_na_vals <- diab_pop %>% na.omit()

install_if_not('rsample')
[1] "the package 'rsample' is already installed"
library(rsample) 

# this will ensure our results are the same every run, to randomize you may use: set.seed(Sys.time())
set.seed(8675309)

train_test <- initial_split(diab_pop.no_na_vals, prop = .6, strata=diq010)
TRAIN <- training(train_test)

\(~\)

\(~\)


\(~\)

\(~\)

27.2 Using nested_cv

A typical scheme for splitting the data when developing a predictive model is to create an initial split of the data into a training and test set.

Then resampling is often done on subests of the training set.

In rsample, the term analysis set is used for subsets of the resampled training set; the assessment sets are the corresponding hold-out sets used to compute performance.

\(~\)

Source: https://tidymodels.github.io/rsample/articles/Applications/Nested_Resampling.html

\(~\)

Grid Search is common practiace in resampling:

  1. Create a tuning grid of tuning parameters s
  2. Apply combinations of tuning parameters on the grid tuning grid with resampling.
  3. Each time, the assessment data are used to estimate performance metrics that maybe pooled to tune model performance

\(~\)

Nested resampling does an additional layer of resampling that separates the tuning activities from the process used to estimate the efficacy of the model:

  1. An outer resampling scheme is used and, for every split in the outer resample, another full set of resampling splits are created on the original analysis set.
  2. Once the tuning results are complete, a model is fit to each of the outer resampling splits using the best parameter associated with that resample.
  3. The average of the outer method’s assessment sets are a unbiased estimate of the model.
  4. For example: To attempt to tune a hyper-parameter with 10 levels, we use a 10-fold cross-validation is used on the outside, and 5-fold cross-validation on the inside. Then:
    • The parameter tuning will be conducted 10 times and the best parameters are determined from the average of the 5 assessment sets.
    • This process occurs 10 times.
    • In total, 500 models would be fit.

\(~\)

\(~\)

27.2.1 Get Started

We will use this example to tune the tree hyper-parameter from the randomForest function in the radomForest package. We will choose from a list of 12 values for which we will vary mtry in seq(1,24,2).

We will use the follow for our nested resampling strategy:

  • OUTER: A twice-repeated 5-fold cross validation will be used as the outer resampling method; it will be used to generate estimates of the overall performance.
  • INNER: To tune the model, we will use estimates from each of the values of the tuning parameter from the 3 bootstrap iterations.

\(~\)

A schema of our sampling strategy

\(~\)

This means that for each of the 12 values that mtry can take on in seq(1,24,2) we will have 5 * 2 * 3 = 30 models to be fit.

So, in total we will fit 5 * 2 * 3 * 12 = 360 models!

\(~\)

\(~\)

27.3 Create nested_cv object

Create the tibble with the resampling specifications:

TRAIN.nested_cv <- nested_cv(TRAIN, 
                             outside = vfold_cv(v = 5, repeats = 2), 
                             inside = bootstraps(times = 3))
TRAIN.nested_cv
# Nested resampling:
#  outer: 5-fold cross-validation repeated 2 times
#  inner: Bootstrap sampling
# A tibble: 10 × 4
   splits            id      id2   inner_resamples
   <list>            <chr>   <chr> <list>         
 1 <split [900/225]> Repeat1 Fold1 <boot [3 × 2]> 
 2 <split [900/225]> Repeat1 Fold2 <boot [3 × 2]> 
 3 <split [900/225]> Repeat1 Fold3 <boot [3 × 2]> 
 4 <split [900/225]> Repeat1 Fold4 <boot [3 × 2]> 
 5 <split [900/225]> Repeat1 Fold5 <boot [3 × 2]> 
 6 <split [900/225]> Repeat2 Fold1 <boot [3 × 2]> 
 7 <split [900/225]> Repeat2 Fold2 <boot [3 × 2]> 
 8 <split [900/225]> Repeat2 Fold3 <boot [3 × 2]> 
 9 <split [900/225]> Repeat2 Fold4 <boot [3 × 2]> 
10 <split [900/225]> Repeat2 Fold5 <boot [3 × 2]> 
str(TRAIN.nested_cv,1)
nest_cv [10 × 4] (S3: nested_cv/vfold_cv/rset/tbl_df/tbl/data.frame)
 - attr(*, "v")= num 5
 - attr(*, "repeats")= num 2
 - attr(*, "breaks")= num 4
 - attr(*, "pool")= num 0.1
 - attr(*, "fingerprint")= chr "f039727a2aed6bda912104e921559e5c"
 - attr(*, "outside")= language vfold_cv(v = 5, repeats = 2)
 - attr(*, "inside")= language bootstraps(times = 3)

\(~\)

\(~\)

\(~\)

We can access all of the resamples of data:

TRAIN.nested_cv$inner_resamples
[[1]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/326]> Bootstrap1
2 <split [900/320]> Bootstrap2
3 <split [900/346]> Bootstrap3

[[2]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/324]> Bootstrap1
2 <split [900/343]> Bootstrap2
3 <split [900/352]> Bootstrap3

[[3]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/339]> Bootstrap1
2 <split [900/325]> Bootstrap2
3 <split [900/318]> Bootstrap3

[[4]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/321]> Bootstrap1
2 <split [900/340]> Bootstrap2
3 <split [900/319]> Bootstrap3

[[5]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/340]> Bootstrap1
2 <split [900/323]> Bootstrap2
3 <split [900/335]> Bootstrap3

[[6]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/333]> Bootstrap1
2 <split [900/336]> Bootstrap2
3 <split [900/333]> Bootstrap3

[[7]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/322]> Bootstrap1
2 <split [900/328]> Bootstrap2
3 <split [900/317]> Bootstrap3

[[8]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/329]> Bootstrap1
2 <split [900/335]> Bootstrap2
3 <split [900/319]> Bootstrap3

[[9]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/327]> Bootstrap1
2 <split [900/336]> Bootstrap2
3 <split [900/337]> Bootstrap3

[[10]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/351]> Bootstrap1
2 <split [900/326]> Bootstrap2
3 <split [900/330]> Bootstrap3

\(~\)

\(~\)

\(~\)

We can also access individual samples or data sets used for training or assessment:

TRAIN.nested_cv$inner_resamples[[5]]
# Bootstrap sampling 
# A tibble: 3 × 2
  splits            id        
  <list>            <chr>     
1 <split [900/340]> Bootstrap1
2 <split [900/323]> Bootstrap2
3 <split [900/335]> Bootstrap3
TRAIN.nested_cv$inner_resamples[[5]]$splits[[2]]
<Analysis/Assess/Total>
<900/323/900>
glimpse(   analysis( TRAIN.nested_cv$inner_resamples[[5]]$splits[[2]] ) ) # used to train
Rows: 900
Columns: 10
$ seqn     <dbl> 85990, 87964, 86528, 85304, 86146, 86284, 87506, 86817, 86360…
$ riagendr <fct> Female, Male, Female, Male, Male, Male, Male, Male, Male, Fem…
$ ridageyr <dbl> 50, 21, 37, 68, 40, 46, 48, 60, 50, 42, 80, 43, 48, 31, 37, 6…
$ ridreth1 <fct> Non-Hispanic White, MexicanAmerican, Non-Hispanic Black, Non-…
$ dmdeduc2 <fct> Some college or AA degrees, Some college or AA degrees, Some …
$ dmdmartl <fct> Married, Never married, Never married, Married, Married, Marr…
$ indhhin2 <fct> "$100,000+", "$15,000-$19,999", "$45,000-$54,999", "$25,000-$…
$ bmxbmi   <dbl> 33.6, 28.0, 23.9, 27.0, 26.5, 29.2, 27.7, 30.4, 41.2, 34.5, 2…
$ diq010   <fct> No Diabetes, No Diabetes, No Diabetes, No Diabetes, No Diabet…
$ lbxglu   <dbl> 87, 86, 90, 91, 101, 79, 94, 119, 116, 100, 92, 111, 94, 103,…
glimpse( assessment( TRAIN.nested_cv$inner_resamples[[5]]$splits[[2]] ) ) # used to assess
Rows: 323
Columns: 10
$ seqn     <dbl> 84424, 84443, 84627, 84685, 85192, 85363, 85471, 85521, 85928…
$ riagendr <fct> Male, Male, Male, Female, Female, Female, Male, Female, Male,…
$ ridageyr <dbl> 69, 64, 80, 27, 53, 66, 63, 71, 66, 38, 55, 80, 80, 72, 37, 7…
$ ridreth1 <fct> Non-Hispanic White, Other, MexicanAmerican, Non-Hispanic Whit…
$ dmdeduc2 <fct> College grad or above, Some college or AA degrees, Less than …
$ dmdmartl <fct> Divorced, Married, Married, Married, Living with partner, Mar…
$ indhhin2 <fct> "$25,000-$34,999", "$25,000-$34,999", "$10,000-$14,999", "$45…
$ bmxbmi   <dbl> 33.5, 33.8, 29.1, 25.4, 22.5, 23.2, 21.7, 31.7, 29.3, 37.5, 3…
$ diq010   <fct> Diabetes, Diabetes, Diabetes, Diabetes, Diabetes, Diabetes, D…
$ lbxglu   <dbl> 429, 134, 125, 100, 122, 238, 156, 147, 154, 87, 311, 125, 19…

\(~\)

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.4 Function mapping with purrr

We will frequently make use of map functions within the purrr library.

library(purrr)

\(~\)

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.5 Helper Functions

\(~\)

The majority of the work is done by using a sequence of helper-functions and purrr mappers.

We know our goal is to work with the inner_resamples of our TRAIN.nested_cv tibble.

Given a cut of data, and tuning parameters mtry %in% seq(1,24,2) we will write a function to estimate F1:

\(~\)

\(~\)

27.5.1 F1_Tree Helper Function

Here we can borrow from what we already know about a predicting and scoring models along with some helper functions.

We want the first input of our function to be a cut of data, an object from our rsplit::nested_cv function to the TRAIN.nested_cv tibble and our second to be a value of ntree:

This function will:

  • take: a cut of data and tuning-parameter
  • train a model with: a value of tuning-parameter and the corresponding training set
  • score the corresponding testing data
  • return the estimate for the f1 metric
f1_mtry <- function(object, my_mtry = 4) {  # my_tree = 761

  model_rf <- randomForest::randomForest(diq010 ~ riagendr + ridageyr + ridreth1 + dmdeduc2 + dmdmartl + indhhin2 + bmxbmi + lbxglu,
                       data = analysis(object),
                       ntree = 637,
                       mtry = my_mtry)
  
  holdout_pred_class <- predict(model_rf, assessment(object), "class")
  
  dt <- tibble::as_tibble(cbind(assessment(object), holdout_pred_class))
  
  yd_f1 <- yardstick::f_meas(data=dt, truth=diq010, holdout_pred_class)
    
  F1_Score <- yd_f1$.estimate 
  
  return(F1_Score)
}

# test the function
f1_mtry(TRAIN.nested_cv$inner_resamples[[5]]$splits[[3]])
[1] 0.6868687

\(~\)

\(~\)

27.5.2 F1_Tree_Wrapper Function

If you understood the last function, this one is even easier. This function will allow us to reverse the call order of f1_tree:

f1_mtry_wrapper <- function(my_mtry , object){ f1_mtry(object, my_mtry) }

f1_mtry_wrapper( 4, TRAIN.nested_cv$inner_resamples[[5]]$splits[[3]] )
[1] 0.6868687

\(~\)

\(~\)

27.5.3 Tune_Over_Mtry Function

We will develop the function that will allow us to tune over mtry.

Again, this function’s input expects a sample of data as it’s input.

Given a sample of data, for each value of the tuning parameter, we want to make an assessment:

tune_over_mtry <- function(object) {
  tune_over_trees_results <- tibble( mtry = seq(1,24,2) )  
  
  tune_over_trees_results$F1_Score <- map_dbl(tune_over_trees_results$mtry, 
                                             f1_mtry_wrapper,
                                             object = object)
  tune_over_trees_results
}

# test the function
tune_over_mtry( TRAIN.nested_cv$inner_resamples[[1]]$splits[[2]] )
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.3  
 2     3    0.635
 3     5    0.635
 4     7    0.602
 5     9    0.593
 6    11    0.593
 7    13    0.610
 8    15    0.610
 9    17    0.602
10    19    0.610
11    21    0.610
12    23    0.610

\(~\)

\(~\)

27.5.4 Summarise_Tune_Results Function

This function will be used to summarise our results from the inner nest into the outer nest.

This function will again take an inner object from rsample::nested_cv

  1. map_df maps the sample data to the tune_over_trees function which is already waiting for it; and an inner assessment is made.
  2. Return the row-bound tibble map_df(object$splits, tune_over_trees) is our result from that has 3 bootstrap results
  3. Next we group the assessments by the value of ntree we perform a summary step. * Compute the count, max, median, and min of the F1_Score from the inner bootstrap estimate.
    • Since we have only have 3 bootstraps, the returning max, median, and min values will completely contain all the estimates from the inner nest, so we can pool them into the Outer Layer.
summarize_tune_results <- function(object) {
  map_df(object$splits, tune_over_mtry)  %>% 
    group_by(mtry) %>%
    summarize(max_F1_Score = max(F1_Score, na.rm = TRUE),
              median_F1_Score = median(F1_Score, na.rm = TRUE),
              min_F1_Score = min(F1_Score, na.rm = TRUE),
              n = length(F1_Score))
}

We can’t test this function for a slice of data, as it’s intended to be used with another function:

summarize_tune_results( TRAIN.nested_cv$inner_resamples[[1]]$splits[[2]] )
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `mtry` is not found.
#### Summary below

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.6 Run Inner Nest

Now to run the inner nest!

toc <- Sys.time() # set timer

tuning_results <- map(TRAIN.nested_cv$inner_resamples, 
                      summarize_tune_results) # run inner samples
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
tic <- Sys.time() # clock
diff.inner <- tic - toc # compute difference 
diff.inner # print difference
Time difference of 2.670991 mins
tuning_results
[[1]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.493           0.308        0.231     3
 2     3        0.635           0.613        0.609     3
 3     5        0.630           0.605        0.602     3
 4     7        0.610           0.595        0.543     3
 5     9        0.610           0.587        0.563     3
 6    11        0.610           0.583        0.568     3
 7    13        0.610           0.587        0.556     3
 8    15        0.595           0.593        0.587     3
 9    17        0.610           0.587        0.575     3
10    19        0.610           0.597        0.595     3
11    21        0.610           0.595        0.587     3
12    23        0.610           0.587        0.563     3

[[2]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.299           0.281        0.185     3
 2     3        0.660           0.612        0.588     3
 3     5        0.673           0.605        0.593     3
 4     7        0.636           0.633        0.609     3
 5     9        0.636           0.587        0.58      3
 6    11        0.636           0.602        0.596     3
 7    13        0.636           0.615        0.602     3
 8    15        0.626           0.621        0.587     3
 9    17        0.621           0.587        0.58      3
10    19        0.621           0.602        0.602     3
11    21        0.636           0.62         0.602     3
12    23        0.636           0.596        0.587     3

[[3]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.494           0.4          0.386     3
 2     3        0.681           0.658        0.609     3
 3     5        0.674           0.667        0.596     3
 4     7        0.680           0.659        0.589     3
 5     9        0.646           0.634        0.583     3
 6    11        0.673           0.651        0.577     3
 7    13        0.667           0.634        0.577     3
 8    15        0.651           0.646        0.583     3
 9    17        0.660           0.634        0.577     3
10    19        0.660           0.634        0.568     3
11    21        0.651           0.646        0.583     3
12    23        0.667           0.651        0.577     3

[[4]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.5             0.448        0.281     3
 2     3        0.722           0.667        0.667     3
 3     5        0.686           0.643        0.623     3
 4     7        0.642           0.628        0.593     3
 5     9        0.654           0.607        0.585     3
 6    11        0.660           0.615        0.614     3
 7    13        0.654           0.614        0.593     3
 8    15        0.654           0.636        0.608     3
 9    17        0.654           0.607        0.593     3
10    19        0.648           0.607        0.593     3
11    21        0.644           0.642        0.585     3
12    23        0.660           0.621        0.6       3

[[5]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.441           0.405        0.346     3
 2     3        0.673           0.618        0.554     3
 3     5        0.687           0.609        0.523     3
 4     7        0.66            0.580        0.492     3
 5     9        0.653           0.6          0.492     3
 6    11        0.647           0.592        0.492     3
 7    13        0.66            0.592        0.492     3
 8    15        0.667           0.620        0.515     3
 9    17        0.641           0.583        0.492     3
10    19        0.667           0.6          0.515     3
11    21        0.647           0.6          0.492     3
12    23        0.653           0.6          0.492     3

[[6]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.3             0.261        0.205     3
 2     3        0.658           0.604        0.584     3
 3     5        0.625           0.615        0.566     3
 4     7        0.615           0.574        0.574     3
 5     9        0.615           0.596        0.579     3
 6    11        0.615           0.574        0.574     3
 7    13        0.615           0.574        0.553     3
 8    15        0.615           0.565        0.561     3
 9    17        0.608           0.574        0.559     3
10    19        0.615           0.589        0.574     3
11    21        0.615           0.611        0.587     3
12    23        0.615           0.574        0.574     3

[[7]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.375           0.321        0.271     3
 2     3        0.682           0.627        0.568     3
 3     5        0.681           0.602        0.579     3
 4     7        0.674           0.587        0.556     3
 5     9        0.659           0.587        0.533     3
 6    11        0.630           0.579        0.533     3
 7    13        0.630           0.587        0.533     3
 8    15        0.638           0.587        0.539     3
 9    17        0.674           0.587        0.533     3
10    19        0.667           0.587        0.545     3
11    21        0.645           0.587        0.539     3
12    23        0.674           0.595        0.539     3

[[8]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.542           0.517        0.4       3
 2     3        0.694           0.667        0.667     3
 3     5        0.703           0.667        0.646     3
 4     7        0.684           0.667        0.627     3
 5     9        0.667           0.667        0.657     3
 6    11        0.684           0.667        0.638     3
 7    13        0.667           0.658        0.647     3
 8    15        0.667           0.65         0.647     3
 9    17        0.667           0.667        0.647     3
10    19        0.675           0.667        0.638     3
11    21        0.684           0.667        0.638     3
12    23        0.684           0.667        0.647     3

[[9]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.471           0.4          0.382     3
 2     3        0.7             0.659        0.575     3
 3     5        0.7             0.659        0.627     3
 4     7        0.651           0.649        0.619     3
 5     9        0.644           0.641        0.605     3
 6    11        0.649           0.644        0.595     3
 7    13        0.636           0.605        0.605     3
 8    15        0.641           0.636        0.571     3
 9    17        0.636           0.632        0.605     3
10    19        0.649           0.644        0.595     3
11    21        0.644           0.613        0.605     3
12    23        0.632           0.629        0.595     3

[[10]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.423           0.406        0.273     3
 2     3        0.667           0.644        0.525     3
 3     5        0.696           0.624        0.535     3
 4     7        0.667           0.652        0.535     3
 5     9        0.637           0.633        0.554     3
 6    11        0.667           0.652        0.558     3
 7    13        0.667           0.637        0.554     3
 8    15        0.66            0.652        0.535     3
 9    17        0.673           0.645        0.554     3
10    19        0.653           0.637        0.558     3
11    21        0.653           0.652        0.535     3
12    23        0.639           0.637        0.535     3

\(~\)

\(~\)

27.6.1 Inner Nest Results

head(str(tuning_results))
List of 10
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.493 0.635 0.63 0.61 0.61 ...
  ..$ median_F1_Score: num [1:12] 0.308 0.613 0.605 0.595 0.587 ...
  ..$ min_F1_Score   : num [1:12] 0.231 0.609 0.602 0.543 0.563 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.299 0.66 0.673 0.636 0.636 ...
  ..$ median_F1_Score: num [1:12] 0.281 0.612 0.605 0.633 0.587 ...
  ..$ min_F1_Score   : num [1:12] 0.185 0.588 0.593 0.609 0.58 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.494 0.681 0.674 0.68 0.646 ...
  ..$ median_F1_Score: num [1:12] 0.4 0.658 0.667 0.659 0.634 ...
  ..$ min_F1_Score   : num [1:12] 0.386 0.609 0.596 0.589 0.583 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.5 0.722 0.686 0.642 0.654 ...
  ..$ median_F1_Score: num [1:12] 0.448 0.667 0.643 0.628 0.607 ...
  ..$ min_F1_Score   : num [1:12] 0.281 0.667 0.623 0.593 0.585 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.441 0.673 0.687 0.66 0.653 ...
  ..$ median_F1_Score: num [1:12] 0.405 0.618 0.609 0.58 0.6 ...
  ..$ min_F1_Score   : num [1:12] 0.346 0.554 0.523 0.492 0.492 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.3 0.658 0.625 0.615 0.615 ...
  ..$ median_F1_Score: num [1:12] 0.261 0.604 0.615 0.574 0.596 ...
  ..$ min_F1_Score   : num [1:12] 0.205 0.584 0.566 0.574 0.579 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.375 0.682 0.681 0.674 0.659 ...
  ..$ median_F1_Score: num [1:12] 0.321 0.627 0.602 0.587 0.587 ...
  ..$ min_F1_Score   : num [1:12] 0.271 0.568 0.579 0.556 0.533 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.542 0.694 0.703 0.684 0.667 ...
  ..$ median_F1_Score: num [1:12] 0.517 0.667 0.667 0.667 0.667 ...
  ..$ min_F1_Score   : num [1:12] 0.4 0.667 0.646 0.627 0.657 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.471 0.7 0.7 0.651 0.644 ...
  ..$ median_F1_Score: num [1:12] 0.4 0.659 0.659 0.649 0.641 ...
  ..$ min_F1_Score   : num [1:12] 0.382 0.575 0.627 0.619 0.605 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
 $ : tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ mtry           : num [1:12] 1 3 5 7 9 11 13 15 17 19 ...
  ..$ max_F1_Score   : num [1:12] 0.423 0.667 0.696 0.667 0.637 ...
  ..$ median_F1_Score: num [1:12] 0.406 0.644 0.624 0.652 0.633 ...
  ..$ min_F1_Score   : num [1:12] 0.273 0.525 0.535 0.535 0.554 ...
  ..$ n              : int [1:12] 3 3 3 3 3 3 3 3 3 3 ...
NULL
tuning_results[[1]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.493           0.308        0.231     3
 2     3        0.635           0.613        0.609     3
 3     5        0.630           0.605        0.602     3
 4     7        0.610           0.595        0.543     3
 5     9        0.610           0.587        0.563     3
 6    11        0.610           0.583        0.568     3
 7    13        0.610           0.587        0.556     3
 8    15        0.595           0.593        0.587     3
 9    17        0.610           0.587        0.575     3
10    19        0.610           0.597        0.595     3
11    21        0.610           0.595        0.587     3
12    23        0.610           0.587        0.563     3
tuning_results[[6]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.3             0.261        0.205     3
 2     3        0.658           0.604        0.584     3
 3     5        0.625           0.615        0.566     3
 4     7        0.615           0.574        0.574     3
 5     9        0.615           0.596        0.579     3
 6    11        0.615           0.574        0.574     3
 7    13        0.615           0.574        0.553     3
 8    15        0.615           0.565        0.561     3
 9    17        0.608           0.574        0.559     3
10    19        0.615           0.589        0.574     3
11    21        0.615           0.611        0.587     3
12    23        0.615           0.574        0.574     3
label_nested_cv <- TRAIN.nested_cv %>% 
  mutate(RowNumber = row_number()) %>% 
  select(RowNumber,id,id2) 

label_nested_cv
# A tibble: 10 × 3
   RowNumber id      id2  
       <int> <chr>   <chr>
 1         1 Repeat1 Fold1
 2         2 Repeat1 Fold2
 3         3 Repeat1 Fold3
 4         4 Repeat1 Fold4
 5         5 Repeat1 Fold5
 6         6 Repeat2 Fold1
 7         7 Repeat2 Fold2
 8         8 Repeat2 Fold3
 9         9 Repeat2 Fold4
10        10 Repeat2 Fold5
tuning_results_2 <- NULL

for (i in 1:length(tuning_results)){
  tuning_results_2[[i]] <- cbind( tuning_results[[i]], label_nested_cv[i, c('id','id2')]  )  
  }

tuning_results_2[[1]]
   mtry max_F1_Score median_F1_Score min_F1_Score n      id   id2
1     1    0.4927536       0.3076923    0.2307692 3 Repeat1 Fold1
2     3    0.6352941       0.6133333    0.6086957 3 Repeat1 Fold1
3     5    0.6301370       0.6052632    0.6024096 3 Repeat1 Fold1
4     7    0.6097561       0.5945946    0.5428571 3 Repeat1 Fold1
5     9    0.6097561       0.5866667    0.5633803 3 Repeat1 Fold1
6    11    0.6097561       0.5833333    0.5675676 3 Repeat1 Fold1
7    13    0.6097561       0.5866667    0.5555556 3 Repeat1 Fold1
8    15    0.5945946       0.5925926    0.5866667 3 Repeat1 Fold1
9    17    0.6097561       0.5866667    0.5753425 3 Repeat1 Fold1
10   19    0.6097561       0.5974026    0.5945946 3 Repeat1 Fold1
11   21    0.6097561       0.5945946    0.5866667 3 Repeat1 Fold1
12   23    0.6097561       0.5866667    0.5633803 3 Repeat1 Fold1
library(ggplot2)
library(scales)

Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
pooled_inner <- tuning_results_2 %>% bind_rows() 

str(pooled_inner)
'data.frame':   120 obs. of  7 variables:
 $ mtry           : num  1 3 5 7 9 11 13 15 17 19 ...
 $ max_F1_Score   : num  0.493 0.635 0.63 0.61 0.61 ...
 $ median_F1_Score: num  0.308 0.613 0.605 0.595 0.587 ...
 $ min_F1_Score   : num  0.231 0.609 0.602 0.543 0.563 ...
 $ n              : int  3 3 3 3 3 3 3 3 3 3 ...
 $ id             : chr  "Repeat1" "Repeat1" "Repeat1" "Repeat1" ...
 $ id2            : chr  "Fold1" "Fold1" "Fold1" "Fold1" ...

\(~\)

\(~\)

\(~\)

\(~\)

27.6.2 Inner Nest Results Graphs

Check out the Facet - Fold tab below:

27.6.2.1 Facet - Fold

ggplot(pooled_inner, 
       aes(x = mtry , y = median_F1_Score, shape = id, color = id )) + 
  geom_point(aes(size = median_F1_Score)) +
  geom_line() +
  facet_wrap(.~id2)

27.6.2.2 Facet - Fold with Errors

ggplot(pooled_inner, 
       aes(x = mtry , y = median_F1_Score, shape = id, color = id )) + 
  geom_line() +
  geom_errorbar(aes(ymin=min_F1_Score, ymax=max_F1_Score)) +
  facet_wrap(.~id2)

27.6.2.3 Facet - Repeat

ggplot(pooled_inner, 
       aes(x = mtry , y = median_F1_Score, shape = id2, color = id2 )) + 
  geom_point(aes(size = median_F1_Score)) +
  geom_line() +
  facet_wrap(.~id)

27.6.2.4 Unorganized

ggplot(pooled_inner, 
       aes(x = mtry , y = median_F1_Score, shape = id2, color = id )) + 
  geom_point(aes(size = median_F1_Score)) +
  geom_line() 

27.7

#{.unnumbered}

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.8 Max Best F1 Value Function

We will use this helper function to select the corresponding row of data that contains the max_F1_Score from our summary in pooled_inner.

max_BEST_F1 <- function(data) { data[ which.max(data$max_F1_Score) , ] }


# test the function
tuning_results[[10]]
# A tibble: 12 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     1        0.423           0.406        0.273     3
 2     3        0.667           0.644        0.525     3
 3     5        0.696           0.624        0.535     3
 4     7        0.667           0.652        0.535     3
 5     9        0.637           0.633        0.554     3
 6    11        0.667           0.652        0.558     3
 7    13        0.667           0.637        0.554     3
 8    15        0.66            0.652        0.535     3
 9    17        0.673           0.645        0.554     3
10    19        0.653           0.637        0.558     3
11    21        0.653           0.652        0.535     3
12    23        0.639           0.637        0.535     3
max_BEST_F1(tuning_results[[10]])
# A tibble: 1 × 5
   mtry max_F1_Score median_F1_Score min_F1_Score     n
  <dbl>        <dbl>           <dbl>        <dbl> <int>
1     5        0.696           0.624        0.535     3

\(~\)

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.9 Running Outer Nest From Inner Pooled

We will find the values of ntree that had the highest performing F1 by sample, we will map those and an outer split of the sample to f1_tree to obtain our outer-inner estimates.

tuning_results %>% map_df(max_BEST_F1)
# A tibble: 10 × 5
    mtry max_F1_Score median_F1_Score min_F1_Score     n
   <dbl>        <dbl>           <dbl>        <dbl> <int>
 1     3        0.635           0.613        0.609     3
 2     5        0.673           0.605        0.593     3
 3     3        0.681           0.658        0.609     3
 4     3        0.722           0.667        0.667     3
 5     5        0.687           0.609        0.523     3
 6     3        0.658           0.604        0.584     3
 7     3        0.682           0.627        0.568     3
 8     5        0.703           0.667        0.646     3
 9     3        0.7             0.659        0.575     3
10     5        0.696           0.624        0.535     3
# These are the values of `ntree` that had highest perfroming `F1` by sample :
max_BEST_F1_vals <- tuning_results %>% 
  map_df(max_BEST_F1) %>% 
  select(mtry)

TRAIN.nested_cv.results <- bind_cols(TRAIN.nested_cv, max_BEST_F1_vals)

glimpse(TRAIN.nested_cv.results)
Rows: 10
Columns: 5
$ splits          <list> [<vfold_split[900 x 225 x 1125 x 10]>], [<vfold_split…
$ id              <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1",…
$ id2             <chr> "Fold1", "Fold2", "Fold3", "Fold4", "Fold5", "Fold1", …
$ inner_resamples <list> [<bootstraps[3 x 2]>], [<bootstraps[3 x 2]>], [<boots…
$ mtry            <dbl> 3, 5, 3, 3, 5, 3, 3, 5, 3, 5
toc <- Sys.time() # start clock
TRAIN.nested_cv.results$F1_SCORE <- map2_dbl(TRAIN.nested_cv.results$splits,
                                             TRAIN.nested_cv.results$mtry, 
                                             f1_mtry)#f1_tree)
tic <- Sys.time() # 

diff.inner_then_outer <- tic - toc

diff.inner - diff.inner_then_outer
Time difference of 154.9291 secs
TRAIN.nested_cv.results$F1_SCORE
 [1] 0.6666667 0.7761194 0.5531915 0.5660377 0.6545455 0.6296296 0.7457627
 [8] 0.6764706 0.6071429 0.6274510
summary(TRAIN.nested_cv.results$F1_SCORE)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5532  0.6122  0.6421  0.6503  0.6740  0.7761 
TRAIN.nested_cv.results
# Nested resampling:
#  outer: 5-fold cross-validation repeated 2 times
#  inner: Bootstrap sampling
# A tibble: 10 × 6
   splits            id      id2   inner_resamples  mtry F1_SCORE
   <list>            <chr>   <chr> <list>          <dbl>    <dbl>
 1 <split [900/225]> Repeat1 Fold1 <boot [3 × 2]>      3    0.667
 2 <split [900/225]> Repeat1 Fold2 <boot [3 × 2]>      5    0.776
 3 <split [900/225]> Repeat1 Fold3 <boot [3 × 2]>      3    0.553
 4 <split [900/225]> Repeat1 Fold4 <boot [3 × 2]>      3    0.566
 5 <split [900/225]> Repeat1 Fold5 <boot [3 × 2]>      5    0.655
 6 <split [900/225]> Repeat2 Fold1 <boot [3 × 2]>      3    0.630
 7 <split [900/225]> Repeat2 Fold2 <boot [3 × 2]>      3    0.746
 8 <split [900/225]> Repeat2 Fold3 <boot [3 × 2]>      5    0.676
 9 <split [900/225]> Repeat2 Fold4 <boot [3 × 2]>      3    0.607
10 <split [900/225]> Repeat2 Fold5 <boot [3 × 2]>      5    0.627

\(~\)

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.10 Run Outer Nest

toc <- Sys.time() # start clock
  outer_only <- map(TRAIN.nested_cv$splits, tune_over_mtry) # run function
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
Warning in randomForest.default(m, y, ...): invalid mtry: reset to within valid
range
tic <- Sys.time() # time 
diff.outer <- tic-toc
diff.outer
Time difference of 1.011248 mins
diff.outer - diff.inner
Time difference of -1.659744 mins
outer_only
[[1]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.423
 2     3    0.667
 3     5    0.667
 4     7    0.688
 5     9    0.667
 6    11    0.667
 7    13    0.688
 8    15    0.667
 9    17    0.667
10    19    0.667
11    21    0.667
12    23    0.688

[[2]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.356
 2     3    0.788
 3     5    0.788
 4     7    0.758
 5     9    0.746
 6    11    0.746
 7    13    0.735
 8    15    0.735
 9    17    0.746
10    19    0.746
11    21    0.746
12    23    0.735

[[3]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.278
 2     3    0.458
 3     5    0.490
 4     7    0.52 
 5     9    0.510
 6    11    0.510
 7    13    0.510
 8    15    0.52 
 9    17    0.490
10    19    0.52 
11    21    0.52 
12    23    0.510

[[4]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.293
 2     3    0.593
 3     5    0.538
 4     7    0.538
 5     9    0.538
 6    11    0.510
 7    13    0.510
 8    15    0.538
 9    17    0.510
10    19    0.510
11    21    0.510
12    23    0.510

[[5]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.596
 2     3    0.655
 3     5    0.655
 4     7    0.655
 5     9    0.643
 6    11    0.643
 7    13    0.655
 8    15    0.655
 9    17    0.655
10    19    0.618
11    21    0.655
12    23    0.655

[[6]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.326
 2     3    0.604
 3     5    0.627
 4     7    0.56 
 5     9    0.588
 6    11    0.56 
 7    13    0.588
 8    15    0.588
 9    17    0.56 
10    19    0.588
11    21    0.577
12    23    0.588

[[7]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.553
 2     3    0.746
 3     5    0.746
 4     7    0.746
 5     9    0.721
 6    11    0.733
 7    13    0.721
 8    15    0.733
 9    17    0.746
10    19    0.710
11    21    0.733
12    23    0.733

[[8]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.5  
 2     3    0.657
 3     5    0.676
 4     7    0.667
 5     9    0.667
 6    11    0.676
 7    13    0.676
 8    15    0.667
 9    17    0.676
10    19    0.676
11    21    0.667
12    23    0.676

[[9]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.391
 2     3    0.621
 3     5    0.561
 4     7    0.519
 5     9    0.536
 6    11    0.536
 7    13    0.571
 8    15    0.571
 9    17    0.545
10    19    0.536
11    21    0.561
12    23    0.509

[[10]]
# A tibble: 12 × 2
    mtry F1_Score
   <dbl>    <dbl>
 1     1    0.167
 2     3    0.6  
 3     5    0.627
 4     7    0.6  
 5     9    0.654
 6    11    0.627
 7    13    0.667
 8    15    0.627
 9    17    0.642
10    19    0.642
11    21    0.627
12    23    0.627
str(outer_only,1)
List of 10
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
 $ : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
outer_only_2 <- NULL

for (i in 1:length(outer_only)){
  outer_only_2[[i]] <- cbind( outer_only[[i]], label_nested_cv[i, c('id','id2')]  )  
  }

outer_only_3 <- outer_only_2 %>% bind_rows()

outer_summary <- outer_only_3 %>% 
  group_by(id,id2, mtry) %>% 
  summarize(outer_F1_Score = mean(F1_Score),
            n = length(F1_Score),
            .groups = 'keep') %>%
  ungroup()
            
outer_summary
# A tibble: 120 × 5
   id      id2    mtry outer_F1_Score     n
   <chr>   <chr> <dbl>          <dbl> <int>
 1 Repeat1 Fold1     1          0.423     1
 2 Repeat1 Fold1     3          0.667     1
 3 Repeat1 Fold1     5          0.667     1
 4 Repeat1 Fold1     7          0.688     1
 5 Repeat1 Fold1     9          0.667     1
 6 Repeat1 Fold1    11          0.667     1
 7 Repeat1 Fold1    13          0.688     1
 8 Repeat1 Fold1    15          0.667     1
 9 Repeat1 Fold1    17          0.667     1
10 Repeat1 Fold1    19          0.667     1
# ℹ 110 more rows

\(~\)

\(~\)

27.10.1 Outer Nest Summary Graphs

Some output summary graphs:

27.10.1.1 Facet - Fold

ggplot(outer_summary, 
       aes(x=mtry, y=outer_F1_Score, color = id , shape = id  ) ) + 
  geom_point(aes(size = outer_F1_Score)) +
  geom_line()  +
  facet_wrap(.~id2)

27.10.1.2 Unorganized

ggplot(outer_summary, 
       aes(x=mtry, y=outer_F1_Score, color = id , shape = id2  ) ) + 
  geom_point(aes(size = outer_F1_Score)) +
  geom_line()

27.10.1.3 Facet - Repeat

ggplot(outer_summary, 
       aes(x=mtry, y=outer_F1_Score, color = id2 , shape = id2  ) ) + 
  geom_point(aes(size = outer_F1_Score)) +
  geom_line()  +
  facet_wrap(.~id)

\(~\)

\(~\)

\(~\)


\(~\)

\(~\)

\(~\)

27.11 Inner V Outer

Pooled_Inner_L <- pooled_inner %>%
  select(id2, id, mtry, max_F1_Score, median_F1_Score, min_F1_Score) %>%
  gather(max_F1_Score, 
         median_F1_Score, 
         min_F1_Score, 
         key='Inner_F1_Scores', 
         value=Inner_F1_Score) %>%
  mutate(Inner_Type = ifelse( Inner_F1_Scores == 'max_F1_Score', "MAX",  
                      ifelse( Inner_F1_Scores == 'median_F1_Score', "MED",  
                      ifelse( Inner_F1_Scores == 'min_F1_Score', "MIN", "NA" ) )  ) ) 


Outer_Summary <- outer_summary %>% 
  select(id2, id, mtry, outer_F1_Score)

inner_v_outer <- Pooled_Inner_L %>% 
  left_join(Outer_Summary, by=c('id','id2','mtry')) %>%
  select(id2, id, Inner_Type, mtry, Inner_F1_Score, outer_F1_Score) %>%
  arrange(id2, id, Inner_Type, mtry, Inner_F1_Score, outer_F1_Score)

glimpse(inner_v_outer)
Rows: 360
Columns: 6
$ id2            <chr> "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "…
$ id             <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", …
$ Inner_Type     <chr> "MAX", "MAX", "MAX", "MAX", "MAX", "MAX", "MAX", "MAX",…
$ mtry           <dbl> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 1, 3, 5, 7, …
$ Inner_F1_Score <dbl> 0.4927536, 0.6352941, 0.6301370, 0.6097561, 0.6097561, …
$ outer_F1_Score <dbl> 0.4230769, 0.6666667, 0.6666667, 0.6875000, 0.6666667, …
cnt_inner_v_outer_grps <- inner_v_outer %>% 
  group_by(id2,id, Inner_Type, mtry) %>% 
  tally() 
  
glimpse(cnt_inner_v_outer_grps)
Rows: 360
Columns: 5
Groups: id2, id, Inner_Type [30]
$ id2        <chr> "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold1", "Fold…
$ id         <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Repeat1", "Rep…
$ Inner_Type <chr> "MAX", "MAX", "MAX", "MAX", "MAX", "MAX", "MAX", "MAX", "MA…
$ mtry       <dbl> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 1, 3, 5, 7, 9, 1…
$ n          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

\(~\)

\(~\)

27.11.1 Inner V Outer Summary Graphs

27.11.1.1 Facet - my_tree

inner_min_max_v_outer <- inner_v_outer %>% 
         mutate( mtry = as.character(mtry) ) %>%
         mutate( Fold_Sample = paste0(id2,'_',id)) %>%
         spread( key=Inner_Type, Inner_F1_Score)

ggplot(inner_min_max_v_outer , 
       aes(x = outer_F1_Score , y = MED, shape = id , color = id2 )) + 
  geom_point( ) + 
  geom_line( ) +
  geom_errorbar(aes(ymin=MIN, ymax=MAX)) +
  facet_wrap(.~ mtry ) 
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?