15 Difference between pipes in R

15.1 Performance comparisons

Code

library('dplyr')


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Code

library('microbenchmark')

15.1.1 Define a function that takes two arguments

Code

add_numbers <- function(a, b) {
  return(a + b)
}

15.1.2 Compare the performance

Code

rand_number <- runif(1, -100, 100)

Code

benchmark_results <- microbenchmark(
  "%>%" = {rand_number %>% add_numbers(rand_number)},
  "|>" = {rand_number |> add_numbers(rand_number)},
  "no pipe" = {add_numbers(rand_number, rand_number)},
  times = 10000
)

Code

benchmark_results |> str()

Classes 'microbenchmark' and 'data.frame':  30000 obs. of  2 variables:
 $ expr: Factor w/ 3 levels "%>%","|>","no pipe": 3 3 1 2 2 2 3 2 2 3 ...
 $ time: num  26000 1297200 37200 900 500 ...

Code

print(benchmark_results)

Unit: nanoseconds
    expr  min   lq    mean median   uq     max neval
     %>% 1600 1700 1840.99   1800 1900   37200 10000
      |>  300  400  421.57    400  400    1600 10000
 no pipe  300  400  554.56    400  400 1297200 10000

Code

M <- print(benchmark_results)

Unit: nanoseconds
    expr  min   lq    mean median   uq     max neval
     %>% 1600 1700 1840.99   1800 1900   37200 10000
      |>  300  400  421.57    400  400    1600 10000
 no pipe  300  400  554.56    400  400 1297200 10000

Code

base_pipe <- M %>%
  filter(expr == "|>") 

magrittr_pipe <- M %>%
  filter(expr == "%>%") 

no_pipe <- M %>%
  filter(expr == "no pipe") 

# Calculate the differences in execution time
diff_no_magrittr <- magrittr_pipe$time - no_pipe$time  
diff_no_base <- base_pipe$time - no_pipe$time 

# Calculate the difference-in-differences
percent_diff_in_diff <- (diff_no_magrittr - diff_no_base)  / no_pipe$time * 100

summary(percent_diff_in_diff)

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
   0.1311  325.0000  350.0000  343.2731  375.0000 2260.0000

Note that while the difference is expressed in nanoseconds our computation of percent difference is in percentage points, therefore, for a more complex computations or significantly long chains of pipes, the difference in performance can be more pronounced.

15.2 Useage Differences

Code

filter_row_number <- function(df, num){
  filter(df, row_number() == num) 
}

Both pipe opperators passes the object on the left to the next object on the right.

Code

mtcars |> filter_row_number(1)

          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

Code

mtcars %>% filter_row_number(1)

          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

%>% allows you to change the placement with a . placeholder:

Code

1 %>% filter_row_number(mtcars, .)

          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

However, the |> operator does not allow for the . placeholder:

Code

# this will not work
1 |> filter_row_number(mtcars , .)

Error in filter_row_number(1, mtcars, .): unused argument (.)

The |> operator allows for the use of the _ placeholder, but it must be named:

Code

1 |> filter_row_number(mtcars, num = _)

          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

Another important difference is that the %>% operator allows for the use of curly braces to create a temporary environment for the object on the left:

Code

mtcars %>% {subset(.$mpg, .$cyl == 6)}

[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7

Whereas the |> operator does not allow for the use of curly braces:

Code

mtcars |>  {subset(_$mpg, _$cyl == 6)}

Error: function '{' not supported in RHS call of a pipe (<text>:1:12)