r - Programmatic way to apply list of functions to list of columns in dplyr summarize - Stack Overflow

I have about 40 columns that I want to summarize by group using dplyr::summarize(). But each column is

I have about 40 columns that I want to summarize by group using dplyr::summarize(). But each column is going to be summarized with a different function: some with mean(), some with sum(), some with min(), and some with max(). Only one function will be applied to each column. I have a lookup table to match each column name with the appropriate function. What is the best way to do this with dplyr::summarize() without having to write out each statement individually?

Here is an example lookup table using mtcars. The code below it does not give the desired result. It applies all ten summary functions to all ten columns for a total of 100 summaries, where I only want ten summaries, one for each column.

library(dplyr)

lookup_table <- tibble(
  variable = c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'),
  summ_fn = list(mean, max, min, sum, mean, mean, median, mean, sum, max)
)

mtcars %>%
  group_by(cyl) %>%
  summarize(across(all_of(lookup_table$variable), .fns = lookup_table$summ_fn))

I have about 40 columns that I want to summarize by group using dplyr::summarize(). But each column is going to be summarized with a different function: some with mean(), some with sum(), some with min(), and some with max(). Only one function will be applied to each column. I have a lookup table to match each column name with the appropriate function. What is the best way to do this with dplyr::summarize() without having to write out each statement individually?

Here is an example lookup table using mtcars. The code below it does not give the desired result. It applies all ten summary functions to all ten columns for a total of 100 summaries, where I only want ten summaries, one for each column.

library(dplyr)

lookup_table <- tibble(
  variable = c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'),
  summ_fn = list(mean, max, min, sum, mean, mean, median, mean, sum, max)
)

mtcars %>%
  group_by(cyl) %>%
  summarize(across(all_of(lookup_table$variable), .fns = lookup_table$summ_fn))
Share Improve this question edited Mar 24 at 7:05 Darren Tsai 36.3k5 gold badges25 silver badges57 bronze badges asked Mar 24 at 0:11 qdreadqdread 4,0652 gold badges26 silver badges43 bronze badges
Add a comment  | 

5 Answers 5

Reset to default 7

Instead of a tibble, initiate your lookup table to a named list where the names are the columns (i.e, lookup_table <- list("mpg" = mean, "disp" = max,...). Then you can use dplyr::curr_column to index the function to the current column:

(Note that since I was a bit lazy and you (thankfully) provided the lookup table, I just used setNames to assign the names.)

lookup_table <- setNames(list(mean, max, min, sum, mean, mean, median, mean, sum, max),
                         c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'))

mtcars %>%
  summarize(across(names(lookup_table), 
                   ~ lookup_table[[cur_column()]](.)),
            .by = cyl)

As @G. Grothendieck points out in the comments, if you already have the lookup table in tibble form, you can create a little helper function and achieve the same goal:

lookup <- function(x) {
  with(lookup_table, summ_fn[[match(x, variable)]]) 
  }

mtcars %>% 
  summarize(across(all_of(lookup_table$variable), 
                   ~ lookup(cur_column())(.)), 
            .by = cyl)

Output (for both approaches):

#   cyl      mpg  disp  hp  drat       wt     qsec vs        am gear carb
# 1   6 19.74286 258.0 105 25.10 3.117143 17.97714  1 0.4285714   27    6
# 2   4 26.66364 146.7  52 44.78 2.285727 19.13727  1 0.7272727   45    2
# 3   8 15.10000 472.0 150 45.21 3.999214 16.77214  0 0.1428571   46    8

You can create a list of defused expressions, and inject it into summarize() using the injection operator !!!.

lookup_table <- setNames(list('mean', 'max', 'min', 'sum', 'mean', 'mean', 'median', 'mean', 'sum', 'max'),
                         c('mpg', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'))

exprs <- Map(\(x, y) call(x, as.name(y)), lookup_table, names(lookup_table))

# $mpg
# mean(mpg)
# 
# $disp
# max(disp)
# .
# .
# .

mtcars %>%
  summarize(!!!exprs, .by = cyl)

#   cyl      mpg  disp  hp  drat       wt     qsec vs        am gear carb
# 1   6 19.74286 258.0 105 25.10 3.117143 17.97714  1 0.4285714   27    6
# 2   4 26.66364 146.7  52 44.78 2.285727 19.13727  1 0.7272727   45    2
# 3   8 15.10000 472.0 150 45.21 3.999214 16.77214  0 0.1428571   46    8

Without dplyr, using mapply directly.

> mapply(\(f, x) tapply(mtcars[[x]], mtcars$cyl, f), lookup_table, names(lookup_table))
       mpg  disp  hp  drat       wt     qsec vs        am gear carb
4 26.66364 146.7  52 44.78 2.285727 19.13727  1 0.7272727   45    2
6 19.74286 258.0 105 25.10 3.117143 17.97714  1 0.4285714   27    6
8 15.10000 472.0 150 45.21 3.999214 16.77214  0 0.1428571   46    8

Here's a variation that adds to your original lookup_table:

library(dplyr)
library(rlang)
library(purrr)

summ_by <- function(fun, var, by) {
  exec(fun, filter(mtcars, cyl==x) %>% pull(var))
}

Reduce(inner_join, 
       map(unique(mtcars[,'cyl']), \(by) {
         lookup_table %>%
           rowwise() %>%
           mutate("cyl_{{by}}":=summ_by(summ_fn, variable, by))})
)

Giving

# A tibble: 10 × 5
# Rowwise: 
   variable summ_fn   cyl_6   cyl_4   cyl_8
   <chr>    <list>    <dbl>   <dbl>   <dbl>
 1 mpg      <fn>     19.7    26.7    15.1  
 2 disp     <fn>    258     147.    472    
 3 hp       <fn>    105      52     150    
 4 drat     <fn>     25.1    44.8    45.2  
 5 wt       <fn>      3.12    2.29    4.00 
 6 qsec     <fn>     18.0    19.1    16.8  
 7 vs       <fn>      1       1       0    
 8 am       <fn>      0.429   0.727   0.143
 9 gear     <fn>     27      45      46    
10 carb     <fn>      6       2       8

To pass a different variable to group by (instead of "cyl"):

summarize_by <- function(data, by, lookup) {
  library(purrr)
  
  summ_by <- function(f, var, by, x) 
    exec(f, filter(data, !!sym(by)==x) %>% pull(var))

  by <- deparse(substitute(by))
  
  Reduce(inner_join, 
         map(unique(data[,by]), \(x) {
           varname <- paste0(by, "_", x)
           lookup %>%
             rowwise() %>%
             mutate(!!varname:=summ_by(summ_fn, variable, by, x))}))
  }

Call this function:
summarize_by(mtcars, cyl, lookup_table)
# As above

summarize_by(mtcars, am, lookup_table)
# A tibble: 10 × 4
# Rowwise: 
   variable summ_fn   am_1   am_0
   <chr>    <list>   <dbl>  <dbl>
 1 mpg      <fn>     24.4   17.1 
 2 disp     <fn>    351    472   
 3 hp       <fn>     52     62   
 4 drat     <fn>     52.6   62.4 
 5 wt       <fn>      2.41   3.77
 6 qsec     <fn>     17.4   18.2 
 7 vs       <fn>      1      0   
 8 am       <fn>      1      0   
 9 gear     <fn>     57     61   
10 carb     <fn>      8      4

And without any group_by, we can just do:

lookup_table %>%
  rowwise() %>%
  mutate(result=exec(summ_fn, mtcars[,variable]))

# A tibble: 10 × 3
# Rowwise: 
   variable summ_fn  result
   <chr>    <list>    <dbl>
 1 mpg      <fn>     20.1  
 2 disp     <fn>    472    
 3 hp       <fn>     52    
 4 drat     <fn>    115.   
 5 wt       <fn>      3.22 
 6 qsec     <fn>     17.8  
 7 vs       <fn>      0    
 8 am       <fn>      0.406
 9 gear     <fn>    118    
10 carb     <fn>      8

Here is another base R option, using by + Map

with(
  lookup_table,
    do.call(
      rbind,
      by(
        mtcars[variable],
        mtcars$cyl,
        \(x) as.data.frame(
          Map(\(f, y) f(y), summ_fn, x),
          col.names = variable
      )
    ),
  )
)

which gives

       mpg  disp  hp  drat       wt     qsec vs        am gear carb
4 26.66364 146.7  52 44.78 2.285727 19.13727  1 0.7272727   45    2
6 19.74286 258.0 105 25.10 3.117143 17.97714  1 0.4285714   27    6
8 15.10000 472.0 150 45.21 3.999214 16.77214  0 0.1428571   46    8

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744265282a4565837.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信