r - Median imputation to a list by mutate() in dplyr - Stack Overflow

I want to replace missing data with median values to a dataframe within a list. I can do this by enteri

I want to replace missing data with median values to a dataframe within a list. I can do this by entering the column name. However, how can I do this when I need to randomly select the column in a simulation study?

For example:

mylist <- list(structure(list(V1 = c(3L, 16L, 8L, 2L, 17L, 6L, 10L, 15L, 
7L, 11L), V2 = c(9L, NA, 14L, 18L, NA, 20L, 15L, 17L, 3L, NA), 
    V3 = c(4L, 1L, 10L, 9L, 7L, 13L, 16L, 8L, 17L, 18L)), row.names = c(NA, 
-10L), class = "data.frame"), structure(list(V1 = c(6L, 12L, 
14L, 10L, 5L, 20L, 26L, 2L, 23L, 1L), V2 = c(6L, 15L, NA, 30L, 
NA, 14L, 2L, 11L, NA, 3L), V3 = c(18L, 12L, 3L, 2L, 8L, 23L, 
13L, 16L, 17L, 7L)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(V1 = c(18L, 26L, 9L, 28L, 8L, 4L, 29L, 24L, 
    37L, 3L), V2 = c(NA, 36L, 13L, 19L, NA, 31L, 20L, 7L, NA, 
    16L), V3 = c(NA, 25L, NA, NA, NA, 21L, 17L, 4L, 32L, 6L)), row.names = c(NA, 
    -10L), class = "data.frame"))

newlist <- list()
for (k in 1:3) {
  newlist[[k]] <- mylist[[k]] %>%
    mutate(V2 = replace_na(V2, median(V2, na.rm = TRUE)))
}

newlist

I have successfully done this for column named V2 (as you can see above).

ch_column <- sample(1:3, 1)
ch_column

How can I do if I select the column with the help of sample() function? I need to change the places named V2 (with ch_column) in the first codes I shared.

I want to replace missing data with median values to a dataframe within a list. I can do this by entering the column name. However, how can I do this when I need to randomly select the column in a simulation study?

For example:

mylist <- list(structure(list(V1 = c(3L, 16L, 8L, 2L, 17L, 6L, 10L, 15L, 
7L, 11L), V2 = c(9L, NA, 14L, 18L, NA, 20L, 15L, 17L, 3L, NA), 
    V3 = c(4L, 1L, 10L, 9L, 7L, 13L, 16L, 8L, 17L, 18L)), row.names = c(NA, 
-10L), class = "data.frame"), structure(list(V1 = c(6L, 12L, 
14L, 10L, 5L, 20L, 26L, 2L, 23L, 1L), V2 = c(6L, 15L, NA, 30L, 
NA, 14L, 2L, 11L, NA, 3L), V3 = c(18L, 12L, 3L, 2L, 8L, 23L, 
13L, 16L, 17L, 7L)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(V1 = c(18L, 26L, 9L, 28L, 8L, 4L, 29L, 24L, 
    37L, 3L), V2 = c(NA, 36L, 13L, 19L, NA, 31L, 20L, 7L, NA, 
    16L), V3 = c(NA, 25L, NA, NA, NA, 21L, 17L, 4L, 32L, 6L)), row.names = c(NA, 
    -10L), class = "data.frame"))

newlist <- list()
for (k in 1:3) {
  newlist[[k]] <- mylist[[k]] %>%
    mutate(V2 = replace_na(V2, median(V2, na.rm = TRUE)))
}

newlist

I have successfully done this for column named V2 (as you can see above).

ch_column <- sample(1:3, 1)
ch_column

How can I do if I select the column with the help of sample() function? I need to change the places named V2 (with ch_column) in the first codes I shared.

Share Improve this question edited Mar 27 at 9:20 Darren Tsai 36.3k5 gold badges25 silver badges57 bronze badges asked Mar 27 at 8:08 MetehanGungorMetehanGungor 1811 silver badge13 bronze badges 2
  • 1 Is it the same column for all 3 frames or 3 separate choices? – margusl Commented Mar 27 at 9:02
  • 2 You should not impute with constants such as the median. – jay.sf Commented Mar 27 at 10:17
Add a comment  | 

1 Answer 1

Reset to default 1

You can create a character string of column name, and inject it on the left-hand side of ⁠:=⁠.

imp_fun <- function(df, col) {
  var <- paste0('V', col)
  df %>%
    mutate(!!var := replace_na(.data[[var]], median(.data[[var]], na.rm = TRUE)))
}

newlist <- lapply(mylist, imp_fun, col = ch_column)
ch_column
# [1] 2

newlist
# [[1]]
#    V1 V2 V3
# 1   3  9  4
# 2  16 15  1
# 3   8 14 10
# 4   2 18  9
# 5  17 15  7
# 6   6 20 13
# 7  10 15 16
# 8  15 17  8
# 9   7  3 17
# 10 11 15 18
# 
# [[2]]
# ...
# 
# [[3]]
# ...

If you are not familiar with how lapply works, the code above is equivalent to the following for loop.

newlist <- list()
ch_column <- sample(1:3, 1)
var <- paste0('V', ch_column)
for (k in 1:3) {
  newlist[[k]] <- mylist[[k]] %>%
    mutate(!!var := replace_na(.data[[var]], median(.data[[var]], na.rm = TRUE)))
}

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744104963a4558684.html

相关推荐

  • r - Median imputation to a list by mutate() in dplyr - Stack Overflow

    I want to replace missing data with median values to a dataframe within a list. I can do this by enteri

    8天前
    10

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信