r - Problem - Dynamically creating new variables from existing ones using mutate() - Stack Overflow

I have a dataset with multiple variables named 'trig.x', 'trig.y', and so on. Basic

I have a dataset with multiple variables named 'trig.x', 'trig.y', and so on. Basically, I have many pairs of variable w/ similar names, except the '.x' or '.y'. These variables came from a merge from two other datasets.

Problem: I want to create new variables (keep the old ones) named 'trig' (just without '.x' or '.y' at the end) using the following logic (example): I want the new variable to assume the '.y' value when it is not missing, and it is missing, the new variable should assume '.x'. The variable should be missing when both are missing.

Code:


> data_merged_ex
# A tibble: 5 × 8
  seqg    cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
  <chr> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 7894      0    1.1    1.1    2.7   NA      242     NA
2 1597      2   NA     NA     NA      2.7     NA    196
3 2586      1    0.7    0.7    2.2    4.3    161     NA
4 4679      0    0.6   NA      1.4    2.2    239     NA
5 8371      0    0.5    1.3    1.6   NA      206     NA


data_merged_ex_t <- data_merged_ex %>%

  mutate(across(
    .cols = ends_with(".x"),
    .fns = ~ case_when(
      !is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
      is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
      !is.na(.data[[cur_column()]]) & is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[cur_column()]],
      TRUE ~ NA_real_
    ),
    .names = "{str_remove(.col, '\\.x$')}"
  ))

Error in `cur_column()`:
! Must only be used inside `across()`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
    ▆
 1. ├─mc_merged_sy %>% ...
 2. ├─dplyr::mutate(...)
 3. ├─dplyr:::mutate.data.frame(...)
 4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
 5. │ │ └─base::force(dots)
 6. │ └─dplyr:::dplyr_quosures(...)
 7. │   └─rlang::quos(..., .ignore_empty = "all")
 8. └─dplyr::cur_column()
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
     ▆
  1. ├─mc_merged_sy %>% ...
  2. ├─dplyr::mutate(...)
  3. ├─dplyr:::mutate.data.frame(...)
  4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  5. │ │ └─base::force(dots)
  6. │ └─dplyr:::dplyr_quosures(...)
  7. │   └─rlang::quos(..., .ignore_empty = "all")
  8. └─dplyr::cur_column()
  9.   └─dplyr:::peek_column()
 10.     └─dplyr:::context_peek("column", "`across()`", call)
 11.       ├─context_peek_bare(name) %||% ...
 12.       └─rlang::abort(glue("Must only be used inside {location}."), call = call)

I have a dataset with multiple variables named 'trig.x', 'trig.y', and so on. Basically, I have many pairs of variable w/ similar names, except the '.x' or '.y'. These variables came from a merge from two other datasets.

Problem: I want to create new variables (keep the old ones) named 'trig' (just without '.x' or '.y' at the end) using the following logic (example): I want the new variable to assume the '.y' value when it is not missing, and it is missing, the new variable should assume '.x'. The variable should be missing when both are missing.

Code:


> data_merged_ex
# A tibble: 5 × 8
  seqg    cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
  <chr> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 7894      0    1.1    1.1    2.7   NA      242     NA
2 1597      2   NA     NA     NA      2.7     NA    196
3 2586      1    0.7    0.7    2.2    4.3    161     NA
4 4679      0    0.6   NA      1.4    2.2    239     NA
5 8371      0    0.5    1.3    1.6   NA      206     NA


data_merged_ex_t <- data_merged_ex %>%

  mutate(across(
    .cols = ends_with(".x"),
    .fns = ~ case_when(
      !is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
      is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
      !is.na(.data[[cur_column()]]) & is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[cur_column()]],
      TRUE ~ NA_real_
    ),
    .names = "{str_remove(.col, '\\.x$')}"
  ))

Error in `cur_column()`:
! Must only be used inside `across()`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
    ▆
 1. ├─mc_merged_sy %>% ...
 2. ├─dplyr::mutate(...)
 3. ├─dplyr:::mutate.data.frame(...)
 4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
 5. │ │ └─base::force(dots)
 6. │ └─dplyr:::dplyr_quosures(...)
 7. │   └─rlang::quos(..., .ignore_empty = "all")
 8. └─dplyr::cur_column()
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
     ▆
  1. ├─mc_merged_sy %>% ...
  2. ├─dplyr::mutate(...)
  3. ├─dplyr:::mutate.data.frame(...)
  4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  5. │ │ └─base::force(dots)
  6. │ └─dplyr:::dplyr_quosures(...)
  7. │   └─rlang::quos(..., .ignore_empty = "all")
  8. └─dplyr::cur_column()
  9.   └─dplyr:::peek_column()
 10.     └─dplyr:::context_peek("column", "`across()`", call)
 11.       ├─context_peek_bare(name) %||% ...
 12.       └─rlang::abort(glue("Must only be used inside {location}."), call = call)
Share Improve this question edited Feb 5 at 1:59 jpsmith 18k6 gold badges23 silver badges45 bronze badges asked Feb 3 at 6:05 Ronaldo FilhoRonaldo Filho 311 silver badge1 bronze badge
Add a comment  | 

3 Answers 3

Reset to default 2

It is not clear what "I want the new variable to assume the '.y' value when it is not missing, and it is missing" means but we will assume that the new variable should equal the .x variable if it is not NA and the .y variable otherwise. Swap them in the code below if you want the reverse.

1) map2_dfc We can use map2_dfc with coalesce and then fix up the names.

Note that the tools package comes with R so it does not have to be installed.

library(dplyr)
library(purrr)
library(tools) # file_path_sans_ext

dat2 <- dat %>%
  mutate(map2_dfc(pick(ends_with(".x")), pick(ends_with(".y")), coalesce) %>% 
    rename_with(file_path_sans_ext))
dat2

##   seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894   0    1.1    1.1    2.7     NA    242     NA  1.1  2.7  242
## 2 1597   2     NA     NA     NA    2.7     NA    196   NA  2.7  196
## 3 2586   1    0.7    0.7    2.2    4.3    161     NA  0.7  2.2  161
## 4 4679   0    0.6     NA    1.4    2.2    239     NA  0.6  1.4  239
## 5 8371   0    0.5    1.3    1.6     NA    206     NA  0.5  1.6  206

Optionally sort the computed columns to come after the corresponding .x and .y columns. This computes the column names without any suffix giving root. Then for each column match finds the first column with the same root. Using order on that gives the required order.

dat2 %>% {
  root <- file_path_sans_ext(names(.))
  select(., order(match(root, root)))
}

##   seqg cyl piip.x piip.y piip dalo.x dalo.y dalo tcct.x tcct.y tcct
## 1 7894   0    1.1    1.1  1.1    2.7     NA  2.7    242     NA  242
## 2 1597   2     NA     NA   NA     NA    2.7  2.7     NA    196  196
## 3 2586   1    0.7    0.7  0.7    2.2    4.3  2.2    161     NA  161
## 4 4679   0    0.6     NA  0.6    1.4    2.2  1.4    239     NA  239
## 5 8371   0    0.5    1.3  0.5    1.6     NA  1.6    206     NA  206

2) across2 The dplyover padckage on github (not on CRAN) has an across2 which is like across but operates on two arguments.

library(dplyr)
# librray(removes); install_github("TimTeaFan/dplyover")
library(dplyrover)

dat %>%
  mutate(across2(ends_with(".x"), ends_with(".y"), coalesce, .names = "{pre}")

##   seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894   0    1.1    1.1    2.7     NA    242     NA  1.1  2.7  242
## 2 1597   2     NA     NA     NA    2.7     NA    196   NA  2.7  196
## 3 2586   1    0.7    0.7    2.2    4.3    161     NA  0.7  2.2  161
## 4 4679   0    0.6     NA    1.4    2.2    239     NA  0.6  1.4  239
## 5 8371   0    0.5    1.3    1.6     NA    206     NA  0.5  1.6  206

3) Base R Calculate root as above and then for each unique root for which there are two columns in dat compute the result and for other columns return NULL. Remove the NULL values using Filter and convert the resulting list to a data frame. As noted previously tools comes built into R. Exchange 1 and 2 in f if you want the reverse condition.

library(tools)

root <- file_path_sans_ext(names(dat))

f <- function(x, d = dat[root == x]) {
  if (ncol(d) == 2) ifelse(is.na(d[, 1]), d[, 2], d[, 1])
}

data.frame(dat, Filter(length, Map(f, unique(root))))

##   seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894   0    1.1    1.1    2.7     NA    242     NA  1.1  2.7  242
## 2 1597   2     NA     NA     NA    2.7     NA    196   NA  2.7  196
## 3 2586   1    0.7    0.7    2.2    4.3    161     NA  0.7  2.2  161
## 4 4679   0    0.6     NA    1.4    2.2    239     NA  0.6  1.4  239
## 5 8371   0    0.5    1.3    1.6     NA    206     NA  0.5  1.6  206

Note

The input data in a reproducible form:

dat <- data.frame(
  seqg = c(7894L, 1597L, 2586L, 4679L, 8371L),
  cyl = c(0L, 2L, 1L, 0L, 0L),
  piip.x = c(1.1, NA, 0.7, 0.6, 0.5),
  piip.y = c(1.1, NA, 0.7, NA, 1.3),
  dalo.x = c(2.7, NA, 2.2, 1.4, 1.6),
  dalo.y = c(NA, 2.7, 4.3, 2.2, NA),
  tcct.x = c(242L, NA, 161L, 239L, 206L),
  tcct.y = c(NA, 196L, NA, NA, NA)
)

You could pivot to long, then perform the logic, then pivot back:

library(dplyr)
library(tidyr)

pivot_longer(data_merged_ex, -c(seqg, cyl),
             names_to=c("name", ".value"),
             names_sep="\\.") %>%
  mutate(z=case_when(!is.na(y)~y,
                     !is.na(x)~x,
                     .default=NA), .by=c(seqg, name)) %>%
  pivot_wider(id_cols=c(seqg,cyl),
              values_from=c(x,y,z),
              names_vary = "slowest",
              names_glue = "{name}.{.value}") 

# A tibble: 5 × 11
   seqg   cyl piip.x piip.y piip.z dalo.x dalo.y dalo.z tcct.x tcct.y tcct.z
  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1  1597     2   NA     NA     NA     NA      2.7    2.7     NA    196    196
2  2586     1    0.7    0.7    0.7    2.2    4.3    4.3    161     NA    161
3  4679     0    0.6   NA      0.6    1.4    2.2    2.2    239     NA    239
4  8371     0    0.5    1.3    1.3    1.6   NA      1.6    206     NA    206
5  7894     0    1.1    1.1    1.1    2.7   NA      2.7    242     NA    242

Add

%>% select(seqg, cyl, ends_with(".z"))

if you don't want the original columns.

You could use base::get() there to access other columns:

library(dplyr, warn.conflicts = FALSE)
library(stringr)

data_merged_ex <- 
read.table(header = TRUE, text = "
seqg    cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
1 7894      0    1.1    1.1    2.7   NA      242     NA
2 1597      2   NA     NA     NA      2.7     NA    196
3 2586      1    0.7    0.7    2.2    4.3    161     NA
4 4679      0    0.6   NA      1.4    2.2    239     NA
5 8371      0    0.5    1.3    1.6   NA      206     NA") |> 
as_tibble()

x2y <- \(x) str_replace(x, "\\.x$", ".y")

data_merged_ex |> 
  mutate(
    across(ends_with(".x"), 
           \(x) coalesce(get(x2y(cur_column())), x), 
           .names = '{str_sub(.col, end = -3)}')
  )
#> # A tibble: 5 × 11
#>    seqg   cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y  piip  dalo  tcct
#>   <int> <int>  <dbl>  <dbl>  <dbl>  <dbl>  <int>  <int> <dbl> <dbl> <int>
#> 1  7894     0    1.1    1.1    2.7   NA      242     NA   1.1   2.7   242
#> 2  1597     2   NA     NA     NA      2.7     NA    196  NA     2.7   196
#> 3  2586     1    0.7    0.7    2.2    4.3    161     NA   0.7   4.3   161
#> 4  4679     0    0.6   NA      1.4    2.2    239     NA   0.6   2.2   239
#> 5  8371     0    0.5    1.3    1.6   NA      206     NA   1.3   1.6   206

Depending on your dataset(s) and next planned steps, you might want to consider dplyr::rows_update() as an alternative to joins.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745247890a4618498.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信