I have a dataset with multiple variables named 'trig.x', 'trig.y', and so on. Basically, I have many pairs of variable w/ similar names, except the '.x' or '.y'. These variables came from a merge from two other datasets.
Problem: I want to create new variables (keep the old ones) named 'trig' (just without '.x' or '.y' at the end) using the following logic (example): I want the new variable to assume the '.y' value when it is not missing, and it is missing, the new variable should assume '.x'. The variable should be missing when both are missing.
Code:
> data_merged_ex
# A tibble: 5 × 8
seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7894 0 1.1 1.1 2.7 NA 242 NA
2 1597 2 NA NA NA 2.7 NA 196
3 2586 1 0.7 0.7 2.2 4.3 161 NA
4 4679 0 0.6 NA 1.4 2.2 239 NA
5 8371 0 0.5 1.3 1.6 NA 206 NA
data_merged_ex_t <- data_merged_ex %>%
mutate(across(
.cols = ends_with(".x"),
.fns = ~ case_when(
!is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
!is.na(.data[[cur_column()]]) & is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[cur_column()]],
TRUE ~ NA_real_
),
.names = "{str_remove(.col, '\\.x$')}"
))
Error in `cur_column()`:
! Must only be used inside `across()`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
▆
1. ├─mc_merged_sy %>% ...
2. ├─dplyr::mutate(...)
3. ├─dplyr:::mutate.data.frame(...)
4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
5. │ │ └─base::force(dots)
6. │ └─dplyr:::dplyr_quosures(...)
7. │ └─rlang::quos(..., .ignore_empty = "all")
8. └─dplyr::cur_column()
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
▆
1. ├─mc_merged_sy %>% ...
2. ├─dplyr::mutate(...)
3. ├─dplyr:::mutate.data.frame(...)
4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
5. │ │ └─base::force(dots)
6. │ └─dplyr:::dplyr_quosures(...)
7. │ └─rlang::quos(..., .ignore_empty = "all")
8. └─dplyr::cur_column()
9. └─dplyr:::peek_column()
10. └─dplyr:::context_peek("column", "`across()`", call)
11. ├─context_peek_bare(name) %||% ...
12. └─rlang::abort(glue("Must only be used inside {location}."), call = call)
I have a dataset with multiple variables named 'trig.x', 'trig.y', and so on. Basically, I have many pairs of variable w/ similar names, except the '.x' or '.y'. These variables came from a merge from two other datasets.
Problem: I want to create new variables (keep the old ones) named 'trig' (just without '.x' or '.y' at the end) using the following logic (example): I want the new variable to assume the '.y' value when it is not missing, and it is missing, the new variable should assume '.x'. The variable should be missing when both are missing.
Code:
> data_merged_ex
# A tibble: 5 × 8
seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7894 0 1.1 1.1 2.7 NA 242 NA
2 1597 2 NA NA NA 2.7 NA 196
3 2586 1 0.7 0.7 2.2 4.3 161 NA
4 4679 0 0.6 NA 1.4 2.2 239 NA
5 8371 0 0.5 1.3 1.6 NA 206 NA
data_merged_ex_t <- data_merged_ex %>%
mutate(across(
.cols = ends_with(".x"),
.fns = ~ case_when(
!is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
is.na(.data[[cur_column()]]) & !is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[str_replace(cur_column(), "\\.x$", ".y")]],
!is.na(.data[[cur_column()]]) & is.na(.data[[str_replace(cur_column(), "\\.x$", ".y")]]) ~ .data[[cur_column()]],
TRUE ~ NA_real_
),
.names = "{str_remove(.col, '\\.x$')}"
))
Error in `cur_column()`:
! Must only be used inside `across()`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
▆
1. ├─mc_merged_sy %>% ...
2. ├─dplyr::mutate(...)
3. ├─dplyr:::mutate.data.frame(...)
4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
5. │ │ └─base::force(dots)
6. │ └─dplyr:::dplyr_quosures(...)
7. │ └─rlang::quos(..., .ignore_empty = "all")
8. └─dplyr::cur_column()
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `cur_column()`:
! Must only be used inside `across()`.
---
Backtrace:
▆
1. ├─mc_merged_sy %>% ...
2. ├─dplyr::mutate(...)
3. ├─dplyr:::mutate.data.frame(...)
4. │ ├─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
5. │ │ └─base::force(dots)
6. │ └─dplyr:::dplyr_quosures(...)
7. │ └─rlang::quos(..., .ignore_empty = "all")
8. └─dplyr::cur_column()
9. └─dplyr:::peek_column()
10. └─dplyr:::context_peek("column", "`across()`", call)
11. ├─context_peek_bare(name) %||% ...
12. └─rlang::abort(glue("Must only be used inside {location}."), call = call)
Share
Improve this question
edited Feb 5 at 1:59
jpsmith
18k6 gold badges23 silver badges45 bronze badges
asked Feb 3 at 6:05
Ronaldo FilhoRonaldo Filho
311 silver badge1 bronze badge
3 Answers
Reset to default 2It is not clear what "I want the new variable to assume the '.y' value when it is not missing, and it is missing" means but we will assume that the new variable should equal the .x
variable if it is not NA and the .y
variable otherwise. Swap them in the code below if you want the reverse.
1) map2_dfc We can use map2_dfc
with coalesce
and then fix up the names.
Note that the tools package comes with R so it does not have to be installed.
library(dplyr)
library(purrr)
library(tools) # file_path_sans_ext
dat2 <- dat %>%
mutate(map2_dfc(pick(ends_with(".x")), pick(ends_with(".y")), coalesce) %>%
rename_with(file_path_sans_ext))
dat2
## seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894 0 1.1 1.1 2.7 NA 242 NA 1.1 2.7 242
## 2 1597 2 NA NA NA 2.7 NA 196 NA 2.7 196
## 3 2586 1 0.7 0.7 2.2 4.3 161 NA 0.7 2.2 161
## 4 4679 0 0.6 NA 1.4 2.2 239 NA 0.6 1.4 239
## 5 8371 0 0.5 1.3 1.6 NA 206 NA 0.5 1.6 206
Optionally sort the computed columns to come after the corresponding .x and .y columns. This computes the column names without any suffix giving root
. Then for each column match
finds the first column with the same root. Using order
on that gives the required order.
dat2 %>% {
root <- file_path_sans_ext(names(.))
select(., order(match(root, root)))
}
## seqg cyl piip.x piip.y piip dalo.x dalo.y dalo tcct.x tcct.y tcct
## 1 7894 0 1.1 1.1 1.1 2.7 NA 2.7 242 NA 242
## 2 1597 2 NA NA NA NA 2.7 2.7 NA 196 196
## 3 2586 1 0.7 0.7 0.7 2.2 4.3 2.2 161 NA 161
## 4 4679 0 0.6 NA 0.6 1.4 2.2 1.4 239 NA 239
## 5 8371 0 0.5 1.3 0.5 1.6 NA 1.6 206 NA 206
2) across2 The dplyover padckage on github (not on CRAN) has an across2
which is like across
but operates on two arguments.
library(dplyr)
# librray(removes); install_github("TimTeaFan/dplyover")
library(dplyrover)
dat %>%
mutate(across2(ends_with(".x"), ends_with(".y"), coalesce, .names = "{pre}")
## seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894 0 1.1 1.1 2.7 NA 242 NA 1.1 2.7 242
## 2 1597 2 NA NA NA 2.7 NA 196 NA 2.7 196
## 3 2586 1 0.7 0.7 2.2 4.3 161 NA 0.7 2.2 161
## 4 4679 0 0.6 NA 1.4 2.2 239 NA 0.6 1.4 239
## 5 8371 0 0.5 1.3 1.6 NA 206 NA 0.5 1.6 206
3) Base R Calculate root
as above and then for each unique root for which there are two columns in dat
compute the result and for other columns return NULL. Remove the NULL values using Filter
and convert the resulting list to a data frame. As noted previously tools comes built into R. Exchange 1 and 2 in f
if you want the reverse condition.
library(tools)
root <- file_path_sans_ext(names(dat))
f <- function(x, d = dat[root == x]) {
if (ncol(d) == 2) ifelse(is.na(d[, 1]), d[, 2], d[, 1])
}
data.frame(dat, Filter(length, Map(f, unique(root))))
## seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
## 1 7894 0 1.1 1.1 2.7 NA 242 NA 1.1 2.7 242
## 2 1597 2 NA NA NA 2.7 NA 196 NA 2.7 196
## 3 2586 1 0.7 0.7 2.2 4.3 161 NA 0.7 2.2 161
## 4 4679 0 0.6 NA 1.4 2.2 239 NA 0.6 1.4 239
## 5 8371 0 0.5 1.3 1.6 NA 206 NA 0.5 1.6 206
Note
The input data in a reproducible form:
dat <- data.frame(
seqg = c(7894L, 1597L, 2586L, 4679L, 8371L),
cyl = c(0L, 2L, 1L, 0L, 0L),
piip.x = c(1.1, NA, 0.7, 0.6, 0.5),
piip.y = c(1.1, NA, 0.7, NA, 1.3),
dalo.x = c(2.7, NA, 2.2, 1.4, 1.6),
dalo.y = c(NA, 2.7, 4.3, 2.2, NA),
tcct.x = c(242L, NA, 161L, 239L, 206L),
tcct.y = c(NA, 196L, NA, NA, NA)
)
You could pivot to long, then perform the logic, then pivot back:
library(dplyr)
library(tidyr)
pivot_longer(data_merged_ex, -c(seqg, cyl),
names_to=c("name", ".value"),
names_sep="\\.") %>%
mutate(z=case_when(!is.na(y)~y,
!is.na(x)~x,
.default=NA), .by=c(seqg, name)) %>%
pivot_wider(id_cols=c(seqg,cyl),
values_from=c(x,y,z),
names_vary = "slowest",
names_glue = "{name}.{.value}")
# A tibble: 5 × 11
seqg cyl piip.x piip.y piip.z dalo.x dalo.y dalo.z tcct.x tcct.y tcct.z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1597 2 NA NA NA NA 2.7 2.7 NA 196 196
2 2586 1 0.7 0.7 0.7 2.2 4.3 4.3 161 NA 161
3 4679 0 0.6 NA 0.6 1.4 2.2 2.2 239 NA 239
4 8371 0 0.5 1.3 1.3 1.6 NA 1.6 206 NA 206
5 7894 0 1.1 1.1 1.1 2.7 NA 2.7 242 NA 242
Add
%>% select(seqg, cyl, ends_with(".z"))
if you don't want the original columns.
You could use base::get()
there to access other columns:
library(dplyr, warn.conflicts = FALSE)
library(stringr)
data_merged_ex <-
read.table(header = TRUE, text = "
seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y
1 7894 0 1.1 1.1 2.7 NA 242 NA
2 1597 2 NA NA NA 2.7 NA 196
3 2586 1 0.7 0.7 2.2 4.3 161 NA
4 4679 0 0.6 NA 1.4 2.2 239 NA
5 8371 0 0.5 1.3 1.6 NA 206 NA") |>
as_tibble()
x2y <- \(x) str_replace(x, "\\.x$", ".y")
data_merged_ex |>
mutate(
across(ends_with(".x"),
\(x) coalesce(get(x2y(cur_column())), x),
.names = '{str_sub(.col, end = -3)}')
)
#> # A tibble: 5 × 11
#> seqg cyl piip.x piip.y dalo.x dalo.y tcct.x tcct.y piip dalo tcct
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <int>
#> 1 7894 0 1.1 1.1 2.7 NA 242 NA 1.1 2.7 242
#> 2 1597 2 NA NA NA 2.7 NA 196 NA 2.7 196
#> 3 2586 1 0.7 0.7 2.2 4.3 161 NA 0.7 4.3 161
#> 4 4679 0 0.6 NA 1.4 2.2 239 NA 0.6 2.2 239
#> 5 8371 0 0.5 1.3 1.6 NA 206 NA 1.3 1.6 206
Depending on your dataset(s) and next planned steps, you might want to consider dplyr::rows_update()
as an alternative to joins.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745247890a4618498.html
评论列表(0条)