Summing a set of R dataframe rows (column-wise), while retaining the first n columns - Stack Overflow

admin•2025-04-20 23:47:47•questions•阅读0

I have a large R dataframe of which the first 21 columns are abiotic variables (incl. sample names), an

I have a large R dataframe of which the first 21 columns are abiotic variables (incl. sample names), and column 22-72 are species with relative abundances as values. Due to processing of the data, each sample (i.e., col 1) has multiple rows for all species (with variable rel. ab. values). The abiotic variables of each of that row (of a sample) are identical.

I would like to sum the relative abundance values of each species per sample.

Below, you can find a example original dataframe and the desired outcome.

Original:

df <- data.frame(
  sample = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3),
  var1 = c(3,3,3,3,3,3,3,7,7,7,7,7,7,7,2,2,2,2,2,2),
  var2 = c(4,4,4,4,4,4,4,42,42,42,42,42,42,42,2,2,2,2,2,2),
  species1 = c(0,0,0.05,0,0,0.02,0,0,0,0,0,0,0,0,0,0.001,0.02,0.03,0.001,0),
  species2 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.001,0.002,0.03,0,0,0)
)

desired outcome:

df_summed <- data.frame(
  sample = c(1, 2, 3),
  var1 = c(3, 7, 2),
  var2 = c(4, 42, 2),
  species1 = c(0.07, 0, 0.052),
  species2 = c(0, 0, 0.033)
)

I've tried multiple things with dplyr functions like group_by and summarise. For example:

df_summed <- df %>%
  group_by(across(1:21)) %>%
  summarise(across(22:ncol(df), sum), .groups = "drop")

but this gives me the error

Caused by error in `across()`:
! Can't subset columns past the end.
ℹ Locations 52, 53, 54, …, 71, and 72 don't exist.
ℹ There are only 51 columns.

while the df does have 72 columns... ncol(df) yields 72

Could anyone assist me how to perform this operation?

I would like to sum the relative abundance values of each species per sample.

Below, you can find a example original dataframe and the desired outcome.

Original:

df <- data.frame(
  sample = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3),
  var1 = c(3,3,3,3,3,3,3,7,7,7,7,7,7,7,2,2,2,2,2,2),
  var2 = c(4,4,4,4,4,4,4,42,42,42,42,42,42,42,2,2,2,2,2,2),
  species1 = c(0,0,0.05,0,0,0.02,0,0,0,0,0,0,0,0,0,0.001,0.02,0.03,0.001,0),
  species2 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.001,0.002,0.03,0,0,0)
)

desired outcome:

df_summed <- data.frame(
  sample = c(1, 2, 3),
  var1 = c(3, 7, 2),
  var2 = c(4, 42, 2),
  species1 = c(0.07, 0, 0.052),
  species2 = c(0, 0, 0.033)
)

I've tried multiple things with dplyr functions like group_by and summarise. For example:

df_summed <- df %>%
  group_by(across(1:21)) %>%
  summarise(across(22:ncol(df), sum), .groups = "drop")

but this gives me the error

Caused by error in `across()`:
! Can't subset columns past the end.
ℹ Locations 52, 53, 54, …, 71, and 72 don't exist.
ℹ There are only 51 columns.

while the df does have 72 columns... ncol(df) yields 72

Could anyone assist me how to perform this operation?

Share Improve this question asked Mar 3 at 9:00 RobH 1,2921 gold badge12 silver badges23 bronze badges

4 This is just aggregate(.~sample+var1+var2, df, sum). Do not name your data df, this masks stats::df(). Note from the docs dplyr.tidyverse./reference/across.html across() is for columns not rows. You are looking for df |> summarise(across(starts_with('species'), sum), .by = c(sample, var1, var2)) – Friede Commented Mar 3 at 9:03
Potential duplicate: stackoverflow/questions/78160636/… – Edward Commented Mar 3 at 12:14

Add a comment |

2 Answers 2

Sorted by: Reset to default 4

After your group_by, you need to adjust the summarise columns accordingly:

For your toy data, the number of columns to group on is 3 (1 + 2 vars). The columns to summarise on (excluding these grouping columns) is now 1:2, not 4:5.

df_summed <- df %>%
  group_by(across(1:3)) %>%
  summarise(across(1:2, sum), .groups = "drop")
# A tibble: 3 × 5
  sample  var1  var2 species1 species2
   <dbl> <dbl> <dbl>    <dbl>    <dbl>
1      1     3     4    0.07     0    
2      2     7    42    0        0    
3      3     2     2    0.052    0.033

Although Sam's answer advises against using numbers to select columns, this answer explains why yours failed.

As you've tagged dplyr, you can use summarise(across()) the columns that satisfy the condition starts_with("species") and take the sum.

library(dplyr)
out <- df |>
    summarise(
        across(starts_with("species"), sum),
        .by = c(sample, var1, var2)
    )

#   sample var1 var2 species1 species2
# 1      1    3    4    0.070    0.000
# 2      2    7   42    0.000    0.000
# 3      3    2    2    0.052    0.033

identical(out, df_summed)
# [1] TRUE

Regarding the difference between this and the approach in your question, here is an extract from the data.table FAQ:

You may have heard that it is generally bad practice to refer to columns by number rather than name, though. If your colleague comes along and reads your code later they may have to hunt around to find out which column is number 5. If you or they change the column ordering higher up in your R program, you may produce wrong results with no warning or error if you fet to change all the places in your code which refer to column number 5. That is your fault not R’s or data.table’s. It’s really really bad. Please don’t do it.

Alternative approach

The above is relatively little code but it's not really consistent with the tidyverse philosophy which outlines that data should be stored in the format where:

Each variable is a column; each column is a variable.

Each observation is a row; each row is an observation.

Each value is a cell; each cell is a single value.

This is not the case in your data. It can be rearranged into tidy format as follows:

df_long <- df |>
    tidyr::pivot_longer(starts_with("species"), names_to = "species")
# # A tibble: 40 × 5
#    sample  var1  var2 species  value
#     <dbl> <dbl> <dbl> <chr>    <dbl>
#  1      1     3     4 species1  0
#  2      1     3     4 species2  0
#  3      1     3     4 species1  0
#  4      1     3     4 species2  0
#  5      1     3     4 species1  0.05
#  6      1     3     4 species2  0
#  7      1     3     4 species1  0
#  8      1     3     4 species2  0
#  9      1     3     4 species1  0
# 10      1     3     4 species2  0
# # ℹ 30 more rows
# # ℹ Use `print(n = ...)` to see more rows

Or, if your real columns do not actually start with "species", selecting all but certain columns will lead to the same output, e.g. tidyr::pivot_longer(!c(sample, var1, var2), names_to = "species").

It is more natural to summarise() in this format as you can simply do so by group and there's no need to iterate over columns:

out_long <- df_long |>
    summarise(
        value = sum(value),
        .by = c(sample, var1, var2, species)
    )
# # A tibble: 6 × 5
#   sample  var1  var2 species  value
#    <dbl> <dbl> <dbl> <chr>    <dbl>
# 1      1     3     4 species1 0.07
# 2      1     3     4 species2 0
# 3      2     7    42 species1 0
# 4      2     7    42 species2 0
# 5      3     2     2 species1 0.052
# 6      3     2     2 species2 0.033

This is probably the format it makes sense to keep the data in for further analysis. However, if you need it in wide format to present it then you can pivot back:

out_long |>
    tidyr::pivot_wider(
        id_cols = c(sample, var1, var2),
        names_from = species
    )
# ^^ same as desired output

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1745102620a4611368.html

admin

questions
url rewriting - Custom taxonomy named 'tag' return 404 page
I registered a custom taxonomy named 'tag', code below:add_action('init', 'reg_custom_tag'
admin
29分钟前
10
questions
wp mail - upload file with front-end submission and forward the data in an email
I have an email set up using wp_mail that forwards my front-end submission data as well as posting it to the dashboard.
admin
28分钟前
10
questions
javascript - JQuery Smooth Scroll to Anchor with Sticky Navigation Bar - Stack Overflow
I have a webpage with a sticky header that I'm attempting to implement smooth scrolling to anchor
admin
28分钟前
10
questions
javascript - Adding collision detection to images drawn on canvas - Stack Overflow
I'm creating a simple web based game using the HTML5 canvas and JavaScript. I currently have some
admin
25分钟前
10
questions
mongodb - Installation of FIWARE Orion Context Broker (OCB) in Raspberry Pi model 4 (RP4) - Stack Overflow
The instructions at.mdhave allowed us to install the OCB and mongoDB on a raspbery pi 4, although we
admin
22分钟前
00
questions
toggle - how to make javascript object disappear when click on rest of page - Stack Overflow
How to change the following code so when clicking anywhere on the web page, the line "This is foo&
admin
19分钟前
00
questions
scrapy booking with playwright-python return an error - Stack Overflow
I'm using scrapy and playwright to scrape bookingin this way I need to click on a button and get
admin
17分钟前
00
questions
javascript - How to find all the expanded or collapsed rows in Kendo UI grid hierarchy? - Stack Overflow
How to find all the expanded or collapsed rows in Kendo UI grid hierarchy?I can find the count of all
admin
17分钟前
00
questions
plsql - Block boundries in oracle plsql - Stack Overflow
The following code is compiled successfully in Oracle PLSQL:BEGINNULL;-- some codesEND LOOP;My questi
admin
15分钟前
10
questions
javascript - dynamically update the scale type of chartjs graph - Stack Overflow
I have a bar chart on a page that has data displayed that is dynamically updated.The problem I have i
admin
14分钟前
10
questions
Change WordPress search Permalink structure
Please I want to change my WordPress search permalink structure, from + as the separator to -Example:+help+me+outToI don
admin
14分钟前
00
questions
javascript - How to make a texture always face the camera ..? - Stack Overflow
Update 5Created another fiddle to show what is expected would look like. An invisible skydome and a cub
admin
13分钟前
00
questions
java - Spring Boot Many to many POST both ways - Stack Overflow
newbie to spring boot here.I've been following this tutorial: ;t=1145sMy implementation is about
admin
12分钟前
10
questions
javascript - Float button to the right on tinyMCE toolbar - Stack Overflow
What I'm trying to do is create a toolbar with some default buttons aligned to the left but then h
admin
9分钟前
10
questions
javascript - moduleDirectories key does not make it possible to import my test utils - Stack Overflow
I want to test my Expo React Native app with Jest and @testing-libreact-native.I have the following se
admin
5分钟前
00
questions
javascript - Sending AJAX request to server with jQuery - Stack Overflow
I want to send a very basic post request using AJAX to the same url. It should take the username and ch
admin
3分钟前
00
questions
javascript - How to make assertion for below body request via Cypress. Thanks - Stack Overflow
I need to test the properties that I have in the below JSON by using Cypress. I think I have a problem
admin
2分钟前
00
questions
javascript - SAPUI5 Data binding for table not working - Stack Overflow
I am new into SAPUI5 development, and i have problems with data binding in a table. In my other tables
admin
1分钟前
00
questions
javascript - Jquery ajax call inside a then function - Stack Overflow
So I need two ajax calls to get all the data. I am using jQuery's ajax call to achieve that. But t
admin
1分钟前
00
questions
javascript - Change react-modal data dynamically - Stack Overflow
I have a Parent ponent, App.js and a Child ponent, MealModal.js. When a user click on a specific meal c
admin
1分钟前
00

发表回复

评论列表（0条）

暂无评论

Summing a set of R dataframe rows (column-wise), while retaining the first n columns - Stack Overflow

2 Answers 2

Alternative approach

发表回复

评论列表（0条）

联系我们

400-800-8888

Summing a set of R dataframe rows (column-wise), while retaining the first n columns - Stack Overflow

2 Answers 2

Alternative approach

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888