ggplot2 - How to use values from PCA calculation as ggplot labels for axes in R - Stack Overflow

When I run a principal components analysis in R and then plot 2 of the PCs in ggplot I would like to be

When I run a principal components analysis in R and then plot 2 of the PCs in ggplot I would like to be able to have the axis labels automatically include which PC # is on the axis and the percent of variation it explains. Right now I have to change the labels manually when I switch to different PCs.

I have example code here (I've left out quite a bit of code that I believe has nothing to do with the question but please let me know if I'm mistaken):

# Example dataset SR
SR = structure(list(Site_ID = 1:6, A = c(0.102, 1.34, 0.875, 0.564, 
0.075, 0.141), B = c(0.01, 0.05, 0.021, 0.018, 0.006, 0.144), 
    C = c(1.329, 2.029, 2.466, 6.648, 0.735, 2.49), D = c(0.025, 
    0.045, 0.039, 0.024, 0.045, 0.112), E = c(0.007, 0.001618893, 
    0.022, 0.018, 0.006, 0.035), F = c(17.52188, 27.412, 18.69, 
    118.8684, 9.7188, 2.9904)), class = "data.frame", row.names = c(NA, 
-6L))

##### PCA calcuation ##############################################

SR.pca <- prcomp(SR, scale=TRUE, retx=TRUE)
PCAvalues <- summary(SR.pca)

#Example output
#Importance of components:
#                          PC1    PC2    PC3     PC4     PC5       PC6
#Standard deviation     2.3467 2.1712 1.8408 1.12707 1.05835 8.756e-16
#Proportion of Variance 0.3442 0.2946 0.2118 0.07939 0.07001 0.000e+00
#Cumulative Proportion  0.3442 0.6388 0.8506 0.92999 1.00000 1.000e+00

summ <- summary(SR.pca)$importance[2,] 
#gives proportion of variance for each PC
summ

#Example output
#PC1     PC2     PC3     PC4     PC5     PC6 
#0.34420 0.29463 0.21178 0.07939 0.07001 0.00000

###### Graph the PCA ########

library(ggplot2)
ggplot(PCAvalues, aes(x = PC1, y = PC2)) +
  geom_text(data=PCAvalues, aes(x = PC1, y = PC2, label=Site_ID), size=2)+
  scale_color_gradient(low = "red", high = "blue") + 
  coord_equal() +
  labs(color="Dist. Grad.")+
  theme_bw() 
 # + labs(y = "PC2 (29.46%)", x = "PC1 (34.42%)")

Right now I have to manually change this last line of code every time I change which PCs I'm plotting. If it could somehow take the PC # (i.e. PC2) from ggplot(PCAvalues, aes(x = PC1, y = PC2)) and the % from summ for the label that would be awesome.

When I run a principal components analysis in R and then plot 2 of the PCs in ggplot I would like to be able to have the axis labels automatically include which PC # is on the axis and the percent of variation it explains. Right now I have to change the labels manually when I switch to different PCs.

I have example code here (I've left out quite a bit of code that I believe has nothing to do with the question but please let me know if I'm mistaken):

# Example dataset SR
SR = structure(list(Site_ID = 1:6, A = c(0.102, 1.34, 0.875, 0.564, 
0.075, 0.141), B = c(0.01, 0.05, 0.021, 0.018, 0.006, 0.144), 
    C = c(1.329, 2.029, 2.466, 6.648, 0.735, 2.49), D = c(0.025, 
    0.045, 0.039, 0.024, 0.045, 0.112), E = c(0.007, 0.001618893, 
    0.022, 0.018, 0.006, 0.035), F = c(17.52188, 27.412, 18.69, 
    118.8684, 9.7188, 2.9904)), class = "data.frame", row.names = c(NA, 
-6L))

##### PCA calcuation ##############################################

SR.pca <- prcomp(SR, scale=TRUE, retx=TRUE)
PCAvalues <- summary(SR.pca)

#Example output
#Importance of components:
#                          PC1    PC2    PC3     PC4     PC5       PC6
#Standard deviation     2.3467 2.1712 1.8408 1.12707 1.05835 8.756e-16
#Proportion of Variance 0.3442 0.2946 0.2118 0.07939 0.07001 0.000e+00
#Cumulative Proportion  0.3442 0.6388 0.8506 0.92999 1.00000 1.000e+00

summ <- summary(SR.pca)$importance[2,] 
#gives proportion of variance for each PC
summ

#Example output
#PC1     PC2     PC3     PC4     PC5     PC6 
#0.34420 0.29463 0.21178 0.07939 0.07001 0.00000

###### Graph the PCA ########

library(ggplot2)
ggplot(PCAvalues, aes(x = PC1, y = PC2)) +
  geom_text(data=PCAvalues, aes(x = PC1, y = PC2, label=Site_ID), size=2)+
  scale_color_gradient(low = "red", high = "blue") + 
  coord_equal() +
  labs(color="Dist. Grad.")+
  theme_bw() 
 # + labs(y = "PC2 (29.46%)", x = "PC1 (34.42%)")

Right now I have to manually change this last line of code every time I change which PCs I'm plotting. If it could somehow take the PC # (i.e. PC2) from ggplot(PCAvalues, aes(x = PC1, y = PC2)) and the % from summ for the label that would be awesome.

Share Improve this question edited Nov 20, 2024 at 19:16 Friede 8,4512 gold badges9 silver badges29 bronze badges asked Nov 20, 2024 at 18:43 Bridget WheelockBridget Wheelock 111 silver badge2 bronze badges 1
  • 1 There are dozens of ways. The simplest would be paste. However, please correct your code so others can copy-paste and run without error(s). – Friede Commented Nov 20, 2024 at 19:01
Add a comment  | 

2 Answers 2

Reset to default 0

I can't run your code because PCAvalues is a list, not a data frame, and ggplot() cannot use it. Here is an example of a function that takes a data frame, the names of two columns, and a named vector and makes a plot from the data frame labeled with the column names and the corresponding values in the named vector. I think this corresponds to your data.

library(ggplot2)
DF <- data.frame(PC1 = rnorm(10), PC2 = rnorm(10), PC3 = rnorm(10))
summ <- c(PC1 = 42.3, PC2 = 23.0, PC3 = 5.2)

plotFunc <- function(DATA, col1, col2, vec) {
  label1 = paste(col1, round(vec[col1],2), "%")
  label2 = paste(col2, round(vec[col2],2), "%")
  ggplot(DATA, aes(.data[[col1]], .data[[col2]])) + geom_point() +
    labs(x = label1, y = label2)
}
plotFunc(DF, "PC1","PC3", summ)

Created on 2024-11-20 with reprex v2.1.1

Here is a tweak using sprintf(), which I personally like.

 SR = structure(
   list(
     Site_ID = 1:6,
     A = c(0.102, 1.34, 0.875, 0.564, 0.075, 0.141),
     B = c(0.01, 0.05, 0.021, 0.018, 0.006, 0.144),
     C = c(1.329, 2.029, 2.466, 6.648, 0.735, 2.49),
     D = c(0.025, 0.045, 0.039, 0.024, 0.045, 0.112),
     E = c(0.007, 0.001618893, 0.022, 0.018, 0.006, 0.035),
     F = c(17.52188, 27.412, 18.69, 118.8684, 9.7188, 2.9904)
   ),
   class = "data.frame",
   row.names = c(NA, -6L))
 
 SR.pca = prcomp(SR, scale=TRUE, retx=TRUE)
 summ = summary(SR.pca)$importance[2, ] 
 
 library(ggfortify)
#> Loading required package: ggplot2
 library(ggplot2)
 
 ggplot(SR.pca, aes(x=PC1, y=PC2)) + 
   geom_text(aes(label=Site_ID), size=5) + 
   coord_equal() + 
   theme_bw() + 
   labs(x=sprintf("PCA1 (%.2f%%)", 100*summ[1]), 
        y=sprintf("PCA2 (%.2f%%)", 100*summ[2]))

It remains unclear to me what the lines concerning colour should do. Add them back in please. Currently, the plot is ... More natural to {ggplot2} is {scales}, have a look.

Alternatively, you could overwrite autoplot(), i.e.

library(ggfortify)
library(ggplot2)
autoplot(SR.pca) +
  theme_bw() 

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742336659a4424786.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信