r - SHAP values for random survival forest - Stack Overflow

I want to plot the SHAP values for my RSF model; here is the code and error:xvars <- c("RIDRET

I want to plot the SHAP values for my RSF model; here is the code and error:

xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")

X <- Data[sample(nrow(Data), 1000), xvars]

bg_X <- Data[sample(nrow(Data), 200), ]


system.time(
  ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks

ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )

Error: Fejl i align_pred(pred_fun(object, bg_X, ...)) : Predictions must be numeric! Timing stopped at: 0.03 0.05 0.11

These are my predictions:

rf_mort_nutrients_withoutage_1018_all$predicted
   [1]  81.31376  75.82491  99.35944  58.63055  67.65847  98.32906  75.33934 107.81604  62.22175  75.69875  69.99881  83.67161  81.39735  65.59381

I am not sure why it is not working. Anyone has an idea?

I want to plot the SHAP values for my RSF model; here is the code and error:

xvars <- c("RIDRETH1", "RXDLIPID", "DRXTKCAL", "DRXTPROT", "DRXTCARB", "DRXTCHOL", "DRXTFIBE", "DRXTVARA", "DRXTATOC", "DRXTSODI", "DRXTPOTA", "DRXTM161", "DRXTM181", "DRXTM201", "DRXTM221", "DRXTP182", "DRXTP183", "DRXTP184", "DRXTP204", "DRXTP205", "DRXTP225", "DRXTP226", "DRXTRET", "DRXT_G_TOTAL", "DRXT_V_STARCHY_TOTAL", "DRXTS160", "DRXTS180", "DRXTsumSFA", "INDFMPIR", "LBXCOT", "GENDERRC")

X <- Data[sample(nrow(Data), 1000), xvars]

bg_X <- Data[sample(nrow(Data), 200), ]


system.time(
  ks <- kernelshap(rf_mort_nutrients_withoutage_1018_all, X, bg_X = bg_X, type = 'prob')
)
ks

ks <- shapviz(ks)
sv_importance(ks, kind = "bee", )

Error: Fejl i align_pred(pred_fun(object, bg_X, ...)) : Predictions must be numeric! Timing stopped at: 0.03 0.05 0.11

These are my predictions:

rf_mort_nutrients_withoutage_1018_all$predicted
   [1]  81.31376  75.82491  99.35944  58.63055  67.65847  98.32906  75.33934 107.81604  62.22175  75.69875  69.99881  83.67161  81.39735  65.59381

I am not sure why it is not working. Anyone has an idea?

Share edited Nov 22, 2024 at 17:03 Ben Reiniger 12.7k3 gold badges22 silver badges40 bronze badges asked Nov 22, 2024 at 6:49 mtvprmtvpr 111 silver badge1 bronze badge 4
  • What do you get from predict(rf_mort_nutrients_withoutage_1018_all, X, type = 'prob')? You should get numeric output. Without reproducible example, we won't be able to help. – Michael M Commented Nov 22, 2024 at 18:21
  • @MichaelM I get this: Sample size of test (predict) data: 1000 Number of grow trees: 200 Average no. of grow terminal nodes: 35.705 Total no. of grow variables: 31 Resampling used to grow trees: swor Resample size used to grow trees: 37243 Analysis: RSF Family: surv – mtvpr Commented Nov 25, 2024 at 9:16
  • Doesn't sound particularly numeric. Maybe you could extend the question and explain what are you trying to do. Somewhere you write "survival", but use kernelshap as in a classification situation. – Michael M Commented Nov 25, 2024 at 20:11
  • @MichaelM I have a rfsrc model rf_mort_nutrients <- rfsrc(Surv(endage, mortality_status) ~ . , data = data, ntree = 200, nodesize = 1000, importance = T) and I would like to plot the SHAP values to understand which variables contribute most to mortality and in which direction, ie, how feature value alters the prediction. Is there a method compatible with random survival forests to have this? – mtvpr Commented Nov 27, 2024 at 9:36
Add a comment  | 

1 Answer 1

Reset to default 1

To analyze continuous rank probability scores, you can work like this:

library(randomForestSRC)
library(survival)
library(kernelshap)
library(shapviz)

head(veteran)
#   trt celltype time status karno diagtime age prior
# 1   1 squamous   72      1    60        7  69     0
# 2   1 squamous  411      1    70        5  64    10
# 3   1 squamous  228      1    60        3  38     0

xvars <- setdiff(colnames(veteran), c("time", "status"))

fit <- rfsrc(
  reformulate(xvars, "Surv(time, status)"),
  data = veteran,
  ntree = 50,
  nodesize = 20,
  importance = TRUE
)

# Function that returns continuous rank probability scores
pred_fun <- function(model, data) {
  predict(model, data)$predicted
}

# Sample <=1000 rows from the training data. veteran is small enough to use all
X_explain <- veteran[xvars]
sv <- kernelshap(fit, X = X_explain, pred_fun = pred_fun) |> 
  shapviz()

sv |> sv_importance(kind = "bee")
sv |> sv_dependence(xvars)

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742293293a4416611.html

相关推荐

  • r - SHAP values for random survival forest - Stack Overflow

    I want to plot the SHAP values for my RSF model; here is the code and error:xvars <- c("RIDRET

    18小时前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信