Calculate how a species responses to environmental variables using Shapley values.
Usage
shap_dependence(
model,
var_occ,
variables,
si = 1000,
shap_nsim = 100,
visualize = FALSE,
seed = 10,
pfun = .pfun_shap
)
Arguments
- model
(
isolation_forest
or other model). The SDM. It could be the itemmodel
ofPOIsotree
made by functionisotree_po
. It also could be other user-fitted models as long as thepfun
can work on it.- var_occ
(
data.frame
,tibble
) Thedata.frame
style table that include values of environmental variables at occurrence locations.- variables
(
stars
) Thestars
of environmental variables. It should have multipleattributes
instead ofdims
. If you haveraster
object instead, you could usest_as_stars
to convert it tostars
or useread_stars
directly read source data as astars
. You also could use itemvariables
ofPOIsotree
made by functionisotree_po
.- si
(
integer
) The number of samples to generate response curves. If it is too small, the response curves might be biased. The default value is1000
.- shap_nsim
(
integer
) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. When the number of variables is large, a smaller shap_nsim could be used. See details in documentation of functionexplain
in packagefastshap
. The default is 100.- visualize
(
logical
) ifTRUE
, plot the variable dependence plots. The default isFALSE
.- seed
(
integer
) The seed for any random progress. The default is10
.- pfun
(
function
) The predict function that requires two arguments,object
andnewdata
. It is only required whenmodel
is notisolation_forest
. The default is the wrapper function designed for iForest model initsdm
.
Value
(ShapDependence
) A list of
dependences_cont (
list
) A list of Shapley values of continuous variablesdependences_cat (
list
) A list of Shapley values of categorical variablesfeature_values (
data.frame
) A table of feature values
Details
The values show how each environmental variable independently affects the modeling prediction. They show how the Shapley value of each variable changes as its value is varied.
References
Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.doi:10.1007/s10115-013-0679-x
See also
plot.ShapDependence
explain
in fastshap
Examples
# \donttest{
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_dependence <- shap_dependence(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(var_dependence, target_var = "bio1", related_var = "bio16")
# }
if (FALSE) { # \dontrun{
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
shap_dependences <- shap_dependence(
model = mod_rf,
var_occ = model_data %>% select(-occ),
variables = env_vars,
visualize = FALSE,
seed = 10,
pfun = pfun)
} # }