Calculate shapley values-based spatial response.
Source:R/shap_spatial_response.R
shap_spatial_response.RdCalculate spatially SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.
Usage
shap_spatial_response(
model,
var_occ,
variables,
target_vars = NULL,
shap_nsim = 10,
seed = 10,
pfun = .pfun_shap
)Arguments
- model
(
isolation_forestor other model). It could be the itemmodelofPOIsotreemade by functionisotree_po. It also could be other user-fitted models as long as thepfuncan work on it.- var_occ
(
data.frame,tibble) Thedata.framestyle table that include values of environmental variables at occurrence locations.- variables
(
stars) Thestarsof environmental variables. It should have multipleattributesinstead ofdims. If you haverasterobject instead, you could usest_as_starsto convert it tostarsor useread_starsdirectly read source data as astars. You also could use itemvariablesofPOIsotreemade by functionisotree_po.- target_vars
(a
vectorofcharacter) The selected variables to process. If it isNULL, all variables will be used.- shap_nsim
(
integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of functionexplainin packagefastshap. When the number of variables is large, a smaller shap_nsim could be used. Be cautious that making SHAP-based spatial dependence will be slow because of Monte-Carlo computation for all pixels. But it is worth the time because it is much more informative. See details in documentation of functionexplainin packagefastshap. The default is 10. Usually a value 10 - 20 is enough.- seed
(
integer) The seed for any random progress. The default is10L.- pfun
(
function) The predict function that requires two arguments,objectandnewdata. It is only required whenmodelis notisolation_forest. The default is the wrapper function designed for iForest model initsdm.
Value
(SHAPSpatial) A list of
A list of
starsobject of spatially SHAP-based response of all variables
Details
The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response.
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
shap_spatial <- shap_spatial_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
shap_spatial <- shap_spatial_response(
model = mod$model,
target_vars = c("bio1", "bio12"),
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
if (FALSE) { # \dontrun{
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
shap_spatial <- shap_spatial_response(
model = mod_rf,
target_vars = c("bio1", "bio12"),
var_occ = model_data %>% select(-occ),
variables = env_vars,
shap_nsim = 10,
pfun = pfun)
} # }