Evaluate variable contribution for targeted observations according to SHapley Additive exPlanations (SHAP).

variable_contrib(
  model,
  var_occ,
  var_occ_analysis,
  shap_nsim = 100,
  visualize = FALSE,
  seed = 10,
  pfun = .pfun_shap
)

Arguments

model

(isolation_forest or other model) The SDM. It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

var_occ_analysis

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations for analysis. It could be either var_occ or its subset, or any new dataset.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap.

visualize

(logical) if TRUE, plot the response curves. The default is FALSE.

seed

(integer) The seed for any random progress. The default is 10L.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

Value

(VariableContribution) A list of

  • shapley_values (data.frame) A table of Shapley values of each variables for all observations

  • feature_values (tibble) A table of values of each variables for all observations

See also

Examples


# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 5,
  sample_size = 0.8, ndim = 1L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_contribution <- variable_contrib(
  model = mod$model,
  var_occ = mod$vars_train,
  var_occ_analysis = mod$vars_train %>% slice(1:2))
if (FALSE) {
plot(var_contribution,
  num_features = 3,
  plot_each_obs = TRUE)

# Plot together
plot(var_contribution)
}