Evaluate variable contributions for targeted observations.
Source:R/variable_contrib.R
variable_contrib.Rd
Evaluate variable contribution for targeted observations according to SHapley Additive exPlanations (SHAP).
Usage
variable_contrib(
model,
var_occ,
var_occ_analysis,
shap_nsim = 100,
visualize = FALSE,
seed = 10,
pfun = .pfun_shap
)
Arguments
- model
(
isolation_forest
or other model) The SDM. It could be the itemmodel
ofPOIsotree
made by functionisotree_po
. It also could be other user-fitted models as long as thepfun
can work on it.- var_occ
(
data.frame
,tibble
) Thedata.frame
style table that include values of environmental variables at occurrence locations.- var_occ_analysis
(
data.frame
,tibble
) Thedata.frame
style table that include values of environmental variables at occurrence locations for analysis. It could be eithervar_occ
or its subset, or any new dataset.- shap_nsim
(
integer
) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of functionexplain
in packagefastshap
.- visualize
(
logical
) ifTRUE
, plot the response curves. The default isFALSE
.- seed
(
integer
) The seed for any random progress. The default is10L
.- pfun
(
function
) The predict function that requires two arguments,object
andnewdata
. It is only required whenmodel
is notisolation_forest
. The default is the wrapper function designed for iForest model initsdm
.
Value
(VariableContribution
) A list of
shapley_values (
data.frame
) A table of Shapley values of each variables for all observationsfeature_values (
tibble
) A table of values of each variables for all observations
See also
plot.VariableContribution
explain
in fastshap
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 5,
sample_size = 0.8, ndim = 1L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_contribution <- variable_contrib(
model = mod$model,
var_occ = mod$vars_train,
var_occ_analysis = mod$vars_train %>% slice(1:2))
if (FALSE) { # \dontrun{
plot(var_contribution,
num_features = 3,
plot_each_obs = TRUE)
# Plot together
plot(var_contribution)
} # }