R/variable_contrib.R
variable_contrib.Rd
Evaluate variable contribution for targeted observations according to SHapley Additive exPlanations (SHAP).
variable_contrib(
model,
var_occ,
var_occ_analysis,
shap_nsim = 100,
visualize = FALSE,
seed = 10,
pfun = .pfun_shap
)
(isolation_forest
or other model) The SDM.
It could be the item model
of POIsotree
made by function isotree_po
.
It also could be other user-fitted models as long as the pfun
can work on it.
(data.frame
, tibble
) The data.frame
style table that
include values of environmental variables at occurrence locations.
(data.frame
, tibble
) The data.frame
style table that
include values of environmental variables at occurrence locations for analysis. It
could be either var_occ
or its subset, or any new dataset.
(integer
) The number of Monte Carlo repetitions in SHAP
method to use for estimating each Shapley value. See details in documentation of
function explain
in package fastshap
.
(logical
) if TRUE
, plot the response curves.
The default is FALSE
.
(integer
) The seed for any random progress. The default is 10L
.
(function
) The predict function that requires two arguments,
object
and newdata
.
It is only required when model
is not isolation_forest
.
The default is the wrapper function designed for iForest model in itsdm
.
(VariableContribution
) A list of
shapley_values (data.frame
) A table of Shapley values of each variables for
all observations
feature_values (tibble
) A table of values of each variables for all
observations
plot.VariableContribution
explain
in fastshap
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 5,
sample_size = 0.8, ndim = 1L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_contribution <- variable_contrib(
model = mod$model,
var_occ = mod$vars_train,
var_occ_analysis = mod$vars_train %>% slice(1:2))
if (FALSE) {
plot(var_contribution,
num_features = 3,
plot_each_obs = TRUE)
# Plot together
plot(var_contribution)
}