Plot Shapley value-based variable dependence curves using ggplot2 by optionally selecting target variable(s). It also can plot the interaction between a related variable to the selected variable(s).

# S3 method for ShapDependence
plot(
  x,
  target_var = NA,
  related_var = NA,
  sample_prop = 0.3,
  sample_bin = 100,
  smooth_line = TRUE,
  seed = 123,
  ...
)

Arguments

x

(ShapDependence) The variable dependence object to plot. It could be the return of function shap_dependence.

target_var

(vector of character) The target variable to plot. It could be NA. If it is NA, all variables will be plotted.

related_var

(character) The dependent variable to plot together with target variables. It could be NA. If it is NA, no related variable will be plotted.

sample_prop

(numeric) The proportion of points to sample for plotting. It will be ignored if the number of points is less than 1000. The default is 0.3.

sample_bin

(integer) The number of bins to use for stratified sampling.

smooth_line

(logical) Whether to fit the smooth line or not. It will be ignored if the number of points is less than 1000. The default is 100.

seed

(integer) The seed for sampling. It will be ignored if the number of points is less than 1000. The default is 123.

...

Other arguments passed on to geom_smooth. Mainly method and formula to fit the smooth line. Note that the same arguments will be used for all target variables. User could set variable one by one to set the arguments separately.

Value

ggplot2 figure of dependent curves

Details

If the number of samples is more than 1000, a stratified sampling is used to thin the sample pool, and then plot its subset. The user could set a proportion to sample and a number of bins for stratified sampling.

See also

Examples

# \donttest{
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12, 16))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 20,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

var_dependence <- shap_dependence(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables)
plot(var_dependence, target_var = 'bio1', related_var = 'bio12')
# }