Show variable dependence plots and variable interaction plots obtained from Shapley values.
Source:R/plot.R
plot.ShapDependence.Rd
Plot Shapley value-based variable dependence curves using ggplot2 by optionally selecting target variable(s). It also can plot the interaction between a related variable to the selected variable(s).
Usage
# S3 method for class 'ShapDependence'
plot(
x,
target_var = NA,
related_var = NA,
sample_prop = 0.3,
sample_bin = 100,
smooth_line = TRUE,
seed = 123,
...
)
Arguments
- x
(
ShapDependence
) The variable dependence object to plot. It could be the return of functionshap_dependence
.- target_var
(
vector
ofcharacter
) The target variable to plot. It could beNA
. If it isNA
, all variables will be plotted.(
character
) The dependent variable to plot together with target variables. It could beNA
. If it isNA
, no related variable will be plotted.- sample_prop
(
numeric
) The proportion of points to sample for plotting. It will be ignored if the number of points is less than 1000. The default is0.3
.- sample_bin
(
integer
) The number of bins to use for stratified sampling.- smooth_line
(
logical
) Whether to fit the smooth line or not. It will be ignored if the number of points is less than 1000. The default is 100.- seed
(
integer
) The seed for sampling. It will be ignored if the number of points is less than 1000. The default is 123.- ...
Other arguments passed on to
geom_smooth
. Mainlymethod
andformula
to fit the smooth line. Note that the same arguments will be used for all target variables. User could set variable one by one to set the arguments separately.
Details
If the number of samples is more than 1000, a stratified sampling is used to thin the sample pool, and then plot its subset. The user could set a proportion to sample and a number of bins for stratified sampling.
Examples
# \donttest{
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_dependence <- shap_dependence(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(var_dependence, target_var = 'bio1', related_var = 'bio12')
# }