R/plot.R
plot.ShapDependence.Rd
Plot Shapley value-based variable dependence curves using ggplot2 by optionally selecting target variable(s). It also can plot the interaction between a related variable to the selected variable(s).
# S3 method for ShapDependence
plot(
x,
target_var = NA,
related_var = NA,
sample_prop = 0.3,
sample_bin = 100,
smooth_line = TRUE,
seed = 123,
...
)
(ShapDependence
) The variable dependence object to plot.
It could be the return of function shap_dependence
.
(vector
of character
) The target variable to plot. It could be
NA
. If it is NA
, all variables will be plotted.
(character
) The dependent variable to plot together with
target variables. It could be NA
. If it is NA
, no related variable will be
plotted.
(numeric
) The proportion of points to sample for plotting.
It will be ignored if the number of points is less than 1000.
The default is 0.3
.
(integer
) The number of bins to use for stratified sampling.
(logical
) Whether to fit the smooth line or not.
It will be ignored if the number of points is less than 1000.
The default is 100.
(integer
) The seed for sampling.
It will be ignored if the number of points is less than 1000.
The default is 123.
Other arguments passed on to geom_smooth
. Mainly
method
and formula
to fit the smooth line. Note that the same arguments
will be used for all target variables. User could set variable one by one to
set the arguments separately.
ggplot2
figure of dependent curves
If the number of samples is more than 1000, a stratified sampling is used to thin the sample pool, and then plot its subset. The user could set a proportion to sample and a number of bins for stratified sampling.
# \donttest{
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_dependence <- shap_dependence(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(var_dependence, target_var = 'bio1', related_var = 'bio12')
# }