This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.
evaluate_po(
model,
occ_pred,
bg_pred = NULL,
var_pred,
threshold = NULL,
visualize = FALSE
)
(isolation_forest
) The extended isolation forest SDM.
It could be the item model
of POIsotree
made by
function isotree_po
.
(vector
of numeric
) A vector
contains predicted values
at occurrence locations.
(vector
of numeric
) the vector contains predicted values
with same number of background points.
(vector
of numeric
) the vector contains predicted values
of the whole area. The reason to take a vector is to keep this function
flexible for multiple types of output.
(numeric
or NULL
) The threshold to calculate
threshold-based evaluation metrics. If NULL
, a recommended threshold
will be calculated based on optimal TSS value. The default is NULL
.
(logical
) If TRUE
, plot the evaluation figures.
The default is FALSE
.
(POEvaluation
) A list of
po_evaluation is presence-only evaluation metrics. It is a list of
cvi (list
) A list of CVI with 0.25, 0.5, and 0.75 as threshold
boyce (list
) A list of items related to continuous Boyce index (CBI)
roc_ratio (list
) A list of ROC ratio and AUC ratio
pb_evaluation is presence-background evaluation metrics. It is a list of
confusion matrix (table
) A table of confusion matrix. The columns are
true values, and the rows are predicted values.
sensitivity (numeric
) The sensitivity or TPR
specificity (numeric
) The specificity or TNR
TSS (list
) A list of info related to true skill statistic (TSS)
cutoff (vector
of numeric
) A vector of cutoff threshold values
tss (vector
of numeric
) A vector of TSS for each cutoff threshold
Recommended threshold (numeric
) A recommended threshold
according to TSS
Optimal TSS (numeric
) The best TSS value
roc (list
) A list of ROC values and AUC value
Jaccard's similarity index (numeric
) The Jaccard's similarity index
Sørensen's similarity index (numeric
) The Sørensen's similarity index
or F-measure
Overprediction rate (numeric
) The Overprediction rate
Underprediction rate (numeric
) The Underprediction rate
CVI is the proportion of presence points falling in cells having
a threshold (0.5
for example) habitat suitability index minus
the proportion of cells within this range of threshold of the model.
Here we used varied thresholds: 0.25
, 0.5
, and 0.75
.
continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.
ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.
Sensitivity (TPR) = TP/(TP + FN)
Specificity (TNR) = TN/(TN + FP)
True skill statistic (TSS) = Sensitivity + specificity - 1
Jaccard's similarity index = TP/(FN + TP + FP)
Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)
Overprediction rate = FP/(TP + FP)
Underprediction rate = FN/(TP + FN)
Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008
Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017
Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3
Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With perfect_presence mode,
# which should be very rare in reality.
mod <- isotree_po(
obs_mode = "perfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Without background samples or absences
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)
# With background samples
bg_pred <- st_extract(
mod$prediction, mod$background_samples) %>%
st_drop_geometry()
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
bg_pred = bg_pred$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
#'