This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.
Arguments
- model
(
isolation_forest
) The extended isolation forest SDM. It could be the itemmodel
ofPOIsotree
made by functionisotree_po
.- occ_pred
(
vector
ofnumeric
) Avector
contains predicted values at occurrence locations.- bg_pred
(
vector
ofnumeric
) the vector contains predicted values with same number of background points.- var_pred
(
vector
ofnumeric
) the vector contains predicted values of the whole area. The reason to take a vector is to keep this function flexible for multiple types of output.- threshold
(
numeric
orNULL
) The threshold to calculate threshold-based evaluation metrics. IfNULL
, a recommended threshold will be calculated based on optimal TSS value. The default isNULL
.- visualize
(
logical
) IfTRUE
, plot the evaluation figures. The default isFALSE
.
Value
(POEvaluation
) A list of
po_evaluation is presence-only evaluation metrics. It is a list of
cvi (
list
) A list of CVI with 0.25, 0.5, and 0.75 as thresholdboyce (
list
) A list of items related to continuous Boyce index (CBI)roc_ratio (
list
) A list of ROC ratio and AUC ratio
pb_evaluation is presence-background evaluation metrics. It is a list of
confusion matrix (
table
) A table of confusion matrix. The columns are true values, and the rows are predicted values.sensitivity (
numeric
) The sensitivity or TPRspecificity (
numeric
) The specificity or TNRTSS (
list
) A list of info related to true skill statistic (TSS)cutoff (
vector
ofnumeric
) A vector of cutoff threshold valuestss (
vector
ofnumeric
) A vector of TSS for each cutoff thresholdRecommended threshold (
numeric
) A recommended threshold according to TSSOptimal TSS (
numeric
) The best TSS value
roc (
list
) A list of ROC values and AUC valueJaccard's similarity index (
numeric
) The Jaccard's similarity indexSørensen's similarity index (
numeric
) The Sørensen's similarity index or F-measureOverprediction rate (
numeric
) The Overprediction rateUnderprediction rate (
numeric
) The Underprediction rate
Details
CVI is the proportion of presence points falling in cells having a threshold (
0.5
for example) habitat suitability index minus the proportion of cells within this range of threshold of the model. Here we used varied thresholds:0.25
,0.5
, and0.75
.continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.
ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.
Sensitivity (TPR) = TP/(TP + FN)
Specificity (TNR) = TN/(TN + FP)
True skill statistic (TSS) = Sensitivity + specificity - 1
Jaccard's similarity index = TP/(FN + TP + FP)
Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)
Overprediction rate = FP/(TP + FP)
Underprediction rate = FN/(TP + FN)
References
Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008
Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017
Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3
Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With perfect_presence mode,
# which should be very rare in reality.
mod <- isotree_po(
obs_mode = "perfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Without background samples or absences
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)
# With background samples
bg_pred <- st_extract(
mod$prediction, mod$background_samples) %>%
st_drop_geometry()
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
bg_pred = bg_pred$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
#'