This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.
evaluate_po( model, occ_pred, bg_pred = NULL, var_pred, threshold = NULL, visualize = FALSE )
isolation_forest) The extended isolation forest SDM.
It could be the item
POIsotree made by
vector contains predicted values
at occurrence locations.
numeric) the vector contains predicted values
with same number of background points.
numeric) the vector contains predicted values
of the whole area. The reason to take a vector is to keep this function
flexible for multiple types of output.
NULL) The threshold to calculate
threshold-based evaluation metrics. If
NULL, a recommended threshold
will be calculated based on optimal TSS value. The default is
TRUE, plot the evaluation figures.
The default is
POEvaluation) A list of
po_evaluation is presence-only evaluation metrics. It is a list of
list) A list of CVI with 0.25, 0.5, and 0.75 as threshold
list) A list of items related to continuous Boyce index (CBI)
list) A list of ROC ratio and AUC ratio
pb_evaluation is presence-background evaluation metrics. It is a list of
confusion matrix (
table) A table of confusion matrix. The columns are
true values, and the rows are predicted values.
numeric) The sensitivity or TPR
numeric) The specificity or TNR
list) A list of info related to true skill statistic (TSS)
numeric) A vector of cutoff threshold values
numeric) A vector of TSS for each cutoff threshold
Recommended threshold (
numeric) A recommended threshold
according to TSS
Optimal TSS (
numeric) The best TSS value
list) A list of ROC values and AUC value
Jaccard's similarity index (
numeric) The Jaccard's similarity index
Sørensen's similarity index (
numeric) The Sørensen's similarity index
Overprediction rate (
numeric) The Overprediction rate
Underprediction rate (
numeric) The Underprediction rate
CVI is the proportion of presence points falling in cells having
a threshold (
0.5 for example) habitat suitability index minus
the proportion of cells within this range of threshold of the model.
Here we used varied thresholds:
continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.
ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.
Sensitivity (TPR) = TP/(TP + FN)
Specificity (TNR) = TN/(TN + FP)
True skill statistic (TSS) = Sensitivity + specificity - 1
Jaccard's similarity index = TP/(FN + TP + FP)
Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)
Overprediction rate = FP/(TP + FP)
Underprediction rate = FN/(TP + FN)
Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008
Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017
Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3
Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402
# Using a pseudo presence-only occurrence dataset of # virtual species provided in this package library(dplyr) library(sf) library(stars) library(itsdm) data("occ_virtual_species") obs_df <- occ_virtual_species %>% filter(usage == "train") eval_df <- occ_virtual_species %>% filter(usage == "eval") x_col <- "x" y_col <- "y" obs_col <- "observation" # Format the observations obs_train_eval <- format_observation( obs_df = obs_df, eval_df = eval_df, x_col = x_col, y_col = y_col, obs_col = obs_col, obs_type = "presence_only") env_vars <- system.file( 'extdata/bioclim_tanzania_10min.tif', package = 'itsdm') %>% read_stars() %>% slice('band', c(1, 5, 12, 16)) # With perfect_presence mode, # which should be very rare in reality. mod <- isotree_po( obs_mode = "perfect_presence", obs = obs_train_eval$obs, obs_ind_eval = obs_train_eval$eval, variables = env_vars, ntrees = 30, sample_size = 0.8, ndim = 2L, seed = 123L, response = FALSE, spatial_response = FALSE, check_variable = FALSE) # Without background samples or absences eval_train <- evaluate_po( mod$model, occ_pred = mod$pred_train$prediction, var_pred = na.omit(as.vector(mod$prediction[]))) print(eval_train) # With background samples bg_pred <- st_extract( mod$prediction, mod$background_samples) %>% st_drop_geometry() eval_train <- evaluate_po( mod$model, occ_pred = mod$pred_train$prediction, bg_pred = bg_pred$prediction, var_pred = na.omit(as.vector(mod$prediction[]))) plot(eval_train) #'