The following is the established format for referencing this article:
Van Wilgenburg, S. L., D. A. W. Miller, D. T. Iles, S. Haché, C. M. Francis, D. D. Hope, J. D. Toms, and K. L. Drake. 2024. Evaluating trade-offs in spatial versus temporal replication when estimating avian community composition and predicting species distributions. Avian Conservation and Ecology 19(1):11.ABSTRACT
Species distribution modeling is important for predicting species responses to environmental change, but model accuracy can be limited by a lack of data in remote areas. Hierarchically stratified surveys (cluster sampling) offer an efficient approach to sampling in remote areas, but an appropriate balance is needed between cost efficiency and statistical independence. The cost-effectiveness of cluster sampling likely varies with temporal sampling intensity (e.g., single vs. multiple repeat samples) due to differences in spatial autocorrelation. Our aim was to assess the trade-offs between spatial and temporal replication and optimize sampling in which temporal replication occurs. We used bootstrap resampling to create alternative designs from multi-species avian point-count and autonomous recording unit data. We varied the number of primary sample units (PSUs), secondary sampling units per PSU (SSUs), and temporal repeat samples (i.e., visits) at each SSU. We fit species accumulation curves to examine how spatial and temporal replication influenced species accumulation. We split data into spatially independent model training and validation datasets and fit species distribution models (SDM) for 47 species using generalized linear models and examined how prediction accuracy changed with sampling intensity to examine the cost-benefit trade-offs between spatial versus temporal replication within PSUs. We found that spatial and temporal replication were partially redundant and adding more visits had less influence on predictive accuracy when there were more SSUs and vice versa. The cost-benefit of increasing spatial replication within PSUs varied with the costs of accessing SSUs. The optimal number of SSUs per PSU varied with both temporal replication and the number of unique PSUs sampled. In general, using ≤ 3 SSUs per PSU produced the most accurate SDM predictions when the number of PSUs was high. When the number of PSUs was low and/or SSU costs were low, increasing clustering within PSUs can optimize sampling.
RÉSUMÉ
La modélisation de la répartition des espèces est importante pour prévoir les réactions des espèces face aux changements environnementaux, mais la précision des modèles peut être limitée par le manque de données dans les régions reculées. Les relevés stratifiés hiérarchiquement (échantillonnage en grappes) offrent une approche efficace d’échantillonnage dans les régions reculées, mais un équilibre est nécessaire entre la rentabilité et l’indépendance statistique. Le rapport coût-efficacité de l’échantillonnage par grappes varie probablement en fonction de l’intensité de l’échantillonnage dans le temps (p. ex. échantillons uniques ou multiples répétés) en raison des différences d’autocorrélation spatiale. Notre objectif était d’évaluer les compromis entre la réplication spatiale et la réplication temporelle et d’optimiser l’échantillonnage en cas de réplication temporelle. Nous avons rééchantillonné au moyen du bootstrap pour créer des plans alternatifs à partir de données de points d’écoute d’oiseaux multi-espèces et d’enregistreurs automatisés. Nous avons fait varier le nombre d’unités d’échantillonnage primaires (UEP), d’unités d’échantillonnage secondaires par UEP (UES) et de répétitions temporelles d’échantillons (c.-à-d. de visites) à chaque UES. Nous avons ajusté les courbes d’accumulation des espèces afin d’examiner l’effet de la réplication spatiale et de la réplication temporelle sur l’accumulation des espèces. Nous avons divisé les données en ensembles d’entraînement et de validation de modèles spatialement indépendants et ajusté des modèles de distribution des espèces (MDE) pour 47 espèces à l’aide de modèles linéaires généralisés. Nous avons examiné dans quelle mesure la précision de la prédiction changeait avec l’intensité de l’échantillonnage afin d’examiner les compromis coût-bénéfice entre la réplication spatiale et la réplication temporelle au sein des UEP. Nous avons constaté que les deux types de réplication étaient partiellement redondantes et que l’ajout de visites avait moins d’effet sur la précision de la prédiction lorsqu’il y avait plus d’UES et vice versa. Le rapport coût-bénéfice de l’augmentation de la réplication spatiale au sein des UEP a varié en fonction des coûts d’accès aux UES. Le nombre optimal d’UES par UEP a varié en fonction de la réplication temporelle et du nombre d’UEP uniques échantillonnées. En général, l’utilisation de ≤ 3 UES par UEP a permis de produire les prédictions de MDE les plus précises lorsque le nombre d’UEP était élevé. Lorsque le nombre d’UEP était faible ou que les coûts d’UES étaient bas, l’augmentation de l’échantillonnage par grappes au sein des UEP peut optimiser l’échantillonnage.
INTRODUCTION
Many recent species distribution modeling efforts have relied upon data from citizen science programs, such as the North American Breeding Bird Survey (Sauer et al. 2017), eBird (Sullivan et al. 2014), breeding bird atlases (Davidson et al. 2015, Mccabe et al. 2018, Artuso et al. 2019), and Journey North (Howard et al. 2010). Citizen science programs generate large quantities of data that have improved our understanding of factors influencing the distribution of various taxa. Nevertheless, the potential contribution of citizen science is limited in areas with low human population densities and poor accessibility (Amano et al. 2016). Furthermore, the data are likely to be non-representative because habitat and species occurrence generally differ between accessible (e.g., along roadsides or rivers) versus non-accessible areas (Geldmann et al. 2016, Johnston et al. 2020, Robinson et al. 2020). Statistical models incorporating species-environment relationships can potentially reduce bias associated with extrapolation to unsurveyed areas, but the reliability will depend on the ability to measure appropriate covariates, on whether differential habitat selection is modeled, and on whether species occupy all potentially suitable habitat (Weiser et al. 2020). Given these limitations, surveying geographic gaps and unsampled covariate space is important.
Gaps in coverage by citizen science efforts are most effectively filled using carefully designed expert surveys (Callaghan et al. 2019, Robinson et al. 2020). Given the high costs of expert surveys in remote areas, such surveys should optimize geographic coverage while minimizing costs (Van Wilgenburg et al. 2020). Hierarchical sampling designs are a cost-efficient sampling and allocation approach to optimize information content from surveys while minimizing costs (Pavlacky et al. 2017, Callaghan et al. 2019, Van Wilgenburg et al. 2020). Hierarchical study designs include the clustering of secondary sample units (SSU) within primary sampling units (PSU), which can greatly reduce costs associated with travel among survey locations (Johnson et al. 2009, Mccabe et al. 2018, Van Wilgenburg et al. 2020). However, the benefit of increasing sample size using cluster sampling must be weighed against the risk of increasing spatial autocorrelation (Foster et al. 2023).
Clustered sampling is often favored when the temporal sampling window is restricted and precludes long-distance travel among locations in addition to when the cost of traveling among SSUs is much lower than traveling among PSUs. For example, breeding songbirds are generally surveyed using point counts conducted between 30 minutes prior to sunrise until ~4.5 hours after sunrise during the main breeding season (Matsuoka et al. 2014). The narrow dawn chorus window often precludes visiting multiple PSUs during a single morning. In contrast, use of programmable autonomous recording units (ARUs; Shonfield and Bayne 2017) may allow greater flexibility in sampling. These technologies allow simultaneous sampling of multiple PSUs because users can deploy and retrieve hardware outside the restricted temporal sampling window. Furthermore, ARUs take repeated samples throughout the season, increasing temporal sample coverage for any given location (Shonfield and Bayne 2017, Metcalf et al. 2021). These characteristics can make ARUs an attractive alternative for sampling remote areas. However, the cost to benefit ratio of using clustered sampling with ARUs likely differs from human observers in the field. Although the implications of within-cluster spacing have been examined for spatially explicit capture-recapture data (Clark 2019, Efford and Boulanger 2019), there remains a need to determine optimal cluster sampling approaches for species distribution modeling.
The North American boreal forest is a largely remote biome covering approximately 552 million hectares (Brandt et al. 2013), which is home to over 300 species of birds (North American Bird Conservation Initiative 2016), many of which lack sufficient data for conservation management (Van Wilgenburg et al. 2020). A growing anthropogenic footprint (Pickell et al. 2015), climate change (Roy et al. 2019), and large apparent declines in the avifauna (Rosenberg et al. 2019) have increased concerns over boreal forest bird conservation. Despite this, there has been relatively sparse data collection owing to poor accessibility and low human population densities (Roy et al. 2019). As a result, historic efforts have been limited to the southern periphery of the biome (Van Wilgenburg et al. 2015, 2020, Roy et al. 2019) with few exceptions (Machtans et al. 2014). To address these gaps, recent efforts have implemented spatially balanced and hierarchically stratified sampling designs incorporating habitat stratification and access costs to improve efficiency (Van Wilgenburg et al. 2020). The design allows for either skilled observers doing traditional point counts or the use of ARUs (Shonfield and Bayne 2017, Roy et al. 2019). In general, data obtained from human observers and ARUs are comparable given identical sampling effort (Pankratz et al. 2017, Van Wilgenburg et al. 2017, Darras et al. 2018), but the efficiency of each method differs depending upon avian community composition and the number of species detected visually (Drake et al. 2021). Songbirds, nightjars, owls, and shorebirds in the boreal forest are primarily detected acoustically, making both methods suitable (Shonfield and Bayne 2017). Leaving programmable ARUs in situ for extended durations allows for temporal replication thereby improving detection of hard-to-detect species (MacKenzie and Royle 2005) and estimates of overall species richness and distribution (Metcalf et al. 2021). Theoretically, improved estimation of the species pool within a PSU should increase the number of species for which species distribution can be modeled accurately; however, the trade-offs between spatial and temporal replication needs study to ensure efficient use of resources.
Our goal is to determine how sampling design affects our ability to measure community composition and predict species distributions within a multi-species hierarchical sampling framework when temporal revisits are logistically feasible with minimal additional cost (i.e., using ARUs) versus when they are not (i.e., human point counts). We develop and demonstrate our approach using bootstrap resampling methods to simulate datasets under different sampling scenarios using empirical data collected using a combination of human point counts and ARUs from a large-scale hierarchical monitoring program in the boreal forests biome of Canada (Van Wilgenburg et al. 2020). We evaluate how spatial and temporal replication affects spatial autocorrelation and the ability to detect species present within a local community as well as predict occurrence of species in locations not used in model creation. We then develop and present an approach to combine these results with functions describing monetary program costs and logistical constraints to guide decisions about survey design. We use optimization of sampling for breeding bird atlases (Mccabe et al. 2018) as a motivating example and therefore demonstrate our approach by optimizing for the ability to predict species distributions on average (across species) and discuss how our methods can be adapted to alternative objective functions and scenarios.
METHODS
Data collection
We developed our model using data collected in a study area stretched over approximately 1750 km (Fig. 1) from a southeastern limit in Riding Mountain National Park, Manitoba (50° 40' 57.9” N, 99° 44' 57.9” W) and northwest to the vicinity of Wekweètì, Northwest Territories (64° 11’ 25” N, 114° 10' 58” W). Habitat composition within the study area is described in Appendix 1.
We sampled the avian community using one of two approaches. At 60% of SSUs, we used standard 10-minute point-count surveys during which trained observers recorded all birds heard and seen following protocols recommended by Matsuoka et al. (2014; full details in Appendix 1). We sampled the remaining 40% of the SSUs using ARUs, which were pre-programmed to record ≥ 6 10-minute intervals over the course of a dawn chorus during the same time-period as point-counts, with this schedule repeated over multiple dates (Appendix 1). Wherever possible, we attempted to leave ARUs in situ and record over ≥ four mornings of good survey conditions: however, we occasionally deployed ARUs for as little as a single morning. We processed six temporal replicates (k), which should have allowed us to achieve 95% confidence in species presence/absence for > 50% of species detected based on median number of temporal replicates to achieve 95% confidence in species presence/absence needed across 118 species (excluding domestic animals) from estimates of k reported in Bayne et al. (2017). We manually processed recordings by tagging cues within the WildTrax platform for processing environmental sensor data (https://www.wildtrax.ca/home). Expert listeners estimated counts of individuals using a combination of stereo effect, signal strength, timing of counter-singing events, and occasionally individual variation in song spectral characteristics. Previous work has shown that abundance estimates from ARUs correlate strongly with abundance estimates from point-counts (Van Wilgenburg et al. 2017); however, experimental evidence suggests that abundance estimates tend to be underestimated when birds are very (≥ circa 7 individuals) abundant (Drake et al. 2016). Prior to analyses, we standardized effort by taking counts exclusively from the first three minutes of each recording/point-count. Further details of deployment strategies and ARU data processing are in Appendix 1.
We selected survey locations within our study area using a spatially balanced hierarchical sampling design (Van Wilgenburg et al. 2020) consisting of a sample frame of 5-km wide hexagons representing the primary sampling units (PSUs), within which grids of secondary sampling units (SSUs, or point-count sampling stations) were nested (Fig 1; see also Van Wilgenburg et al. 2020). Further details on study site selection are in Appendix 1. We conducted sampling in Saskatchewan and Manitoba during the summers of 2014–2019 and in the Northwest Territories in 2019. We sampled 3251 SSUs distributed among 325 PSUs (Fig. 1) across Boreal Taiga Plains (123 PSUs; 1050 SSUs), Taiga Shield and Hudson Plains (88 PSUs; 898 SSUs), and Boreal Softwood Shield (123 PSUs; 1243 SSUs) ecozones; note that the number of PSUs tallied across ecozones exceeds the total number of PSUs owing to one PSU containing SSUs in two ecozones. The ARUs were used at a total of 166 PSUs across the Boreal Taiga Plains (82 PSUs; 686 SSUs), Taiga Shield and Hudson Plains (10 PSUs; 123 SSUs), and Boreal Softwood Shield (74 PSUs; 509 SSUs) ecozones. We sampled the remaining PSUs with human observers. Data are available in Appendix 3 (3.1).
Study design components
We used a subset of 80 PSUs sampled with ARUs that had a minimum of four visits to each SSU (some ARUs failed prior to collecting six temporal replicates) and a minimum of six SSUs sampled. We used these data to examine the following three parameters of the spatial and temporal sampling design (range of values considered in resampling simulations):
- The number of primary sampling units (20, 30, 40, 50, 60, and 70) across the study area;
- The number of secondary sampling units sampled within each PSU (1–6 randomly selected from among 6–18 available);
- The number of temporal replicate observations for each SSU (1–4 3-minute visits within a single breeding season; for more details see Appendix 1).
Spatial autocorrelation
Our first goal was to determine how increasing temporal replication affected spatial autocorrelation. We used a subset of 86 PSUs sampled with ARUs in which at least 4 temporal revisits were completed for subsequent estimation of spatial autocorrelation. Data from all available SSUs (n = 922) within this subset of PSUs were included in subsequent analyses. We took 500 bootstrap resampled visits for each SSU and calculated mean abundance across 1 vs. 4 temporal samples at the SSU for each species. We fit non-parametric spline correlograms (Bjørnstad and Falck 2001) to the mean abundance estimates to estimate spatial covariance and the correlation length (distance over which counts are no more similar than expected by random chance across the study area) in each of the bootstrap samples. We estimated spatial covariance values for distances of 300, 420, and 600 m to evaluate logistically plausible alternative SSU spacing that could be implemented in the field.
Species accumulation
Our second goal was to determine how within-PSU spatial and temporal replication affected the relative proportion of the total species pool detected. Prior to analysis, we limited the species to a subset of 104 species comprised of passerines, shorebirds, and woodpeckers (Appendix 2, Table A2.1), for which point count and acoustic recording methods are appropriate survey techniques. We excluded gulls, raptors, waterfowl, rails, and passerines commonly detected as flyovers. Within our resampling procedures however, we defined the total species pool as the subset of species detected within the PSUs used as our training data, where we split data into training (85%) and test (15%) data sets within each bootstrap iteration. We used the training data for our SDMs to define the species pool to facilitate automating both the species accumulation and SDM analyses within the same bootstrap resampling exercise. For each sampling scenario (number of PSUs, SSUs, and visits), within each bootstrap sampling iteration, we recorded the number of species detected and compared this with the total number of species in the full training dataset to evaluate how species accumulation is affected by sampling design. We used species accumulation curves and 95% confidence intervals to compare between scenarios.
Predicting species distributions
Our third objective was to determine how spatial and temporal replication affected our ability to predict the occurrence of species in withheld PSUs. We again used the sub-sampling strategy previously defined to replicate data sets collected using different alternative spatial and temporal sampling intensities and withheld 15% of the PSUs as spatially independent testing data to evaluate the predictive performance of distribution models fit to the training data. The spatial independence of the testing data produces a more accurate and conservative assessment of model performance (Ploton et al. 2020).
We generated predictions of species distributions using a simple and standardized modeling framework. The response variable for models was the observed count during a single three-min ARU recording at an SSU. We fit generalized linear models (GLM) to estimate relationships to environmental covariates with a log link and Poisson error distribution (Guisan et al. 2002). We fit models using main effects only (Appendix 3; 3.2 and 3.3) and did not include parameters adjusting for spatial or temporal autocorrelation in our models because our spatially balanced and nested bootstrapping approach was designed to propagate any autocorrelation to our results, i.e., measures of out-of-sample predictive performance, to reflect the potential reduction in statistical efficiency related to clustering. Specifically, because out-of-sample predictive accuracy was calculated on entire PSUs, spatially independent of the training data, any changes in prediction accuracy between designs associated with varying within PSU sampling intensity would be due to spatial or temporal replication and not due to added predictive power from autocorrelation structures specified in the models. In addition, it would not be possible to fit models including spatial covariance for design combinations with little to no spatial replication within the PSU, thereby making a common model structure without accounting for autocorrelation useful.
We included five environmental covariates in all models to reflect both biotic and abiotic factors influencing distributions and minimizing covariation among predictors based on previous work (Stralberg et al. 2015, 2018). Using the “envirem” package (Title and Bemmels 2018), we calculated mean annual temperature (°C), climate moisture index (mean annual precipitation/potential evapotranspiration), and terrain roughness index, which indexes changes in slope (Wilson et al. 2007) as climatic covariates. In addition, we represented local habitat for each SSU using (1) proportion of needle-leaved trees and (2) stand canopy closure (at 250 m resolution) from Beaudoin et al. (2014).
We accounted for detection probability in our models by including an offset term in our GLMs to correct for factors previously determined to affect detection rates during point count surveys. Survey specific (date, time, and location) offsets were estimated in an a priori analysis using the “QPAD” approach to fitting time-removal and distance estimation models described by (Sólymos et al. 2013). The QPAD estimated offsets allowed us to correct estimates for differential detection related to date, time of day, percent tree cover, and land cover using QPAD version 0.0-3 (Sólymos 2016) in R (R Core Team 2022). We fit species distribution models for all species appropriately surveyed using point counts detected on ≥ 20 PSUs and ≥ 50 total individuals detected (Appendix 2, Table A2.1). Models consistently failed to converge for Connecticut warbler (Oporornis agilis), and we therefore excluded it from all reported SDM results. In total, we fit distribution models to 3,384,000 simulated datasets, i.e., 47 species, 6 combinations of number of PSUs, 6 combinations of number of SSUs, 4 combinations of temporal revisits, and 500 iterations each.
We measured predictive performance based on the ability of the model to accurately predict probability of species occurrence in withheld PSUs. We derived predicted probability of occurrence from our count models, i.e., the predicted probability of obtaining a non-zero count at a location, 0, which relates to the predicted count as 1-e-λ. We used the Brier Score (Σ[predicted probability - observed outcome]²)/n; Brier 1950) and mean squared error (MSE) to assess model fit. We calculated Brier scores and MSE using all observations in the validation data set. We only present Brier scores because both metrics produced similar results, and we selected Brier scores because they are strictly proper scores in that they are minimized in expectation by the true probability (Gneiting and Raftery 2012).
Measuring benefits of sampling design changes
To compare predictive accuracy (i.e., Brier score) across the full range of potential survey designs in the 3,384,000 simulated datasets, we fit a statistical model with log Brier score as the response variable, and number of PSUs, SSUs, and visits as covariates. The model was fit using a generalized additive model (GAM; Wood 2017) with an interaction between covariates as a tensor smoothed function with four bases, and a random effect for the bootstrap iteration (Appendix 3; 3.3). The GAMs were fit using the “mgcv” package (Wood 2017) in R (R Core Team 2022). We then used the fitted model to: (1) evaluate the probability that a specific baseline survey scenario would provide more accurate predictive ability than a competing scenario and (2) to find the optimal design choices under budgetary constraints.
To isolate the impact of increasing within PSU spatial versus temporal replication from increasing the number of PSUs, we used simulations to test four competing scenarios in which we held the number of PSUs constant at 40 and altered the number of SSUs and visits. Specifically, we created a baseline scenario in which we assumed a single SSU would be sampled per PSU with six temporal visits, and we compared this baseline to three alternative scenarios. The first alternative scenario included six SSUs per PSU, but a single temporal sample per SSU to test whether directly swapping spatial for temporal sampling effort provided improved predictive accuracy. Barring equipment failures, we anticipated that repeat temporal samples would be analyzed whenever an ARU was used and therefore also created 2 scenarios holding the number of PSUs constant at 40 and the number of visits constant at 6 but included 3 versus 6 SSUs in the final alternative scenarios. We simulated 5000 Brier score values for each scenario, where values were simulated as random normal values in which the mean of the distribution derived from the GAM predicted values for a given scenario, and the variation was introduced using the residual standard deviation from the GAM model. We also added an additional random normal error term with a mean of 0, where the standard deviation of the values was derived from the standard error of the predicted values. We used the proportion of simulations in which the alternative scenario had a lower predicted Brier score than the baseline scenario as an estimate of the probability that the alternative scenario was an improvement in the sampling design.
Optimizing sampling designs
We considered two logistical approaches to field work in boreal Canada. In both cases, we considered an annual budget of $70,000 for 5 years, i.e., a common sampling window for periodic citizen science projects like breeding bird atlases (Mccabe et al. 2018), which constrains the total numbers of surveys that can be conducted. All monetary values are given in Canadian currency.
First, we evaluated a monitoring program deploying ARUs that can cheaply collect repeated surveys at a location, but which are expensive to deploy and retrieve. In this case, the program is interested in determining the optimal number of PSUs and SSUs at which to deploy ARUs, where each deployed ARU would collect up to 6 temporal replicates, i.e., 10-minute recordings. In this scenario, we assumed PSUs are accessed by chartered helicopter at an average cost of $2000 per PSU and each SSU cost $1000 to establish. Our SSU costs included ARU costs averaged over an anticipated 10-year lifespan and replacement of 4% destroyed in 10 years ($165), SD cards ($140), microphone replacement ($27), batteries ($27), ARU maintenance including parts and labor ($91), and within PSU helicopter costs ($550 assuming each SSU is accessed by helicopter with staff walking ≤ 100 m to deploy ARUs based on 15 minutes @$2200 hr-1). We assumed each 3-minute temporal replicate per SSU would cost $36 for processing and quality control by expert listeners.
Second, we considered monitoring conducted using expert birders (human survey scenarios). In this situation, birders access each PSU for a single day and collect a single 10-minute survey at each SSU per common protocols (Matsuoka et al. 2014), with a team of 2 conducting up to 18 SSUs within each PSU. In this case, the program is interested in the optimal number of PSUs and SSUs human observers should access. We assumed PSUs must be accessed by chartered helicopter at an average cost of $1000 per PSU (half the cost of ARU surveys owing to not requiring a second visit), but that SSU costs were negligible because staff would walk between SSUs rather than being ferried by helicopter and because staff salaries are separate from operational budgets. We replaced transcription costs with costs for data entry and quality assurance/quality control ($10 SSU-1).
We used our GAM regression to estimate prediction accuracy (Brier scores) under the range of PSU, SSU, and visits to identify the combination that provided the greatest prediction accuracy given a fixed budget for our scenarios (Appendix 3; 3.4). We also used variance components from the fitted GAM to calculate uncertainty in predicted Brier scores, allowing us to evaluate whether cheaper survey scenarios could plausibly yield similar predictive accuracy to the optimal scenario, i.e., falling within the 95% credible interval.
For our ARU scenarios, we generated predicted Brier scores and estimated the program cost for every combination of number of PSUs (20–80), number of SSUs (1–9), and number of visits (1–6). We selected the optimal scenario for a fixed budget by first excluding all design scenarios in which the total estimated costs exceeded our total budget and subsequently selected the design with the minimum predicted Brier score (i.e. greatest accuracy). In addition, we used the optimization algorithm to examine how the optimal number of SSUs within PSUs was influenced by the interaction of number of PSUs and number of temporal replicates by examining each combination of 30 versus 60 PSUs and 1 versus 6 temporal replicates. We used 5000 simulations from our GAM model to propagate error to construct 95% credible intervals around estimated log(Brier score) to evaluate the statistical comparability of competing scenarios (Appendix 3; 3.4).
Finally, we examined optimization for surveys conducted by expert birders by replicating the optimization with an altered cost model to reflect logistical differences but assumed the same overall budget ($70,000 yr-1 for 5 years). We ran 500 bootstraps in fitting the same SDMs (with the addition of a factor for survey type, i.e., human vs. ARU) to the 47 species used in the ARU scenarios, but only used a single visit to each SSU. More PSUs were sampled with single-visit surveys, so we examined 9 combinations of number of PSUs (n = 20 - 180 in increments of 20) and 8 combinations number of SSUs (n = 1 - 15 in increments of 2). We fit a GAM regression model to the bootstrap estimates to allow us to compare predictive accuracy (i.e., Brier score) among a range of survey designs not explicitly considered in our empirical bootstrapping. Specifically, we fit GAM with log(Brier score) as the response variable, a tensor smoothed function with four bases to the interaction between number of PSUs and number of SSUs, and a random effect for the bootstrap iteration. In simulating from our GAM model, we examined the predictive accuracy for all design combinations of number of SSUs considered (1–8), and number of PSUs (20–180), but restricted the number of visits to 1 per our expert birder logistics (Appendix 3; 3.5). We again propagated error around the predicted log(Brier score) for the survey and calculated the percent reduction in Brier scores that would be achieved by switching from the optimal ARU scenario to a human survey scenario, and we report the cost differences between the two designs.
RESULTS
Spatial autocorrelation
Increasing the number of temporal replicates from 1 to 4 resulted in a median increase in correlation length of 51 m; however, there was significant interspecific variation (inter-quartile range [IQR] = 16–390 m; range = 0–1943 m; Appendix 4, Fig. A4.1). Median correlation length with a single temporal replicate was 527 m (IQR = 202–851 m, range = 114–2340 m) compared to 740 m (IQR = 379–948 m, range = 137–2942 m) when SSUs had 4 temporal revisits. Correlation between counts at adjacent SSUs (300 m apart; Appendix 4, Fig. A4.2) was generally weak (median = 0.02; range = -0.03–0.16) with a single temporal replicate, and weak to moderate with 4 temporal replicates (median = 0.05; range = -0.04–0.28). Increasing from 1 to 4 temporal replicates increased correlation values between adjacent SSUs (absolute change in spatial correlation given; median = 0.02; IQR = 0.01–0.05, range = -0.02–0.14) when 300 m apart and the effect was weaker when SSUs separated by 420 m (median = 0.03; IQR = 0.01–0.05, range = -0.01–0.10) or 600 m (median = 0.02; IQR = 0.01–0.06, range = -0.02–0.14).
Species accumulation
Increasing the number of secondary samples and the number of (three minute) temporal replicates both substantially increased the proportion of the bird community observed, however the influence of spatial replication was greater than that of temporal replication (Fig. 2). Only 52% (95% CI: 36–68%) of the bird community, within the training data species pool, was detected with 1 SSU/1 visit, while 82% (95% CI: 73–90%) was detected with 6 SSUs/4 visits. Increasing the spatial replication from 1 to 6 SSUs resulted in ~21% and 14% more species detected on average for 1 versus 4 temporal replicates, respectively (Fig. 2). Increasing temporal replication at SSUs had marginally less pronounced effects on species accumulation curves than increasing spatial replication within PSUs. For example, sampling 1 SSU with 4 visits detected 68% of species on average (95% CI: 55–80%) compared to 69% when sampling 4 SSUs with 1 visit (95% CI: 57–81%). On average, increasing temporal replication from 1 visit to 4 visits resulted in 16% more species detected when 1 SSU was sampled per PSU and by 9% when 6 SSUs were sampled per PSU with 4 visits (Fig. 2). With 4 temporal replicates, increasing spatial replication from 1 SSU to 4 SSUs increased the proportion of the community detected from 52% (95% CI: 36–68%) to 68% (95% CI: 55–80%) for 1 vs. 4 SSUs, respectively (Fig. 2). Sampling 6 SSUs with 1 temporal replicate each detected 73% (95% CI: 62–83%) of the community compared to 82% (95% CI: 74–90%) when the community was sampled with 6 SSUs per PSU with 4 temporal replicates each.
Predicting species distributions
All else being equal, increasing the number of PSUs had a larger effect on predictive accuracy than changing the number of SSUs or temporal replicates (Fig. 3). For example, doubling sampling intensity from 20 to 40 PSUs, while using a single visit improved the Brier score by 36% (95% CI: 26–40) compared to only 27% (95% CI: 18–29) when doubling sampling intensity by using 2 SSUs at 20 PSUs. We found that prediction accuracy substantially increased with increasing numbers of SSUs, but the effect of increasing SSUs interacted with increasing temporal replication, which also increased prediction accuracy (Figs. 4, 5). The magnitude of the increase in predictive accuracy (reduced Brier score) varied depending on the initial number of SSUs per PSU and the number of temporal replicates considered (Fig. 5). For example, if a PSU were sampled using only a single SSU, adding another SSU (i.e., having 2 SSUs) would reduce the Brier score by 9.3% on average with a single visit, but by 8.1% on average if both SSUs had 4 temporal replicates (Fig. 5). In contrast, adding an additional SSU would only reduce Brier scores by 1.5% if PSUs were initially sampled with 5 single-visit SSUs versus 0.7% if the initial sample was based on 5 SSUs each with 4 temporal replicates (Fig. 5). The effect of additional temporal replication also improved prediction accuracy, but effect size was not as great and was more variable than the effect of adding SSUs (Fig. 6). When a PSU was initially sampled using a single visit, the addition of another visit reduced the Brier score by 6.8% on average when a single SSU was sampled per PSU versus 3.2% on average when there were 6 SSUs (Fig. 6). Increasing temporal replication had less effect when there were already 3 temporal replicates, with an additional visit reducing Brier scores by 2.4% and 0.8%, respectively, for PSUs containing 1 versus 6 SSUs, respectively (Fig. 6).
A generalized additive model fit to the bootstrap estimates of mean log(Brier score) explained 91% of the deviance in the bootstrap estimates (Fig. 3), suggesting it could accurately predict Brier scores for scenarios we did not explicitly consider in our bootstrap exercise (see Appendix 2, Table A2.2 for parameter estimates). Predicted mean log(Brier scores) for scenarios swapping spatial and temporal replication suggested that having 6 SSUs each sampled once would provide better predictive ability than having each PSU sampled with a single SSU sampled once, on average 64.5% of the time. When holding the number of temporal replicates constant at 6 visits, out-of-sample prediction accuracy was better 95.6% of the time using 6 SSUs as opposed to a single SSU per PSU. Similarly, using 3 SSUs per PSU provided better predictive accuracy compared to using a single SSU 95.0% of the time when both had 6 temporal replicates.
Optimizing sampling designs
When comparing a suite of alternative ARU scenarios, the study design with the highest predictive accuracy not exceeding our total budget was achieved using 75 PSUs with 3 SSUs each, each with 6 temporal replicates. This design had a predicted cost of $348,600. Using 6 temporal replicates increased costs by $40,500 but resulted in a 14.5% improvement in Brier scores compared to single visit sampling. However, the optimal number of SSUs varied with the number of PSUs and temporal replicates. Specifically, if 30 PSUs were sampled, the optimal number of SSUs was 9 when using a single temporal replicate, but 8 when using 6 temporal replicates. In contrast, the optimal number of SSUs was lower when holding the number of PSUs constant at 60. When 60 PSUs were sampled, the optimal number of SSUs was 4 using a single temporal replicate versus 3 SSUs with 6 temporal visits each. Furthermore, the 3 SSU and 6 temporal replicate design cost $29,760 less and had 13.5% (95% CI: -32.4–9.7%) lower Brier scores on average than the 4 SSU single temporal replicate alternative.
The GAM fit to the bootstrap estimates of mean log(Brier score) explained 94% of the deviance in the bootstrap estimates from single-visit surveys, suggesting we could predict over a wider range of scenarios than those in the empirical bootstraps (see Appendix 2, Table A2.2 for parameter estimates). Applying predictions from this GAM in a suite of alternative human survey design scenarios suggested the highest predictive accuracy for a within-budget design was achieved with 88 PSUs and 18 SSUs per PSU, at a total cost of $103,840. Thus, if repeat visit surveys were not required or feasible, switching to human surveys would save $244,760 compared to the optimal design for ARU surveys and result in Brier scores that were 6.6% lower on average (95% CI: -27.2–18.6%).
DISCUSSION
We have provided a workflow to gain insights into hierarchical sampling designs for measuring species richness and distributions. We base our results on inferences from real-world data and corresponding real-world variation rather than parametric simulations and on the actual ability of analyses to accurately predict observations outside the sample based on cross-validation methods. We demonstrated that increasing temporal replication increased the spatial autocorrelation between adjacent SSUs, which may increase the redundancy of spatial replication. We showed the contrasting value of including more PSUs versus increasing SSUs and temporal replicates. The relative value of increasing PSU versus SSUs and temporal replicates is driven by a combination of the sample size and relative costs of implementing changes in each.
We found that if it is relatively cheap to collect additional temporal replicates at each location (e.g., with ARUs) and relatively expensive to establish new SSUs within each PSU, e.g., using helicopter to travel within a PSU, then it is preferable to spend limited resources surveying more PSUs with fewer SSUs in each of them while increasing temporal replicates. Increasing temporal replication when it is comparatively inexpensive also increases the statistical and cost efficiency for monitoring population trends (MacKenzie and Royle 2005, Bayne et al. 2017), and would therefore be a beneficial design consideration. In contrast, if conducting repeated surveys is expensive or infeasible, it is better to conduct a larger number of spatially clustered surveys within each PSU despite some minor increases in spatial autocorrelation. Although this is generally unsurprising because cluster sampling is known to be efficient when costs are high (Cochran 1977, Foster et al. 2023), our results highlight the budget and logistical trade-offs encountered for avian surveys in remote locations.
Similar to Wood et al. (2021), we found we could increase the proportion of the avian community detected via increasing either spatial or temporal replication. Generally, our analyses suggest that increasing the number of SSUs had a greater influence on species accumulation than did temporal replication; but adding more visits had less influence when there were more SSUs and vice versa. Our results contrast somewhat with those of Wood et al. (2021) who found in simulations that survey coverage (spatial replication) was less influential than temporal replication; however, their empirical results suggested that spatial replication was more important in data from a large study area (~6000 km²) where β-diversity was higher than in data from a small (~0.9 km²) study area with lower β-diversity. Our study area covers ~600,000 km² and therefore crosses many range boundaries and has high habitat heterogeneity contributing to β-diversity. These contrasting results highlight that consideration should be given to how species turnover could affect optimal allocation to spatial versus temporal sampling; though approaches applied here or in Wood et al. (2021) should be able to directly inform those decisions with pilot data.
Although increasing spatial and temporal replication likely improved species accumulation via differing mechanisms, i.e., adding more samples in different habitats versus reducing false negatives in habitat that should otherwise be occupied, respectively, they both result in more locations with confirmed presences that improve inference. Given that much of the data used in species distribution modeling relates to the presence or absence of species, the increased number of species detections owing to additional SSUs within a PSU may be an efficient design consideration for the rarest species not modeled here. Our species distribution modeling provided similar conclusions and simulating from our model predicted that a direct swap of spatial (6 SSUs and 1 visit) and temporal (1 SSU with 6 visits) replicates would result in the spatially intensive sampling being favored, on average, 64.5% of the time. Realistically, once an automated device such as an ARU is in situ, the costs of additional visits are comparatively trivial, and repeat visits could be maximized when funds for additional processing are available, consistent with recommendations by MacKenzie and Royle (2005). Thus, when considering only scenarios in which 6 repeat visits were used, we found that prediction accuracy was greater when 3 or 6 SSUs were sampled (> 95% of the time on average) than using a single SSU; however, there were diminishing returns as spatial replication increased.
The occupancy modeling (Whittington et al. 2014) and population temporal trend estimation (Rhodes and Jonzén 2011) literature both suggest that when data are spatially independent, it is preferable to maximize spatial replication, whereas sampling fewer sites more frequently is preferable when spatial correlation is high. Our results appear generally consistent with these recommendations because we found weak to moderate spatial correlation, and spatial replication was somewhat favored over temporally replication; however, the trade-off between spatial and temporal replication was strongly influenced by the sampling costs. Given a budgetary constraint, the optimal design for ARU sampling maximized the number of temporal replicates (n = 6) and nearly maximized the number of PSUs (n = 75 compared to maximum possible of 80) but included an intermediate level of SSU replication (n = 3). It is unsurprising that our optimization suggested maximizing temporal replication due to the low cost of additional samples ($36 visit-1), most likely due to the increased probability of species being detected at survey locations (MacKenzie and Royle 2005, Bailey et al. 2007, Guillera-Arroita et al. 2017). We generally used temporal replicates collected over a 7–10 day period to increase the probability of population closure while reducing temporal autocorrelation following recommendations of Bayne et al. (2017); however, using temporal replicates within a single day or across an entire season could alter the cost-benefit ratio because temporal autocorrelation, detection rates, and the number of revisits needed to achieve 95% probability of correctly determining species presence vs. absence varies with temporal design (Bayne et al. 2017). In contrast, the scenarios considered for human observers achieved slightly better prediction accuracy at lower costs despite only a single temporal replicate due to increased spatial replication (PSUs = 88 and SSUs = 18). Because human observers also record species for which visual detections are important, multi-species surveys may benefit from assigning subsets of PSUs to sampling using humans versus ARUs and optimizing the fraction of each would warrant future investigation.
Short-term (fire, forest harvest, insects, wind, etc.) and long-term (glacial) disturbance history may play a role in determining the distance over which species counts are correlated. For example, variation in fire return intervals and fire intensity contribute to spatial heterogeneity in forest age classes (Coogan et al. 2021), which could alter the distance over which counts are correlated within a PSU. Additional species accumulation analyses provide some support for this contention because our ability to represent the avian community with fewer spatial replicates was greater in the ecozone with lowest habitat diversity (Taiga Shield) than ecozones further south where habitat diversity is higher. Within PSU habitat, heterogeneity (landcover richness) had less influence on our ability to quantify the avian community with a given level of spatial replication (Appendix 5, Fig. A5.1). Future application of our framework within and between ecologically meaningful strata (e.g., ecozones) may be a parsimonious approach to further refine our cost-benefit analyses.
Although our framework allows design optimization given budgetary constraints, several issues bear consideration. First, our approach optimizes the effort given the total budget available but does not guarantee effort is sufficient to achieve the desired level of precision for individual species distribution models. We recommend effort be put into applying closed-form equations to predict sample size requirements to achieve a desired level of power to detect set levels of difference in species occupancy (Guillera-Arroita et al. 2017), simulation studies (Bailey et al. 2007), or pilot field work combined with the aforementioned tools to determine expected precision under various levels of achievable effort required. Furthermore, our framework and associated R scripts optimize for omnibus multi-species surveys and optimal design (number of PSUs, SSUs, and temporal replicates) will likely vary significantly among individual species. We therefore recommend that our tools be combined with pilot sampling or initial modeling using historic data (e.g., Mccabe et al. 2018) to examine potential precision for individual species distribution models. Our scripts could optimize on Brier scores (or another objective function) from a single species rather than averaging across the results for multiple species. Finally, the achieved accuracy of any sampling design will also be a function of how well covariate space is represented (Morán-Ordóñez et al. 2017, Henckel et al. 2020, Johnston et al. 2020). Probability-based designs to select sampling locations can help to optimize the accuracy of species distribution models (Mccabe et al. 2018, Van Wilgenburg et al. 2020, Weiser et al. 2020).
Other metrics exist for measuring species distribution model accuracy and our framework could readily be adapted to these measures. For example, users may be more interested in the ability of sampling and models to discriminate between occupied versus unoccupied sites at a local scale. In such cases, tracking the sensitivity, i.e., the probability that the model correctly predicts species presence, specificity (the probability that the model correctly predicts species absence), and measures of users’ accuracy (i.e., how reliable the models are for out-of-sample prediction) such as positive and negative predictive value may also be useful (Liu et al. 2011). In addition, optimizing the coefficient of variation of density estimates may also provide a useful metric because SDMs are increasingly used to assess population size (Sólymos et al. 2020). Optimizing accuracy and precision of species temporal trend estimation would also be another key metric worth exploring. We suggest that our framework could be adapted to maximize or minimize on these other metrics as is most appropriate for the question at hand. Indeed, because many agencies are tasked with tackling multiple and sometimes competing objectives, extending our framework to a multi-criteria optimization (Kaim et al. 2018) would be fruitful for further refinements.
Although we have attempted to optimize between spatial versus temporal sampling, the rapid advancement of automated recognition of signals from acoustic data (Shonfield and Bayne 2017, Kahl et al. 2021) from long-duration deployments may, upon first consideration, call into question the value of optimizing temporal replication. We are interested in optimizing temporal subsets of acoustic data to process when the data are being combined in count regression models with human observer data, therefore requiring measures of effort to standardize the data and predict species densities (Sólymos et al. 2013, Van Wilgenburg et al. 2017). These models generally assume population closure (Fogarty and Fleishman 2021), an assumption that is more likely to be violated the longer the duration represented in the data. In addition, efficient access to study areas in remote regions often dictates that not all deployments will be long term, e.g., we deployed some ARUs for less than 24 hours while using canoe access. Therefore, until new statistical approaches are available to facilitate combining long-duration ARU or camera-trap data from automated recognizers with single-visit observations from citizen and professional scientists, the need to optimize temporal replicates will remain for applications such as ours.
Species distribution modeling has become increasingly important for assessing species status and trend monitoring (Sólymos et al. 2020, Weiser et al. 2020) and predicting the outcomes of future environmental change (Porfirio et al. 2014, Micheletti et al. 2021). Given the importance of SDMs to conservation, it is crucial that predictions be as accurate as possible. With limited resources dedicated toward field surveys, this will only occur if sample designs contribute to substantial improvements in model accuracy. We have provided an approach (and associated tools) to optimize sampling for multi-species surveys given the constraints of a fixed budget. Our tools can be customized to reflect differences in sampling scenarios, logistics, and budgetary constraints. Our models did not consider variation in field staff time because we had a fixed number of staff available each season, but this could be an important consideration that could be customized by altering the cost models in our scripts. Furthermore, our approach could optimize targets other than the average improvement in model accuracy across species. Optimizing sampling for the hardest to detect species, species of greatest conservation concern, or using multi-criteria optimization to optimize sampling design across subsets of species based on combinations of rarity and detectability may also be fruitful. We encourage the customization of our tools to consider alternative designs, cost models, and logistical scenarios to improve sampling for species distribution modeling. Careful implementation of probabilistic sampling and design optimization would significantly reduce uncertainties in conservation efforts informed by species distribution models.
RESPONSES TO THIS ARTICLE
Responses to this article are invited. If accepted for publication, your response will be hyperlinked to the article. To submit a response, follow this link. To read responses already accepted, follow this link.
AUTHOR CONTRIBUTIONS
All authors contributed to writing and editing. Study design and conceptualization: SVW, DTI, SH, CMF, DDH, JDT. Field work, logistics, funding, and planning SVW and KLD. Data preparation, analysis design and implementation, and writing code DTI, DAWM, SVW.
ACKNOWLEDGMENTS
This work was made possible through operating grants from Environment and Climate Change Canada. In addition, financial support for Birds Canada and the Saskatchewan Breeding Bird Atlas was provided by the Saskatchewan Ministry of Environment Fish and Wildlife Development Fund, Wildlife Habitat Canada, SaskPower, Orano Canada, Weyerhaeuser, the Government of Saskatchewan, Tolko Industries, SaskEnergy, K+S Potash Canada, Mosaic, the McLean Foundation, Mountain Equipment Coop, the Baillie Fund, TD Friends of the Environment Foundation, Saskatchewan Mining Association, Nature Regina, Pristine Gemstone Jewelry, Saskatoon Community Foundation, and Saskatoon Nature Society. We thank Rich Russell for discussions that contributed to the direction of this work. This work would not have been possible without many field contributions including C. Chutter, B. Obermayer, E. Cumming, L. Latremouille, K. Kardynal, M. D. Frey, L. Stewart, M. Godin, J. Biro, E. Sherwood, J. Pocha, S. Hnytka, T. J. Kocken, P. Des Brisay, D. Evans, A. Roberto-Charron, C. Moser-Purdy, S. Catholique, D. Poole, A. Mills, I. Cook, M. Ferguson, C. Evans, J. J. MacKenzie, L. Cardenas-Ortiz, J. E. Karst, T. Carpenter, S. Musgrave, S. Yeomans, K. Courtney, P. Davidson, S. Menu, and J. McManus. We also thank Weyerhaeuser Timberlands, Cameco, L&M Wood Products, Tolko Industries, Orano Canada, SaskPower, Mistik Management, Scott Lake Lodge, Churchill River Canoe Outfitters, Eb’s Source for Adventure, Harvest Foodworks, Łutsel K’e Dene First Nations, Wek’éezhìi Renewable Resources Board, Tłı̨chǫ Government, North Slave Métis Alliance, and the Department of National Defence for in-kind support. We would like the thank the Editor, Subject Editor and two anonymous reviewers for comments that improved this manuscript.
DATA AVAILABILITY
All data and code are available through FigShare links provided in the appendices.
LITERATURE CITED
Amano, T., J. D. L. Lamming, and W. J. Sutherland. 2016. Spatial gaps in global biodiversity information and the role of citizen science. BioScience 66:393–400. https://doi.org/10.1093/biosci/biw022
Artuso, C., A. R. Couturier, K. D. De Smet, R. F. Koes, D. Lepage, J. McCracken, R. D. Mooi, and P. Taylor. 2019. The atlas of the breeding birds of Manitoba, 2010-2014. Bird Studies Canada, Winnipeg, Manitoba, Canada. https://www.birdatlas.mb.ca/index_en.jsp
Bailey, L. L., J. E. Hines, J. D. Nichols, and D. I. MacKenzie. 2007. Sampling design trade-offs in occupancy studies with imperfect detection: examples and software. Ecological Applications 17:281–290. https://doi.org/10.1890/1051-0761(2007)017[0281:SDTIOS]2.0.CO;2
Bayne, E. M., M. Knaggs, and P. Sólymos. 2017. How to most effectively use autonomous recording units when data are processed by human listeners. Alberta Biodiversity Monitoring Institute, Edmonton, Alberta, Canada. http://bioacoustic.abmi.ca/wp-content/uploads/2017/08/HowtoARU3-April18.pdf
Beaudoin, A., P. Y. Bernier, L. Guindon, P. Villemaire, X. J. Guo, G. Stinson, T. Bergeron, S. Magnussen, and R. J. Hall. 2014. Mapping attributes of Canada’s forests at moderate resolution through kNN and MODIS imagery. Canadian Journal of Forest Research 44:5215–32. https://doi.org/10.1139/cjfr-2013-0401
Bjørnstad, O. N., and W. Falck. 2001. Nonparametric spatial covariance functions: estimation and testing. Environmental and Ecological Statistics 8:53–70. https://doi.org/10.1023/A:1009601932481
Brandt, J. P., M. D. Flannigan, D. G. Maynard, I. D. Thompson, and W. J. A. Volney. 2013. An introduction to Canada’s boreal zone: ecosystem processes, health, sustainability, and environmental issues. Environmental Reviews 21:207–226. https://doi.org/10.1139/er-2013-0040
Brier, G. W. 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review 78:1–3.
Callaghan, C. T., J. J. L. Rowley, W. K. Cornwell, A. G. B. Poore, and R. E. Major. 2019. Improving big citizen science data: moving beyond haphazard sampling. PLoS Biology 17:e3000357. https://doi.org/10.1371/journal.pbio.3000357
Clark, J. D. 2019. Comparing clustered sampling designs for spatially explicit estimation of population density. Population Ecology 61:93–101. https://doi.org/10.1002/1438-390X.1011
Cochran, W. G. 1977. Sampling techniques. Third edition. John Wiley and Sons, New York, New York, USA.
Coogan, S. C. P., L. D. Daniels, D. Boychuk, P. J. Burton, M. D. Flannigan, S. Gauthier, V. Kafka, J. S. Park, and B. M. Wotton. 2021. Fifty years of wildland fire science in Canada. Canadian Journal of Forest Research 51:283–302. https://doi.org/10.1139/cjfr-2020-0314
Darras, K., P. Batáry, B. Furnas, A. Celis-Murillo, S. L. Van Wilgenburg, Y. A. Mulyani, and T. Tscharntke. 2018. Comparing the sampling performance of sound recorders versus point counts in bird surveys: a meta-analysis. Journal of Applied Ecology 55:2575–2586. https://doi.org/10.1111/1365-2664.13229
Davidson, P. J. A., R. J. Cannings, A. R. Couturier, D. Lepage, and C. M. Di Corrado. 2015. The atlas of the breeding birds of British Columbia, 2008-2012. Bird Studies Canada, Delta, British Columbia, Canada. https://www.birdatlas.bc.ca/
Drake, A., D. R. de Zwaan, T. A. Altamirano, S. Wilson, K. Hick, C. Bravo, J. T. Ibarra, and K. Martin. 2021. Combining point counts and autonomous recording units improves avian survey efficacy across elevational gradients on two continents. Ecology and Evolution 11:8654–8682. https://doi.org/10.1002/ece3.7678
Drake, K. L., M. D. Frey, D. Hogan, and R. Hedley. 2016. Using digital recordings and sonogram analysis to obtain counts of Yellow Rails. Wildlife Society Bulletin 40:346–354. https://doi.org/10.1002/wsb.658
Efford, M. G., and J. Boulanger. 2019. Fast evaluation of study designs for spatially explicit capture-recapture. Methods in Ecology and Evolution 10:1529–1535. https://doi.org/10.1111/2041-210X.13239
Fogarty, F. A., and E. Fleishman. 2021. Bias in estimated breeding-bird abundance from closure-assumption violations. Ecological Indicators 131:108170. https://doi.org/10.1016/j.ecolind.2021.108170
Foster, S. D., E. Lawrence, and A. J. Hoskins. 2023. Spatially clustered survey designs. Journal of Agricultural, Biological and Environmental Statistics 29:130–146. https://doi.org/10.1007/s13253-023-00562-1
Geldmann, J., J. Heilmann-Clausen, T. E. Holm, I. Levinsky, B. Markussen, K. Olsen, C. Rahbek, and A. P. Tøttrup. 2016. What determines spatial bias in citizen science? Exploring four recording schemes with different proficiency requirements. Diversity and Distributions 22:1139–1149. https://doi.org/10.1111/ddi.12477
Gneiting, T., and A. E. Raftery. 2012. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102:359–378. https://doi.org/10.1198/016214506000001437
Grafström, A., J. Lisic, and W. Prentius. 2022. BalancedSampling: balanced and spatially balanced sampling. R package version 1.6.3. R Foundation for Statistical Computing, Vienna, Austria. https://cran.r-project.org/web/packages/BalancedSampling/index.html
Guillera-Arroita, G., J. J. Lahoz-Monfort, A. R. van Rooyen, A. R. Weeks, and R. Tingley. 2017. Dealing with false-positive and false-negative errors about species occurrence at multiple levels. Methods in Ecology and Evolution 8:1081–1091. https://doi.org/10.1111/2041-210X.12743
Guisan, A., T. C. Edwards, Jr., and T. Hastie. 2002. Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecological Modelling 157:89–100. https://doi.org/10.1016/S0304-3800(02)00204-1
Henckel, L., U. Bradter, M. Jönsson, N. J. B. Isaac, and T. Snäll. 2020. Assessing the usefulness of citizen science data for habitat suitability modelling: opportunistic reporting versus sampling based on a systematic protocol. Diversity and Distributions 26:1276–1290. https://doi.org/10.1111/ddi.13128
Howard, E., H. Aschen, and A. K. Davis. 2010. Citizen science observations of monarch butterfly overwintering in the southern United States. Psyche: A Journal of Entomology 2010:689301. https://doi.org/10.1155/2010/689301
Johnson, D. H., J. P. Gibbs, M. Herzog, S. Lor, N. D. Niemuth, C. A. Ribic, M. Seamans, T. L. Shaffer, W. G. Shriver, S. V. Stehman, and W. L. Thompson. 2009. A sampling design framework for monitoring secretive marshbirds. Waterbirds 32:203–215. https://doi.org/10.1675/063.032.0201
Johnston, A., N. Moran, A. Musgrove, D. Fink, and S. R. Baillie. 2020. Estimating species distributions from spatially biased citizen science data. Ecological Modelling 422:108927. https://doi.org/10.1016/j.ecolmodel.2019.108927
Kahl, S., C. M. Wood, M. Eibl, and H. Klinck. 2021. BirdNET: a deep learning solution for avian diversity monitoring. Ecological Informatics 61:101236. https://doi.org/10.1016/j.ecoinf.2021.101236
Kaim, A., A. F. Cord, and M. Volk. 2018. A review of multi-criteria optimization techniques for agricultural land use allocation. Environmental Modelling and Software 105:79–93. https://doi.org/10.1016/j.envsoft.2018.03.031
Liu, C., M. White, and G. Newell. 2011. Measuring and comparing the accuracy of species distribution models with presence-absence data. Ecography 34:232–243. https://doi.org/10.1111/j.1600-0587.2010.06354.x
Machtans, C. S., K. J. Kardynal, and P. A. Smith. 2014. How well do regional or national Breeding Bird Survey data predict songbird population trends at an intact boreal site? Avian Conservation and Ecology 9(1):5. https://doi.org/10.5751/ACE-00649-090105
MacKenzie, D. I., and J. A. Royle. 2005. Designing occupancy studies: general advice and allocating survey effort. Journal of Applied Ecology 42:1105–1114. https://doi.org/10.1111/j.1365-2664.2005.01098.x
Matsuoka, S. M., C. L. Mahon, C. M. Handel, P. Sólymos, E. M. Bayne, P. C. Fontaine, and C. J. Ralph. 2014. Reviving common standards in point-count surveys for broad inference across studies. Condor 116:599–608. https://doi.org/10.1650/CONDOR-14-108.1
Mccabe, J. D., N. M. Anich, R. S. Brady, and B. Zuckerberg. 2018. Raising the bar for the next generation of biological atlases: using existing data to inform the design and implementation of atlas monitoring. Ibis 160:528–541. https://doi.org/10.1111/ibi.12561
Metcalf, O. C., J. Barlow, S. Marsden, N. Gomes de Moura, E. Berenguer, J. Ferreira, and A. C. Lees. 2021. Optimizing tropical forest bird surveys using passive acoustic monitoring and high temporal resolution sampling. Remote Sensing in Ecology and Conservation 8:45–56. https://doi.org/10.1002/rse2.227
Micheletti, T., F. E. C. Stewart, S. G. Cumming, S. Haché, D. Stralberg, J. A. Tremblay, C. Barros, I. M. S. Eddy, A. M. Chubaty, M. Leblond, R. F. Pankratz, C. L. Mahon, S. L. Van Wilgenburg, E. M. Bayne, F. Schmiegelow, and E. J. B. McIntire. 2021. Assessing pathways of climate change effects in SpaDES: an application to boreal landbirds of Northwest Territories Canada. Frontiers in Ecology and Evolution 9:679673. https://doi.org/10.3389/fevo.2021.679673
Morán-Ordóñez, A., J. J. Lahoz-Monfort, J. Elith, and B. A. Wintle. 2017. Evaluating 318 continental-scale species distribution models over a 60-year prediction horizon: what factors influence the reliability of predictions? Global Ecology and Biogeography 26:371–384. https://doi.org/10.1111/geb.12545
North American Bird Conservation Initiative. 2016. The state of North America’s birds 2016. Environment and Climate Change Canada, Ottawa, Ontario, Canada. https://www.stateofthebirds.org/
Pankratz, R. F., S. Hache, P. Sólymos, and E. M. Bayne. 2017. Potential benefits of augmenting road-based breeding bird surveys with autonomous recordings. Avian Conservation and Ecology 12(2):18. https://doi.org/10.5751/ACE-01087-120218
Pavlacky, Jr., D. C., P. M. Lukacs, J. A. Blakesley, R. C. Skorkowsky, D. S. Klute, B. A. Hahn, V. J. Dreitz, T. L. George, and D. J. Hanni. 2017. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions. PLoS ONE 12:e0185924. https://doi.org/10.1371/journal.pone.0185924
Pickell, P. D., D. W. Andison, N. C. Coops, S. E. Gergel, and P. L. Marshall. 2015. The spatial patterns of anthropogenic disturbance in the western Canadian boreal forest following oil and gas development. Canadian Journal of Forest Research 45:732–743. https://doi.org/10.1139/cjfr-2014-0546
Ploton, P., F. Mortier, M. Réjou-Méchain, N. Barbier, N. Picard, V. Rossi, C. Dormann, G. Cornu, G. Viennois, N. Bayol, A. Lyapustin, S. Gourlet-Fleury, and R. Pélissier. 2020. Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nature Communications 11:4540. https://doi.org/10.1038/s41467-020-18321-y
Porfirio, L. L., R. M. B. Harris, E. C. Lefroy, S. Hugh, S. F. Gould, G. Lee, N. L. Bindoff, and B. Mackey. 2014. Improving the use of species distribution models in conservation planning and management under climate change. PLoS ONE 9:e113749. https://doi.org/10.1371/journal.pone.0113749
R Core Team. 2022. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org/
Rhodes, J. R., and N. Jonzén. 2011. Monitoring temporal trends in spatially structured populations: how should sampling effort be allocated between space and time? Ecography 34:1040–1048. https://doi.org/10.1111/j.1600-0587.2011.06370.x
Robinson, O. J., V. Ruiz-Gutierrez, M. D. Reynolds, G. H. Golet, M. Strimas-Mackey, and D. Fink. 2020. Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution models. Diversity and Distributions 26:976–986. https://doi.org/10.1111/ddi.13068
Rosenberg, K. V., A. M. Dokter, P. J. Blancher, J. R. Sauer, A. C. Smith, P. A. Smith, J. C. Stanton, A. Panjabi, L. Helft, M. Parr, and P. P. Marra. 2019. Decline of the North American avifauna. Science 366:120–124. https://doi.org/10.1126/science.aaw1313
Roy, C., N. L. Michel, C. M. Handel, S. L. Van Wilgenburg, J. C. Burkhalter, K. E. B. Gurney, D. J. Messmer, K. Princé, C. S. Rushing, J. F. Saracco, R. Schuster, A. C. Smith, P. A. Smith, P. Sólymos, L. A. Venier, and B. Zuckerberg. 2019. Monitoring boreal avian populations: how can we estimate trends and trajectories from noisy data? Avian Conservation and Ecology 14(2):8. https://doi.org/10.5751/ACE-01397-140208
Sauer, J. R., K. L. Pardieck, D. J. Ziolkowski, Jr., A. C. Smith, M.-A. R. Hudson, V. Rodriguez, H. Berlanga, D. K. Niven, and W. A. Link. 2017. The first 50 years of the North American Breeding Bird Survey. Condor 119:576–593. https://doi.org/10.1650/CONDOR-17-83.1
Shonfield, J., and E. M. Bayne. 2017. Autonomous recording units in avian ecological research: current use and future applications. Avian Conservation and Ecology 12(1):14. https://doi.org/10.5751/ACE-00974-120114
Sólymos, P. 2016. QPAD version 3 documentation. Technical Report. Boreal Avian Modelling Project. University of Alberta, Edmonton, Alberta, Canada. https://zenodo.org/records/3251111
Sólymos, P., S. M. Matsuoka, E. M. Bayne, S. R. Lele, P. Fontaine, S. G. Cumming, D. Stralberg, F. K. A. Schmiegelow, and S. J. Song. 2013. Calibrating indices of avian density from non-standardized survey data: making the most of a messy situation. Methods in Ecology and Evolution 4:1047–1058. https://doi.org/10.1111/2041-210X.12106
Sólymos, P., J. D. Toms, S. M. Matsuoka, S. G. Cumming, N. K. S. Barker, W. E. Thogmartin, D. Stralberg, A. D. Crosby, F. V. Dénes, S. Haché, C. L. Mahon, F. K. A. Schmiegelow, and E. M. Bayne. 2020. Lessons learned from comparing spatially explicit models and the Partners in Flight approach to estimate population sizes of boreal birds in Alberta, Canada. Condor 122:duaa007. https://doi.org/10.1093/condor/duaa007
Stralberg, D., C. Carroll, J. H. Pedlar, C. B. Wilsey, D. W. McKenney, and S. E. Nielsen. 2018. Macrorefugia for North American trees and songbirds: climatic limiting factors and multi-scale topographic influences. Global Ecology and Biogeography 27:690–703. https://doi.org/10.1111/geb.12731
Stralberg, D., S. M. Matsuoka, A. Hamann, E. M. Bayne, P. Sólymos, F. K. A. Schmiegelow, X. Wang, S. G. Cumming, and S. J. Song. 2015. Projecting boreal bird responses to climate change: the signal exceeds the noise. Ecological Applications 25:52-69. https://doi.org/10.1890/13-2289.1
Sullivan, B. L., J. L. Aycrigg, J. H. Barry, R. E. Bonney, N. Bruns, C. B. Cooper, T. Damoulas, A. A. Dhondt, T. Dietterich, A. Farnsworth, D. Fink, J. W. Fitzpatrick, T. Fredericks, J. Gerbracht, C. Gomes, W. M. Hochachka, M. J. Iliff, C. Lagoze, F. A. La Sorte, M. Merrifield, W. Morris, T. B. Phillips, M. Reynolds, A. D. Rodewald, K. V. Rosenberg, N. M. Trautmann, A. Wiggins, D. W. Winkler, W.-K. Wong, C. L. Wood, J. Yu, and S. Kelling. 2014. The eBird enterprise: an integrated approach to development and application of citizen science. Biological Conservation 169:31–40. https://doi.org/10.1016/j.biocon.2013.11.003
Title, P. O., and J. B. Bemmels. 2018. ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography 41:291–307. https://doi.org/10.1111/ecog.02880
Van Wilgenburg, S. L., E. M. Beck, B. Obermayer, T. Joyce, and B. Weddle. 2015. Biased representation of disturbance rates in the roadside sampling frame in boreal forests: implications for monitoring design. Avian Conservation and Ecology 10(2):5. https://doi.org/10.5751/ACE-00777-100205
Van Wilgenburg, S. L., C. L. Mahon, G. Campbell, L. McLeod, M. Campbell, D. Evans, W. Easton, C. M. Francis, S. Haché, C. S. Machtans, C. Mader, R. F. Pankratz, R. Russell, A. C. Smith, P. Thomas, J. D. Toms, and J. A. Tremblay. 2020. A cost efficient spatially balanced hierarchical sampling design for monitoring boreal birds incorporating access costs and habitat stratification. PLoS ONE 15:e0234494. https://doi.org/10.1371/journal.pone.0234494
Van Wilgenburg, S. L., P. Sólymos, K. J. Kardynal, and M. D. Frey. 2017. Paired sampling standardizes point count data from humans and acoustic recorders. Avian Conservation and Ecology 12(1):13. https://doi.org/10.5751/ACE-00975-120113
Weiser, E. L., J. E. Diffendorfer, L. Lopez-Hoffman, D. Semmens, and W. E. Thogmartin. 2020. Challenges for leveraging citizen science to support statistically robust monitoring programs. Biological Conservation 242:108411. https://doi.org/10.1016/j.biocon.2020.108411
Whittington, J., K. Heuer, B. Hunt, M. Hebblewhite, and P. M. Lukacs. 2014. Estimating occupancy using spatially and temporally replicated snow surveys. Animal Conservation 18:92–101. https://doi.org/10.1111/acv.12140
Wilson, M. F. J., B. O’Connell, C. Brown, J. C. Guinan, and A. J. Grehan. 2007. Multiscale terrain analysis of multibeam bathymetry data for habitat mapping on the continental slope. Marine Geodesy 30:3–35. https://doi.org/10.1080/01490410701295962
Wood, C. M., S. Kahl, P. Chaon, M. Z. Peery, and H. Klinck. 2021. Survey coverage, recording duration and community composition affect observed species richness in passive acoustic surveys. Methods in Ecology and Evolution 12:885–896. https://doi.org/10.1111/2041-210X.13571
Wood, S. N. 2017. Generalized additive models: an introduction with R. Second edition. CRC Press/Taylor and Francis Group, Boca Raton, Florida, USA.