This tutorial assumes that you have collected data following one of the survey procedures documented in the MAISRC video, “Monitoring Zebra Mussels in Lakes”. Here, we describe how to analyze data collected using dependent double-observer surveys with and without distance sampling. These designs are sometimes referred to as “removal designs” (rather than dependent double-observer designs). We will demonstrate these techniques using data collected on Lake Burgan, Minnesota in the summer of 2018. Both of these survey designs allow you to estimate the probability of detecting mussels, and thus, obtain estimates of density adjusted for imperfect detection. Here, we show how to use design-based estimators of density. We have an R package to go along with the tutorial; it includes all the necessary functions and data to replicate our analyses. You can download the R package here and install from source using the following code:
install.packages(path_to_file, repos = NULL, type="source") #path_to_file is a character string that points to
#the downloaded file, e.g., install.packages("ZebraTutorial_0.1.0.tar.gz", repos = NULL, type="source")
#if the file is in your working directory
If you use Rstudio you can also install the downloaded file by clicking the Packages tab -> Install -> Install from: Package archive file (.tar.gz). Alternatively you can install from github using the following code:
library(devtools) #this takes awhile to install so install from source if you don't want to bother
install_github("troutinthemilk/Zeebs-Season2/Tutorial/ZebraTutorial", build_vignettes = TRUE)
The remainder of the tutorial assumes that you have been able to install this package. All the data and functions you will need to complete the tutorial are in this package. To access the data and functions in this package type:
Estimating Density and Quantifying Uncertainty
Our goal is to estimate zebra mussel density in a lake and to quantify uncertainty associated with that estimate. To do this, we need the counts in each transect, denoted as for the th transect, the transect length, , and the width, , to determine the surveyed area, . We estimate the density as:
where denotes the estimated probability of detecting a zebra mussel by at least one observer.
The degree of uncertainty in our density estimate will depend on the variance of the counts among the different transects, , and the variance in the estimated detection probability, .
Our estimate of will depend on the survey design.
Dependent Double-Observer (no distance data)
To conduct the dependent double-observer survey, 2 divers survey 30-meter x 1-meter wide areas delineated using 2 lead lines laid in parallel. The primary diver swims first, marking all the mussels she/he sees within the belt. The secondary diver follows looking for mussels that the first observer missed. We rotated the role of primary and secondary divers at each transect.
The Data
Data should be entered in a manner that ensures reliability. For the datasets used here, we entered data using Google Forms which can be accessed here. This software allowed us to check that no impossible values were entered (for example, transect lengths were numbers between 1 and 30 due to our design). We created a datasheet of detection events and a datasheet that described each transect. The variable “Transect number” is a unique identifier for each transect that links these sheets. Below we print the first few lines from the dataframe of the detection events:
data(DoubleEncounter)
head(DoubleEncounter)
#> # A tibble: 6 x 3
#> `Observer name` `Transect number` size
#> <chr> <dbl> <dbl>
#> 1 Aislyn 2 1
#> 2 Aislyn 4 2
#> 3 Aislyn 3 0
#> 4 Aislyn 6 2
#> 5 Aislyn 6 1
#> 6 Aislyn 8 1
Each row contains data from a unique detection event, with the name of the observer (Observer name
), the transect number (Transect number
), and the number of zebra mussels in the cluster (size
).
We then read in the transect data containing the primary and secondary observers, the transect length, and transect number. We create new columns in the transect data that denote the observers’ name and whether that observer was primary or secondary.
data(DoubleTransect)
DoubleTransect <- DoubleTransect %>% gather(key = observer, value = name, "primary", "secondary")
head(DoubleTransect)
#> # A tibble: 6 x 4
#> length `Transect number` observer name
#> <dbl> <dbl> <chr> <chr>
#> 1 13 1 primary Austen
#> 2 19 2 primary Aislyn
#> 3 27 3 primary Austen
#> 4 25 4 primary Aislyn
#> 5 24 5 primary Austen
#> 6 19 6 primary Aislyn
Now, we need to format the data for the analysis. We will be using the removal
function in the R fisheries stock analysis (FSA
) package to calculate the density estimates. This function will estimate the detection probability and variance in counts for us. We can use this information to determine the variance in density. We have written the create.removal
function to properly format the data and have included it in the file HelperFuncs.R
, which we source here (making the function available to us in the current R session). Tutorials on the estimator used here are available at the FishR site.
The Analysis
Here we apply the function removal
to the combined transect counts. This function calculates the conditinal variance of the counts and the detection probabilty. We need to combine them to calculate the variance in density.
#get the length of each transect
length.vec <- DoubleTransect %>% group_by(`Transect number`) %>% summarize(length = first(length))
# now sum the results over all transects to estimate
# population size at the transect level
remove.ests <- removal(removal.list, just.ests=TRUE)
Nhat <- remove.ests[1]
Nhat.SE <- remove.ests[2]
# estimate of density along the transect
# estimate of detection probability along the transect
bp.hat <- remove.ests[5]
bpSE.phat <- remove.ests[6]
#we need to incorporate variation of detection and variation in the conditional counts into the the density estimate
Dhat <- Nhat/sum(length.vec$length)
Dhat.SE <- Dhat*sqrt((Nhat.SE^2/sum(removal.list)^2) + bpSE.phat/bp.hat^2)
We get an estimate of density 0.22 mussels per square meter with a standard error of 0.07. We estimate the detection to be 0.81 with a standard error of 0.07.
Dependent Double-Observer Survey with Distance Data
The dependent double-observer survey with distance data is similar to the dependent double-observer survey described above. However, in this case, both divers swim along a single transect line (lead line). Further, whenever a diver detects a mussel or cluster of mussels, the diver also measures the perpendicular distance from the detected mussel(s) to the transect line. The secondary diver again follows the primary diver, looking for mussels that were missed by the primary diver. These distance to detection measurements are then used to model how detection probabilities decline with distance from the transect line. This distance model is then used to correct counts for imperfect detection. The distancesampling.org website provides tutorials on using the R package described here, as well as several other packages that can be used to do distance sampling with either single- or double-observer designs.
The Data
Here we read in the encounter data containing the observer (Observer name
), transect number (Transect #
), perpendicular distance from each mussel or cluster of mussels from the transect line (distance
), and the cluster size (size
):
data("DistanceEncounter")
head(DistanceEncounter)
#> # A tibble: 6 x 4
#> `Observer name` `Transect #` distance size
#> <chr> <dbl> <dbl> <dbl>
#> 1 Aislyn 1 0.64 1
#> 2 Aislyn 1 0.38 1
#> 3 Aislyn 2 0.3 1
#> 4 Aislyn 2 0.38 1
#> 5 Aislyn 2 0.4 1
#> 6 Aislyn 2 0.19 1
And now, we read in the transect data containing primary and secondary observers, the transect length, and transect number:
data("DistanceTransect")
head(DistanceTransect)
#> # A tibble: 6 x 4
#> `Primary observer` `Secondary observer` `Transect length`
#> <chr> <chr> <dbl>
#> 1 Austen Aislyn 11
#> 2 Aislyn Austen 17
#> 3 Austen Aislyn 29
#> 4 Aislyn Austen 25
#> 5 Austen Aislyn 26
#> 6 Aislyn Austen 18
#> `Transect number`
#> <dbl>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
# order by transect number
DistanceTransect <- DistanceTransect[order(DistanceTransect$"Transect number"),]
It is useful to construct some initial diagnostic plots to ensure data were entered correctly and assumptions of distance sampling are met (e.g., detection declines monotonically with distance from the transect line).
hist(DistanceEncounter$distance, col="black", border="white",
xlab='Distance of detection from transect (m)', main="")
Here we see a pattern that is consistent with our expectation - i.e., the number of detections declines with distance from the transect line.
The analysis
We first prepare the data for use with the R package MRDS
, making use of a helper function create.removal.Observer
.
library(mrds) #load the mrds package
#> This is mrds 2.2.0
#> Built: R 3.5.2; ; 2019-04-25 23:38:01 UTC; unix
# Order the transects
DistanceEncounter <- DistanceEncounter[order(DistanceEncounter$`Transect #`),]
# Format distance data for use with mrds
DistanceEncounter <- DistanceEncounter %>%
mutate(detected = rep(1, dim(DistanceEncounter)[1]),
object=1:dim(DistanceEncounter)[1])
distance.rem.dat <- create.removal.Observer(transect.dat=DistanceTransect, obs.dat=DistanceEncounter)
Based on the relatively slow decline in detection with distance (Figure 1), we chose to use the hazard rate model (dsmodel=~cds(key="hr")
) to capture the drop-off of detection with distance. Alternative distance functions in the mrds
package are the half-normal (hn
), gamma function (gamma
), and uniform (unif
).
distance.model <- ddf(method="rem", dsmodel=~cds(key="hr"),
mrmodel=~glm(formula=~1),
data=distance.rem.dat,
meta.data=list(width=1))
print(summary(distance.model))
#>
#> Summary for rem.fi object
#> Number of observations : 79
#> Number seen by primary : 56
#> Number additional seen by secondary : 23
#> AIC : 99.13481
#>
#>
#> Conditional detection function parameters:
#> estimate se
#> (Intercept) 0.8898575 0.420271
#>
#> Estimate SE CV
#> Average primary p(0) 0.9152379 0.05050355 0.05518079
#>
#>
#>
#> Summary for ds object
#> Number of observations : 79
#> Distance range : 0 - 1
#> AIC : -27.39173
#>
#> Detection function:
#> Hazard-rate key function
#>
#> Detection function parameters
#> Scale coefficient(s):
#> estimate se
#> (Intercept) -0.7774321 0.1764449
#>
#> Shape coefficient(s):
#> estimate se
#> (Intercept) 1.037199 0.3349329
#>
#> Estimate SE CV
#> Average p 0.580408 0.0645321 0.111184
#>
#>
#> Summary for rem object
#>
#> Total AIC value = 71.74308
#> Estimate SE CV
#> Average p 0.5312114 0.06593617 0.1241242
#> N in covered region 148.7166908 21.72528452 0.1460850
The average probability of detecting a mussel by at least one observer is with a standard error of . Below, we overlay the estimated detection model on the histogram of detection distances made by the two observers.
The variance in the total counts can be estimated using the dht
function in the mrds
package. We need to create two tables to link the survey effort and surveyed area to the actual count data. This is done below, then we run the dht
function to get the density and it’s standard error. Below we create these tables, then get the density estimate.
w <- 2 #width of transects
area <- data.frame(Region.Label=1, Area=sum(DistanceTransect$`Transect length`)*w)
counts <- table(DistanceEncounter$`Transect #`) #count in each transect
samples <- data.frame(Region.Label=rep(1, length(counts)), Sample.Label=1:length(counts), Effort=DistanceTransect$`Transect length`)
density.est <- dht(distance.model, region.table=area, sample.table=samples)
print(density.est)
#>
#> Summary for clusters
#>
#> Summary statistics:
#> Region Area CoveredArea Effort n k ER se.ER cv.ER
#> 1 1 602 602 301 79 15 0.2624585 0.05123375 0.1952071
#>
#> Abundance:
#> Label Estimate se cv lcl ucl df
#> 1 Total 148.7167 34.40232 0.2313279 93.07522 237.6213 26.80237
#>
#> Density:
#> Label Estimate se cv lcl ucl df
#> 1 Total 0.2470377 0.05714671 0.2313279 0.15461 0.3947198 26.80237
#>
#> Summary for individuals
#>
#> Summary statistics:
#> Region Area CoveredArea Effort n ER se.ER cv.ER
#> 1 1 602 602 301 93 0.3089701 0.07225861 0.2338693
#> mean.size se.mean
#> 1 1.177215 0.05332043
#>
#> Abundance:
#> Label Estimate se cv lcl ucl df
#> 1 Total 175.0715 46.35319 0.2647671 102.1349 300.0937 22.66678
#>
#> Density:
#> Label Estimate se cv lcl ucl df
#> 1 Total 0.2908165 0.07699866 0.2647671 0.1696593 0.4984946 22.66678
#>
#> Expected cluster size
#> Region Expected.S se.Expected.S cv.Expected.S
#> 1 Total 1.177215 0.06999613 0.05945908
#> 2 Total 1.177215 0.06999613 0.05945908
The output from dht
provides two density estimates. The first estimate is for the density of clusters, the second estimate is for the total density of individuals. We want individual density, reading off the output table we get an estimate of with a standard error of .
Resources
Book: Buckland et al. (2015) Distance sampling: methods and applications.
Tutorial: distancesampling.org. Tutorials on distance sampling in R.
Tutorial: FishR. Tutorial on depletion sampling in R.
Files used in this tutorial
Data and functions that are available in the ZebraTutorial R package.
Double-observer survey detections
Data sheets and data entry for conducting a new survey
Transect data sheet: used to record information associated with the surveyed transects.
Habitat data sheet: used to record habitat data along the transect. These data can be used to model variation in mussel density.
Dependent double-observer belt survey: used to record observations of zebra mussels in dependent double-observer belt surveys.
Dependent double-observer distance survey: used to record observations of zebra mussels in dependent double-observer distance surveys.