This tutorial assumes that you have collected data following one of the survey procedures documented in the MAISRC video, “Monitoring Zebra Mussels in Lakes”. Here, we describe how to analyze data collected using dependent double-observer surveys with and without distance sampling. These designs are sometimes referred to as “removal designs” (rather than dependent double-observer designs). We will demonstrate these techniques using data collected on Lake Burgan, Minnesota in the summer of 2018. Both of these survey designs allow you to estimate the probability of detecting mussels, and thus, obtain estimates of density adjusted for imperfect detection. Here, we show how to use design-based estimators of density. We have an R package to go along with the tutorial; it includes all the necessary functions and data to replicate our analyses. You can download the R package here and install from source using the following code:

If you use Rstudio you can also install the downloaded file by clicking the Packages tab -> Install -> Install from: Package archive file (.tar.gz). Alternatively you can install from github using the following code:

The remainder of the tutorial assumes that you have been able to install this package. All the data and functions you will need to complete the tutorial are in this package. To access the data and functions in this package type:

Estimating Density and Quantifying Uncertainty

Our goal is to estimate zebra mussel density in a lake and to quantify uncertainty associated with that estimate. To do this, we need the counts in each transect, denoted as xi for the ith transect, the transect length, li, and the width, w, to determine the surveyed area, ai=wli. We estimate the density as:


where P^ denotes the estimated probability of detecting a zebra mussel by at least one observer.

The degree of uncertainty in our density estimate will depend on the variance of the counts among the different transects, var(xi), and the variance in the estimated detection probability, var(P^).


Our estimate of var(P^) will depend on the survey design.

Dependent Double-Observer (no distance data)

To conduct the dependent double-observer survey, 2 divers survey 30-meter x 1-meter wide areas delineated using 2 lead lines laid in parallel. The primary diver swims first, marking all the mussels she/he sees within the belt. The secondary diver follows looking for mussels that the first observer missed. We rotated the role of primary and secondary divers at each transect.

The Data

Data should be entered in a manner that ensures reliability. For the datasets used here, we entered data using Google Forms which can be accessed here. This software allowed us to check that no impossible values were entered (for example, transect lengths were numbers between 1 and 30 due to our design). We created a datasheet of detection events and a datasheet that described each transect. The variable “Transect number” is a unique identifier for each transect that links these sheets. Below we print the first few lines from the dataframe of the detection events:

Each row contains data from a unique detection event, with the name of the observer (Observer name), the transect number (Transect number), and the number of zebra mussels in the cluster (size).

We then read in the transect data containing the primary and secondary observers, the transect length, and transect number. We create new columns in the transect data that denote the observers’ name and whether that observer was primary or secondary.

Now, we need to format the data for the analysis. We will be using the removal function in the R fisheries stock analysis (FSA) package to calculate the density estimates. This function will estimate the detection probability and variance in counts for us. We can use this information to determine the variance in density. We have written the create.removal function to properly format the data and have included it in the file HelperFuncs.R, which we source here (making the function available to us in the current R session). Tutorials on the estimator used here are available at the FishR site.

Dependent Double-Observer Survey with Distance Data

The dependent double-observer survey with distance data is similar to the dependent double-observer survey described above. However, in this case, both divers swim along a single transect line (lead line). Further, whenever a diver detects a mussel or cluster of mussels, the diver also measures the perpendicular distance from the detected mussel(s) to the transect line. The secondary diver again follows the primary diver, looking for mussels that were missed by the primary diver. These distance to detection measurements are then used to model how detection probabilities decline with distance from the transect line. This distance model is then used to correct counts for imperfect detection. The website provides tutorials on using the R package described here, as well as several other packages that can be used to do distance sampling with either single- or double-observer designs.

The Data

Here we read in the encounter data containing the observer (Observer name), transect number (Transect #), perpendicular distance from each mussel or cluster of mussels from the transect line (distance), and the cluster size (size):

And now, we read in the transect data containing primary and secondary observers, the transect length, and transect number:

It is useful to construct some initial diagnostic plots to ensure data were entered correctly and assumptions of distance sampling are met (e.g., detection declines monotonically with distance from the transect line).

Figure 1. Diagnositic plot of the detection distance, the distance of each detection from the transect line.

Figure 1. Diagnositic plot of the detection distance, the distance of each detection from the transect line.

Here we see a pattern that is consistent with our expectation - i.e., the number of detections declines with distance from the transect line.

The analysis

We first prepare the data for use with the R package MRDS, making use of a helper function create.removal.Observer.

Based on the relatively slow decline in detection with distance (Figure 1), we chose to use the hazard rate model (dsmodel=~cds(key="hr")) to capture the drop-off of detection with distance. Alternative distance functions in the mrds package are the half-normal (hn), gamma function (gamma), and uniform (unif).

The average probability of detecting a mussel by at least one observer is P^=0.53 with a standard error of 0.06. Below, we overlay the estimated detection model on the histogram of detection distances made by the two observers.

The variance in the total counts can be estimated using the dht function in the mrds package. We need to create two tables to link the survey effort and surveyed area to the actual count data. This is done below, then we run the dht function to get the density and it’s standard error. Below we create these tables, then get the density estimate.

The output from dht provides two density estimates. The first estimate is for the density of clusters, the second estimate is for the total density of individuals. We want individual density, reading off the output table we get an estimate of D^=0.29 with a standard error of 0.08.


Book: Buckland et al. (2015) Distance sampling: methods and applications.

Tutorial: Tutorials on distance sampling in R.

Tutorial: FishR. Tutorial on depletion sampling in R.

Data sheets and data entry for conducting a new survey

Transect data sheet: used to record information associated with the surveyed transects.

Habitat data sheet: used to record habitat data along the transect. These data can be used to model variation in mussel density.

Dependent double-observer belt survey: used to record observations of zebra mussels in dependent double-observer belt surveys.

Dependent double-observer distance survey: used to record observations of zebra mussels in dependent double-observer distance surveys.