Temporal Precision

All effort-based eBird checklists come with information on the starting time and a duration of the observation period. The ‘TIME OBSERVATIONS STARTED’ column of the EBD describes the initiation of the birding event, whereas the ‘DURATION MINUTES’ column is how long observers were collecting data. Duration is assumed to be continuous. Minimally, care should be taken to filter out checklists by duration length, for example removing very long observation periods, depending on the needs of your particular analysis. This can be done using the ‘auk’ package for R (https://github.com/CornellLabofOrnithology/auk). We routinely control statistically for variation in TIME OBSERVATIONS STARTED in analyses conducted by staff of the Lab of Ornithology.

In analyses, our preferred method for dealing with variation in the durations of observation periods is to include DURATION MINUTES as a predictor variable in analyses, and to treat this predictor as a continuous, non-linear variable.  The reasons for this treatment of DURATION MINUTES are:

We suggest that one useful approach to statistically describing the effect of variation in observation effort in a statistical model is to treat this effect as a smooth term in a generalized additive model (GAM), i.e. as a spline.  By doing so, you are letting the process of fitting the model determine the best description for how the likelihood of additional observation of a species changes as effort increases, rather than arbitrarily assuming that you already know the best possible way of describing this relationship.  Alternatively, it is possible to describe the relationship using any of a number of non-linear effects, for example by transforming the values of DURATION MINUTES by log transforming the values [i.e. using log(DURATION MINUTES) as your predictor variable]. However, if you choose this approach, you will need to justify that the approach that you have used in an appropriate way of describing how the accumulation of new observations slows with each additional increase in the length of an observation period.  A reasonable way of producing this justification is to compare the accuracy of one’s approach with the effect described by a spline…so, you would have to fit a GAM to your model anyway.

For fitting GAMs using R statistical software, when we analyse data, staff at the Lab of Ornithology typically use either the ‘mgcv’ package, or if there are random effects in the model the package ‘gamm4’. In our experience, the online documentation for these packages is far better than average, and the creator of these packages, Simon Wood, has written an extremely useful book Generalized Additive Models: An Introduction with R that is now in its second edition.

If you are using a machine-learning analysis, such as fitting a Random Forest or Gradient Boosted Model, to your data, then deciding on the best description for the effect of changing observation effort is a moot point.  As long as you have specified that DURATION MINUTES is to be treated as a continuous variable, then the analysis will determine the best description of the effect of observation period on the accumulation of new observations in much the same way as fitting a GAM using a statistical model.