General RRatepol package description • RRatepol

General description

The computation of RoC in RRatepol is performed using the following steps:

Assemblage and age-model data are extracted from the original source and should be compiled together, i.e. depth, age, variable (taxon) 1, variable (taxon) 2, etc.
(optional) Smoothing of assemblage data: Each variable within the assemblage data is smoothed using one of five in-built smoothing methods: none (smooth_method = "none"), Shepard’s 5-term filter (smooth_method = "shep"; Davis, 1986; Wilkinson, 2005), moving average (smooth_method = "m.avg"), age-weighted average (smooth_method = "age.w"), Grimm’s smoothing (smooth_method = "grim"; Grimm and Jacobson, 1992).
Creation of time bins: A template for all time bins in all window movements is created.
A single run (an individual loop) is computed:

(optional) Selection of one time series from age uncertainties (see section 2.1.1.2. on randomisation)
Subsetting levels in each bin: Here the working units (WU) are defined
(optional) Standardisation of assemblage data in each WU
Calculation of RoC between WUs: RoC is calculated as the dissimilarity coefficient (dissimilarity_coefficient) standardised by age differences between WUs. Several in-built dissimilarity coefficients are available: Euclidean distance (dissimilarity_coefficient = "euc), standardised Euclidean distance (dissimilarity_coefficient = "euc.sd), Chord distance (dissimilarity_coefficient = "chord), Chi-squared coefficient (dissimilarity_coefficient = "chisq), Gower’s distance (dissimilarity_coefficient = "gower), Bray-Curtis(dissimilarity_coefficient = "bray). See vegan::vegdistfor more details. The choice of dissimilarity_coefficient depends on the type of assemblage data.
The summary of a single run is produced based on all moving windows

Step 4 is repeated multiple times (e.g. 10,000 times).
Validation and summary of results from all runs of RoC calculation are produced.
(Optional) Data beyond a certain age can be excluded.
Detection and validation of significant peak-points. There are five in-built methods to detect significant peak-points: Threshold (method = "threshold), Linear trend (method = "trend_linear), Non-linear trend (method = "trend_non_linear), first derivative of a generalised additive model (method = "GAM_deriv; f-deriv GAM; Simpson, 2018), and Signal-to-Noise Index (method = "SNI; Kelly et al., 2011).

Selection of working units

RoC is calculated between consecutive Working Units (WU). Traditionally, these WUs represent individual stratigraphical levels. However, changes in sedimentation rates and sampling strategies can result in an uneven temporal distribution of levels within a time sequence, which in turn makes the comparison of RoC between sequences problematic. There are various methods that attempt to minimise such problems. The first is interpolation of levels to evenly spaced time intervals, and the use of the interpolated data as WUs. This can lead to a loss of information when the density of levels is high. Second is binning of levels: assemblage data are pooled into age brackets of various size (i.e. time bins) and these serve as WUs. Here, the issue is a lower resolution of WUs and their uneven size in terms of total assemblage count (bins with more levels have higher assemblage counts). Third is selective binning: like classical binning, bins of selected size are created, but instead of pooling assemblage data together, only one level per time bin is selected as representative of each bin. This results in an even number of WUs in bins with a similar count size in the assemblage. However, the issue of low resolution remains. Therefore, we propose a new method of binning with a moving window, which is a compromise between using individual levels and selective binning. This method follows a simple sequence: time bins are created, levels are selected as in selective binning, and RoC between bins is calculated. However, the brackets of the time bin (window) are then moved forward by a selected amount of time (Z), levels are selected again (subset into bins), and RoC calculated for the new set of WUs. This is repeated X times (where X is the bin size divided by Z) while retaining all the results. RRatepol currently provides several options for selecting WU, namely as individual levels (working_units = "levels"), selective binning of levels (working_units = "bins"), and our new method of binning with a moving window (working_units = "MW"), which is summarised in figure.

RRatepol scheme

Randomisation

Due to the inherent statistical errors in uncertainties in the age estimates from age-depth and the assemblage datasets (e.g. pollen counts in each level; Birks and Gordon, 1985), RRatepol can be run several times and the results summarised (Steps 5-6). Therefore, two optional settings are available by using age uncertainties and assemblage data standardisation.

Age uncertainties (Step 4a)

For each run, a single age sequence from the age uncertainties is randomly selected. The calculation between two consecutive WUs (i.e. one working-unit combination) results in a RoC score and a time position (which is calculated as the mean age position of the two WUs). However, due to random sampling of the age sequence, each WU combination will result in multiple RoC values. The final RoC value for a single WU combination is calculated as the median of the scores from all randomisations. In addition, the 95th quantile from all randomisations is calculated as an error estimate.

Data standardisation (Step 4b)

Variables (taxa) in the assemblage dataset can be standardised to a certain count (e.g. number of pollen grains in each WU) by rarefaction. Random sampling without replacement is used to draw a selected number of individuals from each WU (e.g. 150 pollen grains).

Detection of peak-points in RoC sequence (Step 8)

A rapid change in composition or relative abundances of variables within the sequence can provide a means of comparing RoC between sequences and interpreting the potential drivers of assemblage change. To detect such significant peak-points of RoC scores in each sequence, each point is tested to see if it represents a significant increase in RoC values. There are various ways to detect peak-points in a time series and RRatepol is able to detect peak-points using five methods:

Threshold (method = "threshold): Each point in the RoC sequence is compared to a median of all RoC scores from the whole sequence (i.e. threshold value). The ROC value for a point is considered significant if the 95th quantile of the RoC scores from all calculations is higher than the threshold value.
Linear trend (method = "trend_linear): A linear model is fitted between the RoC values and their ages. Differences between the model and each point are calculated (residuals). The standard deviation (SD) is calculated from all the residuals. A peak is considered significant if it is 1.5 SD higher than the model (sd_threshold = 1.5).
Non-linear trend (method = "trend_non_linear): A conservative generalised additive model (GAM) is fitted through the RoC scores and their ages (GAM= RoC ~ s(age,k=3) using the mgcv package (Wood, 2011). The distance between each point and the fitted value is calculated (residuals). The standard deviation (SD) is calculated from all the residuals. A peak is considered significant if it is 1.5 SD higher than the model (sd_threshold = 1.5).
F-deriv GAM (method = "GAM_deriv): A smooth GAM model is fitted to the RoC scores and their ages (GAM= RoC ~ s(age). The first derivative as well as continuous confidence intervals are calculated from the model using the gratia package (Simpson, 2019). A peak is considered significant if the confidence intervals of the first derivative differ from 0 (for more information see Simpson, 2018).
Signal-to-noise method (method = "SNI): We adapted SNI from Kelly et al. (2011), which was developed to detect changes in charcoal stratigraphical records. SNI is calculated for the whole RoC sequence and a peak-point is considered significant if it has an SNI value higher than 3.

References

Birks, H.J.B., Gordon, A.D., 1985. Numerical Methods in Quaternary Pollen Analysis. Academic Press, London.

Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd edn. ed. J. Wiley & Sons, New York.

Grimm, E.C., Jacobson, G.L., 1992. Fossil-pollen evidence for abrupt climate changes during the past 18000 years in eastern North America. Clim. Dyn. 6, 179–184.

Kelly, R.F., Higuera, P.E., Barrett, C.M., Feng Sheng, H., 2011. A signal-to-noise index to quantify the potential for peak detection in sediment-charcoal records. Quat. Res. 75, 11–17. https://doi.org/10.1016/j.yqres.2010.07.011

Simpson, G.L., 2019. gratia: graceful’ggplot’–based graphics and other functions for GAMs fitted using “mgcv.” R Packag. version 0.2–1.

Simpson, G.L., 2018. Modelling palaeoecological time series using generalised additive models. Front. Ecol. Evol. 6, 1–21. https://doi.org/10.3389/fevo.2018.00149

Wilkinson, L., 2005. The Grammar of Graphics. Springer-Verlag, New York, USA 37. https://doi.org/10.2307/2669493 Wood, S.N., 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 73, 3–36.