Skip to contents

A function to estimate Rate of change in community data in time series

Usage

estimate_roc(
  data_source_community,
  data_source_age,
  age_uncertainty = NULL,
  smooth_method = c("none", "m.avg", "grim", "age.w", "shep"),
  smooth_n_points = 5,
  smooth_age_range = 500,
  smooth_n_max = 9,
  working_units = c("levels", "bins", "MW"),
  bin_size = 500,
  number_of_shifts = 5,
  bin_selection = c("random", "first"),
  standardise = FALSE,
  n_individuals = 150,
  dissimilarity_coefficient = c("euc", "euc.sd", "chord", "chisq", "gower", "bray"),
  tranform_to_proportions = TRUE,
  rand = NULL,
  use_parallel = FALSE,
  interest_threshold = NULL,
  time_standardisation = NULL,
  verbose = FALSE
)

Arguments

data_source_community

Data.frame. Community data with species as columns and levels (samples) as rows. First column should be sample_id (character).

data_source_age

Data.frame with two columns:

  • sample_id - unique ID of each level (character)

  • age - age of level (numeric)

age_uncertainty

Usage of age uncertainty form Age-depth models. Either:

  • matrix with number of columns as number of samples. Each column is one sample, each row is one age sequence from age-depth model. Age sequence is randomly sampled from age-depth model uncertainties at the beginning of each run.

  • NULL - Age uncertainties are not available and, therefore, will not be used.

smooth_method

Character. type of smoothing applied for the each of the pollen type

  • "none" - Pollen data is not smoothed

  • "m.avg" - Moving average

  • "grim" - Grimm's smoothing

  • "age.w"" - Age-weighted average

  • "shep" - Shepard's 5-term filter

smooth_n_points

Numeric. Number of points for used for moving average, Grimm and Age-Weighted smoothing (odd number)

smooth_age_range

Numeric. Maximal age range for both Grimm and Age-weight smoothing

smooth_n_max

Numeric. Maximal number of samples to look in Grimm smoothing

working_units

Character. Selection of units that the dissimilarity_coefficient will be calculated between.

  • "levels" - individual levels are going to be used

  • "bins" - samples in predefined bins will be pooled together and one sample will be selected from each time bin as a representation.

  • "MW" - Bins of selected size are created, starting from the beginning of the core. This is repeated many times, with each time bin (window) shifting by Z years forward. This is repeated X times, where X = bin size / Z.

bin_size

Numeric. Size of the time bin (in years)

number_of_shifts

Numeric. Value determining the number of shifts of window used in Moving window method

bin_selection

Character. Setting determining the the process of selection of samples from bins.

  • "first" - sample closest to the beginning of the bin is selected as a representation.

  • "random" - a random sample is selected as a representation.

standardise

Logical. If standardise == TRUE, then standardise each Working Unit to certain number of individuals (using random resampling without repetition)

n_individuals

Numeric. Number of grain to perform standardisation to. The N_individual is automatically adjusted to the smallest number of pollen grains in sequence.

dissimilarity_coefficient

Character. Dissimilarity coefficient. Type of calculation of differences between Working Units. See vegan::vegdist for more details.

  • "euc" - Euclidean distance

  • "euc.sd" - Standardised Euclidean distance

  • "chord" - Chord distance

  • "chisq" - Chi-squared coefficient

  • "gower" - Gower's distance

  • "bray" - Bray-Curtis distance

tranform_to_proportions

Logical. Should the community data be transformed to a proportion during calculations?

rand

Numeric. Number of runs used in randomisation.

use_parallel

Preference of usage of parallel computation of randomisation

  • [value] - selected number of cores

  • TRUE - automatically selected number of cores

  • FALSE - does not use parallel computation (only single core)

interest_threshold

Numeric. Optional. Age, after which all results of RoC are excluded.

time_standardisation

Numeric. Units scaling for result RoC values. For example, if time_standardisation = 100, the RoC will be reported as dissimilarity per 100 yr.

verbose

Logical. If TRUE, function will output messages about internal processes

Details

R-Ratepol is written as an R package and includes a range of possible settings including a novel method to evaluate RoC in a single stratigraphical sequence using assemblage data and age uncertainties for each level. There are multiple built-in dissimilarity coefficients (dissimilarity_coefficient) for different types of assemblage data, and various levels of data smoothing that can be applied depending on the type and variance of the data. In addition, R-Ratepol can use randomisation, accompanied by use of age uncertainties of each level and taxon standardisation to detect RoC patterns in datasets with high data noise or variability (i.e. numerous rapid changes in composition or sedimentation rates).

The computation of RoC in R-Ratepol is performed using the following steps:

  1. Assemblage and age-model data are extracted from the original source and should be compiled together, i.e. depth, age, variable (taxon) 1, variable (taxon) 2, etc.

  2. (optional) Smoothing of assemblage data: Each variable within the assemblage data is smoothed using one of five in-built smoothing methods:

    • none (smooth_method = "none")

    • Shepard's 5-term filter (smooth_method = "shep"; Davis, 1986; Wilkinson, 2005)

    • moving average (smooth_method = "m.avg"})

    • age-weighted average (smooth_method = "age.w")

    • Grimm's smoothing (smooth_method = "grim"; Grimm and Jacobson, 1992)

  3. Creation of time bins: A template for all time bins in all window movements is created.

  4. A single run (an individual loop) is computed:

    • (optional) Selection of one time series from age uncertainties (see section on randomisation)

    • Subsetting levels in each bin: Here the working units (WU) are defined

    • (optional) Standardisation of assemblage data in each WU

    • The summary of a single run is produced based on all moving windows

    • Calculation of RoC between WUs: RoC is calculated as the dissimilarity coefficient (dissimilarity_coefficient) standardised by age differences between WUs. Five in-built dissimilarity coefficients are available:

      • Euclidean distance (dissimilarity_coefficient = "euc")

      • standardised Euclidean distance (dissimilarity_coefficient = "euc.sd")

      • Chord distance (dissimilarity_coefficient = "chord")

      • Chi-squared coefficient (dissimilarity_coefficient = "chisq")

      • Gower's distance (dissimilarity_coefficient = "gower")

      • Bray-Curtis distance (dissimilarity_coefficient = "bray")

  5. Step 4 is repeated multiple times (e.g. 10,000 times).

  6. Validation and summary of results from all runs of RoC calculation are produced.

  7. (Optional) Data beyond a certain age can be excluded.

Selection of working units (WU; Step 3)

RoC is calculated between consecutive Working Units (WU). Traditionally, these WUs represent individual stratigraphical levels. However, changes in sedimentation rates and sampling strategies can result in an uneven temporal distribution of levels within a time sequence, which in turn makes the comparison of RoC between sequences problematic. There are various methods that attempt to minimise such problems. The first is interpolation of levels to evenly spaced time intervals, and the use of the interpolated data as WUs. This can lead to a loss of information when the density of levels is high. Second is binning of levels: assemblage data are pooled into age brackets of various size (i.e. time bins) and these serve as WUs. Here, the issue is a lower resolution of WUs and their uneven size in terms of total assemblage count (bins with more levels have higher assemblage counts). Third is selective binning: like classical binning, bins of selected size are created, but instead of pooling assemblage data together, only one level per time bin is selected as representative of each bin. This results in an even number of WUs in bins with a similar count size in the assemblage. However, the issue of low resolution remains. Therefore, we propose a new method of binning with a moving window, which is a compromise between using individual levels and selective binning. This method follows a simple sequence: time bins are created, levels are selected as in selective binning, and RoC between bins is calculated. However, the brackets of the time bin (window) are then moved forward by a selected amount of time (Z), levels are selected again (subset into bins), and RoC calculated for the new set of WUs. This is repeated X times (where X is the bin size divided by Z) while retaining all the results.

R-Ratepol currently provides several options for selecting WU, namely as i ndividual levels (working_units = "levels"), selective binning of levels (working_units = "bins"), and our new method of binning with a moving window (working_units = "MW")

Randomisation

Due to the inherent statistical errors in uncertainties in the age estimates from age-depth and the assemblage datasets (e.g. pollen counts in each level; Birks and Gordon, 1985), R-Ratepol can be run several times and the results summarised (Steps 5-6). Therefore, two optional settings are available by using age uncertainties and assemblage data standardisation.

Age uncertainties

For each run, a single age sequence from the age uncertainties is randomly selected. The calculation between two consecutive WUs (i.e. one working-unit combination) results in a RoC score and a time position (which is calculated as the mean age position of the two WUs). However, due to random sampling of the age sequence, each WU combination will result in multiple RoC values. The final RoC value for a single WU combination is calculated as the median of the scores from all randomisations. In addition, the 95th quantile from all randomisations is calculated as an error estimate.

Data standardisation (Step 4b)

Taxa in the assemblage dataset can be standardised to a certain count (e.g. number of pollen grains in each WU) by rarefaction. Random sampling without replacement is used to draw a selected number of individuals from each WU (e.g. 150 pollen grains).

References

Birks, H.J.B., Gordon, A.D., 1985. Numerical Methods in Quaternary Pollen Analysis. Academic Press, London.

Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd edn. ed. J. Wiley & Sons, New York.

Grimm, E.C., Jacobson, G.L., 1992. Fossil-pollen evidence for abrupt climate changes during the past 18000 years in eastern North America. Clim. Dyn. 6, 179-184.

Wilkinson, L., 2005. The Grammar of Graphics. Springer-Verlag, New York, USA 37.

Examples

if (FALSE) {
example_data <- RRatepol::example_data

sequence_01 <-
  estimate_roc(
    data_source_community = example_data$pollen_data[[1]],
    data_source_age = example_data$sample_age[[1]],
    age_uncertainty = FALSE,
    smooth_method = "shep",
    working_units = "MW",
    rand = 1e3,
    use_parallel = TRUE,
    dissimilarity_coefficient = "chisq"
  )

plot_roc(
  sequence_01,
  age_threshold = 8e3,
  roc_threshold = 1
)
}