A function to estimate Rate of change in community data in time series
Usage
estimate_roc(
data_source_community,
data_source_age,
age_uncertainty = NULL,
smooth_method = c("none", "m.avg", "grim", "age.w", "shep"),
smooth_n_points = 5,
smooth_age_range = 500,
smooth_n_max = 9,
working_units = c("levels", "bins", "MW"),
bin_size = 500,
number_of_shifts = 5,
bin_selection = c("random", "first"),
standardise = FALSE,
n_individuals = 150,
dissimilarity_coefficient = c("euc", "euc.sd", "chord", "chisq", "gower", "bray"),
tranform_to_proportions = TRUE,
rand = NULL,
use_parallel = FALSE,
interest_threshold = NULL,
time_standardisation = NULL,
verbose = FALSE
)
Arguments
- data_source_community
Data.frame. Community data with species as columns and levels (samples) as rows. First column should be
sample_id
(character).- data_source_age
Data.frame with two columns:
sample_id
- unique ID of each level (character)age
- age of level (numeric)
- age_uncertainty
Usage of age uncertainty form Age-depth models. Either:
matrix with number of columns as number of samples. Each column is one sample, each row is one age sequence from age-depth model. Age sequence is randomly sampled from age-depth model uncertainties at the beginning of each run.
NULL
- Age uncertainties are not available and, therefore, will not be used.
- smooth_method
Character. type of smoothing applied for the each of the pollen type
"none"
- Pollen data is not smoothed"m.avg"
- Moving average"grim"
- Grimm's smoothing"age.w""
- Age-weighted average"shep"
- Shepard's 5-term filter
- smooth_n_points
Numeric. Number of points for used for moving average, Grimm and Age-Weighted smoothing (odd number)
- smooth_age_range
Numeric. Maximal age range for both Grimm and Age-weight smoothing
- smooth_n_max
Numeric. Maximal number of samples to look in Grimm smoothing
- working_units
Character. Selection of units that the dissimilarity_coefficient will be calculated between.
"levels"
- individual levels are going to be used"bins"
- samples in predefined bins will be pooled together and one sample will be selected from each time bin as a representation."MW"
- Bins of selected size are created, starting from the beginning of the core. This is repeated many times, with each time bin (window) shifting by Z years forward. This is repeated X times, where X = bin size / Z.
- bin_size
Numeric. Size of the time bin (in years)
- number_of_shifts
Numeric. Value determining the number of shifts of window used in Moving window method
- bin_selection
Character. Setting determining the the process of selection of samples from bins.
"first"
- sample closest to the beginning of the bin is selected as a representation."random"
- a random sample is selected as a representation.
- standardise
Logical. If
standardise
==TRUE
, then standardise each Working Unit to certain number of individuals (using random resampling without repetition)- n_individuals
Numeric. Number of grain to perform standardisation to. The
N_individual
is automatically adjusted to the smallest number of pollen grains in sequence.- dissimilarity_coefficient
Character. Dissimilarity coefficient. Type of calculation of differences between Working Units. See
vegan::vegdist
for more details."euc"
- Euclidean distance"euc.sd"
- Standardised Euclidean distance"chord"
- Chord distance"chisq"
- Chi-squared coefficient"gower"
- Gower's distance"bray"
- Bray-Curtis distance
- tranform_to_proportions
Logical. Should the community data be transformed to a proportion during calculations?
- rand
Numeric. Number of runs used in randomisation.
- use_parallel
Preference of usage of parallel computation of randomisation
[value]
- selected number of coresTRUE
- automatically selected number of coresFALSE
- does not use parallel computation (only single core)
- interest_threshold
Numeric. Optional. Age, after which all results of RoC are excluded.
- time_standardisation
Numeric. Units scaling for result RoC values. For example, if
time_standardisation
= 100, the RoC will be reported as dissimilarity per 100 yr.- verbose
Logical. If
TRUE
, function will output messages about internal processes
Details
R-Ratepol is written as an R package and includes a range of possible settings including a novel method to evaluate RoC in a single stratigraphical sequence using assemblage data and age uncertainties for each level. There are multiple built-in dissimilarity coefficients (dissimilarity_coefficient) for different types of assemblage data, and various levels of data smoothing that can be applied depending on the type and variance of the data. In addition, R-Ratepol can use randomisation, accompanied by use of age uncertainties of each level and taxon standardisation to detect RoC patterns in datasets with high data noise or variability (i.e. numerous rapid changes in composition or sedimentation rates).
The computation of RoC in R-Ratepol is performed using the following steps:
Assemblage and age-model data are extracted from the original source and should be compiled together, i.e. depth, age, variable (taxon) 1, variable (taxon) 2, etc.
(optional) Smoothing of assemblage data: Each variable within the assemblage data is smoothed using one of five in-built smoothing methods:
none (
smooth_method
="none"
)Shepard's 5-term filter (
smooth_method
="shep"
; Davis, 1986; Wilkinson, 2005)moving average (
smooth_method
="m.avg"}
)age-weighted average (
smooth_method
="age.w"
)Grimm's smoothing (
smooth_method
="grim"
; Grimm and Jacobson, 1992)
Creation of time bins: A template for all time bins in all window movements is created.
A single run (an individual loop) is computed:
(optional) Selection of one time series from age uncertainties (see section on randomisation)
Subsetting levels in each bin: Here the working units (WU) are defined
(optional) Standardisation of assemblage data in each WU
The summary of a single run is produced based on all moving windows
Calculation of RoC between WUs: RoC is calculated as the dissimilarity coefficient (dissimilarity_coefficient) standardised by age differences between WUs. Five in-built dissimilarity coefficients are available:
Euclidean distance (
dissimilarity_coefficient
="euc"
)standardised Euclidean distance (
dissimilarity_coefficient
="euc.sd"
)Chord distance (
dissimilarity_coefficient
="chord"
)Chi-squared coefficient (
dissimilarity_coefficient
="chisq"
)Gower's distance (
dissimilarity_coefficient
="gower"
)Bray-Curtis distance (
dissimilarity_coefficient
="bray"
)
Step 4 is repeated multiple times (e.g. 10,000 times).
Validation and summary of results from all runs of RoC calculation are produced.
(Optional) Data beyond a certain age can be excluded.
Selection of working units (WU; Step 3)
RoC is calculated between consecutive Working Units (WU). Traditionally, these WUs represent individual stratigraphical levels. However, changes in sedimentation rates and sampling strategies can result in an uneven temporal distribution of levels within a time sequence, which in turn makes the comparison of RoC between sequences problematic. There are various methods that attempt to minimise such problems. The first is interpolation of levels to evenly spaced time intervals, and the use of the interpolated data as WUs. This can lead to a loss of information when the density of levels is high. Second is binning of levels: assemblage data are pooled into age brackets of various size (i.e. time bins) and these serve as WUs. Here, the issue is a lower resolution of WUs and their uneven size in terms of total assemblage count (bins with more levels have higher assemblage counts). Third is selective binning: like classical binning, bins of selected size are created, but instead of pooling assemblage data together, only one level per time bin is selected as representative of each bin. This results in an even number of WUs in bins with a similar count size in the assemblage. However, the issue of low resolution remains. Therefore, we propose a new method of binning with a moving window, which is a compromise between using individual levels and selective binning. This method follows a simple sequence: time bins are created, levels are selected as in selective binning, and RoC between bins is calculated. However, the brackets of the time bin (window) are then moved forward by a selected amount of time (Z), levels are selected again (subset into bins), and RoC calculated for the new set of WUs. This is repeated X times (where X is the bin size divided by Z) while retaining all the results.
R-Ratepol currently provides several options for selecting WU, namely as i
ndividual levels (working_units
= "levels"
), selective binning of levels
(working_units
= "bins"
), and our new method of binning with a moving
window (working_units
= "MW"
)
Randomisation
Due to the inherent statistical errors in uncertainties in the age estimates from age-depth and the assemblage datasets (e.g. pollen counts in each level; Birks and Gordon, 1985), R-Ratepol can be run several times and the results summarised (Steps 5-6). Therefore, two optional settings are available by using age uncertainties and assemblage data standardisation.
Age uncertainties
For each run, a single age sequence from the age uncertainties is randomly selected. The calculation between two consecutive WUs (i.e. one working-unit combination) results in a RoC score and a time position (which is calculated as the mean age position of the two WUs). However, due to random sampling of the age sequence, each WU combination will result in multiple RoC values. The final RoC value for a single WU combination is calculated as the median of the scores from all randomisations. In addition, the 95th quantile from all randomisations is calculated as an error estimate.
References
Birks, H.J.B., Gordon, A.D., 1985. Numerical Methods in Quaternary Pollen Analysis. Academic Press, London.
Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd edn. ed. J. Wiley & Sons, New York.
Grimm, E.C., Jacobson, G.L., 1992. Fossil-pollen evidence for abrupt climate changes during the past 18000 years in eastern North America. Clim. Dyn. 6, 179-184.
Wilkinson, L., 2005. The Grammar of Graphics. Springer-Verlag, New York, USA 37.
Examples
if (FALSE) {
example_data <- RRatepol::example_data
sequence_01 <-
estimate_roc(
data_source_community = example_data$pollen_data[[1]],
data_source_age = example_data$sample_age[[1]],
age_uncertainty = FALSE,
smooth_method = "shep",
working_units = "MW",
rand = 1e3,
use_parallel = TRUE,
dissimilarity_coefficient = "chisq"
)
plot_roc(
sequence_01,
age_threshold = 8e3,
roc_threshold = 1
)
}