| Type: | Package |
| Title: | An R Package for the Mean Measure of Divergence (MMD) |
| Description: | Offers a graphical user interface for the calculation of the mean measure of divergence, with facilities for trait selection and graphical representations <doi:10.1002/ajpa.23336>. |
| Version: | 4.1.0 |
| Depends: | R (≥ 4.1.0) |
| Imports: | dplyr, MASS, plotrix, rlang, scatterplot3d, shiny, smacof |
| Suggests: | cluster, covr, knitr, rmarkdown, testthat (≥ 2.1.0) |
| License: | CeCILL-2 | file LICENSE |
| Encoding: | UTF-8 |
| URL: | https://gitlab.com/f-santos/anthropmmd/ |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-10-21 11:34:17 UTC; pingouache |
| Author: | Frédéric Santos |
| Maintainer: | Frédéric Santos <frederic.santos@u-bordeaux.fr> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-21 12:20:02 UTC |
An R package for the Mean Measure of Divergence (MMD)
Description
Offers a graphical user interface for the calculation of the meanmeasure of divergence, with facilities for trait selection andgraphical representations.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
Harris, E. F. and Sjøvold, T. (2004) Calculation of Smith's meanmeasure of divergence for intergroup comparisons using nonmetricdata.Dental Anthropology,17(3), 83–93.
Irish, J. (2010) The mean measure of divergence: Its utility inmodel-free and model-bound analyses relative to the Mahalanobis D2distance for nonmetric traits.American Journal of HumanBiology,22, 378–395. doi: 10.1002/ajhb.21010
Nikita, E. (2015) A critical review of the mean measure of divergenceand Mahalanobis distances using artificial data and new approaches tothe estimation of biodistances employing nonmetrictraits.American Journal of Physical Anthropology,157,284–294. doi: 10.1002/ajpa.22708
Santos, F. (2018) AnthropMMD: an R package with a graphical userinterface for the mean measure of divergence.American Journalof Physical Anthropology,165(1), 200–205. doi:10.1002/ajpa.23336
Fidalgo, D., Hubbe, M. and Wesolowski, V. (2021) Population Historyof Brazilian South and Southeast Shellmound Builders Inferred throughDental Morphology.American Journal of Physical Anthropology,176(2), 192–207. doi: 10.1002/ajpa.24342
Examples
## Not run: start_mmd()A toy example dataset for mean measures of divergence, in a table format
Description
This artifical dataset includes 200 individuals described by 9 binarytraits and splitted into 5 groups. To fit with commonly observeddatasets in past sciences, a substantial amount of missing values havebeen added at random on this dataset.
Usage
data(absolute_freqs)Format
A matrix with 10 rows and 9 columns:
Trait1summary statistics for this trait
Trait2summary statistics for this trait
Trait3summary statistics for this trait
Trait4summary statistics for this trait
Trait5summary statistics for this trait
Trait6summary statistics for this trait
Trait7summary statistics for this trait
Trait8summary statistics for this trait
Trait9summary statistics for this trait
Converts a data frame of binary (i.e., presence/absence) traitinformation into a table of sample sizes and frequencies.
Description
This function allows to get a summary of sample sizes and frequenciesfor each trait in each group. It is also mandatory to apply thisfunction before using themmd function, since the latteronly accepts table of frequencies, and cannot work with raw binary data.
Usage
binary_to_table(data, relative = FALSE)Arguments
data | A binary (0/1 for presence/absence of traits) data framewith |
relative | Boolean. Indicates if the last rows ofthe table must contain frequencies (i.e., number of individualshaving a given trait) or relative frequencies (i.e., proportions). |
Value
A matrix with2*K rows (K being the number of groups inthe dataset) andp columns (one per trait).The firstK rows are the sample sizes, the lastK rows aretrait frequencies.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
Santos, F. (2018) AnthropMMD: an R package with a graphical userinterface for the mean measure of divergence.American Journalof Physical Anthropology,165(1), 200–205. doi:10.1002/ajpa.23336
See Also
Examples
## Load and visualize a binary dataset:data(toyMMD)head(toyMMD)## Convert this dataframe into a table of sample sizes and relative frequencies:binary_to_table(toyMMD, relative = TRUE)Compute MMD values from a table of sample sizes and relativefrequencies
Description
Compute various MMD results, typically using a table returned by thefunctionbinary_to_table with the argumentrelative = TRUE.
Usage
mmd(data, angular = c("Anscombe", "Freeman"), correct = TRUE, all.results = TRUE)Arguments
data | A table of sample sizes and frequencies |
angular | Choice of a formula for angular transformation: eitherAnscombe or Freeman-Tukey transformation. |
correct | Boolean; whether to apply the correction for smallsample sizes (should be |
all.results | Boolean; whether to compute all four matricesdescribed below as results. If FALSE, only the matrix |
Value
A list with four components:
MMDMatrix | Following the presentation adopted in many researcharticles, a matrix filled with MMD values above the diagonal, andstandard deviations of MMD below the diagonal. |
MMDSym | A symmetrical matrix of MMD values, where negativevalues are replaced by zeroes. |
MMDSignif | A matrix where any pair of traits having asignificant MMD value is indicated by a star, ‘*’. |
MMDRatio | A matrix filled with MMD values above the diagonal, andratios MMD/sd(MMD) below the diagonal. |
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
de Souza, P. and Houghton, P. (1977). The mean measure of divergenceand the use of non-metric data in the estimation of biologicaldistances.Journal of Archaeological Science,4(2),163–169. doi: 10.1016/0305-4403(77)90063-2
Harris, E. F. and Sjøvold, T. (2004) Calculation of Smith's meanmeasure of divergence for intergroup comparisons using nonmetricdata.Dental Anthropology,17(3), 83–93.
Nikita, E. (2015) A critical review of the mean measure of divergenceand Mahalanobis distances using artificial data and new approaches tothe estimation of biodistances employing nonmetrictraits.American Journal of Physical Anthropology,157,284–294. doi: 10.1002/ajpa.22708
See Also
Examples
## Load and visualize a binary dataset:data(toyMMD)head(toyMMD)## Convert this dataframe into a table of sample sizes and relative## frequencies:tab <- binary_to_table(toyMMD, relative = TRUE)tab## Compute and display a symmetrical matrix of MMD values:mmd_out <- mmd(tab, angular = "Anscombe")mmd_out$MMDSym## Significant MMD values are indicated by a star:mmd_out$MMDSignifImplementation of Fidalgo et al.'s (2022) method of bootstrap for theMean Measure of Divergence
Description
Compute a matrix of MMD dissimilarities among bootstrapped samples ofthe original groups. The input data must be a “raw binarydataset”.
Usage
mmd_boot(data, angular = c("Anscombe", "Freeman"), B = 100, ...)Arguments
data | A “raw binary dataset”, as defined in the man pageof |
angular | Choice of a formula for angular transformation: eitherAnscombe or Freeman-Tukey transformation. |
B | Numeric value: number of bootstrap samples. |
... | Arguments for traits selection, passed to |
Details
This function sticks very close to Fidalgo et al's (2022)implementation. In particular, no correction for small sample sizes isapplied in the MMD formula; see Fidalgo et al's (2021) for therationale.
Note thatonly a “raw binary dataset” is allowed asinput, since the resampling cannot be performed properly from a tableof counts and frequencies.
To get a MDS plot of the dissimilarity matrix obtained with thisfunction, seeplot.anthropmmd_boot.
Value
A symmetrical dissimilarity matrix of MMD values among original groupsand bootstrapped samples. This matrix is an R object of classanthropmmd_boot.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
D. Fidalgo, M. Hubbe and V. Vesolowski (2021). Population history ofBrazilian south and southeast shellmound builders inferred throughdental morphology.American Journal of Physical Anthropology176(2), 192-207.
D. Fidalgo, V. Vesolowski and M. Hubbe (2022). Biological affinitiesof Brazilian pre-colonial coastal communities explored throughboostrapped biodistances of dental non-metric traits.Journal ofArchaeological Science138, 105545.
See Also
Examples
## Not run: ## Load and visualize a raw binary dataset:data(toyMMD)head(toyMMD)## Compute MMD among bootstrapped samples:resboot <- mmd_boot( data = toyMMD, B = 50, # number of bootstrap samples angular = "Anscombe", strategy = "excludeQNPT", # strategy for trait selection k = 10 # minimal number of observations required per trait)## View part of MMD matrix among bootstrapped samples:dim(resboot)print(resboot[1:15, 1:15])## End(Not run)Display a multidimensional scaling (MDS) plot using Fidalgo et al's(2022) bootstrap method for MMD
Description
This function plots a 2D MDS to represent the MMD dissimilaritiesamong the groups compared, after a bootstrap resampling performed withmmd_boot.
Usage
## S3 method for class 'anthropmmd_boot'plot(x, method = c("classical", "interval", "ratio", "ordinal"), level = 0.95, pch = 16, gof = FALSE, xlab = NA, ylab = NA, main = "MDS plot of original and bootstrapped samples", ...)Arguments
x | An object of class |
.
method | Algorithm used for MDS computation; see |
level | Numeric value between 0 and 1, confidence level for thecontour lines displayed after the kernel density estimate. |
pch | Passed to |
gof | Boolean; whether to display goodness of fit statistic onthe plot. |
xlab | Passed to |
ylab | Passed to |
main | Passed to |
... | Other arguments possibly passed to |
Details
In the current implementation, to stick to Fidalgo et al.'s (2022)protocol, this function does not provide as much freedom asplot.anthropmmd_result as concenrs MDS parameters andother analysis options.
Value
This function returns no value by itself, and only plots a MDS in anew device.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
D. Fidalgo, V. Vesolowski and M. Hubbe (2022). Biological affinitiesof Brazilian pre-colonial coastal communities explored throughboostrapped biodistances of dental non-metric traits.Journal ofArchaeological Science138, 105545.
See Also
start_mmd,stats::cmdscale
Examples
## Not run: ## Load and visualize a raw binary dataset:data(toyMMD)head(toyMMD)## Compute MMD among bootstrapped samples:resboot <- mmd_boot( data = toyMMD, B = 50, # number of bootstrap samples angular = "Anscombe", strategy = "excludeQNPT", # strategy for trait selection k = 10 # minimal number of observations required per trait)## MDS plot for bootstrapped samples:plot( x = resboot, method = "interval", # algorithm used for MDS computation level = 0.95 # confidence level for the contour lines)## End(Not run)Display a multidimensional scaling (MDS) plot with the MMDdissimilarities as input
Description
This function plots a 2D or 3D MDS to represent the MMDdissimilarities among the groups compared. Various MDS methods areproposed, and most of them are based on the R packagesmacof.
Usage
## S3 method for class 'anthropmmd_result'plot(x, method = c("classical", "interval", "ratio", "ordinal"),axes = FALSE, gof = FALSE, dim = 2, asp = TRUE, xlim = NULL, ...)Arguments
x | An object of class |
.
method | Specification of MDS type. |
axes | Boolean: should the axes be displayed on the plot? |
gof | Boolean: should goodness of fit statistics be displayed onthe topleft corner of the plot? More details below. |
dim | Numeric value, 2 or 3. Indicates the maximal dimensiondesired for the MDS plot. It should be noted that, even with |
asp | Boolean. If |
xlim | Parameter passed to |
... | Other arguments possibly passed to |
Details
Axes and scale. Making all axes use the same scale isstrongly recommended in all cases (Borg et al., 2013). For a3D-plot, since the third axis carries generally only a very smallpercentage of the total variability, you might want to uncheck thisoption to better visualize the distances along the third axis. Inthis case, the axes scales must be displayed on the plot, otherwisethe plot would be misleading.
Goodness of fit values. (i) For classical metric MDS, acommon statistic is given: the sum of the eigenvalues of the firsttwo axes, divided by the sum of all eigenvalues. It indicates thefraction of the total variance of the data represented in the MDSplot. This statistic comes from the
$GOFvalue returned bythe functionstats::cmdscale. (ii) For SMACOF methods, thestatistic given is the$stressvalue returned by the functionsmacof::smacofSymIt indicates the final stress-1 value. Avalue very close to 0 corresponds to a perfect fit. (iii) For bothapproaches, a 'rho' value is also given, which is the Spearman'scorrelation coefficient between real dissimilarities (i.e., MMDvalues) and distances observed on the MDS plot (Dzemyda etal.,2013). A value very close to 1 indicates a perfect fit.
Value
This function returns no value by itself, and only plots a MDS in anew device.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
G. Dzemyda, O. Kurasova and J. Zilinskas (2013)MultidimensionalData Visualization, Springer, chap. 2, p. 39–40.
I. Borg, P. Groenen and P. Mair (2013)Applied MultidimensionalScaling, Springer, chap. 7, p. 79.
See Also
start_mmd,stats::cmdscale,smacof::smacofSym
Examples
## Load and visualize a binary dataset:data(toyMMD)head(toyMMD)## Convert this dataframe into a table of sample sizes and relative## frequencies:tab <- binary_to_table(toyMMD, relative = TRUE)tab## Compute and display a symmetrical matrix of MMD values:mmd_out <- mmd(tab, angular = "Freeman")## Plot a classical metric MDS in two dimensions:plot(x = mmd_out, method = "classical", axes = TRUE, gof = TRUE, dim = 2)Select a subset of traits meeting certain criteria
Description
This function provides several strategies to discard some uselesstraits (non-polymorphic, non-discriminatory, etc.) upstream the MMDanalysis.
Usage
select_traits(tab, k = 10, strategy = c("none", "excludeNPT","excludeQNPT", "excludeNOMD", "keepFisher"), OMDvalue = NULL, groups,angular = c("Anscombe", "Freeman"))Arguments
tab | A table of sample sizes and frequencies, typically returnedby the function |
k | Numeric value: the required minimal number of individuals pergroup. Any trait that could be taken on fewer individuals in atleast one group will be removed from the dataset. This allows toselect only the traits with a sufficient amount of information ineach group. |
strategy | Strategy for trait selection, i.e. for the removal ofnon-polymorphic traits. The four options are fully described inSantos (2018) and in the help page of |
OMDvalue | To be specified if and only if |
groups | A factor or character vector, indicating the group to beconsidered in the analysis. Since some groups can have a very lowsample size, this will allow to discard those groups in order tofacilitate the trait selection via the argument |
angular | Formula for angular transformation, see Harris andSjøvold (2004). Useful only for the calculation of overall measureof divergence. |
Value
A list with two components:
filtered | The dataset filtered according to the user-definedcriteria. |
OMD | The “overall measure of divergence” foreach trait. |
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
Harris, E. F. and Sjøvold, T. (2004) Calculation of Smith's meanmeasure of divergence for intergroup comparisons using nonmetricdata.Dental Anthropology,17(3), 83–93.
Santos, F. (2018) AnthropMMD: an R package with a graphical userinterface for the mean measure of divergence.American Journalof Physical Anthropology,165(1), 200–205. doi:10.1002/ajpa.23336
See Also
Examples
## Load and visualize a binary dataset:data(toyMMD)head(toyMMD)## Convert this dataframe into a table of sample sizes and## relative frequencies:tab <- binary_to_table(toyMMD, relative = TRUE)tab## Filter this dataset to keep only those traits that have at## least k=10 individuals in each group:select_traits(tab, k = 10)## Only Trait1 is excluded.## Filter this dataset to keep only those traits that have at## least k=11 individuals in each group, and show significant## differences at Fisher's exact test:select_traits(tab, k = 11, strategy = "keepFisher")## Traits 1, 5 and 8 are excluded.An R-Shiny application for the mean measure of divergence
Description
Launches a graphical user interface (GUI) for the calculation of themean measure of divergence.
Usage
start_mmd()StartMMD()Details
The GUI of AnthropMMD is completely autonomous: reading the data fileand specifying the parameters of the analysis are done through theinterface. Once the dataset is loaded, the output reacts dynamically toany change in the analysis settings.
AnthropMMD accepts .CSV or .TXT data files, but does not support.ODS or .XLS(X) files. Two types of data input formats can be used:
A ‘Raw binary dataset’ (one row for each individual,one column for each variable). The first column must be the groupindicator, and the other columns are binary data for the traitsstudied, where 1 indicates the presence of a trait, and 0 itsabsence. Row names are optional for this type of file. An example ofvalid data file can be found as Supporting Information online inSantos (2018).
A ‘Table of n's and absolute frequencies for eachgroup’, i.e. a dataset of sample sizes and absolutefrequencies. This type of dataset has
2 \times Krows(Kbeing the number of groups compared) andpcolumns (pbeing the number of traits studied). The firstKlines must be the group n's for each trait, and the lastKlines are absolute frequencies for each trait (i.e. thenumber of times the trait is present). Row names are mandatory forthis type of file. The firstKrows must be labelled withnames beginning with ‘N_’, such as: N_GroupA, N_GroupB, ...,N_GroupK. The lastKrows should be labelled with namesbeginning with ‘Freq_’, such as: Freq_GroupA, ...,Freq_GroupK. An example of valid data file can be found asSupporting Information online in Santos (2018).
For both data types, column names are strongly recommended for betterinterpretability of the results.
One can choose between Anscombe or Freeman-Tukey formula forangular transformation (cf. Harris and Sjøvold 2004; Irish 2010).
‘Only retain the traits with this minimal number ofindividuals per group’: the traits with fewer individuals in at leastone active group will not be considered in the analysis.
‘Exclusion strategy’: a careful selection of traits iscrucial when using MMD (cf. Harris and Sjøvold 2004 for a completeexplanation), and the user should probably “exclude the traitsthat are nondiscriminatory across groups” (Irish 2010).
‘Exclude nonpolymorphic traits’ removes all the traitsshowing no variability at all, i.e. with the same value (‘0’or ‘1’) for all individuals.
‘Exclude quasi-nonpolymorphic traits’ also removes thetraits whose variability is only due to a single individual: forexample, a trait with only one positive observation in the wholedataset.
‘Use Fisher's exact test’ implements the advice givenby Harris and Sjøvold (2004) to select contributory traits, definedas those “showing a statistically significant differencebetween at least one pair of the groups being evaluated”. Fisher'sexact tests are performed for each pair of groups, and the traitsshowing no intergroup difference at all are excluded. Note that ifyou have a large number of groups (say, 10 groups), a trait withstrictly equal frequencies for the last 8 groups may be consideredas useful according to this criterion if there is a significantdifference for the first two groups. This criterion will select alltraits that can be useful for a given pair of groups, even if theyare nondiscriminatory for all the other ones.
‘Exclude traits with overall MD’ lower than a giventhreshold: it is a simple way of removing the traits with quite similarfrequencies across groups (the ‘overall MD’ is defined as the sumof the variable's measures of divergence over all pairs of groups). Thiscriterion aims to select the traits whose frequency differssubstantially across most or all groups.
These four options are designed to avoid negative MMD values.
Some groups/populations can be manually excluded from theanalysis. This may be useful if very few individuals belonging to agiven population could be recorded for the variables retained by thecriteria described above.
A MDS plot and a hierarchical clustering, done using MMDdissimilarities as inputs, are displayed in the last two tabs. As MMDcan sometimes be negative, those negatives values are replaced byzeros, so that the MMD matrix can be seen as a symmetrical distancematrix. Please note that the classical two-dimensional metric MDS plotcannot be displayed if there is only one positive eigenvalue. SeveralMDS options are proposed, cf. the help page of the
smacofSymfunction from the R packagesmacoffor detailed technicalinformation.
Value
The function returns no value by itself, but all results can beindividually downloaded through the graphical interface.
The ‘true’ MMD values (i.e., which can be negative inthe case of small samples with similar traits frequencies, cf. Irish2010) and their standard deviations are presented in the matrixlabelled ‘MMD values (upper triangular part) and associatedSD values (lower triangular part)’.
A MMD value can be considered as significant if it is greaterthan twice its standard deviation. Significance is assessed inanother ad-hoc table of results.
The negative MMD values, if any, are replaced by zeros in the‘Symmetrical matrix of MMD values’.
Note
The R console is not available when the GUI is active. To exit theGUI, type Echap (on MS Windows systems) or Ctrl+C (on Linux systems)in the R console.
On 14-inch (or smaller) screens, for convenience, it may be necessaryto decrease the zoom level of your web browser and/or to turn onfullscreen mode.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
References
Harris, E. F. and Sjøvold, T. (2004) Calculation of Smith's meanmeasure of divergence for intergroup comparisons using nonmetricdata.Dental Anthropology,17(3), 83–93.
Irish, J. (2010) The mean measure of divergence: Its utility inmodel-free and model-bound analyses relative to the Mahalanobis D2distance for nonmetric traits.American Journal of HumanBiology,22, 378–395. doi: 10.1002/ajhb.21010
Nikita, E. (2015) A critical review of the mean measure of divergenceand Mahalanobis distances using artificial data and new approaches tothe estimation of biodistances employing nonmetrictraits.American Journal of Physical Anthropology,157,284–294. doi: 10.1002/ajpa.22708
Santos, F. (2018) AnthropMMD: an R package with a graphical userinterface for the mean measure of divergence.American Journalof Physical Anthropology,165(1), 200–205. doi:10.1002/ajpa.23336
Examples
## An example of valid binary dataset:data(toyMMD)head(toyMMD)## An example of valid table:data(absolute_freqs)absolute_freqs## Launch the GUI:## Not run: start_mmd()Converts a table of sample sizes and frequencies into a table ofsample sizes and relative frequencies.
Description
Mostly used as an internal function, but could also be convenient totransform frequencies (i.e., number of individuals having a giventrait) into relative frequencies (i.e., proportions).
Usage
table_relfreq(tab)Arguments
tab | A table of sample sizes and frequencies, such as the tablesreturned by the function |
Value
The lastK rows (K being the number of groups) oftab are simply transformed to relative frequencies.
Author(s)
Frédéric Santos,frederic.santos@u-bordeaux.fr
See Also
Examples
## Load and visualize a binary dataset:data(toyMMD)head(toyMMD)## Convert this dataframe into a table of sample sizes and frequencies:tab <- binary_to_table(toyMMD, relative = FALSE)tab## Convert this table into relative frequencies:table_relfreq(tab)A toy example dataset for mean measures of divergence, in a binaryformat
Description
This artifical dataset includes 200 individuals described by 9 binarytraits and splitted into 5 groups. To fit with commonly observeddatasets in past sciences, a substantial amount of missing values havebeen added at random on this dataset.
Usage
data(toyMMD)Format
A data frame with 200 observations on the following 10 variables:
Groupa factor with 5 levels (group indicator)
Trait1a numeric vector of zeroes and ones
Trait2a numeric vector of zeroes and ones
Trait3a numeric vector of zeroes and ones
Trait4a numeric vector of zeroes and ones
Trait5a numeric vector of zeroes and ones
Trait6a numeric vector of zeroes and ones
Trait7a numeric vector of zeroes and ones
Trait8a numeric vector of zeroes and ones
Trait9a numeric vector of zeroes and ones