Identification

Input (2 files):

Input files may contain various kinds of data.

  • RAW COORDINATES of anatomical LANDMARKS
  • RAW COORDINATES of PSEUDOlandmarks (outlines)
  • GLOBAL SIZE (centroid size, perimeter, square root area)
  • TABLE OF TRADITIONAL MENSURATIONS.

Whatever the kind of input data, you will be asked to enter them in two separate files:

  • UNKNOWN:
    • The file containing the data to be tentatively identified
  • REFERENCE:
    • The file containing all the reference data; they must be organized as successive groups in a single matrix, so that you must also provide the
    • subdivision argument: it refers to the successive groups of the reference matrix

Unknown and Reference data must have the same kind of variables in the same order (the same number of columns)

Analyses: identification statistics may use three approaches,

  • Mahalanobis distances: the unknown are “projected” into the discriminant space computed on the reference matrix only
  • Maximum likelihood method: see
    • Jean-Pierre Dujardin, Sebastien Dujardin, Dramane Kaba, Soledad Santillan-Guayasamin, Anita G. Villacis, Sitha Piyaselakul, Suchada Sumruayphol, Ronald Morales Vargas. 2017. The maximum likelihood identification method applied to insect morphometric data. Zoological Systematics (2017), DOI: 10.11865/zs.2017
  • Artificial Neural network (ANN, using the Multilayer Perceptron) Requires an existing and adequate weights file.
    • Please note that up to now there is no automated procedure available to process different kinds of data. Please read here (landmarks) and there (pseudolandmarks) the steps you should implement by yourself to be able to apply ANN to your data.

Output:

  • A report is issued indicating the tentative assignation of each individual to a given group of reference.

IDENTIFICATION section of XYOM: Maximum Likelihood or Mahalanobis distances methods:

According to the kind of variable announced by the user, XYOM will automatically perform some data transformation as described hereunder. 

  • In case of RAW COORDINATES of anatomical LANDMARKS
    • Concatenating raw landmarks of unknown and reference individuals
    • GPA (Generalised Procrustes Analysis) on these concatenated data
    • PCA on resulting ORP (orthogonal projections, also called Procrustes residuals)
    • Splitting the file of PCs of shape into two files, one containing the PCs corresponding to the unknown individuals and one containing the PCs corresponding to the reference individuals.
  • In case of RAW COORDINATES of PSEUDOlandmarks (outlines)
    • EFA (Elliptic Fourier Analysis) separately on UNKNOWN and REFERENCE data.
      • For the EFA, it is not mandatory to concatenate unknown and reference data into one file for subsequent comparisons. Indeed, in the process of size and shape separation, each individual is treated independently of the remaining individuals. In the EFA, shape generation (NEF) does not need to use the consensus of the data (as in the GPA)
    • Removing the three first columns of NEFs (Normalised Elliptic Fourier ‘s coefficients) on both UNKNOWN and REFERENCE data
      • For each individual, the three first coefficients are “degenerated” in the process of rotation and size correction.
    • Taking a number of NEF identical for each individual (same number of columns by rows), using as a maximum the number of columns of the individual described by the minimum number of coefficients, either an unknown or a reference specimen.
      • This step might reduce the power of shape reconstruction for the remaining individuals, but is mandatory to perform identification statistics.
    • Concatenating NEFs of unknown and reference specimens (MISCELLANEOUS, Working on data files, Concatenate, By Rows)
      • This merging of the two files (Unknown NEF and Reference NEF) is made necessary because of the next step: the Principal Component Analysis (PCA).
    • PCA on the merged NEF data
    • Splitting the file of PCs of shape into two files, one containing the PCs corresponding to the unknown individuals and one containing the PCs corresponding to the reference individuals.
  • In case of TRADITIONAL MENSURATIONS (could be also a Table of NEFs)
    • Concatenating unknown and reference data
    • PCA on these data
    • Splitting the file of PCs into two files, one containing the PCs corresponding to the unknown individuals and one containing the PCs corresponding to the reference individuals.

IDENTIFICATION section of XYOM: Artificial Neural Network (ANN) method

  • RAW COORDINATES of anatomical LANDMARKS not allowed
  • RAW COORDINATES of anatomical PSEUDO LANDMARKS not allowed
  • Normalized Elliptic Fourier coefficients (NEF) allowed only under some conditions
  • For any other kind of data, please remember that a previous learning step is mandatory at the MACHINE LEARNING section of XYOM.