Identification

Input (2 files):

Input files may contain various kinds of data.

  • RAW COORDINATES of anatomical LANDMARKS
  • RAW COORDINATES of PSEUDOlandmarks (outlines)
  • GLOBAL SIZE (centroid size, perimeter, square root area)
  • TABLE OF TRADITIONAL MENSURATIONS.

Whatever the kind of input data, you will be asked to enter them in two separate files:

  • UNKNOWN:
    • The file containing the data to be tentatively identified
  • REFERENCE:
    • The file containing all the reference data; they must be organized as successive groups in a single matrix, so that you must also provide the
    • subdivision argument: it refers to the successive groups of the reference matrix

Unknown and Reference data must have the same kind of variables in the same order (the same number of columns)

Analyses: identification statistics may use three approaches,

  • Mahalanobis distances: the unknown are “projected” into the discriminant space computed on the reference matrix only
  • Maximum likelihood method: see
    • Jean-Pierre Dujardin, Sebastien Dujardin, Dramane Kaba, Soledad Santillan-Guayasamin, Anita G. Villacis, Sitha Piyaselakul, Suchada Sumruayphol, Ronald Morales Vargas. 2017. The maximum likelihood identification method applied to insect morphometric data. Zoological Systematics (2017), DOI: 10.11865/zs.2017
  • Single Layer Perceptron: May be convenient for reference data containing only two groups (cfr. the SUBDIVISION argument). The output is [0,1] or[1,0])
  • Neural network (Multilayer Perceptron) For reference data containing two or more than two groups (cfr. the SUBDIVISION argument). The output is for instance [0,0,1] or [0,1,0] or [1,0,0], if the reference matrix contains three groups.

Output:

  • A report is issued indicating the tentative assignation of each individual to a given group of reference.

Detailed processing automatically engaged by XYOM on input data:

  • RAW COORDINATES of anatomical LANDMARKS
    • Concatenating raw landmarks of unknown and reference individuals
    • GPA (Generalised Procrustes Analysis) on these data
    • PCA on resulting ORP (orthogonal projections)
  • RAW COORDINATES of PSEUDOlandmarks (outlines)
    • EFA (Elliptic Fourier Analysis) separately on UNKNOWN and REFERENCE data.
      • For the EFA, it is not mandatory to concatenate unknown and reference data into one file for subsequent comparisons. Indeed, in the process of size and shape separation, each individual is treated independently of the remaining individuals. In the EFA, shape generation does not need to use the consensus of the data (as in the GPA)
    • Removing the three first columns of NEFs (Normalised Elliptic Fourier ‘s coefficients) on both UNKNOWN and REFERENCE data
      • For each individual, the three first coefficients are “degenerated” in the process of rotation and size correction.
    • Taking a number of NEF identical for each individual (same number of columns by rows), using as a maximum the number of columns of the individual described by the minimum number of coefficients, either an unknown or a reference specimen.
      • This step might reduce the power of shape reconstruction for the remaining individuals, but is mandatory to perform identification statistics.
    • Concatenating NEFs of unknown and reference specimens
      • This merging of the two files (Unknown NEF and Reference NEF) is made necessary because of the next step: the Principal Component Analysis (PCA).
    • PCA on the merged NEF data
  • TABLE OF TRADITIONAL MENSURATIONS.
    • Concatenating unknown and reference data
    • PCA on these data