This is an example about how to use the CLIC method to identify or characterize your own specimens. To help you get the best of the CLIC method in a short time, I will give you an example. Except for the downloading of a few images from the CLIC page (Step 1), and for the digitizing step (Step 2), it is a fast procedure longer to read than to apply.
==============================================
You want to identify by geometric morphometrics a female single specimen of Bactrocera tau collected in Thailand where you know at least two cryptic species occur: A and C (Sangvorn & Dujardin, 2009).
The steps presented here may serve as practical guidelines to avoid possible waist of computer time. Five main steps are necessary, and at least 4 new folders would be created:
Step 2 - DIGITIZING the IMAGES (COO)
Step 3 - PREPARATION of the TEST FOLDER
Step 4 - Creation of the REFERENCE file (TET)
Step 5 - IDENTIFICATION TESTS (MOG)
==============================================
Step 1 . DOWNLOADING REFERENCE IMAGES
==============================================
To know to which cryptic species belongs your unknown female B. tau, you need reference images of female B. tau "A" and female B. tau "C".
1/1. To get these reference images, you go to the CLIC web page http://www.mpl.ird.fr/morphometrics/clic, and you download the following files:
- Bactrocera_tau_A_F.tar.gz
- Bactrocera_tau_C_F.tar.gz
1/2. After downloading the files, you decompact them with either Winzip (Windows system) or tar, rar etc. (Linux system). Please decompact the files one by one, and think about changing the names to fit your style. For example
- Bactrocera_tau_A_F.tar.gz : its decompaction produces the ... folder (full of images), you rename it as A
- Bactrocera_tau_C_F.tar.gz : its decompaction produces the ... folder (full of images), you rename it as C
So, here are the first two folders you created: A and C. They are full of reference images.
==============================================
Step 2 . DIGITIZING IMAGES, use of the COO module
==============================================
The reference images obtained from step 1 will allow you to get the reference coordinates. You need a software to collect coordinates.
2/1. You open the module COO of the CLIC suite, and start to work on the images of the A folder.
After completing digitization of all the images of the A folder, 3 new ASCII files appeared automatically in the A folder. Their names use the current date. Thus, if today is September 7, then the files have the following names (**):
coord_Sept7.txt
It is located inside the A folder, it contains the coordinates of each image of the A folder, as well as the informations you entered during digitization.
coord_Sept7_DB.txt
It is the previous file arranged in such a way you can copy-paste it into a spreadsheet, or import it into a database.
coord_Sept7_format.txt
It is the previous file excluding meta-information, with the first row containing a comment you can modify later, and the remaining rows containing the raw coordinates. This is the file format which must be used as input data for the MOG module.
2/2. You open the module COO of the CLIC suite, and start to work on digitizing the images of the C folder (reference of B. tau C species). After completing collecting the landmarks of all the images of the C folder, you will also end with the new files having same names if today is still September 7:
coord_Sept7.txt
coord_Sept7_format.txt
coord_Sept7_DB.txt
They are located inside the C folder.
2/3. You need to produce the coordinates of the unknow female B. tau specimen, the one you want to identify. You then create the folder Btau_UNKN. The image of your unknown specimen is put in the Btau_UNKN folder. You then open the COO module again, you digitize your specimen and close the COO module. Let us say that today is the 8 of September; the files produced are then:
coord_Sept8.txt ,
coord_Sept8_format.txt ,
coord_Sept8_DB.txt ,
These files are located in the Btau_UNKN folder.
So, now you have three recently created folders: folder A, folder C and folder Btau_UNKN. They contain images and coordinate files.
==============================================
Step 3 . PREPARATION OF THE TEST FOLDER
==============================================
Be careful: the folders A and C you just created contain different reference images, but they contain text files having the same names (see previous steps).
3/1. In the Btau_TEST folder you now copy two files: one file from the folder A, one from the folder C.
- From the A folder: coord_Sept7_format.txt as Btau_TEST / coord_a_format.txt
and
- From the C folder: coord_Sept7_format.txt as Btau_TEST / coord_a_format.txt
3/2. Please copy also the coordinates of the unknown specimen (Btau_UNKN/ coord_Sept8_format.txt) to the Btau_TEST folder under a NEW NAME. For instance,
- From the Btau_UNKN folder: coord_Sept8_format.txt as Btau_TEST/coord_unkn_format.txt
3/3. (*) You have now 4 folders: folder A, folder C, folder Btau_TEST and folder Btau_UNKN , containing:
folder A : images + coord_Sept7.txt , coord_Sept7_DB.txt , coord_Sept7_format.txt
folder C : images + coord_Sept7.txt , coord_Sept7_DB.txt , coord_Sept7_format.txt
folder Btau_UNKN : image(s) + coord_Sept8.txt , coord_Sept8_DB.txt , coord_Sept8_format.txt
folder Btau_TEST : coord_a_format.txt , coord_a_format.txt , coord_unkn_format.txt
Folders A an C contain files with the same names since you worked on the image references the same day (September 7).
==============================================
Step 4 - Creation of the total REFERENCE file, use of TET
==============================================
4/1. If you followed previous steps carefully, everything is ready now to create within the Btau_TEST folder the reference file of all the reference coordinates. The latter is the concatenation of coord_a_format.txt and coord_a_format.txt. Concatenation is a job for TET. You open TET,
you load coord_a_format.txt ,you ask for row concatenation and
you load coord_a_format.txt .You then push the bottom right hand red button allowing you to save the concatenated megafile under a NEW NAME, say:
a_c.txt
4/2. The following files are now present in the Btau_TEST folder:
coord_a_format.txt
coord_a_format.txt
coord_unkn_format.txt
a_c.txt
a_c_log.txt
We have reached the end of the most important steps. The MOG module has now everything to allow your identification test.
==============================================
Step 5 - IDENTIFICATION TESTS, use of MOG
==============================================
In the example at hand (see previous steps), the identification step is completely performed by one module, MOG, and from files of the same folder, the Btau_TEST folder.
With MOG, please open the " a_c.txt " file containing the coordinates of the reference images. The MOG module allows:
# 1. Obtaining centroid sizes (CS) and shape variables (Residual coordinates, Procrustes residuals, PW, RW).
Files generated will be :
a_c_CS.txt
a_c_ALIGNED.txt
a_c_PrRes.txt
a_c_PW
a_c_RW
At this stage, you can perform a PCA on Procrustes Residuals. File generated will be
a_c_PrCp
# 2. The introduction of your own, unknown specimens as external data (button EXT/UNKN).
* After entering the external data file coord_unkn_format.txt files generated will be (this step is rather slow):
a_c_PW_base
a_c_PW_unkn
a_c_PW_base_unkn
These files contain the the PW computed on the grand total, i.e. the reference and the unknown specimen.
# 3. Identifying/classifying "unknown" specimen(s)
There are two ways of classification, one based on the Procrustes distances, the other one,more powerfull but requiring more sample sizes (***), on the Mahalanobis distances.
* Procrustes classification. After the PW on the grand total have been computed, a first classification of your unknown specimen(s) will be automatically performed on the basis of Procrustes distances. The process is rather slow. It will use two algorithms, one based on the shortest Procrustes distance to each consensus (each reference species), and another one based on the K neirest neighbors method (KNN). They do not necessarily agree completely, and the next approach (DA and Mahalanobis classification) is preferred.
* Mahalanobis classification. This classification does not start automaticaly. After the Procrustes classification, you must ask for a discriminant analysis (DA). You will not have the choice of the input file: the DA will use as input the PW relative to the reference images (a_c_PW_base). You will then be allowed to call your unknown specimen as supplementary data (yes, another button EXT/UNKN ): the program will use the PW of the a_c_PW_unkn as supplementary data.
* Thus, the EXT/UNKN button of the DA will automatically open the file containing the PW relative to the unknown specimen(s) and compute their position in the discriminant space obtained from the PW of the reference specimens. The classification algorithm is the one based on the shortest Mahalanobis distance to each consensus.
========================= Back to =================
(*) You can leave here the CLIC atmosphere and go to your preferred software since the coord_Sept7.txt from the A folder, the coord_Sept7.txt from the C folder and the coord_Sept8.txt are in the TPS format, usable for TPS and some other scripts (see http:://www.edu.).
(**) The .._format.txt and ..._DB.txt files can be obtained also from the coord_... file by using the TET module. You open TET, from there you open the raw coordinates file and ask for the first transformation, called
.tps file (from TPSdig or COO) >> ..._format.txt + _DB.txt
(**) More sample sizes refers here to the reference specimens, since the discriminant model is contructed from them.