Compares a list of annotations to another ontology and suggests the best match based on the EBI::FGPT::FuzzyRecogniser module. It is also possible to align one ontology to another. Accepts ontologies in both OBO and OWL formats as well as MeSH ASCII and OMIM txt.
The script runs non-interactively and the results have to be manually inspected, although it can be expected that anything with a similarity score higher than ~80-90% will be a valid match.
similarity_match.pl (-w owlfile || -o obofile || -m meshfile || -i omimfile) -t targetfile -r resultfile [--obotarget || --owltarget]
Optional '--obotarget' setting specifies that the target file is an OBO ontology. Optional '--owltarget' setting specifies that the target file is an OWL ontology.
owlfile, obofile, meshfile, omimfile are ontologies in OWL, OBO, MeSH ASCII and OMIM formats respectively Only a single file needs to be specified.
The script expects a tab-delimited text file with headers. Only the first column will be used for matching. All other columns will be preserved in the output.
The script will produce a single tab-delimited file as set with the -r flag. The file will have four additional headers
Accession of the source term if target file was an ontology.
Label of the source term if target file was an ontology.
The annotation (label or synoym if target file was an ontology) that was matched based on the highest similarity against the supplied ontology file
Accession of the matched term that provided the best match.
Matched term's label.
The actual term's annotation (label or synoym) that was matched based on the highest similarity from the supplied ontology file.
Similarity score of the two matched terms normalised by lenght of the longer of the two strings and expressed in %. Higher is better.
Aligns the two data structures targetfile and ontology. Outputs the results into a file.
Custom flat file parser.
Splits and joins the columns of a flat file. The first column is assigned to the first element. Concatenates the ragged end (leftover columns) into the second element or returns undef for a one-column file.
Emma Hastings <firstname.lastname@example.org>
Tomasz Adamusiak <email@example.com>