Compares a list of annotations to another ontology and suggests the best match based on some similarity metric (n-grams). It is also possible to align one ontology to another. Accepts ontologies in both OBO and OWL formats as well as MeSH ASCII and OMIM txt.
The script runs non-interactively and the results have to be manually inspected, although it can be expected that anything with a similarity score higher than ~80-90% will be a valid match.
similarity_match.pl (-w owlfile || -o obofile || -m meshfile || -i omimfile) -t targetfile -r resultfile [--obotarget || --owltarget]
Optional '--obotarget' setting specifies that the target file is an OBO ontology. Optional '--owltarget' setting specifies that the target file is an OWL ontology.
owlfile, obofile, meshfile are ontologies in OWL, OBO and MeSH ASCII formats. Only a single file needs to be specified.
The script expects a single column text file with no headears.
The script will produce a single tab-delimited file as set with the -r flag. The file will have four headers:
Accession of the term from the targetfile if the file was an ontology, otherwise OE_VALUE repeated.
Annotation from the supplied targetfile or a term label if the file was an ontology.
Term label that was matched based on the highest similarity from the supplied onotlogy file.
Accession of the ontology term that provided the best match.
Similarity score of ONTOLOGY_TERM compared to OE_VALUE. This is the Levenshtein distance normalised by OE_VALUE length expressed in %. Higher is better.
Normalises labels and synonyms in the target hash. These are stored in extra annotations on the hash, so that the original value is preserved for display.
Checks the input data, e.g. removing empty lines or warning of duplicates
Normalises a string by changing it lowercase and splitting into 2-grams.
Aligns the two data structures targetfile and ontology. Outputs the results into a file.
Custom MeSH parser for the MeSH ASCII format.
Custom OMIM parser.
Custom flat file parser.
Splits and joins the columns of a flat file. The first column is assigned to the first element. Concatenates the ragged end (leftover columns) into the second element or returns undef for a one-column file.
Custom OBO parser.
Custom OWL parser.
A wrapper around the calculate_distance function. Specifies the similarity metric to be used, in this case Text::LevenshteinXS::distance.
Outputs a single line in the output file.
Finds the best match for the supplied term in the ontology using the supplied anonymous distance function defined in find_match().
Tomasz Adamusiak <firstname.lastname@example.org>