NAME

similarity_match.pl

SYNOPSIS

Compares a list of annotations to another ontology and suggests the best match based on some similarity metric (n-grams). It is also possible to align one ontology to another. Accepts ontologies in both OBO and OWL formats as well as MeSH ASCII and OMIM txt.

The script runs non-interactively and the results have to be manually inspected, although it can be expected that anything with a similarity score higher than ~80-90% will be a valid match.

USAGE

similarity_match.pl (-w owlfile || -o obofile || -m meshfile || -i omimfile) -t targetfile -r resultfile [--obotarget || --owltarget]

Optional '--obotarget' setting specifies that the target file is an OBO ontology. Optional '--owltarget' setting specifies that the target file is an OWL ontology.

INPUT FILES

ontologies to map the targetfile against: owlfile, obofile, meshfile are ontologies in OWL, OBO and MeSH ASCII formats. Only a single file needs to be specified.
targetfile: The script expects a single column text file with no headears.

OUTPUT

The script will produce a single tab-delimited file as set with the -r flag. The file will have four headers:

ID: Accession of the term from the targetfile if the file was an ontology, otherwise OE_VALUE repeated.
OE_VALUE: Annotation from the supplied targetfile or a term label if the file was an ontology.
ONTOLOGY_TERM: Term label that was matched based on the highest similarity from the supplied onotlogy file.
ACCESSION: Accession of the ontology term that provided the best match.
SIMILARITY%: Similarity score of ONTOLOGY_TERM compared to OE_VALUE. This is the Levenshtein distance normalised by OE_VALUE length expressed in %. Higher is better.

DESCRIPTION

Function list

normalise_hash()

Normalises labels and synonyms in the target hash. These are stored in extra annotations on the hash, so that the original value is preserved for display.

check_data()

Checks the input data, e.g. removing empty lines or warning of duplicates

normalise()

Normalises a string by changing it lowercase and splitting into 2-grams.

align()

Aligns the two data structures targetfile and ontology. Outputs the results into a file.

parseMeSH()

Custom MeSH parser for the MeSH ASCII format.

parseMeSH()

Custom OMIM parser.

parseFlat()

Custom flat file parser.

parseFlatColumns()

Splits and joins the columns of a flat file. The first column is assigned to the first element. Concatenates the ragged end (leftover columns) into the second element or returns undef for a one-column file.

parseOBO()

Custom OBO parser.

parseOWL()

Custom OWL parser.

find_match()

A wrapper around the calculate_distance function. Specifies the similarity metric to be used, in this case Text::LevenshteinXS::distance.

Outputs a single line in the output file.

calculate_distance()

Finds the best match for the supplied term in the ontology using the supplied anonymous distance function defined in find_match().

AUTHORS

Tomasz Adamusiak <tomasz@cpan.org>

To install OWL::Simple::Class, copy and paste the appropriate command in to your terminal.

cpanm

cpanm OWL::Simple::Class

CPAN shell

perl -MCPAN -e shell
install OWL::Simple::Class

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)