Thierry Hamon > Alvis-NLPPlatform-0.6 > Alvis::NLPPlatform::UserNLPWrappers

Download:
Alvis-NLPPlatform-0.6.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Source  

NAME ^

Alvis::NLPPlatform::UserNLPWRapper - User interface for customizing the NLP wrappers used to linguistically annotating of XML documents in Alvis

SYNOPSIS ^

use Alvis::NLPPlatform::UserNLPWrapper;

Alvis::NLPPlatform::UserNLPWrappers::tokenize($h_config,$doc_hash);

DESCRIPTION ^

This module is a mere interface for allowing the cutomisation of the NLP Wrappers. Anyone who wants to integrated a new NLP tool has to overwrite the default wrapper. The aim of this module is to simplify the development a specific wrapper, its integration and its use in the platform.

Before developing a new wrapper, it is necessary to copy and modify this file in a local directory and add this directory to the PERL5LIB variable.

METHODS ^

tokenize()

    tokenize($h_config, $doc_hash);

This method carries out the tokenisation process of the input document. $doc_hash is the hashtable containing containing all the annotations of the input document. See documentation in Alvis::NLPPlatform::NLPWrappers. It is not recommended to overwrite this method.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

The method returns the number of tokens.

scan_ne()

    scan_ne($h_config, $doc_hash);

This method wraps the Named entity recognition and tagging step. $doc_hash is the hashtable containing containing all the annotations of the input document. It aims at annotating semantic units with syntactic and semantic types. Each text sequence corresponding to a named entity will be tagged with a unique tag corresponding to its semantic value (for example a "gene" type for gene names, "species" type for species names, etc.). All these text sequences are also assumed to be equivalent to nouns: the tagger dynamically produces linguistic units equivalent to words or noun phrases.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

word_segmentation()

    word_segmentation($h_config, $doc_hash);

This method wraps the default word segmentation step. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

sentence_segmentation()

    sentence_segmentation($h_config, $doc_hash);

This method wraps the default sentence segmentation step. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

pos_tag()

    pos_tag($h_config, $doc_hash);

The method wraps the Part-of-Speech (POS) tagging. $doc_hash is the hashtable containing containing all the annotations of the input document. For every input word, the wrapped Part-Of-Speech tagger outputs its tag.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

lemmatization()

    lemmatization($h_config, $doc_hash);

This methods wraps the lemmatizer. $doc_hash is the hashtable containing containing all the annotations of the input document. For every input word, the wrapped lemmatizer outputs its lemma i.e. the canonical form of the word..

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

term_tag()

    term_tag($h_config, $doc_hash);

The method wraps the term tagging step of the ALVIS NLP Platform. $doc_hash is the hashtable containing containing all the annotations of the input document. This step aims at recognizing terms in the documents differing from named entities, like gene expression, spore coat cell.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

syntactic_parsing()

    syntactic_parsing($h_config, $doc_hash);

This method wraps the sentence parsing. It aims at exhibiting the graph of the syntactic dependency relations between the words of the sentence. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

Here is a example of how to tune the platform according to the domain. We integrated and wrapped the BioLG parser, specialized for biology text parsing.

bio_syntactic_parsing()

    bio_syntactic_parsing($h_config, $doc_hash);

This method wraps the sentence parsing tuned for biology texts. As the default wrapper (syntactic_parsing), it aims at exhibiting the graph of the syntactic dependency relations between the words of the sentence. $doc_hash is the hashtable containing containing all the annotations of the input document.

$h_config is the reference to the hashtable containing the variables defined in the configuration file.

We actually integrage a version of the Link Parser tuned for the biology: BioLG (Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko. Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches. Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006). Pages 60-67. Jena, Germany, 2006).

semantic_feature_tagging()

    semantic_feature_tagging($h_config, $doc_hash)

The method wraps the semantic typing step, that is the attachment of a semantic type to the words, terms and named-entities (referred to as lexical items in the following) in documents according to the conceptual hierarchies of the ontology of the domain.

$doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

semantic_relation_tagging()

    semantic_relation_tagging($h_config, $doc_hash)

This method wraps the semantic relation identification step. These semantic relation annotations give another level of semantic representation of the document that makes explicit the role that these semantic units (usually named-entities and/or terms) play with respect to each other, pertaining to the ontology of the domain.

$doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

anaphora_resolution()

    anaphora_resolution($h_config, $doc_hash)

The methods wraps the anaphora solver. $doc_hash is the hashtable containing containing all the annotations of the input document. It aims at identifing and solving the anaphora present in a document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

# =head1 ENVIRONMENT

SEE ALSO ^

Alvis web site: http://www.alvis.info

AUTHORS ^

Thierry Hamon <thierry.hamon@lipn.univ-paris13.fr> and Julien Deriviere <julien.deriviere@lipn.univ-paris13.fr>

LICENSE ^

Copyright (C) 2005 by Thierry Hamon and Julien Deriviere

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.

syntax highlighting: