The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Alvis::NLPPlatform::UserNLPWRapper - User interface for customizing the NLP wrappers used for linguistically annotating of XML documents in Alvis

SYNOPSIS

use Alvis::NLPPlatform::UserNLPWrapper;

Alvis::NLPPlatform::UserNLPWrappers->tokenize($h_config,$doc_hash);

DESCRIPTION

This module is a mere infterface for allowing the cutomisation of the NLP Wrappers. Anyone who wants to integrated a new NLP tool have to overwrite the default wrapper. The aim of this module is to make easier the development a specific wrapper, its integration and its use in the platform.

Before developing a new wraper, it is necessary to copy and modify this file in a local directory and add this directory to the PERL5LIB variable.

METHODS

tokenize()

    tokenize($h_config, $doc_hash);

This method carries out the tokenisation process of the input document. $doc_hash is the hashtable containing containing all the annotations of the input document. See documentation in Alvis::NLPPlatform::NLPWrappers. It is not recommended to overwrite this method.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

The method returns the number of tokens.

scan_ne()

    scan_ne($h_config, $doc_hash);

This method wraps the Named entity recognition and tagging step. $doc_hash is the hashtable containing containing all the annotations of the input document. It aims at annotating semantic units with syntactic and semantic types. Each text sequence corresponding to a named entity will be tagged with a unique tag corresponding to its semantic value (for example a "gene" type for gene names, "species" type for species names, etc.). All these text sequences are also assumed to be equivalent to nouns: the tagger dynamically produces linguistic units equivalent to words or noun phrases.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

word_segmentation()

    word_segmentation($h_config, $doc_hash);

This method wraps the default word segmentation step. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

sentence_segmentation()

    sentence_segmentation($h_config, $doc_hash);

This method wraps the default sentence segmentation step. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

pos_tag()

    pos_tag($h_config, $doc_hash);

The method wraps the Part-of-Speech (POS) tagging. $doc_hash is the hashtable containing containing all the annotations of the input document. For every input word, the wrapped Part-Of-Speech tagger outputs its tag.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

lemmatization()

    lemmatization($h_config, $doc_hash);

This methods wraps the lemmatizer. $doc_hash is the hashtable containing containing all the annotations of the input document. For every input word, the wrapped lemmatizer outputs its lemma i.e. the canonical form of the word..

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

term_tag()

    term_tag($h_config, $doc_hash);

The method wraps the term tagging step of the ALVIS NLP Platform. $doc_hash is the hashtable containing containing all the annotations of the input document. This step aims at recognizing terms in the documents differing from named entities, like gene expression, spore coat cell.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

syntactic_parsing()

    syntactic_parsing($h_config, $doc_hash);

This method wraps the sentence parsing. It aims at exhibiting the graph of the syntactic dependency relations between the words of the sentence. $doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

semantic_feature_tagging()

    semantic_feature_tagging($h_config, $doc_hash)

The method wraps the semantic typing step, that is the attachment of a semantic type to the words, terms and named-entities (referred to as lexical items in the following) in documents according to the conceptual hierarchies of the ontology of the domain.

$doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

semantic_relation_tagging()

    semantic_relation_tagging($h_config, $doc_hash)

This method wraps the semantic relation identification step. These semantic relation annotations give another level of semantic representation of the document that makes explicit the role that these semantic units (usually named-entities and/or terms) play with respect to each other, pertaining to the ontology of the domain.

$doc_hash is the hashtable containing containing all the annotations of the input document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

anaphora_resolution()

    anaphora_resolution($h_config, $doc_hash)

The methods wraps the anaphora solver. $doc_hash is the hashtable containing containing all the annotations of the input document. It aims at identifing and solving the anaphora present in a document.

$hash_config is the reference to the hashtable containing the variables defined in the configuration file.

SEE ALSO

Alvis web site: http://www.alvis.info

AUTHORS

Thierry Hamon <thierry.hamon@lipn.univ-paris13.fr> and Julien Deriviere <julien.deriviere@lipn.univ-paris13.fr>

LICENSE

Copyright (C) 2005 by Thierry Hamon and Julien Deriviere

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.