The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::FR::Ladl::Table - An object representing a Ladl Table

VERSION

This document describes Lingua::FR::Ladl::Table version 0.0.1

SYNOPSIS

    use Lingua::FR::Ladl::Table;

    my $table = Lingua::FR::Ladl::Table->new({ name => $table_ref->{name} });

    # load table data from an excel file:
    $table->load({ format => 'xls', file => 't/1.xls' }); 

    # load table data from a gnumeric xml file:
    $table->load({ format => 'xml', file => 't/1.xml' }); 

    $table->set_name('1'); 
    my $name = $table->get_name();

    my $verbCol = $table->get_verb_column();  # which column contains the verb

    my $col = $table->get_col_for_header('aux =: avoir'); # which column's header is 'aux =: avoir'?

    my $header = $table->get_header_for_col(4); # what is the column header of column 4?

    my $dbh = $table->create_db_table( { col_names => 'col_numbers' } ); # get a db handle with column numbers as column names

    # Query the table using SQL::Statement: for which verbs is column 8 empty and column 19 = '+'?
    my $query = "SELECT col_$verbCol FROM table_$name where col_8 = NULL AND col_19 = '+'";
    my $sth = $dbh->prepare($query);
    $sth->execute();

DESCRIPTION

This module provides a data structure representing a Ladl table. The Ladl tables are the digitized representation of Maurice Gross's Grammar Lexicon, a very large scale, high precision, French linguistic resource, developed over several years by a group of skilled linguists and according to well defined linguistic criteria. The grammar lexicon describes syntactic and semantic properties of French (basic) sentences.

A table gathers together predicative items (verbs in this case) with comparable syntactico-semantic behaviour.

In a table, columns further specify the syntactico-semantic properties of each verb in that table.

Example

1
N0 =: Nhum N0 =: Nnc 1 aux =: avoir aux =: être N0 est Upp W N0 U N1 =: Qu P N1 =: Qu Psubj Tp = Tc Tc =: passé Tc =: présent Tc =: futur Vc =: devoir Vc =: pouvoir Vc =: savoir V-inf0 W = Ppv N0 U Prép N1 N0 U Prép Nhum N0 U Prép N-hum Prép N1 = Ppv N0 U dans N1 N0 U N1 N0 U Nhum N0 U N-hum
+ - <E> achever + - - - de - - + - - - - - - - - - - - - + + + Max achève de peindre le mur
+ + <E> aller - - - - <E> - - - - - - + + + - - - - - - - - - Max va partir
+ - <E> aller - + - - jusqu'à - - + - - - - - - - - - - - - - - - La pluie va tomber
+ + ne aller Nég - + - - sans - + + - - - + + + - + - + - - - - - Cette mesure n'ira pas sans créer des troubles

The tables are available as a set of excel spreadsheets from http://ladl.univ-mlv.fr/

This module represents a table as a Ladl::Table object and allows to investigate and query:

what verbs belong to the table.
what are the headers of the table?
which column corresponds to which header.
which verb corresponds to which row(s).
what is the value of a column in a given row.
what is the value in a given row for a given header.

It is also possible to formulate more complex queries using the SQL dialect implemented in SQL::Statement (see SQL::Statement).

INTERFACE

Methods

For all following methods, column and row numbering starts at 0.

new - build a new Table object
  my $table = Lingua::FR::Ladl::Table->new({ name => 'test_table' });

There's one optional initial argument: name, the table's name

set/get_name
load
  $table->load( {format => 'xls', file=>'file_name'} )

  $table->load( {format => 'xml', file=>'file_name'} )

Load table data from a file in the given format. Format may be one of:

xls - file is an excel file

in this case Spreadsheet::Parser is used to parse the file.

xml - an xml file in gnumeric xml format

the file is parsed using XML::LibXML

The file name is also set to a value inferred from the file name by removing the suffix. The file name is important because the get_verb_column method relies on the correct file name.

get/set_maxCol, get/set_maxRow

get/set maximum row or column value

get_headers

Return a hash with the column headers as keys and the corresponding column numbers as values. When there's no header, col_column number is used as key.

get_value_at($row, $col)
get_col_for_header( $header )

return column number for a given column header, undef if $header doesn't match. For the table in the example:

  $table->get_col_for_header('aux =: avoir')

  returns 4
get_header_for_col($col)

return the header for a given column.

Example:

  $table->get_header_for_col(4)

  returns 'aux =: avoir'
get_verb_column

return the column (by number) containing the verb. The verb column is assumed to be the column the header of which is equal to the table name.

For the table in the example

  $table->get_verb_column();

  returns 3
get_verbs

return the list of verbs of the table (as an array).

get_particle_column

The particle column contains entries as ne, n', se, s', occuring in front of the verb. We assume it to be the column right before the verb column.

Example:

  $table->get_particle_column()

  returns 2
get_example_column

The example column contains example phrases with the verb of the row. We assume it's the last column of the table.

For the table in the example above:

  my $col = $table->get_example_column()

would set $col to 28

get_column_types

Columns may either contain text or one of '+', '-' and '~'. The method returns a reference to a hash with the column numbers as keys and assigning to the columns either 'text' if they have text content or else '+-~'.

For the table in the example:

  $table->get_column_types()

returns the hash:

  {
                '0' => '+-~',
                '1' => '+-~',
                '2' => 'text',
                '3' => 'text',
                '4' => '+-~',
                '5' => '+-~'
                '6' => '+-~',
                '7' => '+-~',
                '8' => 'text',
                '9' => '+-~',
                '10' => '+-~',
                '11' => '+-~',
                '12' => '+-~',
                '13' => '+-~',
                '14' => '+-~',
                '15' => '+-~',
                '16' => '+-~',
                '17' => '+-~',
                '18' => '+-~',
                '19' => '+-~',
                '20' => '+-~',
                '21' => '+-~',
                '22' => '+-~',
                '23' => '+-~',
                '24' => '+-~',
                '25' => '+-~',
                '26' => '+-~',
                '27' => 'text',
  };
get_column_type_for_col

Return the column type for a given column. The column type is either +-~ if the columns contains only one of `+', `-' or `~', or text if the column contains some other text content.

Throws an exception when column is inexistant.

Example:

   $table->get_column_type_for_col(2)

   returns `text';
is_tilda_row($row)

A row is a tilda row if all the columns of type '+-~' are '~' - i.e. they contain no specific information about this verb.

get_verb_for_row($row)

Return the verb for a given row. For the the table in the example above:

  my $verb = $table->get_verb_for_row(3)

  returns `aller'
get_rows_for_verb($verb)

Returns the rows the verb occurs in (there may be more than 1). Example:

  my @rows = $table->get_rows_for_verb('devoir');

  @rows is (27, 28, 29)
is_column_set($row, $col)

A column may be of type text or +-~.

a text column of a row is set if it's different from the `Empty mark', which by default is <E>.
a +-~ column of a row is set if it's +.

The empty_string_mark can be set via the "Parameters" accessors.

has_verb($verb)

Returns true if the verb is contained in the table.

has_verb_matching($regexp)

Returns true if a verb of the table matches $regexp.

create_db_table( { col_names => 'col_numbers' } )

Provides a DB interface using DBI and returns a db handle to an in-memory table created using DBD::AnyData. The table name is table_$table_name. The column names are either

col_$col_numbers,

when the argument is { col_names => 'col_numbers' } (the default).

The column headers

when another argument is given. When the header is empty col_column_number is used.

Example:

   # get a db handle with columns named col_<column number>
   # default: { col_names => 'col_numbers' }
   my $dbh = $table->create_db_table();

   # get a db handle using the column headers as column names
   $dbh = $table->create_db_table( { col_names => 'headers' } );

Once you have a db handle you can start querying the table using SQL::Statement (see SQL::Statements and DBD::AnyData for which SQL statements are supported).

Example:

    my $query = "SELECT col_$verbCol FROM table_$name where col_8 = NULL AND col_19 = '+'";
    my $sth = $dbh->prepare($query);
    $sth->execute();

Note: The empty string marks (`<E>' by default) are replaced by empty strings, equivalent to NULL.

Parameters

The class is parametrized by a Parametrizer (see Lingua::FR::Ladl::Parametrizer) object, which can be accessed by the get/set_parameters method. A parametrizer object provides accessors for its customization items. Currently the most important item is the empty_string_mark which defaults to `<E>'. You could change the empty_string_mark like so:

  my $par_object = $table->get_parameters();
  $par_object->set_empty_string_mark('EMPTY');
  $table->set_parameters($par_object);

DIAGNOSTICS

Format must be one of xls, xml not $format

Exception thrown when trying to load the table from a format that is not supported currently.

The only supported formats are:

xls

excel table

xml

gnumeric xml format

Could not create file parser context for file "unknown"

Thrown by LibXML, the xml parser: Ladl::Table wants to load table data by parsing an xml file, but the xml parser throws an exception. Maybe the file is not accessible?

Couldn't load table data: error parsing file

Ladl::Table wants to load table data by parsing an excel file, but Spreadsheet::ParseExcel returned invalid data. Maybe the file is not accessible?

Need table data for table_name, maybe you should call the load method first?

Most methods only work and make sense if table data is loaded.

col/row must be less or equal max_row/max_col

Method was called with an invalid row/column respectively.

CONFIGURATION AND ENVIRONMENT

Lingua::FR::Ladl::Table requires no configuration files or environment variables.

DEPENDENCIES

Class::Std
Readonly
List::Util
List::MoreUtils
XML::LibXML

if you want to load table data from a gnumeric XML file.

Spreadsheet::ParseExcel

if you want to load table data from an excel file.

DBI and DBD::AnyData

if you want to use a DB interface.

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

Please report any bugs or feature requests to bug-lingua-fr-ladl-table@rt.cpan.org, or through the web interface at http://rt.cpan.org.

SEE ALSO

http://ladl.univ-mlv.fr/, where the Ladl tables have been developed and where they can be obtained.

Some publications on this project:

Maurice Gross' grammar lexicon and Natural Language Processing

by Claire Gardent, Bruno Guillaume, Guy Perrier, Ingrid Falk

http://hal.archives-ouvertes.fr/action/open_file.php?url=http://hal.archives-ouvertes.fr/docs/00/10/31/56/PDF/poznan05.pdf&docid=103156

Extracting subcategorisation information from Maurice Gross' grammar lexicon,

by Claire Gardent, Bruno Guillaume, Guy Perrier, Ingrid Falk in Archives of Control Sciences (2005) 289--300

A talk at the French Perl Workshop 2006 (in French ;-)

http://conferences.mongueurs.net/fpw2006/slides/lexique-syntaxique.pdf

AUTHOR

Ingrid Falk <ingrid dot falk at loria dot fr>

LICENCE AND COPYRIGHT

Copyright (c) 2007, Ingrid Falk <ingrid dot falk at loria dot fr>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.