The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

=pod

=encoding utf-8

=head1 NAME

Treex::Manual::FAQ - Frequently asked questions about Treex

=head1 VERSION

version 0.08324

=head1 FREQUENTLY ASKED QUESTIONS

=head2 General

=head3 Is Treex (API, data formats,...) stable?

=head3 Is Treex platform independent?

=head3 Why should I use Treex and not other NLP frameworks such as GATE?

=head3 My application in Treex is slow. What shall I do?

=head3 I noticed that Treex require an on-line access to some data repository. Can Treex work off-line?

=head3 Why should I use Treex even if it does not support the language I am interested in?

=head3 I found a bug in Treex. What should I do?

write a minimal test, write the version....

=head3 I'd like to create a commercial application using some of Treex modules. Shall I expect any licensing troubles?

=head2 Installation

=head2 Using Treex

=head3 How can I get parallel corpus to Treex?

Let's say you have English and French corpus, sentence aligned,
stored in plain text format, one sentence per line.

   Read::AlignedSentences en=sample_en.txt fr=sample_fr.txt

=head2 Contributing to Treex

=head3 Can I add new attributes to nodes?

For the I<official> attributes (defined in the Treex PML schema),
there are accessor methods, so e.g. for I<lemma>:

  my $old_lemma = $anode->lemma;
  $node->set_lemma('new');

If you want to use your own new attributes, you can use so-called 
L<wild attributes|Treex::Core::WildAttr>:

  $node->wild->{name_of_my_new_attribute} = $value;
  $value = $node->wild->{name_of_my_new_attribute};

=head4 side note

You can also use methods C<get_attr> and C<set_attr>
to access whatever attribute you want:  

  $node->set_attr('my_new_attribute', 'my_value');
  print $node->get_attr('my_new_attribute'); # prints "my_value"

However, when saving the document (to *.treex), only attributes
which are described in treex PML schema
(stored in C<treex/lib/Treex/Core/share/tred_extension/treex/resources>)
will be saved (including all wild attributes).

This means that you should not expect that a "non-schema" attribute
(except for the wild attributes)
saved in one block will be accessible in another block.
(It would work with both block in one scenario, but it wouldn't work
if saved in one and loaded in another scenario, which is hard to debug.)

Note, that for temporary information you can also use a separate hash
variable as an alternative to new attributes:

  my %my_new_attribute_of;
  $my_new_attribute_of{$node} = 'my_value';

=head3 How can a block/tool automatically download some resources?

It is quite usual that a Treex block or a tool (C<Treex::Tool::...>)
needs a pre-trained model, database, dictionary etc.
These files can be huge (several GiB) so we cannot store them in svn repository and upload to CPAN.
We store them in the Treex shared I<data> directory -- either
in C<resources> (for treebanks, corpora and other officially published resources)
or in C<models> (pre-trained statistical models) subdirectory.

These files will be automatically downloaded from the UFAL server
when they are first needed. To achieve this behavior, override the
L<get_required_share_files|Treex::Core::Block/get_required_share_files> method in your block,
so it returns a list of filenames, e.g.:

  my $input_file = 'data/models/tagger/my_tagger/en_penntb.model'

  sub get_required_share_files {
      return $input_file;
  }

  sub BUILD {
      open my $I, "<:encoding(utf-8)", "$ENV{TMT_ROOT}/share/$input_file";
      # now load the model ...
  }

If you want to use shared files in a tool:

  package Treex::Tools
  use Treex::Core::Resource;
  my $input_file = 'data/models/tagger/my_tagger/en_penntb.model'

  sub BUILD {
    my ($self) = @_;
    Treex::Core::Resource::require_file_from_share($input_file, ref($self));
    # now load the model ...
  }

The second parameter of C<require_file_from_share> is a name of the tool that needs the file,
it is just for information -- will be printed when downloading the file.

head2 Using C<ttred>

=head3 What does the name C<ttred> mean?

B<TrEd> is a I<tree editor> developed by Petr Pajas - see L<http://ufal.mff.cuni.cz/~pajas/tred>.
C<ttred> is Treex-modified TrEd capable of showing C<*.treex> files.
Actually, it is a just light-weight wrapper which executes C<tred>
with a path to the pre-installed Treex extension.

=head3 How to change the order of zones (trees) as they are showed in C<ttred>?

Press I<c> and drag&drop the zones in a matrix.
(I<c> is a shortcut for a macro, which you can find also in the menu:
Macros - all modes - treex_mode - Configuration.)
You can choose both horizontal and vertical position,
which is handy e.g. for word-aligned corpora,
where you usually want to have the aligned trees above each other,
so the alignment links are not too long.

=head1 AUTHOR

Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>

David Mareček <marecek@ufal.mff.cuni.cz>

=head1 COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

=cut