RTF::Tokenizer - Tokenize RTF
Tokenizes RTF
use RTF::Tokenizer; sub entity_handler { return "&#" . hex($_[0]); } my $object = RTF::Tokenizer->new($line); #my $object = RTF::Tokenizer->new($line, \&entity_handler); while (1) { my ($type, $value, $extra) = $object->get_token; print "$type, $value, $extra\n"; if ($type eq 'eof') { exit; } } $rtf->bookmark('save', '_font_table_original'); $rtf->jump_to_control_word('fonttbl'); my ($la, $la, $la) = $rtf->get_token; # 'control', 'fonttbl' $rtf->bookmark('retr', '_font_table_original'); $rtf->jump_to_control_word('rtf'); my ($la, $la, $la) = $rtf->get_token; # 'control', 'rtf', 1 $rtf->bookmark('retr', '_font_table_original'); $rtf->bookmark('delete', '_font_table_original');
Creates an instance. Needs a string of RTF for the first argument and an optional subroutine for the second. This subroutine is what to do upon finding an entity. Default behaviour is to change it into the character represented, but you can make it spit out HTML entities if you want too (as per the example above). The argument passed to this routine will be a hex value for the entity.
Returns a list, containing: token type (one of: control, text, group or eof), token data, and then if it's a control word, the integer value associated with it (if there is one).
Saves a copy of the current buffer to a hash in the object, with the key of 'name'. Possible actions are 'save', 'retr' and 'delete.' It's probably a good idea, if you have a large amount of text, to delete your bookmarks when done, because the hash contains a copy of the data, rather than a position in the buffer. Font.pm contains a good example.
Goes through the buffer until it finds one of the control words. The next token from get_token, having done this, will be the control word. The buffer up to this point will be lost (unless you've saved it.)
get_token
Peter Sergeant <pete@clueball.com>
Copyright 2002 Peter Sergeant.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install RTF::Tokenizer, copy and paste the appropriate command in to your terminal.
cpanm
cpanm RTF::Tokenizer
CPAN shell
perl -MCPAN -e shell install RTF::Tokenizer
For more information on module installation, please visit the detailed CPAN module installation guide.