Yves Paindaveine > llg-1.07 > Lex

Download:
llg-1.07.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
Report a bug
Source  

NAME ^

Lex - Lexical analyser generator (Alpha 1.07).

SYNOPSIS ^

        @tokens = (
                   'ADDOP' => '[-+]', 
                   'LEFTP' => '[(]',
                   'RIGHTP' => '[)]',
                   'INTEGER' => '0|[1-9][0-9]*',
                   'NEWLINE' => '\n',
                   'STRING' => '["]', sub {
                     my $self = shift;
                     my $string = $';
                     my $buffer = $string;
                     while($string !~ /"/) {
                           $string = $self->readline;
                           $buffer .= $string;
                     }
                     $buffer =~ s/^[^"]*"//;
                     $self->set($buffer);
                     qq!"$&!;           # token content
                   },
                   'ERROR' => '.+',
                  );

        $lexer = Lex->new(@tokens);
        $lexer->from(\*DATA);
        print "Tokenization of DATA:\n";

        TOKEN:while (1) {
          $token = $lexer->nextToken;
          if (not $lexer->eof) {
            print "Line $.\t";
            print "Type: ", $token->name, "\t";
            print "Content:->", $token->get, "<-\n";
          } else {
            last TOKEN;
          }
        }

        __END__
        1+2-5
        "multiline
        string"

DESCRIPTION ^

The package Lex allows the definition of lexical analysers. It handles reading and eating of the data.The method from() allows you to specify an input filehandle.

The lexical analyser recognises tokens defined by regular expressions given as a parameter to the method new(). These regexs are examined in the order in which they are given in the parameter.

Methods

chomp()

Active/disactivate the removal of the newline character for each input line.

debug()

Activate/disactivate a trace indicating which tokens have been eaten.

eof()

Return true if the end of file is encountered.

from()

Indicate the data source, The argument is either a string or a reference to a filehandle. For example:

            $symbol->from(\*DATA);

or

            $symbol->from('les données à analyser');
less(EXPR)

The argument is an expression whose value is put at the start of the data stream.

new()

Create a new anlayser. The argument is a list of triples consisting of: the symbolique name of the token, the regular expression for its recognition and possibly an anonymous function executed when the token is recognised. new() creates an object of type Token for each triple.

reset()

Empty Lex's internal buffer.

buffer()
buffer(EXPR)

Return the contents of Lex's internal buffer. With an expression as argument, put the result of the expression in the buffer.

readline()

Read data from the specified input (see method from()). Return the result of the read.

singleline()

If active read only a single line.

skip(RE)

Define the lexeme separator (default: [ \t]+).

token()

Return the object corresponding to the last token consumed. In the absence of a such, return a special token whose symbolic name is default token.

PACKAGE TOKEN ^

The package Token allows the definition of tokens used by the lexical analyser. Objects of this type are created by the method new() of the package Lex.

Methods

debug()

Activate/disactivate a trace showing which tokens have been found.

get

Return the content of the object.

mean()

Return the anonymous function associate with the object Token.

name()

Return the symbolic name of the object.

next()

Read, consume and return the token defined by the regular expression in the object.

new()

Create an object of type Token. The arguments of new() are ordered: a symbolic name, a regular expression, and (optionally) an anonymous function. The anonymous function is executed when the token is consumed by the lexical analyser. The output of this function defines the string of characters memorised in the object and accessible by the method get().

regexp()

Return the regular expression used for token recognition.

status()

Indicate is the last token search has succeeded or not.

ERROR HANDLING ^

To handle cases where tokens are not recognised you can define a specific Token object e.g.

            $ERROR = Token->new('.*');

If search for this token succeeds it is then possible to call an error function.

EXEMPLES ^

tokenizer.pl - Shows tokenisation using the package Lex.

AUTEURS ^

Philippe Verdret

SEE ALSO ^

LLg package.

BUGS ^

REFERENCES ^

Groc, B., & Bouhier, M. - Programmation par la syntaxe. Dunod 1990.

Mason, T & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990.

COPYRIGHT ^

Copyright (c) 1995-1996 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.