View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
Moritz Lenz > App-Mowyw > App::Mowyw::Lexer



Annotate this POD


Open  1
View/Report Bugs

App::Mowyw::Lexer - Simple Lexer


    use App::Mowyw::Lexer qw(lex);
    # suppose you want to parse simple math expressions
    my @input_tokens = (
        ['Int',     qr/(?:-|\+)?\d+/],
        ['Op',      qr/\+|\*|-|\//],
        ['Brace_Open',  qr/\(/],
        ['Brace_Close', qr/\)/],
        ['Whitespace',  qr/\s/, sub { return undef; }],
    my $text = "-12 * (3+4)";
    foreach (lex($text, \@input_tokens){
        my ($name, $text, $position, $line) = @$_;
        print "Found Token $name: '$text'\n"
        print "    at position $position line $line\n";


App::Mowyw::Lexer is a simple lexer that breaks up a text into tokens according to regexes you provide.

The only exported subroutine is lex, which expects input text as its first argument, and a array references as second argument, which contains arrays of token names and regexes.

Each input token consists of a token name (which you can choose freely), a regexwhich matches the desired token, and optionally a reference to a functions that takes the matched token text as its argument. The token text is replaced by the return value of that function. If the function returns undef, that token will not be included in the list of output tokens.

lex returns a list of output tokens, each output token is a reference to a list which contains the token name, matched text, position of the match in the input string (zero-based, suitable for passing to substr), and line number of the start of the match (one-based, suitable for humans).

If there is unmatched text, it is returned with the token name UNMATCHED.


Copyright (C) 2007,2009 by Moritz Lenz,,

This Program and its Documentation is free software. You may distribute it under the terms of the Artistic License 2.0 as published by The Perl Foundation.

However all code examples are public domain, so you can use it in any way you want to.

syntax highlighting: