The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RPSL::Parser - Router Policy Specification Language (RFC2622) Parser

SYNOPSIS

        use RPSL::Parser;
        # Create a parser
        my $parser = new RPSL::Parser;
        # Use it
        my $data_structure = $parser->parse($data);

DESCRIPTION

This is a rather simplistic lexer and tokenizer for the RPSL language.

It currently does not validate the object in any way, it just tries (rather hard) to grab the biggest ammount of information it can from the text presented and place it in a Parse Tree (that can be passed to other objects from the RPSL namespace for validation and more RFC2622 related functionality).

PUBLIC METHODS

new()

Constructor. Handles the accessor creation and returns a new RPSL::Parser object.

parse( [ $rpsl_source | IO::Handle | GLOB ] )

Parses one RPSL object for each call, uses the parser internal fields to store the data gathered. This is the method you need to call to transform your RPSL text into a Perl data structure.

It accepts a list or a scalar containing the strings representing the RPSL source code you want to parse, and can read it directly from any IO::Handle or GLOB representing an open file handle.

ACCESSOR METHODS

comment()

Stores an array reference containing all the inline comments found in the RPSL text.

object

Stores a hash reference containing all the RPSL attributes found in the RPSL text.

omit_key

Stores an array reference containing all the position of the keys we must omit from the original RPSL text.

order

Stores an array reference containing an ordered list of RPSL attribute names, to enable the RPSL to be rebuilt from the parsed data version.

key

Stores the value found in the first RPSL attribute parsed. This is sometimes refered as the RPSL object key.

text

Stores an scalar containing the RPSL text to be parsed.

tokens

Stores an array reference containing an ordered list of tokens and token values produced by the tokenize method.

type

Stores a string representing the name of the first RPSL attribute found in the RPSL text parsed. The RFC 2622 requires that the first attribute declares the "data type" of the RPSL object declared.

Private Interface

_read_text( @input )

Checks if the first element from @input is a IO::Handle or a GLOB, and reads from it. If the first element is not any type of file handle, assumes it's an array of scalars containing the text for the RPSL object to be parsed, join() it all toghether and feed it to the parser.

_tokenize()

This method breaks down the RPSL source code read by read_text() into tokens, and store them internally. For commodity, it returs a reference to the object itself, so you can chain up method calls.

_cleanup_attribute( $value )

Returns a cleaned-up version of the attribute passed in: no trailling or leading whitespace or newlines.

_store_attribute( $attribute_name, $attribute_value )

Auxiliary method. It clean up the value and store the attribute in the data structure being built, and does the necessary storage upkeep.

_store_comment( $comment_position_index, $attribute_and_comment_text )

This method extracts inline comments from the inline part of an object and store those comments into the parse tree being built. It returns the attribute passed in with the comments stripped, so it can be stored into the appropriated place afterwards.

_build_parse_tree()

This method consumes the tokens produced by _tokenize() and builds a data structure containing all the information needed to re-build the RPSL object back.

It returns a reference to the parser object itself, making easy to chain method calls again.

_parse_tree()

This method assembles all the information gathered during the RPSL source code tokenization and parsing into a hash reference containing the following keys:

data

Holds a hash reference whose keys are the RPSL attributes found, and the values are the string passed in as values to the respective attributes in the RPSL text. Multi-valued attributes are represented by array references. As this parser doesn't enforces all the RPSL business rules, you must take care when fiddling with this structure, as any value could be an array reference.

order

Holds an array reference containing the key names from the data hash, in the order they where found in the RPSL text. This is stored here because the RFC 2622 commands that the order of the attributes in a RPSL object is important.

type

Holds a string containing the name of the first RPSL attribute found in the RPSL text. RFC 2622 commands that the first attribute must be the type of the object declared. Knowing the type of object can allow proper manipulation of the different RPSL object types by other RPSL namespace modules.

key

Holds the value contained by the first attribute of an RPSL object. This is sometimes the "primary key" of a RPSL object, but not always.

comment

Comment is a hash structure where the keys are index positions in the order array, and values are the inline comments extracted during the parsing stage. Preserving inline comments is not a requirement from RFC 2622, just a nice thing to have.

omit_key

RFC 2622 allows some attribute names to contain multiple values. For every new value, a new line must be inserted into the RPSL object. For brevity, and to allow humans to read and write RPSL, the RFC 2622 allows the attribute name to be omited and replaced by whitespace. It also dictates that lines begining with a "+" sign must be considered as being part of a multi-line RPSL attribute.

This array reference stores integers representing index positions in the order array signaling attribute positions that must be omited when generating RPSL text back from this parse tree. As RFC 2622 doesn't request that attributes omited by starting a line with whitespace or "+" must preserve this characteristic, this is only a nice-to-have feature. =back

EXAMPLES

Example #1: retrieving information from the Whois Database

Suppose you retrieve a RPSL Person object from the RIPE NCC WHOIS Database, like the one below. It's a simple query and I will not explain how you could do it here, because it's a bit out of scope for this module. Anyway, you have the following text as a result:

        person:       I. M. A. Fool
        address:      F.A.K.E Corporation
        address:      226 Nowhere st
        address:      10DD10 Nevercity
                      Neverland
        phone:        +99-99-999-9999
        fax-no:       +99-99-999-9999
        e-mail:       xxx@somewhere.com
        nic-hdl:      XXX007-RIPE # Look, ma, I'm 007! ;)
        mnt-by:       NICE-GUY-MNT
        changed:      xxx@somewhere.com 20001016
        source:       RIPE

Let's assume you need to send an email (for example, to report routing problems) to Mr Fool. It means that you need to retrieve the e-mail field value from this RPSL object.

Let's assume you have the previous text in the $text variable. In order to parse the contents of the text into a nice Perl data structure, all you need to do is instanciate a parser, with

        use RPSL::Parser;
        my $parser = new RPSL::Parser;

And then pass it the contents of the $text variable, and collect the resulting data structure back:

        my $data_structure = $parser->parse( $text );

It will give you something that will look more or less like this (dumped by Data::Dumper):

        $data_strucutre = {
            'omit_key' => [ 4 ],
            'comment'  => { 8 => q{Look, ma, I'm 007! ;)} },
            'order'    => [
                'person', 'address', 'address', 'address', 'address', 'phone',
                'fax-no', 'e-mail',  'nic-hdl', 'mnt-by',  'changed', 'source'
            ],
            'type' => 'person',
            'data' => {
                'source'  => 'RIPE',
                'mnt-by'  => 'NICE-GUY-MNT',
                'phone'   => '+99-99-999-9999',
                'nic-hdl' => 'XXX007-RIPE',
                'fax-no'  => '+99-99-999-9999',
                'e-mail'  => 'xxx@somewhere.com',
                'changed' => 'xxx@somewhere.com 20001016',
                'person'  => 'I. M. A. Fool',
                'address' => [
                    'F.A.K.E Corporation',
                    '226 Nowhere st',
                    '10DD10 Nevercity',
                    'Neverland'
                ]
            },
            'key' => 'I. M. A. Fool'
        };

In a near future, there will be other objects that will know how to interpret this as the specific RPSL object declared, and to write the corresponding RPSL representation of a given data structure, simmilar to this one.

SEE ALSO

RFC2622 http://www.ietf.org/rfc/rfc2622.txt, for the full RPSL specification.

Object::Accessor, for the accessor implementation used in this module.

AUTHOR

Luis Motta Campos, <lmc@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by Luis Motta Campos

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.