The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::String::Charset::Wordlist - A dictionary charset for Math::String

SYNOPSIS

    use Math::String::Charset::Wordlist;

    my $x = Math::String::Charset::Wordlist->new ( {
        file => 'path/dictionary.lst' } );

REQUIRES

perl5.005, DynaLoader, Math::BigInt, Math::String::Charset

EXPORTS

Exports nothing.

DESCRIPTION

This module lets you create an charset object, which is used to construct Math::String objects.

This object maps an external wordlist (aka a dictionary file where one line contains one word) to a simple charset, e.g. each word is one character in the charset.

The wordlist file must be sorted alphabetically (just like sort -u does), otherwise the results from converting between string and number form are unpredictable.

ERORRS

Upon error, the field _error stores the error message, then die() is called with this message. If you do not want the program to die (f.i. to catch the errors), then use the following:

        use Math::String::Charset::Wordlist;

        $Math::String::Charset::Wordlist::die_on_error = 0;

        $a = Math::String::Charset::Wordlist->new();    # error, empty set!
        print $a->error(),"\n";

INTERNAL DETAILS

This object caches certain calculation results (f.i. which word is stored at which offset in the file etc), thus greatly speeding up sequentiell Math::String conversations from string to number, and vice versa.

METHODS

new()

            Math::String::Charset::Wordlist->new();

Create a new Math::String::Charset::Wordlist object.

The constructor takes a HASH reference. The following keys can be used:

        minlen          Minimum string length, for now always 0
        maxlen          Maximum string length, for now always 1
        file            path/filename of wordlist file
        sep             separator character, none if undef

The resulting charset will always be of order 1, type 2.

The wordlist file must be sorted alphabetically (just like sort -u does), otherwise the results from converting between string and number form are unpredictable.

minlen

Optional minimum string length. Any string shorter than this will be invalid. Must be shorter than a (possible defined) maxlen. If not given is set to -inf. Note that the minlen might be adjusted to a greater number, if it is set to 1 or greater, but there are not valid strings with 2,3 etc. In this case the minlen will be set to the first non-empty class of the charset.

For wordlists, the minlen is always 0 (thus making '' the first valid string).

maxlen

Optional maximum string length. Any string longer than this will be invalid. Must be longer than a (possible defined) minlen. If not given is set to +inf.

For wordlists, the maxlen is always 1 (thus making the last word in the dictionary the last valid string).

minlen()

        $charset->minlen();

Return minimum string length.

maxlen()

        $charset->maxlen();

Return maximum string length.

length()

        $charset->length();

Return the number of items in the charset, for higher order charsets the number of valid 1-character long strings. Shortcut for $charset->class(1).

count()

Returns the count of all possible strings described by the charset as a positive BigInt. Returns 'inf' if no maxlen is defined, because there should be no upper bound on how many strings are possible.

If maxlen is defined, forces a calculation of all possible class() values and may therefore be very slow on the first call, it also caches possible lot's of values if maxlen is very high.

class()

        $charset->class($order);

Return the number of items in a class.

        print $charset->class(5);       # how many strings with length 5?

char()

        $charset->char($nr);

Returns the character number $nr from the set, or undef.

        print $charset->char(0);        # first char
        print $charset->char(1);        # second char
        print $charset->char(-1);       # last one

lowest()

        $charset->lowest($length);

Return the number of the first string of length $length. This is equivalent to (but much faster):

        $str = $charset->first($length);
        $number = $charset->str2num($str);

highest()

        $charset->highest($length);

Return the number of the last string of length $length. This is equivalent to (but much faster):

        $str = $charset->first($length+1);
        $number = $charset->str2num($str);
        $number--;

order()

        $order = $charset->order();

Return the order of the charset: is always 1 for grouped charsets. See also type.

type()

        $type = $charset->type();

Return the type of the charset: is always 1 for grouped charsets. See also order.

charlen()

        $character_length = $charset->charlen();

Return the length of one character in the set. 1 or greater. All charsets used in a grouped charset must have the same length, unless you specify a seperator char.

seperator()

        $sep = $charset->seperator();

Returns the separator string, or undefined if none is used.

chars()

        $chars = $charset->chars( $bigint );

Returns the number of characters that the string would have, when you would convert $bigint (Math::BigInt or Math::String object) back to a string. This is much faster than doing

        $chars = length ("$math_string");

since it does not need to actually construct the string.

first()

        $charset->first( $length );

Return the first string with a length of $length, according to the charset. See lowest() for the corrospending number.

last()

        $charset->last( $length );

Return the last string with a length of $length, according to the charset. See highest() for the corrospending number.

is_valid()

        $charset->is_valid();

Check wether a string conforms to the charset set or not.

error()

        $charset->error();

Returns "" for no error or an error message that occured if construction of the charset failed. Set $Math::String::Charset::die_on_error to 0 to get the error message, otherwise the program will die.

start()

        $charset->start();

In list context, returns a list of all characters in the start set, that is the ones used at the first string position. In scalar context returns the lenght of the start set.

Think of the start set as the set of all characters that can start a string with one or more characters. The set for one character strings is called ones and you can access if via $charset-ones()>.

end()

        $charset->end();

In list context, returns a list of all characters in the end set, aka all characters a string can end with. In scalar context returns the lenght of the end set.

ones()

        $charset->ones();

In list context, returns a list of all strings consisting of one character. In scalar context returns the lenght of the ones set.

This list is the cross of start and end.

Think of a string of only one character as if it starts with and ends in this character at the same time.

The order of the chars in ones is the same ordering as in start.

prev()

        $string = Math::String->new( );
        $charset->prev($string);

Give the charset and a string, calculates the previous string in the sequence. This is faster than decrementing the number of the string and converting the new number to a string. This routine is mainly used internally by Math::String and updates the cache of the given Math::String.

next()

        $string = Math::String->new( );
        $charset->next($string);

Give the charset and a string, calculates the next string in the sequence. This is faster than incrementing the number of the string and converting the new number to a string. This routine is mainly used internally by Math::String and updates the cache of the given Math::String.

file()

        $file = $charset->file();

Return the path/name of the dictionary file beeing used in constructing this character set.

num2str()

        my ($string,$length) = $charset->num2str($number);

Converts a Math::BigInt/Math::String to a string. In list context it returns the string and the length, in scalar context only the string.

str2num()

        $number = $charset->str2num($str);

Converts a string (literal string or Math::String object) to the corrosponding number form (as Math::BigInt).

offset()

        my $offset = $charset->offset($number);

Returns the offset of the n'th word into the dictionary file.

EXAMPLES

        use Math::String;
        use Math::String::Charset::Wordlist;

        my $cs = 
          Math::String::Charset::Wordlist->new( { file => 'big.sorted' } );
        my $x = 
          Math::String->new('',$cs)->binc();    # $x is now the first word

        while ($x < Math::BigInt->new(10))      # Math::BigInt->new() necc.!
          {
          # print the first 10 words
          print $x++,"\n";
          }

BUGS

None discovered yet.

AUTHOR

If you use this module in one of your projects, then please email me. I want to hear about how my code helps you ;)

This module is (C) Copyright by Tels http://bloodgate.com 2003-2008.