NAME

Devel::Tokenizer::C - Generate C source for fast keyword tokenizer

SYNOPSIS

  use Devel::Tokenizer::C;
  
  $t = new Devel::Tokenizer::C TokenFunc => sub { "return \U$_[0];\n" };
  
  $t->add_tokens( qw( bar baz ) )->add_tokens( ['for'] );
  $t->add_tokens( [qw( foo )], 'defined DIRECTIVE' );
  
  print $t->generate;

DESCRIPTION

The Devel::Tokenizer::C module provides a small class for creating the essential ANSI C source code for a fast keyword tokenizer.

The generated code is optimized for speed. On the ANSI-C keyword set, it's 2-3 times faster than equivalent code generated with the gprof utility.

The above example would print the following C source code:

  switch( tokstr[0] )
  {
    case 'b':
      switch( tokstr[1] )
      {
        case 'a':
          switch( tokstr[2] )
          {
            case 'r':
              if( tokstr[3] == '\0' )
              {                                     /* bar        */
                return BAR;
              }
  
              goto unknown;
  
            case 'z':
              if( tokstr[3] == '\0' )
              {                                     /* baz        */
                return BAZ;
              }
  
              goto unknown;
  
            default:
              goto unknown;
          }
  
        default:
          goto unknown;
      }
  
    case 'f':
      switch( tokstr[1] )
      {
        case 'o':
          switch( tokstr[2] )
          {
  #if defined DIRECTIVE
            case 'o':
              if( tokstr[3] == '\0' )
              {                                     /* foo        */
                return FOO;
              }
  
              goto unknown;
  #endif /* defined DIRECTIVE */
  
            case 'r':
              if( tokstr[3] == '\0' )
              {                                     /* for        */
                return FOR;
              }
  
              goto unknown;
  
            default:
              goto unknown;
          }
  
        default:
          goto unknown;
      }
  
    default:
      goto unknown;
  }

So the generated code only includes the main switch statement for the tokenizer. You can configure most of the generated code to fit for your application.

CONFIGURATION

TokenFunc => SUBROUTINE

A reference to the subroutine that returns the code for each token match. The only parameter to the subroutine is the token string.

This is the default subroutine:

  TokenFunc => sub { "return $_[0];\n" }

TokenString => STRING

Identifier of the C character array that contains the token string. The default is tokstr.

UnknownLabel => STRING

Label that should be jumped to via goto if there's no keyword matching the token. The default is unknown.

TokenEnd => STRING

Character that defines the end of each token. The default is the null character '\0'.

CaseSensitive => 0 | 1

Boolean defining whether the generated tokenizer should be case sensitive or not. This will only affect the letters A-Z. The default is 1, so the generated tokenizer is case sensitive.

ADDING TOKENS

You can add tokens using the add_tokens method.

The method either takes a list of token strings or a reference to an array of token strings which can optionally be followed by a preprocessor directive string.

Calls to add_tokens can be chained together, as the method returns a reference to its object.

GENERATING THE CODE

The generate method will return a string with the tokenizer switch statement. If no tokens were added, it will return an empty string.

AUTHOR

Marcus Holland-Moritz <mhx@cpan.org>

BUGS

I hope none, since the code is pretty short. Perhaps lack of functionality ;-)

COPYRIGHT

To install Devel::Tokenizer::C, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Devel::Tokenizer::C

CPAN shell

perl -MCPAN -e shell
install Devel::Tokenizer::C

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)