The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Parse::Eyapp::datagenerationtut - Tutorial on Using Parse::Eyapp as a Data Generator for Testing

INTRODUCTION

The examples for this tutorial can be found in the directory examples/generator in the distribution of Parse::Eyapp.

To understand the code you will need some familiarity with Test::LectroTest::Generator, however, we will make an attempt to introduce the basics of Test::LectroTest::Generator needed.

While parsing is the process of determining the membership of a string to a language, generation is the reverse problem. Using a context free grammar it is possible to generate strings belonging to the language described by that grammar.

Context free grammars can be used to generate tests. The programmer designs a grammar that defines a set of inputs that will be able to find some set of bugs.

This tutorial shows how to use Parse::Eyapp to generate phrases belonging to the language defined by a given grammar. We will generate inputs to test a simple calculator.

Compiling and Running the Example

The grammar describing the language is in the file Generator.eyp. Calling eyapp with option -c will show the contents of the file without the semantic actions:

  Parse-Eyapp/examples/generator$ eyapp -c Generator.eyp
  # file: Generator.eyp
  # compile with: eyapp -b '' Generator.eyp
  # then run: ./Generator.pm
  %strict
  %token NUM VARDEF VAR
  %right '='
  %left '-' '+'
  %left '*' '/'
  %left NEG
  %right '^'

  %%

  stmts:
        stmt
      | stmts ';'  stmt
  ;
  stmt:
        VARDEF '=' exp
  ;
  exp:
        NUM
      | VAR
      | exp '+' exp
      | exp '-' exp
      | exp '*' exp
      | exp '/' exp
      | '-'  exp %prec NEG
      | exp '^' exp
      | '('  exp ')'
  ;

  %%

This grammar defines a language of sequences of semicolon separated assignments. The right hand side of an assignment can be any valid arithmetic expression including numbers and variables.

First we compile the grammar with option -b '' to produce a modulino:

  Parse-Eyapp/examples/generator$ eyapp -b '' Generator.eyp

Now, the module has execution permits and its first line contains the #! header:

  Parse-Eyapp/examples/generator$ ls -ltr | tail -1
  -rwxr-xr-x 1 lusasoft lusasoft 7844 2009-01-12 08:30 Generator.pm
  Parse-Eyapp/examples/generator$ head -1 Generator.pm
  #!/usr/bin/perl

The use of option -b '' combined with the fact that we have added these lines

    67  unless (caller) {
    68    __PACKAGE__->main(@ARGV);
    69  }

at the end of the grammar file Generator.eyp provide the generated file with a dual nature: it is a module and an executable at the same time. This is what is know as a modulino (term coined by Brian d Foy).

Here follows the results of several executions. Each run produces a set of assignments. The first output line reports the result of the randomly generated program.

  Parse-Eyapp/examples/generator$ ./Generator.pm
  # result: -3
  SC=-3

  # result: error. Division by zero.
  M=(-4/6+4)+9+2*8*9+4;
  XQ=8/3;
  EI=XQ*2/0/0;
  BL=5+EI+4/5/XQ

As you can see in the former run, only variables that were defined in previous assignments are used in later assignments. However, the generated source may produce run-time errors and exceptions (which is good thing when testing a calculator).

  Parse-Eyapp/examples/generator$ ./Generator.pm

  # result: 6
  CF=(6)

  Parse-Eyapp/examples/generator$ ./Generator.pm

  # result: -710.2
  I=(3*-8+7/5);
  R=2+8*I*4+5*2+I/I

  Parse-Eyapp/examples/generator$ ./Generator.pm

  # result: Calculator syntax differs from Perl... 
  RY=2--2+(3+6)+(7*7*4^1+2*0/8*5/3)

GENERATING PHRASES FROM A CONTEXT FREE GRAMMAR

Using YYExpect and Test::LectroTest::Generator to generate tokens

The basic idea of using Parse::Eyapp to generate phrases for the language defined by a given context free grammar is simple: change the lexer by a token generator. Instead of reading from some input, randomly generate one of the valid tokens.

We can use the method YYExpect to know what tokens are valid. For versions 1.137 and later of Parse::Eyapp, the method YYExpect returns the set of valid tokens at the time it is called. For previous versions (and this is also true for Parse::Yapp), YYExpect only returns a subset of the whole set of valid tokens.

In this example, the token generator has been isolated in the sub gen_lexer in the file GenSupport.pm:

    47  sub gen_lexer {
    48    my $parser = shift;
    49
    50    my $token = $parser->generate_token;
    51    my $attr = $parser->generate_attribute($token);
    52    #$attr = $WHITESPACES->generate.$attr;
    53
    54    return ($token, $attr);
    55  }

The token and its attribute are generated in lines 50 and 51. The methods generate_token and generate_attribute are also in the module GenSupport.pm. They are methods of the parser object since the grammar Generator.eyp not only uses but inherits this module. See line 3 of Generator.eyp:

  Parse-Eyapp/examples/generator$ sed -ne '19,24p' Generator.eyp | cat -n
     1  %{
     2  use base q{Parse::Eyapp::TokenGen};
     3  use base q{GenSupport};
     4  %}
     5
     6  %%

The method generate_token obtains the set of valid tokens using YYExpect (line 29). Then uses the Frequency function in Test::LectroTest::Generator to produce a Test::LectroTest::Generator object (line 31). The method generate of such object is used to generate the actual token (line 33).

    26  sub generate_token {
    27    my $parser = shift;
    28
    29    my @token = $parser->YYExpect;
    30
    31    my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token);
    32
    33    return $tokengen->generate;
    34  }

The Parse::Eyapp::TokenGen method token_weight returns the weight associated with a token, assuming it was previously set using one of the Parse::Eyapp::TokenGen methods like set_tokenweightsandgenerators or set_tokenweights. See the code of method main in GenSupport.pm:

  examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n
     1    my $parser = $package->new();
     2
     3    $parser->set_tokenweightsandgenerators(
     4      NUM => [ 2, Int(range=>[0, 9], sized=>0)],
     5      VAR => [
     6                0,  # At the beginning, no variables are defined
     7                Gen {
     8                  return  Elements(keys %st)->generate if keys %st;
     9                  return Int(range=>[0, 9], sized=>0)->generate;
    10                },
    11              ],
    12      VARDEF => [
    13                  2,
    14                  String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
    15                ],
    16      '=' => 2, '-' => 1, '+' => 2,
    17      '*' => 4, '/' => 2, '^' => 0.5,
    18      ';' => 1, '(' => 1, ')' => 2,
    19      ''  => 2, 'error' => 0,
    20    );

A Brief Introduction to Test::LectroTest::Generator

The module GenSupport.pm uses Test::LectroTest::Generator to build generators for the required tokens. Thus the call to

                   Int(range=>[0, 9], sized=>0)

builds a Test::LectroTest::Generator object that produces integers in the range [0,9]. Such objects have a method generate that produces the actual item. The following debugger session illustrates the way to use Test::LectroTest::Generator:

  pl@europa:~/LEyapp$ perl -wde 0
  main::(-e:1):   0
    DB<1> use Test::LectroTest::Generator qw{:all}
    DB<2> $i = Int(range=>[0, 9], sized=>0)
    DB<3> p $i->generate
  6
    DB<4> p $i->generate
  9

The String method builds a Test::LectroTest::Generator object that produces strings:

    DB<5> $v = String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
    DB<6> p $v->generate
  HM
    DB<7> p $v->generate
  Y
    DB<8> p $v->generate
  KE

The Elements method builds a Test::LectroTest::Generator object that produces one of a given list of elements:

    DB<9> @a = map { $v->generate } 1..10
    DB<10> x @a
  0  'UC'
  1  'P'
  2  'IF'
  3  'EJ'
  4  'H'
  5  'VC'
  6  'CF'
  7  'K'
  8  'T'
  9  'IG'
    DB<11> $x = Elements(@a)
    DB<12> p $x->generate
  P
    DB<13> p $x->generate
  P
    DB<14> p $x->generate
  EJ
    DB<15> p $x->generate
  VC

Even more interesting for our purpose is the Frequency method, which produces one of a given list of elements with a given probability distribution.

The following example illustrates its use. First we build a weight list where the odd elements have weight 2 and the even elements have weight 1:

  DB<16> @w = map { $_ % 2 ? 2 : 1 } 0..9
  DB<21> @w{@a} = @w
  DB<24>  x \%w
    0  HASH(0xd3cc80)
       'CF' => 1
       'EJ' => 2
       'H' => 1
       'IF' => 1
       'IG' => 2
       'K' => 2
       'P' => 2
       'T' => 1
       'UC' => 1
       'VC' => 2

We now use Frequency to build a Test::LectroTest::Generator object that produces one of the given list of elements @a according to the specified probability:

  DB<29> $f = Frequency( map { [$w{$_}, Unit($_)] } @a)

Let us generate 10 items. We see that odd elements appear more frequently than even elements:

  DB<30> @r = map { $f->generate } 1..10
  DB<31> p "@r"
    VC UC K UC VC VC K EJ P P

Generating Token Attributes

Once the token was generated through the call to generate_token at line 50:

    45  #my $WHITESPACES = String( length=>[0,1], charset=>" \t\n", size => 100 );
    46
    47  sub gen_lexer {
    48    my $parser = shift;
    49
    50    my $token = $parser->generate_token;
    51    my $attr = $parser->generate_attribute($token);
    52    #$attr = $WHITESPACES->generate.$attr;
    53
    54    return ($token, $attr);
    55  }

the associated attributed is generated via the generate_attribute method in GenSupport.pm. If needed, random combination of white spaces can be added to the generated attribute via an appropriate generator (line 52).

The generate_attribute method uses the method generate of the generator associated with such token. If no generator object was set, the attribute returned is the token itself (line 42):

    36  sub generate_attribute {
    37    my $parser = shift;
    38    my $token = shift;
    39
    40    my $gen = $parser->token_generator($token);
    41    return $gen->generate  if defined($gen);
    42    return $token;
    43  }

Holding Semantic Constraints

The attribute generator associated with the token VAR is more complex than the others. It was defined in the call to set_tokenweightsandgenerators:

  examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n
     1    my $parser = $package->new();
     2
     3    $parser->set_tokenweightsandgenerators(
     4      NUM => [ 2, Int(range=>[0, 9], sized=>0)],
     5      VAR => [
     6                0,  # At the beginning, no variables are defined
     7                Gen {
     8                  return  Elements(keys %st)->generate if keys %st;
     9                  return Int(range=>[0, 9], sized=>0)->generate;
    10                },
    11              ],
    12      VARDEF => [
    13                  2,
    14                  String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
    15                ],
    16      '=' => 2, '-' => 1, '+' => 2,
    17      '*' => 4, '/' => 2, '^' => 0.5,
    18      ';' => 1, '(' => 1, ')' => 2,
    19      ''  => 2, 'error' => 0,
    20    );

The Gen function of Test::LectroTest::Generator creates a new generator from a given code. Since a variable can't be used unless it is defined, we use a symbol table %st to keep record of the variables that were defined in previous assignments. If no defined variables exists, the defined generator returns a digit between 0 and 9.

Each time a new assignment to a variable occurs, such variable is added to the symbol table. This is achieved through the semantic action associated with the assignment production rule:

  examples/generator$ sed -ne '35,41p' Generator.eyp | cat -n
     1  stmt:
     2      VARDEF '=' exp
     3        {
     4          my $parser = shift;
     5          $parser->defined_variable($_[0]);
     6          "$_[0]=$_[2]";
     7        }

The defined_variable method in GenSupport.pm simply sets the corresponding entry in the symbol table:

  examples/generator$ sed -ne '19,24p' GenSupport.pm | cat -n
     1  my %st; # Symbol Table
     2  sub defined_variable {
     3    my ($parser, $var) = @_;
     4
     5    $st{$var} = 1;
     6  }

The semantic action associated with VARDEF '=' exp returns the string "$_[0]=$_[2]" containing the actual phrase. Since this is the semantic action required for most productions we make it our default action:

  examples/generator$ sed -ne '13,17p' Generator.eyp | cat -n
     1  %defaultaction {
     2    my $parser = shift;
     3
     4    return join '', @_;
     5  }

The syntactic variable stmts generates sequences of stmt separated by semicolons:

  examples/generator$ sed -ne '26,33p' Generator.eyp | cat -n
     1  stmts:
     2      stmt
     3        {
     4          $_[0]->deltaweight(VAR => +1); # At least one variable is defined now
     5          $_[1];
     6        }
     7    | stmts ';' { "\n" } stmt
     8  ;

The second production is left recursive. As a consequence, the stmt in the first production (line 2) is the first statement of the sequence. A small derivation can convince you of this property:

                                               stmts-> stmt
   stmts => stmts';' stmt => stmts';' stmt ';' stmt => stmt ';' stmt ';' stmt 
                                                       ----
     

Thus, when the reduction by the production stmts -> stmt occurs, we are sure that the first statement has been processed. In such case we increase the weight of token VAR one unit (which was initially zero, see the call to set_tokenweightsandgenerators),

           $_[0]->deltaweight(VAR => +1); 

The weight of VAR is now 1, giving chances for variables to appear in the right hand side of an assignment. The Parse::Eyapp::Tokengen method deltaweight increases (decreases if negative) the weight of the given tokens using the associated values.

Dynamically Changing the Probability Distribution

The semantic actions for the productions

               exp -> '(' exp ')' 

and

               exp -> '-' exp

show a way to modify the weights associated with some tokens:

 43 exp:
 44     NUM
 45   | VAR
 46   | exp '+' exp
 47   | exp '-' exp
 48   | exp '*' exp
 49   | exp '/' exp
 50   | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG
 51       {
 52         $_[0]->popweight();
 53         "-$_[3]"
 54       }
 55   | exp '^' exp
 56   | '('   { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); }
 57       exp
 58     ')'
 59       {
 60          $_[0]->popweight;
 61          "($_[3])"
 62       }
 63 ;

After seeing a '(' we decrease by one the weight of '(' to avoid expressions with nested parenthesis. We also increase the weight of token '+', since parenthesis are often used to give more priority to a sum over a multiplication or division. This is achieved via the pushdeltaweight method. The old weight is recovered after the closing parenthesis is seen using the popweight method.

Computing the Expected Result

Function evaluate_using_perl in GenSupport.pm finds the expected value for the generated expression. The calculator expression is roughly translated to a Perl expression and evaluated using the Perl interpreter:

 57 sub evaluate_using_perl { # if possible
 58   my $perlexp = shift;
 59
 60   $perlexp =~ s/\b([a-zA-Z])/\$$1/g; # substitute A by $A everywhere
 61   $perlexp =~ s/\^/**/g;             # substitute power operator: ^ by **
 62
 63   my $res = eval "no warnings; no strict;$perlexp";
 64   if ($@ =~ /Illegal division/) {
 65     $res = "error. Division by zero.";
 66   }
 67   elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas
 68     # Perl interprets -- in a different way
 69     $@ =~ m{(.*)}; # Show only the first line of error message
 70     $res = "Calculator syntax differs from Perl. Can't compute the result: $1";
 71   }
 72
 73   $res;
 74 }

The calculator language differs from Perl. In the calculator, two consecutive minus like in 2--3 are interpreted as 2+3 while for Perl the former expression is an error. This limitation is here to illustrate a limitation of the approach: it gives a way to generate complex structured inputs but the programmer must find a way to compute what the expected value is.

APPENDIX: FILES

File GenSupport.pm

  Parse-Eyapp/examples/generator$ cat -n GenSupport.pm
     1  package GenSupport;
     2  use strict;
     3  use warnings;
     4
     5  use Getopt::Long;
     6  use Test::LectroTest::Generator qw(:all);
     7  use Parse::Eyapp::TokenGen;
     8
     9  sub _Error {
    10    my $parser = shift;
    11
    12    my $t = $parser->YYCurval;
    13    my @e = $parser->YYExpect();
    14    my $attr = $parser->YYSemval(0);
    15    local $" = " ";
    16    warn "Error:\nCurrent attribute: <$attr>\nCurrent token: <$t>\nExpected: <@e>\n";
    17  }
    18
    19  my %st; # Symbol Table
    20  sub defined_variable {
    21    my ($parser, $var) = @_;
    22
    23    $st{$var} = 1;
    24  }
    25
    26  sub generate_token {
    27    my $parser = shift;
    28
    29    my @token = $parser->YYExpect;
    30
    31    my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token);
    32
    33    return $tokengen->generate;
    34  }
    35
    36  sub generate_attribute {
    37    my $parser = shift;
    38    my $token = shift;
    39
    40    my $gen = $parser->token_generator($token);
    41    return $gen->generate  if defined($gen);
    42    return $token;
    43  }
    44
    45  #my $WHITESPACES = String( length=>[0,1], charset=>" \t\n", size => 100 );
    46
    47  sub gen_lexer {
    48    my $parser = shift;
    49
    50    my $token = $parser->generate_token;
    51    my $attr = $parser->generate_attribute($token);
    52    #$attr = $WHITESPACES->generate.$attr;
    53
    54    return ($token, $attr);
    55  }
    56
    57  sub evaluate_using_perl { # if possible
    58    my $perlexp = shift;
    59
    60    $perlexp =~ s/\b([a-zA-Z])/\$$1/g; # substitute A by $A everywhere
    61    $perlexp =~ s/\^/**/g;             # substitute power operator: ^ by **
    62
    63    my $res = eval "no warnings; no strict;$perlexp";
    64    if ($@ =~ /Illegal division/) {
    65      $res = "error. Division by zero.";
    66    }
    67    elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas
    68      # Perl interprets -- in a different way
    69      $@ =~ m{(.*)}; # Show only the first line of error message
    70      $res = "Calculator syntax differs from Perl. Can't compute the result: $1";
    71    }
    72
    73    $res;
    74  }
    75
    76
    77  sub Run {
    78      my($self)=shift;
    79      my $yydebug = shift || 0;
    80
    81      return $self->YYParse(
    82        yylex => \&gen_lexer,
    83        yyerror => \&_Error,
    84        yydebug => $yydebug, # 0x1F
    85      );
    86  }
    87
    88  sub main {
    89    my $package = shift;
    90
    91    my $debug = shift || 0;
    92    my $result = GetOptions (
    93      "debug!" => \$debug,
    94    );
    95
    96    $debug = 0x1F if $debug;
    97
    98    my $parser = $package->new();
    99
   100    $parser->set_tokenweightsandgenerators(
   101      NUM => [ 2, Int(range=>[0, 9], sized=>0)],
   102      VAR => [
   103                0,  # At the beginning, no variables are defined
   104                Gen {
   105                  return  Elements(keys %st)->generate if keys %st;
   106                  return Int(range=>[0, 9], sized=>0)->generate;
   107                },
   108              ],
   109      VARDEF => [
   110                  2,
   111                  String( length=>[1,2], charset=>"A-NP-Z", size => 100 )
   112                ],
   113      '=' => 2, '-' => 1, '+' => 2,
   114      '*' => 4, '/' => 2, '^' => 0.5,
   115      ';' => 1, '(' => 1, ')' => 2,
   116      ''  => 2, 'error' => 0,
   117    );
   118
   119    my $exp = $parser->Run( $debug );
   120
   121    my $res = evaluate_using_perl($exp);
   122
   123    print "\n# result: $res\n$exp\n";
   124  }
   125
   126  1;

File Generator.eyp

  Parse-Eyapp/examples/generator$ cat -n Generator.eyp
     1  # file: Generator.eyp
     2  # compile with: eyapp -b '' Generator.eyp
     3  # then run: ./Generator.pm
     4  %strict
     5  %token NUM VARDEF VAR
     6
     7  %right  '='
     8  %left   '-' '+'
     9  %left   '*' '/'
    10  %left   NEG
    11  %right  '^'
    12
    13  %defaultaction {
    14    my $parser = shift;
    15
    16    return join '', @_;
    17  }
    18
    19  %{
    20  use base q{Parse::Eyapp::TokenGen};
    21  use base q{GenSupport};
    22  %}
    23
    24  %%
    25
    26  stmts:
    27      stmt
    28        {
    29          $_[0]->deltaweight(VAR => +1); # At least one variable is defined now
    30          $_[1];
    31        }
    32    | stmts ';' { "\n" } stmt
    33  ;
    34
    35  stmt:
    36      VARDEF '=' exp
    37        {
    38          my $parser = shift;
    39          $parser->defined_variable($_[0]);
    40          "$_[0]=$_[2]";
    41        }
    42  ;
    43  exp:
    44      NUM
    45    | VAR
    46    | exp '+' exp
    47    | exp '-' exp
    48    | exp '*' exp
    49    | exp '/' exp
    50    | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG
    51        {
    52          $_[0]->popweight();
    53          "-$_[3]"
    54        }
    55    | exp '^' exp
    56    | '('   { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); }
    57        exp
    58      ')'
    59        {
    60           $_[0]->popweight;
    61           "($_[3])"
    62        }
    63  ;
    64
    65  %%
    66
    67  unless (caller) {
    68    __PACKAGE__->main(@ARGV);
    69  }

SEE ALSO

CONTRIBUTORS

AUTHOR

Casiano Rodriguez-Leon (casiano@ull.es)

ACKNOWLEDGMENTS

This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.

A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.

I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Thanks to the people who have contributed to improve the module (see "CONTRIBUTORS" in Parse::Eyapp). Thanks to Larry Wall for giving us Perl. Special thanks to Juana.

LICENCE AND COPYRIGHT

Copyright (c) 2006-2008 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.

Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001

These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.