Herbert Breunung > Perl6-Doc > Perl6::Overview::Rule


Annotate this POD


Open  0
View/Report Bugs


Perl6::Overview::Rule - Grammar Rules


Match object

    $/  -- can be accessed as an array of sub-matches, e.g. $/[0], $/[1] ...
        -- can be accessed as a hash of named subrules, e.g. $/<foo>

    $0, $1, $2, ... -- aliases for $/[0], $/[1], $/[2], ...
    Context     Behaviour
    String      Stringifies to entire match
    Number      The numeric value of the matched string (i.e. given 
                "foo123bar"~~/\d+/, then $/ in numeric context will be 123)
    Boolean     True if success, false otherwise


    :       Prevents backtracking over previous atom
    ::      Fails entire group if previous atom is backtracked over
    :::     Fails entire rule if previous atom is backtracked over
    Also <commit> and <cut> (see below).


    Used as adverbial modifiers:

        "foO bar" ~~ m:s:ignorecase:g/[o | r] ./

    and can appear within rules
        sub rule { <foo> [:i <bar>] }   # only ignore <bar>'s case
    :b :basechar        Match base char ignoring accents and such
    :bytes              Match individual bytes
    :c, :continue       Start scanning from string's current .pos
    :codes              Match individual codepoints
    :ex, :exhaustive    Match every possible way (including overlapping)
    :g, :global         Find all non-overlapping matches
    :graphs             Match individual graphemes
    :i, :ignorecase     Ignore letter case
    :keepall            Force rule and all subrules to remember everything
    :chars              Match maximally abstract characters allowed by pragma
    :nth(N)             Find Nth occurrence. N is an integer; you can
                        also use 1st, 2nd, 3rd, 4th, as well as 1th, 2th.
                        (junctions allowed, e.g. :nth(1|2|3|5).)
    :once               Only match first time
    :p, :pos            Only try to match at string's current .pos
    :perl5              Use Perl 5 syntax for regex
    :ov, :overlap       Match at all possible character positions, including
                        overlapping; return all matches in list context,
                        disjunction in item context
    :rw                 Claim string for modification, instead of copy-on-write
    :s, :sigspace       Replaces every sequence of literal whitespace in
                        pattern with \s+ or \s* according to <?ws> rule
    :x(N)               Repetition -- find N times (N is an integer)

    [:ex vs :ov could use clarification.  see
     http://www.nntp.perl.org/group/perl.perl6.language/20985 ]
    [re :s, except where there is already an explicit space rule]

Built-in assertions

    '...'               Matches ... as a literal string
    "..."               Matches ... as a literal string (after interpolation)
    <sp>                Matches a literal space
    <ws>                Matches any sequence of whitespace, like the :s modifier
    <wb>                Matches any word boundary (Perl 5's \b semantics)
    <dot>               Matches a literal . (same as '.')
    <lt>                Matches a literal < (same as '<')
    <gt>                Matches a literal > (same as '>')
    <prior>             Match whatever the most recent successful match did
    <after pattern>     Matches only after pattern (zero-width)
    <before pattern>    Matches only before pattern (zero-width)
    <commit>            Fails the entire match if backtracked over
    <cut>               Fails the entire match if backtracked over,
                        and removes the portion of the string matched until then
    <fail>              Fails the entire match if reached
    <null>              Matches null string
    <ident>             Matches an "identifier", same as
                        ([ [<alpha> | _] \w* | " [<alpha>|_] \w* " ])
    <self>              Matches the same pattern as the current rule
                        (useful for recursion)
    <!XXX>              Zero width assertion that XXX *doesn't* match at 
                        this location
    <alnum>             Alphanumeric character
    <alpha>             Alphabetic character
    <ascii>             ASCII character
    <blank>             Horizontal whitespace ([ \t])
    <cntrl>             Control character
    <digit>             Numeric character
    <graph>             An alphanumeric character or punctuation
    <lower>             Lowercase character
    <print>             Printable character -- alphanumeric, punctuation or 
    <space>             Whitespace character ([\s\ck])
    <upper>             Uppercase character
    <word>              Word character (alphanumeric + _)
    <xdigit>            Hexadecimal digit

    Named rules are stored in the match object unless the rule name is
    prefixed with a ?

        /<?item> <quantity> <price>/

    would store values in $/{'quantity'} and $/{'price'} but not $/{'item'}

Character classes

    <[abcd]>            Matches one of the characters a,b,c, or d.  Ranges
                        may be used as <[a..z]>.  Can be combined with + 
                        and - like so: <[a..z]-[m..p]> (which is the same
                        as <[a..l]+[q..z]>)
    <-XXX>              Matches XXX as a negated character class.  For
                        instance, <-alpha> would match one non-alpha
                        character. May also be combined as above. 
                        (<-alpha+[qrst]> will match any non-alpha character
                        and the characters q,r,s, and t)
    <+XXX>              Matches XXX as a character class. <+alpha> matches
                        one alpha character.

    [This does not seem to match S05.  Which one is wrong?]

Hypothetical variables

    Assign value to variable only if entire pattern succeeds.  
    Syntax: let $foo = value

        my $x;
        / (\S*) { let $x = .pos } \s* foo /



    let $foo = $1


    $foo := $1
    my $x;
    / $x := (\S*) \s* foo /

Can use arrays:

    / @x := [ (\S+) \s* ]* /
    # returns anonymous list in item context for *, +, **{n,m}:
    / $x := [ (\S+) \s* ]* /
    # ? does not pluralize -- result or undef


    / %x := [ (\S+)\: \s* (.*) ]* / # key/value pairs
    # $0 = list of keys
    # $1 = list of values
    / %x := [ (\S+) \s* ]* /        # capture only keys, values = undef

Can capture return values of closures:

    / $x := { "Item context" } /
    / @x := { "List", "context" } /
    / %x := { "Hash" => "context" } /
    # note: no parens around closure

Reorder paren groups:

    / $1 := (.*?), \h* $0 := (.*) /
    # renumbering occurs
    / $2 := (.*?), \h* (.*) /       # $3 = (.*)

Relative to current location: $-1, $-2, $-3...

Named subrules:

    / <key>\: <value> { let $hash{$<key>} = $<value> } /
syntax highlighting: