The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::Phonology::Rules - a module for defining and applying phonological rules.

SYNOPSIS

        use Lingua::Phonology;
        $phono = new Lingua::Phonology;

        $rules = $phono->rules;

        # Adding and manipulating rules is discussed in the "WRITING RULES"
        # section

DESCRIPTION

This module allows for the creation of linguistic rules, and the application of those rules to "words" of Segment objects. You, the user, add rules to a Rules object, defining various parameters and code references that actually perform the action of the rule. Lingua::Phonology::Rules will take care of the guts of applying and creating rules.

The rules you create may have the following parameters. This is just a brief description of the parameters--a more detailed discussion of their effect is in the "WRITING RULES" section.

  • domain

    Defines the domain within which the rule applies. This should be the name of a feature in the featureset of the segments which the rule is applied to.

  • tier

    Defines the tier on which the rule applies. Must be the name of a feature in the feature set for the segments of the word you pass in.

  • direction

    Defines the direction that the rule applies in. Must be either 'leftward' or 'rightward.' If no direction is given, defaults to 'rightward'.

  • filter

    Defines a filter for the segments that the rule applies on. Must a code reference that returns a truth value.

  • linguistic

    Defines a linguistic-style rule to be parsed. When you provide a linguistic-style rule, it is parsed into code references that take the place of the where and do properties listed below. The format of linguistic rules is described in "LINGUISTIC-STYLE RULES" in Lingua::Phonology::FileFormatPOD.

  • where - defines the condition or conditions where the rule applies. Must be a coderef that returns a truth value. If no value is given, defaults to always true.

  • do - defines the action to take when the where condition is met. Must be a code reference. If no value is given, does nothing.

Lingua::Phonology::Rules is flexible and powerful enough to handle any sequential type of rule system. It cannot handle Optimality Theory-style processes, because those require a fundamentally different kind of algorithm.

METHODS

new

Returns a new Lingua::Phonology::Rules object. This method accepts no arguments.

add_rule

Adds one or more rules to the list. Takes a series of key-value pairs, where the keys are the names of rules to be added, and the values are hashrefs. Any of the parameters mentioned above may be used, so a single rule has the following maximal structure:

        'Name of Rule' => {
                domain    => 'some_feature',
                tier      => 'some_other_feature',
                direction => 'rightward', # Can only be 'rightward' or 'leftward'
        filter    => \&filter,
                where     => \&where,
                do        => \&do_this,
                result    => \&result
        }

If you are using a linguistic rule, then the where and do keys are unnecessary and should not be used. In that case, the rule has the following maximal structure:

    'Linguistic Rule' => {
        domain     => 'some_feature',
        tier       => 'some_other_feature',
        direction  => 'rightward',
        filter     => \&filter,
        linguistic => '[foo] => [bar] / [baz]',
        result     => \&result
    }

A detailed explanation of how to use these to make useful rules is in "WRITING RULES". A typical call to add_rule might look like what follows. Assume that 'nasal' and 'SYLL' are defined in the feature set you're using, and that nasalized() and denasalize() are subroutines defined elsewhere.

        $rules->add_rule(
                Denasalization => {
                        tier => 'nasal',
                        domain => 'SYLLABLE',
                        direction => 'rightward',
                        where => \&nasalized,
                        do => \&denasalize
                }
        );

This method returns true if all rules were added successfully, otherwise false. If a rule already exists with the name you're attempting to add, it is first dropped.

drop_rule

    $rules->drop_rule('Rule');

Takes one argument, the name of a rule, and removes that rule. Returns the hash reference of the properties of that rule, or undef if no such rule actually existed.

change_rule

    $rules->change_rule(
        Denasalization => {
            tier => undef,
            where => undef,
            filter => \&nasalized
        }
    );

This method is exactly like add_rule(), except that it may be used to change parameters on an existing rule. If the method call given above were used after the one shown for add_rule(), then the 'Denasalization' rule would be changed to have no tier or 'where' condition, but to have a filter defined by the subroutine nasalized. The other properties of the rule would be unchanged. If you attempt to use change_rule() with a rule that does not yet exist, you will get an error.

Returns true if all changes succeed, otherwise false.

loadfile

    $rules->loadfile('phono.xml');

Loads rule definitions from a file. Returns true on success, false on failure. This feature is new as of v0.3, and comes with new capability for reading rules in a readable linguistic format. This is far too complex to describe here: please consult Lingua::Phonology::FileFormatPOD for details.

clear

    $rules->clear;

Resets the Lingua::Phonology::Rules object by deleting all rules and all rule ordering.

tier

See below.

domain

See below.

direction

See below.

filter

See below.

where

See below.

do

See below.

result

All of the above methods behave identically. They may take one or two arguments. The first argument is the name of a rule. If only one argument is given, then these return the property of the rule that they name. If two arguments are given, then they set that property to the second argument. For example:

        $rules->tier('Rule');                           # Returns the tier
        $rules->tier('Rule', 'feature');        # Sets the tier to 'feature'
        $rules->domain('Rule');                         # Returns the domain
        $rules->domain('Rule', 'feature');      # Sets the domain to 'feature'
    # Etc.

apply

    $rules->apply('Denasalization', \@word);

Applies a rule to a "word". The first argument to this function is the name of a rule, and the second argument is a reference to an array of Lingua::Phonology:: Segment objects. apply() will take the rule named and apply it to each segment in the array, after doing some appropriate magic with the tiers and the domains, if specified. For a full explanation on how apply() works and how to exploit it, see below in "WRITING RULES".

As of v0.2, the return value of apply() is an array with the modified contents of the array that was passed as a reference in the call to apply(). Thus, the return value of the rule above, if it were captured, would be the same as the contents of @word after apply() was called.

This method will set count, clobbering any earlier value. See "count" below.

Applying rules by name

You may also call rule names themselves as methods, in which case the only needed argument is an array reference to the word. Thus, the following is exactly identical to the preceding example:

        $rules->Denazalization(\@word);

apply_all

    $rules->apply_all(\@word);

When used with persist() and order(), this method can be used to apply all rules to a word with one call. The argument to this method should be a list of Segment objects, just as with apply().

Calling apply_all() applies the rules in the order specified by order(), applying the rules in persist() before and after every one. Rules that are part of the current object but which aren't specified in order() or persist() are not applied. See "order" and "persist" for details on those methods.

For example, say you had the following code:

    $rules->persist('Persist 1', 'Persist 2');
    $rules->order(['A-1', 'A-2', 'A-3'], 'B', ['C-1', 'C-2']);
    $rules->apply_all(\@word);

When you call apply_all, the rules would be applied in this order:

    Persist 1
    Persist 2
    A-1
    A-2
    A-3
    Persist 1
    Persist 2
    B
    Persist 1
    Persist 2
    C-1
    C-2
    Persist 1
    Persist 2

In v0.2, the return value of apply_all() has changed (again). Now, apply_all() always returns a hash reference whose keys are the names of rules and whose values are the number of times that those rules were applied. This is the same thing that count() returns after a call to apply_all(). See "count" below.

order

    $rules->order(['A-1', 'A-2', 'A-3'], 'B', ['C-1', 'C-2']);

If called with no arguments, returns an array of the current order in which rules apply when calling apply_all(). If called with one or more arguments, this sets the order in which rules apply.

The arguments to order() should be array references or strings. If you pass an array reference, the elements in the array should be strings that are the names of rules. A string is interpreted as an array reference of one element. When "apply_all" is called, all rules that are bundled together in one array will be applied, then the persistent rules will be applied, as described above.

Any strings that you pass will be converted to single-element array references when they are returned. Calling this:

    $rules->order(1, 2, 3);

actually returns this:

    ([1], [2], [3])

persist

    $rules->persist('Persist 1', 'Persist 2');

If called with no arguments, returns an array of the current order in which persistent rules apply when calling apply_all(). Persistent rules are applied at the beginning and end of rule processing and between every rule in the middle. Calling this with one or more arguments assigns the list of persistent rules (and knocks out the existing list). You should not call persist() with array reference arguments, unlike order().

count

After a call to apply() or apply_all(), this method can be used to find out how many times the rule was applied. After apply(), the return value of this function will be an integer. After apply_all(), the return value of this method will be a hash reference, the keys of which are the rules that were applied, and the values of which are the times that those rules applied. Whatever value is there will be clobbered in the next call to apply() or apply_all(), so get it while you can.

WRITING RULES

Overview of the rule algorithm

The details of the algorithm, of course, are the module's business. But here's a general overview of what goes on in the execution of a rule:

  • The segments of the input word are broken up into domains, if a domain is specified. This is discussed in "using domains".

  • The segments of each domain are taken and the tier, if there is one, is applied to it. This generally reduces the number of segments being evaluated. Details of this process are discussed below in "using tiers".

  • The segments remaining after the tier is applied are passed through the filter. Segments for which the filter evaluates to true are passed on to the executer.

  • Executing the rule involves examining every segment in turn and deciding if the criteria for applying the rule, defined by the where property, are met. If so, the action defined by do is performed. If the direction of the rule is specified as "rightward", then the criterion-checking and rule execution begin with the leftmost segment and proceed to the right. If the direction is "leftward", the opposite occurs: focus begins on the rightmost segment and proceeds to the left.

  • If a result is specified, after each potential application of the do code, the result condition will be checked. If that condition is true, the rule application goes on to the next segment. If the result condition is false, then the rule is "undone", leaving the input word exactly the way that it was before.

The crucial point is that the rule mechanism has focus on one segment at a time, and that this focus proceeds across each available segment in turn. Criterion checking and execution are done for every segment. According to the order given above, where and do are almost the last things to be executed, but they're the most fundamental, so we'll examine them first.

Using linguistic rules

The linguistic rule format is a powerful way to write out phonological processes that is often easier to write and understand than using pure Perl. This format is described in "LINGUISTIC-STYLE RULES" in Lingua::Phonology::FileFormatPOD. When you include a linguistic rule, it replaces the where and do properties, but the other properties may still exist.

Using 'where' and 'do'

If you are not using a linguistic rule, the actual criteria and execution are done by the coderefs that you supply. So you have to know how to write reasonable criteria and actions.

Lingua::Phonology::Rules will pass an array of segments to both of the coderefs that you give it. This array of segments will be arranged so that the segment that currently has focus will be at index 0, the following segment will be at 1, and the preceding segment at -1, etc. The ends of the "word" (or domain, if you're using domains) are indicated by special segments that have the feature BOUNDARY, and no other features.

For example, let's say we had applied a rule to a simple four-segment word as in the following example:

        $rules->apply('MyRule', [$b, $a, $n, $d]);

If MyRule applies rightward and there are no tiers or domains, then the contents of @_ will be as follows on each of the four turns. Boundary segments are indicated by '_B_':

                 $_[-2]   $_[-1]   $_[0]   $_[1]   $_[2]   $_[3]
        
        turn 1    _B_      _B_      $b      $a      $n      $d
        turn 2    _B_      $b       $a      $n      $d      _B_
        turn 3    $b       $a       $n      $d      _B_     _B_
        turn 4    $a       $n       $d      _B_     _B_     $b

This makes it easy and intuitive to refer to things like 'current segment' and 'preceding segment'. The current segment is $_[0], the preceding one is $_[-1], the following segment is $_[1], etc.

It's true that if the focus is on the first segment of the word, $_[-3] refers to the last segment of the word. So be careful. Besides, you should rarely, if ever, need to refer to something that far away. If you think you do, then you're probably better off using a tier or filter.

Boundary segments themselves are impervious to any attempt to alter or delete them. However, there is nothing that prevents you from setting some other segment to be a boundary, which will do very strange and probably undesirable things. Don't say I didn't warn you.

Using our same example, then, we could write a rule that devoices final consonants very easily.

        # Create the rule with two simple code references
        $final = sub { $_[1]->BOUNDARY };
        $devoice = sub { $_[0]->delink('voice') };
        $rules->add_rule(FinalDevoicing => { where => $final,
                                             do    => $devoice });
        
        @word = $symbols->segment('b', 'a', 'n', 'd');
        $rules->FinalDevoicing(\@word);
        print $symbols->spell(@word); # Prints 'bant'

It is recommended that you follow the intent of the design, and only use the 'where' property to check conditions, and use the 'do' property to actually affect changes. We have no way of enforcing this, however.

Note that, since the code in 'where' and 'do' simply operates on a local subset of the segments that you provided as the word, doing something like delete($_[0]) doesn't have any effect. Neither does adding segments to @_ do anything. To properly perform insertion and deletion, see "Writing insertion and deletion rules" below.

Using domains

Domains change the segments that are visible to your rules by splitting the word given into parts.

The value for a domain is the name of a feature. If the domain property is specified for a rule, the input word given to the rule will be broken into groups of segments whose value for that feature are references to the same value. For the execution of the rule, those groups of segments act as complete words with their own boundaries. For example:

        @word = $symbols->segment('b','a','r','d','a','m');

    # We make two groups of segments whose SYLLABLE features are all references
    # to the same value. Note that something very much like this is done
    # automatically with the Lingua::Phonology::Syllable module.
    #
    # Syllable 1
        $word[0]->SYLLABLE(1);
        $word[1]->SYLLABLE($word[0]->value_ref('SYLLABLE'));
        $word[2]->SYLLABLE($word[0]->value_ref('SYLLABLE'));

        # Syllable 2
        $word[3]->SYLLABLE(1);
        $word[4]->SYLLABLE($word[3]->value_ref('SYLLABLE'));
        $word[5]->SYLLABLE($word[3]->value_ref('SYLLABLE'));

    # Now we make a rule to drop the last consonant in any syllable
        $rules->add_rule(
                'Drop Final C' => {
                        domain => 'SYLLABLE',
                    where => sub { $_[1]->BOUNDARY },
                        do => sub { $_[0]->DELETE }
                }
        );
        
        $rules->apply('Drop Final C', \@word);
    print $symbols->spell(@word); # Prints 'bada'

In this example, if we hadn't specified the domain 'SYLLABLE', only the /m/ would have been deleted, because only the /m/ would have been at a boundary. With the SYLLABLE domain, however, the input word is broken up into the two syllables, which act as their own words with respect to boundaries.

Using tiers

Many linguistic rules behave transparently with respect to some segments or classes of segments. Within the Rules class, this is accomplished by setting the "tier" property of a rule.

The argument given to a tier is the name of a feature. When you specify a tier for a rule and then apply that rule to an array of segments, the rule will only apply to those segments that are defined for that feature. Note that I said 'defined'--binary or scalar features that are set to 0 will still appear on the tier.

This is primarily useful for defining rules that apply across many intervening segments. For example, let's say that you have a vowel harmony rule that applies across any number of intervening consonants. The best solution is to specify that the rule has the tier 'vocoid'. This will cause the rule to completely ignore all non-vocoids: non-vocoids won't even appear in the array that the rule works on. For example:

        # Make a rather contrived word
        @word = $symbols->segment('b','u','l','k','t','r','i'),

Note that if we were doing this without tiers, we would have to specify $_[5] to see the final /i/ from the /u/. No such nonsense is necessary when using the 'vocoid' tier, because the only segments that the rule "sees" are ('u','i'). Thus, the following rule spreads frontness from right to left.

        # Make the rule, being sure to specify the tier
        $rules->add_rule(
                VowelHarmony => {
                        tier => 'vocoid',
                direction => 'leftward',
            
            # We specify that the last vowel in a word should never change
                        where => sub { not $_[1]->BOUNDARY },

            # All vowels before the last copy the front/backness of the vowel
            # after them. Front/back position is dominated by the 'Lingual'
            # node, so we just copy the whole node.
                        do => sub { $_[0]->Lingual( $_[1]->value_ref('Lingual') ) }
                }
        );
        
        # Apply the rule and print out the result
        $rules->VowelHarmony(\@word);
        print $symbols->spell(@word); # prints 'bylktri'

Tiers include one more bit of magic. When you define a tier, if consecutive segments have references to the same value for that tier, Lingua::Phonology::Rules will combine them into one segment. Once such a segment is constructed, you can assign or test values for the tier feature itself, or any features that are children of the tier (if the tier is a node). Assigning or testing other values will generally fail and return undef, but it may succeed if the return values of the assignment or test are the same for every segment. Be careful.

This (hopefully) makes linguistic sense--if you're using the tier 'SYLLABLE', what you're really interested in are interactions between whole syllables. So that's what you see in your rule: "segments" that are really syllables and include all of the true segments inside them.

When using domains and tiers together, the word is broken up into domains before the tier is applied. Thus, two segments which might otherwise have been combined into a single pseudo-segment on a tier will not be combined if they fall into different domains.

Using filters

Filters are a more flexible, but less magical, way of doing the same thing that a tier does. You define a filter as a code reference, and all of the segments in the input word are put through that code before going on to the rule execution. Your code reference should accept a single Lingua::Phonology::Segment object as an argument and return some sort of truth value that determines whether the segment should be included.

A filter is a little like a tier and a little like a where, so here's how it differs from both of those:

  • Unlike a tier, the filter property is a code reference. That means that your test can be arbitrarily complex, and is not limited to simply testing for whether a property is defined, which is what a tier does. On the other hand, there is no magical combination of segments with a tier.

  • Also, the rule algorithm takes the filter and goes over the whole word with it once, picking out those segments that pass through the filter. It then hands the filtered list of segments to be evaluated by where and do. A where property, on the other hand, is evaluated for each segment in turn, and if the where evaluates to true, the do code is immediately executed.

Filters are primarily useful when you want to only see segments that meet a certain binary or scalar feature value, or when you want to avoid the magical segment-joining of a tier.

Writing insertion and deletion rules

The arguments provided to the coderefs in where and do are in a simple list, which means that it's not really possible to insert and delete segments in the word from the coderef. Segments added or deleted in @_ will disappear once the subroutine exits. Lingua::Phonology::Rules provides a workaround for both of these cases.

Deletion is accomplished by calling the special method DELETE() on the segment to be deleted. A rule deleting coda consonants can be written thus:

        # Assume that we have already assigned coda consonants to have the
        # feature 'coda'
        $rules->add_rule(
                DeleteCodaC => {
                        where => sub { $_[0]->coda },
                do => sub { $_[0]->DELETE }
                }
        );

In previous versions of Lingua::Phonology::Rules, deletion was accomplished by calling clear() on the segment. This still works--if you call clear(), your segment will also be deleted from output. However, using DELETE has an advantage over using clear(), namely that if you call clear() on a segment, any other copies of the segment will also have their features cleared. When you call DELETE, only the copy of the segment in the rule is dropped, while other copies of the segment are unaffected.

Insertion can be accomplished using the special methods INSERT_RIGHT() and INSERT_LEFT() on a segment. The argument to INSERT_RIGHT() or INSERT_LEFT() must be a Lingua::Phonology::Segment object, which will be added to the right or the left of the segment on which the method is called. For example, the following rule inserts a schwa to the left of a segment that is unsyllabified (does not have its SYLL feature set):

        $rules->add_rule(
                Epenthesize => {
                        where => sub { not $_[0]->SYLL },
                do => { $_[0]->INSERT_LEFT($symbols->segment('@')) }
                }
        );

Note that the methods DELETE(), INSERT_RIGHT() and INSERT_LEFT() don't exist except during the application of a rule.

When the segments you insert or delete (dis)appear depends on the settings for the rule. When a domain is in effect, segments are not added into the working copy of the word until the current rule exists. In all other situations, the segments appear/disappear "immediately". "Immediately" means "as soon as the current iteration of the rule finishes." For example, consider these rules:

    $rules->add_rule(
        Instant => {
            where => sub { $_[0]->spell eq 's' },
            do => sub { $_[0]->INSERT_RIGHT($symbols->segment('i')) }
        }
        Delayed => {
            where => sub { $_[0]->spell eq 's' },
            do => sub { $_[0]->INSERT_RIGHT($symbols->segment('i')) },
            domain => 'SYLL'
        }
    );

    @word = $symbols->segment(split //, 'kasta');

When the rule 'Instant" is applied to @word, the 'i' which is inserted appears as soon as the code reference that inserts the 'i' finishes. After focus moves off of the 's', it moves onto the 'i' which was inserted. When the rule 'Delayed' is applied, however, the 'i' does not appear immediately because a domain exists for the rule, and focus moves from the 's' onto the 't'. The inserted 'i' does not appear until the whole rule finishes.

This behavior is necessary for several reasons. It is generally desirable to have segments appear as soon as possible. However, when a domain is in effect the calculation time for rebuilding the domains is prohibitive. Additionally, the insertion of a segment can move domain boundaries and have bizarre and unpredictable effects. For these reasons, segment insertion/deletion is delayed when domains are used.

You CANNOT insert or delete segments when a tier is in effect. Such rules are usually nonsensical, since a tier encapsulates several segments, and it's impossible to know how or where to insert a new segment. If you attempt to call INSERT_RIGHT, INSERT_LEFT, or DELETE while a tier is in effect, you will get a warning and the call will be ignored.

Much of this behavior is new as of v0.32. Earlier verions were not nearly as consistent or predictable with respect to insertion or deletion.

Developer goodies

There are a couple of things here that are probably of no use to the average user, but have come in handy when developing code for other modules or scripts to use. And who knows, you may have a use for them.

All segments have the property _RULE during the execution of a rule. This method returns a hash reference that has keys corresponding to the properties of the currently executing rule. These properties include do, where, domain, tier, direction, etc. If for some reason you need to know one of these during the execution of a rule, you can use this to do so. Note that altering the hash reference will NOT alter the actual properties of the current rule.

Here's a silly example:

        sub print_direction {
                print $_[0]->_RULE->{direction}, "\n";
        }

        $rules->add_rule(
                PrintLeft => {
                        direction => 'leftward',
                        do => \&print_direction
                },
                PrintRight => {
                        direction => 'rightward',
                        do -> \&print_direction
                });
        
        $rules->PrintLeft(\@word);    # Prints 'leftward' several times
        $rules->PrintRight(\@word);   # Prints 'rightward' several times

BUGS

When you call clear() during the execution of a rule, the segment is deleted even if you restore some feature values to it immediately afterwards.

There are no diagnostics for finding syntax errors in linguistic rules.

The documentation is confusing and poorly written.

AUTHOR

Jesse S. Bangs <jaspax@cpan.org>

LICENSE

This module is free software. You can distribute and/or modify it under the same terms as Perl itself.