Locale::MakePhrase::RuleManager - Language rule sort and evaluation object.
The Locale::MakePhrase module uses this plugin module, to implement the evaluation and sorting phases of text selection. It explains the rule expression syntax and evaluation procedure.
It sorts the language rules into a suitable order so that we can figure out which rule to select, ie. the aim is to sort the rules into an order so that we can select the first rule.
It evaluates the program arguments against the expressions in the rule.
To allow an argument to be placed within the middle of a string, we use square brackets as the notation, plus an underscore, then the argument number, as in:
"Please select [_1] files"
where [_1] refers to the first program variable supplied as an argument with the text to be translated.
To display square brackets within the text string, you will need to escape the square bracket by using the ~ (tilde) character, as in:
"This is ~[ bracketed text ~]"
this will print:
This is [ bracketed text ]
Of course, if you need to display the ~ character, you will need to use two of them, as in:
"Tilde needs escaping as in ~~"
which ends up printing:
Tilde needs escaping as in ~
We have coined the term linguistic rules as a means to describe the technique which decides which piece of text is displayed, for a given input text phrase and any/all program arguments.
To understand why we need to generate linguistic rules, consider the 'singular vs. plural' example shown in the "REQUIREMENTS" in Locale::MakePhrase section.
In this example, we needed four different text strings, for the trivial case of what to display for a given program value.
For other examples, the URL's mentioned in that section describe why there is a need for applying rules on a per-language basis (they also describe why most current language translation systems fail).
A linguistic rule is the evaluation of the context of a phrase by using program arguments, for a given program string. The arguments are evaluated left-to-right and top-to-bottom. The first rule to succeed has its corresponding translated text applied in-place of the input text.
Note that if a program string takes no arguments, the rule becomes rather simplistic (in that no arguments need to be evaluated).
Rules can be tested in a number of ways. The 'Operators' and 'Functions' sections list the available rule expression conjigates available for use within each rule expression.
Previously we mentioned that the language translation system used syntax with the form [_1]. You will notice that we use an underscore in the placeholder. This may appear to be meaningless, but as we will see, we use this rule property to help understand how rules are evaluated.
Let's show an example of a simple expression:
_1 == 0
The use of the underscore signifies that this value is to be classified as an argument number and is not to be treated literally. This expression says, 'Does the first argument have a value equal to zero?'
[Note that we use double-equals; the double-equals operator will use a numeric context in the equality test.]
Since an argument can also be a string, we could define an expression to be:
_1 eq "some text"
Notice that we use a different operator depending on whether the argument is numeric or a string. This is because we need to be able to figure out what context the argument needs to be evaluated in.
[In this case we use 'eq' as the text context equality operator.]
In some cases we need to be able to specify the translated string, based on an alternate representation of the argument. This is handled by using a function. For example, you may use the term 'houses', which is the main keyword within your application.
To handle alternations of the word 'houses' (such as 'house') we can define an expression of:
left(_1,5) eq 'house'
However, in some cases we will use the terms 'apartments' or 'flats'. In these cases, we only care if the value is in the plural or singular case:
right(_1,1) eq "s"
Thus, we are provided with a set of functions which allow some manipulation of the argument value, prior to evaluation.
In many cases, more than one argument is supplied (as well as the text to translate) to Locale::MakePhrase. In those cases, an expression can be created which tests each argument, as in:
_1 == 0 && _2 != 1
As we can see here, by using &&, we combine multiple expression evaluations, into a single rule expression. In this case the expression is effectively saying "if argument one is equal to zero AND argument two is not equal to one".
We support an unlimited number of arguments within the expression evaluation capability.
Consider the following exresssions:
_1 > 0 produces the output "Lots of files" _1 == 0 produces "No files" _1 == 1 produces "One file"
Each expression is a valid, but if we evaluate this set of expressions in the wrong order, we will never be able to produce the text "One file" as the _1 > 0 expression would evaluate to true, before we try to evaluate the _1 == 1 expression.
To counter this problem, whenever we define a rule expression (including when there is no rule expression as would be the case when no arguments are supplied), we must also define a rule priority (where a larger number gives higher priority).
Knowing this, let's re-consider the previous set of expressions, this time adding a suitable priority of evaluation for each expression:
expression: priority: _1 > 1 1 _1 == 0 2 _1 == 1 2
Now that we have a rule priority, we can see that the _1 == 0 expression and the _1 == 1 expression will get evaluated before the _1 > 1 expression.
You will notice that two rules have the same priority (i.e. we can have any number of rules having the same priority); in this case, the rules are evaluated in a non-deterministic (first found, first evaluated) manner. Thus it is important to make sure that a given rule expression, has a valid rule priority, for the rule set.
Now that we know what a linguistic rule is, we need to explain some minor but important points.
Each symbol in a rule expression needs to be separated with a space, i.e. this works:
_1 > 4 left(_2,1) eq "f"
Whenever we are using a string operator, we must enquote the value that we are testing, i.e. this works:
_1 eq "fred"
_1 eq fred
We support single and double quote characters, including mixed quoting (for simplistic cases), i.e. these work:
_1 eq "some text" _1 eq 'some text' _1 eq "someone's dog" _1 eq '"john spoke"'
this doesn't (i.e. there is no quote 'escape' capability):
_1 eq "\"something\""
Note that expressions are not unary, as in (this checks if the first argument has any length):
rather, they should look like:
length(_1) > 0
The following description of rule evaluation is correct at the time of writing. However, as this module evolves, we may alter the implementation as we get feedback. If you have used this module and found that the rule evaluation order is not what you expect, please contact the maintainer.
So far we have discussed the concept that, a translation exists for a language/dialect combination. However, the application may not be translated into the specific language requested by the user. In these cases, Locale::MakePhrase tries to use fallback languages as the source language for this translation request. This allows languages derived from other base languages (eg Spanish and Mexican share common words) and dialect specific variations of languages (such as variations of English), to use the parent language as a source for possible translations.
Thus whenever a phrase cannot be translated directly into the requested localisation, Locale::MakePhrase will use a fallback mechanisn for the input phrase.
Also, to support variations in output text which can exist in locale-specific translations, non-expression rules should be evaluated after rules which have an expression.
The implementation of which rule to select, has been abstracted into a seperate module so that you can implement your own process of which rule is selected based on the available rules. The default implementation is defined in Locale::MakePhrase::RuleManager. It contains a description of the current implementation.
Shown below are examples of various rules; some rules have no expressions and/or arguments; all rules must have at least a priority of zero or more.
Language: en_US Input text: Please select some colours. Expression: (none) Priority: 0 Output text: Please select some colors.
Language: en Input text: Please select some colours. Expression: (none) Priority: 0 Output text: Please select colours.
Language: en_AU Input text: Please select [_1] colours. Expression: (none) Priority: 0 Output text: Please select [_1] colours.
Language: en Input text: Please select [_1] colours. Expression: _1 > 0 Priority: 0 Output text: Select [_1] colours.
Given that the preferred language is 'en_US', if you compare rule 1 vs rule 2, the linguistic rule evaluation mechanism will be applied to rule 1 before being applied to rule 2, as it has a higher language-order.
Compare rule 3 vs rule 4. Given that there is no expression associated with rule 3, but that the 'en' version does have an expression, rule 4 will be evaluated (and found to be true in some cases) before example 3 is evaluated.
These examples show that it is important to consider the interactions of the linguistic rules, as they are applied to the current localisation.
With any text translation system, there comes a time when it is necessary to apply the values of the arguments 'in situ', replacing the square-bracket argument number, with the corresponding argument value, so that the output will say something useful. This happens after all rules have been applied (if there were any), and after the output text string has been chosen.
Input text: "Selected [_2] files, [_1] directories" Arguments: 3 21
Rule text: "Selected [_2] files, [_1] directories" Output text: "Selected 21 files, 3 directories"
This is a list of all operators:
Operator Context Meaning Example ---------------------------------------------------------------------- == Numeric Equal to _1 == 4 != Numeric Not equal to _1 != 2 > Numeric Greater than _2 > 1 < Numeric Less than _1 < 7 >= Numeric Less than or equal to _4 >= 21 <= Numeric Greater than or equal to _3 <= 12 eq String Equal to _1 eq "some text" ne String Not equal to _2 ne "something else"
This is a list of available functions:
Function Context Meaning Example ---------------------------------------------------------------------- defined(x) - Is the argument defined/not-null, defined(_1) returns 0 or 1 length(x) - Length of value of the argument, length(_1) returns an integer >= 0 abs(n) Number Numerical absolute of argument abs(_3) lc(s) String Lowercase version lc(_1) uc(s) String Uppercase version uc(_2) left(s,n) String LHS of argument from start left(_3,4) right(s,n) String RHS of argument from end right(_1,2) substr(s,n) String RHS of argument from start substr(_2,7) substr(s,n,l) String Sub-part of argument from 'n', substr(_2,7,4) up to 'l' characters
Construct a new instance of Locale::MakePhrase::RuleManager object; arguments are passed to the init() method.
Allow sub-class a chance to control construction of the object. You must return a reference to $self, to 'allow' the construction to complete (should you decide to derive from it).
This is the expression evaluation engine. It takes an expression as described above (for example _1 == 4 && _2 eq 'fred'). It then takes any program arguments, applying them in-place of the _X place holders. Finally returning true / false, based on the result of the evaluation of the expression.
The guts of the sorter; by subclassing this module, you can implement your own sorting routine.
This module implements the following rules for deciding the sorted order of the rules. The aim is to return a list which can be evaluated in-order.
This applies any/all arguments, to the outgoing text phrase; if the argument is text, it (optionally) undergoes the translation process; if the argument is numeric, it is formatted by the Locale::MakePhrase