ZMachine::ZSCII - an encoder/decoder for Z-Machine text
ZMachine::ZSCII is a class for objects that are encoders/decoders of Z-Machine text. Right now, ZMachine::ZSCII only implements Version 5 (and thus 7 and 8), and even that partially. There is no abbreviation support yet.
The Z-Machine's text strings are composed of ZSCII characters. There are 1024 ZSCII codepoints, although only bottom eight bits worth are ever used. Codepoints 0x20 through 0x7E are identical with the same codepoints in ASCII or Unicode.
ZSCII codepoints are then encoded as strings of five-bit Z-characters. The most common ZSCII characters, the lowercase English alphabet, can be encoded with one Z-character. Uppercase letters, numbers, and common punctuation ZSCII characters require two Z-characters each. Any other ZSCII character can be encoded with four Z-characters.
For storage on disk or in memory, the five-bit Z-characters are packed together, three in a word, and laid out in bytestrings. The last word in a string has its top bit set to mark the ending. When a bytestring would end with out enough Z-characters to pack a full word, it is padded. (ZMachine::ZSCII pads with Z-character 0x05, a shift character.)
Later versions of the Z-Machine allow the mapping of ZSCII codepoints to Unicode codepoints to be customized. ZMachine::ZSCII does not yet support this feature.
ZMachine::ZSCII does allow conversion between all four relevant representations: Unicode text, ZSCII text, Z-character strings, and packed Z-character bytestrings. All four forms are represented by Perl strings.
my $z = ZMachine::ZSCII->new; my $z = ZMachine::ZSCII->new(\%arg); my $z = ZMachine::ZSCII->new($version);
This returns a new codec. If the only argument is a number, it is treated as a version specification. If no arguments are given, a Version 5 codec is made.
Valid named arguments are:
The number of the Z-Machine targeted; at present, only 5, 7, or 8 are permitted values.
This is a reference to an array of between 0 and 97 Unicode characters. These will be the characters to which ZSCII characters 155 through 251. They may not duplicate any characters represented by the default ZSCII set. No Unicode codepoint above U+FFFF is permitted, as it would not be representable in the Z-Machine Unicode substitution table.
If no extra characters are given, the default table is used.
This is a string of 78 characters, representing the three 26-character alphabets used to encode ZSCII compactly into Z-characters. The first 26 characters are alphabet 0, for the most common characters. The rest of the characters are alphabets 1 and 2.
No character with a ZSCII value greater than 0xFF may be included in the alphabet. Character 52 (A2's first character) should be NUL.
If no alphabet is given, the default alphabet is used.
By default, the values in the
alphabet are assumed to be ZSCII characters, so that the contents of the alphabet table from the Z-Machine's memory can be used directly. The
alphabet_is_unicode option specifies that the characters in the alphabet string are Unicode characters. They will be converted to ZSCII internally by the
unicode_to_zscii method, and if characters appear in the alphabet that are not in the default ZSCII set or the extra characters, an exception will be raised.
my $packed_zchars = $z->encode( $unicode_text );
This method takes a string of text and encodes it to a bytestring of packed Z-characters.
Internally, it converts the Unicode text to ZSCII, then to Z-characters, and then packs them. Before this processing, any native newline characters (the value of
\n) are converted to
U+000D to match the Z-Machine's use of character 0x00D for newline.
my $text = $z->decode( $packed_zchars );
This method takes a bytestring of packed Z-characters and returns a string of text.
Internally, it unpacks the Z-characters, converts them to ZSCII, and then converts those to Unicode. Any ZSCII characters 0x00D are converted to the value of
my $zscii_string = $z->unicode_to_zscii( $unicode_string );
This method converts a Unicode string to a ZSCII string, using the dialect of ZSCII for the ZMachine::ZSCII's configuration.
If the Unicode input contains any characters that cannot be mapped to ZSCII, an exception is raised.
my $unicode_string = $z->zscii_to_unicode( $zscii_string );
This method converts a ZSCII string to a Unicode string, using the dialect of ZSCII for the ZMachine::ZSCII's configuration.
If the ZSCII input contains any characters that cannot be mapped to Unicode, an exception is raised. In the future, it may be possible to request a Unicode replacement character instead.
my $zchars = $z->zscii_to_zchars( $zscii_string );
Given a string of ZSCII characters, this method will return a (unpacked) string of Z-characters.
It will raise an exception on ZSCII codepoints that cannot be represented as Z-characters, which should not be possible with legal ZSCII.
my $zscii = $z->zchars_to_zscii( $zchars_string, \%arg );
Given a string of (unpacked) Z-characters, this method will return a string of ZSCII characters.
It will raise an exception when the right thing to do can't be determined. Right now, that could mean lots of things.
Valid arguments are:
allow_early_termination is true, no exception is thrown if the Z-character string ends in the middle of a four z-character sequence. This is useful when dealing with dictionary words.
my $zchars = $z->make_dict_length( $zchars_string )
This method returns the Z-character string fit to dictionary length for the Z-machine version being handled. It will trim excess characters or pad with Z-character 5 to be the right length.
When converting such strings back to ZSCII, you should pass the
zchars_to_zscii, as a four-Z-character sequence may have been terminated early.
my $packed_zchars = $z->pack_zchars( $zchars_string );
This method takes a string of unpacked Z-characters and packs them into a bytestring with three Z-characters per word. The final word will have its top bit set.
my $zchars_string = $z->pack_zchars( $packed_zchars );
Given a bytestring of packed Z-characters, this method will unpack them into a string of unpacked Z-characters that aren't packed anymore because they're unpacked instead of packed.
Exceptions are raised if the input bytestring isn't made of an even number of octets, or if the string continues past the first word with its top bit set.
Ricardo SIGNES <email@example.com>
This software is copyright (c) 2013 by Ricardo SIGNES.