The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Language::Zcode::Parser::Perl - Z-code parser in pure Perl

DESCRIPTION

Finding subroutine starts and ends

Things we know:

1a

We understand the syntax of all opcodes that are in the spec. (modulo bugs)

1b

0 is not a legal opcode (almost every other 1-byte number is, depending on version -- but see NOTES)

2a

Subs must start at packed addresses. Bytes between subs are always zero (I hope!)

2b

Subs must start with a byte 0-15

2c

If header byte is zero, next byte CAN'T be a zero, cuz there are no locals so it has to be a command, and 0 isn't a command

2d

Subs must be called with call* opcodes, although it is legal to call a variable (like "call_2n sp 1 2")

3a

There is no way for the program to get past a ret, rfalse (etc.) or jump (backwards) command without jumping past it.

3b

jump opcodes cannot take variable args

3c

There may be code after a sub-ender that is not jumped into. This is a (rare, but existent) orphan fragment.

The upshot of this is that, if we propose that a sub starts at a given address, we can unambiguously read (the header and) commands until we hit a sub-ender that is not jumped past. If we find unexpected 0 bytes, for example, then we were wrong about the sub's starting address.

So:

    read a command. (Note if it has a sub call or a jump)

    if next byte is a known start of sub {
       we finished this sub! Celebrate

    } else if next byte is a 0 {   
       # there must be a sub next
       if there's more than one 0 { 
          skip to the last 0 in the series 
          again, if we get to known start of sub, we're done
       }
       if last 0 is on packed address {
          start a sub here # 0 local vars, so next byte must be (non-zero) cmd
       } else if next byte is on packed address and is 1-15 {
          start a sub at that byte
       } else error!

    } else if not on packed address OR next byte is not 1-15 { # must be command
       read next command

    } else { # start doing things I'm less sure about
       # During this less sure part, if I get a parsing error, try
       # the other possibility
       if previous command was a ret, rfalse etc. that we have not jumped past {
          read sub
       } else {
          read command
       }
    }

Also stop if we get to a known string address or end of the file. The first string may be referenced in a sub we don't see, or may not be referenced at all (Zork1 always call print_paddr with variables, not constant string addresses.) so we'll run past the end of the last sub and into the strings.

Arg to a call is considered the most authoritative demonstration that a sub exists. 0..15 byte at a packed address is slightly less sure, especially if there are no 0 bytes separating it from the previous sub (could be an orphan fragment).

end_of_dictionary

Find the first packed address after the end of the dictionary. (This is a likely place for the lowest-address subroutine.)

NOTES

Actually, the remarks on section 14 of the spec say, "The 2OP opcode 0 was possibly intended for setting break-points in debugging (and may be used for this again). It was not nop." So in theory my algorithm may not be right. Oh well.

TODO

This will break if there's data interleaved between the subs. See SPEC comments on section 1.

Start at the byte after the end of the dictionary. Look at every packed address that's not included in a subroutine I've already found, up until we get to the strings. If I find something that looks like a sub, start parsing commands as above, except with a "not sure" flag set. If we find calls in that sub, follow them, but propagate the "not sure" flag.