The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Net::IMP::ProtocolPinning - IMP plugin for simple protocol matching

SYNOPSIS

    my $factory = Net::IMP::ProtocolPinning->new_factory( rules => [
        # HTTP request from client (dir=0)
        [ 0,9,qr{(GET|POST|OPTIONS) \S} ],
    ]);

    my $factory = Net::IMP::ProtocolPinning->new_factory( rules => [
        # SSHv2 prompt from server
        [ 1,6,qr{SSH-2\.} ],
    ]);

    my $factory = Net::IMP::ProtocolPinning->new_factory(
        rules => [
            # SMTP initial handshake
            # greeting line from server
            { dir => 1, rxlen => 512, rx => qr{220 [^\n]*\n} },
            # HELO|EHLO from client
            { dir => 0, rxlen => 512, rx => qr{(HELO|EHLO)[^\n]*\n}i },
            # response to helo|ehlo
            { dir => 1, rxlen => 512, rx => qr{250-?[^\n]*\n} },
        ],
        # some clients send w/o initially waiting for server
        ignore_order => 1,
        max_unbound => [ 1024,0 ],
        # for UDP use this
        allow_dup => 1,
        allow_reorder => 1,
    );

DESCRIPTION

Net::IMP::ProtocolPinning implements an analyzer for very simple protocol verification using rules with regular expressions. The idea is to only check the first data in the connection for protocol conformance and then let the rest through without further checks.

Calls to new_factory or new_analyzer can contain the following arguments specific to this module:

rules ARRAY

Specifies the rules to use for protocol verification. Rules are an array of direction specific rules, e.g. each rule consists of [dir,rxlen,rx] with

dir

the direction, e.g. 0 for data from client and 1 for data from server

rxlen

the length of data the regular expression might need for the match. E.g. if the regex is qr/foo(?=bar)/ 6 bytes are needed for a successful match, even if the regex matches only 3 bytes.

rx

the regular expression itself. The regex will be applied against the not-yet-forwarded data with an implicit \A in front, so look-behind will not work.

ignore_order BOOLEAN

If true, it will take the first rule for direction, when data for connection arrive. If false, it will cause DENY if data arrive from one direction, but the current rule is for the other direction.

allow_dup BOOLEAN

If true, it will ignore if the last rule (or any previous rule with allow_reorder) matches again, instead of matching the next rule. Only packet data will be checked for duplicates.

allow_reorder BOOLEAN

If true, it will ignore if the rules match in a different order. Unless ignore_order is given it will still enforce the order of data transfer between the directions.

max_unbound [SIZE0,SIZE1]

If there are no more active rules for direction, and ignore_order is true, then the application needs to buffer data, until all remaining rules for the other direction are matched. Using this parameter the amount of buffered data which cannot be bound to a rule will be limited per direction.

If not set, a default of unlimited will be used. In this case it will send IMP_PAUSE to the data provider if it is necessary to buffer data, so that it can temporary stop receiving data. If max_unbound is not unlimited it will not send IMP_PAUSE, so that it can enforce the limit.

Process of Matching Input Against Rules

When new data arrive from direction, it will try to match them against the rule list as follows and stop as soon a rule matches:

  • If there is a previously matching rule which might extend its match, it will be tried first (only for stream data).

  • If the next rule in the list of rules matches the incoming direction, it will be tried to match. If ignore_order is true, the next rule for the incoming direction will be used instead.

  • If allow_reorder is true, then all other rules until the next direction change in the rule list will be tried in the order of the rule list. If ignore_order is true, direction change in the rule list is ignored, e.g. all remaining rules for the incoming direction are considered.

  • If allow_dup is true, then all already matched rules from the incoming direction are allowed to match again, but only if no other rules match. To detect matches a hash over all matched packets will be saved and later checked. To avoid targeted collisions the hash consists of the md5 of an analyzer specific random seed and the data.

If a rule matched the incoming data, they will be passed using IMP_PASS. How the match gets executed and what happens if no rule matches depends on the data type:

Stream Data

For stream data it will match as much data as possible, e.g. the rule which matched last will be considered again if new data arrive, in case the match might be extended. The rule will only be considered done, if the rxlen is reached, a direction change occured and ignore_order is false or if it is the last rule for the direction.

The rules are matched after each other, e.g. the new match will start where the last match finished.

A useful value for rxlen is necessary to not buffer too much data, because it is unable to detect if a rule does not even match the beginning of the incoming data. If no rules match the incoming data, it will buffer up to the maximum rxlen and only fail matching, if it got more than rxlen bytes of unmatched data and still no rule matches.

allow_dup and allow_reorder will behave as documented, but because they usually don't expect the behavior of the data they should be better kept false.

Packet Data

For packet data each rule will be matched against the whole packet, e.g. with an implicit \A at the beginning and \Z at the end of the regular expression. So there cannot be multiple rules matching the same packet after each other, nor can their be a rule spanning multiple packets.

If no rule matches the incoming packet, the matching will fail, e.g. no buffering and waiting for more data.

If the packet stream is based on a protocol like UDP, it is recommended to set allow_dup and allow_reorder, so that protocols match even if packets get resubmitted or arrive out of order.

Only if all rules are matched, the remaining data will be passed using IMP_PASS with IMP_MAXOFFSET. If the matching failed, an IMP_DENY is issued.

If only the rules from one direction matched so their are still outstanding rules for the other connection, the data for the completed connection will not be passed yet. If the amount of unbound data should be limited max_unbound should be set. Buffering more data than max_unbound for this direction will cause a DENY. If max_unbound is not set it will use flow control (e.g. IMP_PAUSE) to make the data provide temporary stop receiving data.

Rules for Writing the Regular Expressions

Because the match will be tried whenever new data come in (e.g. the buffer might have a size of less than, equal to or greater than rxlen), care should be taken, when constructing the regular expression and determining rxlen. It should not match data longer than rxlen, e.g. instead of specifying \d+ one should specify a fixed size with \d{1,10}.

Care should also be taken if you have consecutive rules for the same direction (e.g. either the next rule is for the same direction or ignore_order is true). Here you need to make sure, that the first rule will not match data needed by the next rule, e.g. \w{1,2} followed by \d will not work, while [a-z]{1,2} followed by \d will be fine.

Please note also, that the regular expression in the rule will be implicitly anchored at the beginning of the buffered data, e.g. \d will only match if the first character is a digit, not if any character but the first in the buffer is a digit. If you want the latter behavior, you have to explicitly allow other characters and need to limit their amount, e.g. "(?s).{0,10}\d".

AUTHOR

Steffen Ullrich <sullr@cpan.org>

COPYRIGHT

Copyright by Steffen Ullrich.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.