The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::AptFetch - perl interface onto APT-Methods

SYNOPSIS

    use File::AptFetch::Simple; # No, seriously.

DESCRIPTION

Shortly:

  • Methods are usual executables. Hence F:AF forks.

  • There's no command-line interface for methods. The IPC is two pipes (STDIN and STDOUT from method's POV).

  • Each portion of communication (named message) consists of numerical code with explaining text and a sequence of colon (':') separated lines. A message is terminated with empty line.

  • File::AptFetch::Cookbook has more.

(disclaimer) Right now, F::AF is in "proof-of-concept" state. It surely works with local methods (file and copy); I hope it will work with trivial cases of remote methods (v0.1.9 I've left to hope, it totally does; no manual interaction (credentials and/or tray closing) provided). (F::AF has no means to accept (not talking about to pass along) authentication credentials; So if your upstream needs authentication, F::AF is of no help here.) And one more warning: you're supposed to do all the dirty work of managing -- F::AF is only for comunication. Hopefully, there will be someday a kind of super-module what would simplify all this.

(warning) You should understand one potential tension with F::AF: wget(1), curl(1), various FTP clients, or whatever else that constitutes fetcher are (I hope so) thoroughly tested against monkey-wrench on the other side of connection. APT methods are not. APT talks to repositories; those repositories are mostly mirrors. Administrators of mirrors and mirror-net roots have at least a basic clue. Pending discovery of APT methods behaviour when they face idiots on the other side of connection.

There's a list of known bugs, caveats, and deficiencies.

  • (v0.1.9) There were some concerns about signals. Surprisingly, they're gone now. The only corner left to try is a child ignoring signals at all (stuck in syscall?).

  • That seems that upon normal operation there're no zombies left. However, I'm not sure if waitpid would work as expected. (What if some method would take lots of time to die after being signaled?)

  • Methods are supposed (or not?) to write extra diagnostic at its STDERR. It stays the same as of your process. However, I still haven't seen any output. So, (first) I (and you) have nothing to worry about and (second) I have nothing to work with. That's possible that issue will stay as caveat.

  • @$log is fragile. Don't touch it. However, there's a possibility of @$log corruption, like this. If method goes insane and outputs unparsable messages, then "gain()" will give up immedately leaving @$log unempty. In that case you're supposed to recreate F::AF object (or give up). If you don't then strange things can happen (mostly -- give-ups again). So, please, do.

  • @$diag grows. In next release there will be some means to maintain that. Right now, clean @$diag yourself, if that becomes an issue.

  • You're supposed to maintain a balance of requests and fetches. If you try "gain()" when there's no unfinished requests, then method will timeout. There's nothing to worry about actually except hanging for 120sec.

(note) Documentation of this library must maintain 4 namespaces:

Function/method parameter list (@_)

Within a section they always refer to parameter names and keys (if @_ is hash) mentioned in nearest synopsis.

Explicit values in descriptive codes

They always refer to some value in nearest code. $method, $? etc means that there would be some value that has some relation with named something. POD markup in descriptions means exactly that.

Keys of File::AptFetch blessed object

Whatever missing in nearest synopsis fits here. Each key has explicit content dereference attached. So @$log means that key named log has value of ARRAY reference, %$message has value of HASH reference, and $status has value of plain scalar (it's not reference to SCALAR, or it would be $$status).

Keys of File::AptFetch::ConfigData configuration module

Within each section upon introducing they are explicitly mentioned as such. The above explanation about explicit dereference applies here too.

(note) Message headers are refered as keys of some fake global %$message. So Filename becomes $message{filename}, and Last-Modified -- $message{last_modified}. I hope it's clear from context is that header down- or up-stream.

(note) Through out this POD "log item" means one line in @$log; "log entry" means sequence of log items including terminating empty item.

(note) Through out this POD "120sec timeout" means: "$timeout in File::AptFetch::ConfigData being left as set in stock distribution, overriden while pre-build configuring, or set at run-time".

IMPORTANT NOTE ON PERL-5.10.0

It's neither bug nor caveat. And it's out of my hands, really. perl-5.10.0 exits application code differently if compared with perl-5.10.1 (unbelievable?). My understanding is that v5.10.0 closes handles first, then DESTROYs. Sometimes that filehandle closing happens in right order. But most probably application is killed with $SIG{CHLD}. END{} doesn't help --- that filehandle masacre happens before those blocks are run. I believe, whatever tinkering with the global $SIG{CHLD} is a bad idea. And terminating every method just after transfers have finished is same stupid. Thus, if you run perl-5.10.0 (probably any earlier too) destroy the File::AptFetch object explicitly before exiting app, if you care about to be not $SIG{CHLD}ed.

(note) Some believe that since v0.1.11 it ain't no issue anymore.

IMPORTANT NOTE ON LINUX

Your script (or, more probably, one-liner) could exit with $CHILD_ERROR equal to $SIG{TERM} (or whatever signal was configured ($F::AF::ConfigData{signal}). It would look like your script was killed. It's not. I've strace'd, I don't see an incoming signal.

My understanding is that fork of linux is too thready. Then when an object (it has to be global) is DESTROYed a method (what is a child) indeed is killed. And it's $CHILD_ERROR somehow propagates up to the parent. However that propagation isn't reliable; in some combinations of kernel, libc, and/or perl and (that's important) *your* code probability of propagation reaches to ~1; for other combinations it goes down to ~0. E.g. comparse these, the only diffence is size of nvtype: double and long double, version of ExtUtils::MakeMaker and definition of $ENV{LANG}. But there're failures with nvtype=long double too.

If that's ever a problem you should apply a simple work-around:

    $faf = File::AptFetch->init( ... );
    ...
    undef $faf;
    $faf = '';

The last assignment is essential. I don't suggest that DESTROY would be optimized away; it just sneaks into final destroy-everything phase then. From what the propagation raises.

METHODS

init()
    ref( my $fetch = File::AptFetch->init( $method )) or die $fetch;

That's an initialization stuff. APT-Methods are userspace executables, you know, hence it forks. If fork fails, then it dies. If all preparations succeede, then it returns File::AptFetch blessed object; Otherwise a string describing issue is returned. Any diagnostic from forked instance and, later, execed $method goes through STDERR. (And see "_cache_configuration()".)

An idea behind this ridiculous construct is that someday, in some future, there will be a lots of concurency. Hence it's impossible to maintain one package-wide store for fail description. All methods of File::AptFetch return descriptive strings in case of errors. init() just follows them.

$method is saved in same named key for reuse.

Give-up codes:

($method): (lib_method): neither preset nor found

$lib_method (in File::AptFetch::ConfigData) points to a directory where APT-Methods reside. Without that knowledge File::AptFetch has nothing to do. It's either picked from configuration (build-time) or from apt-config output (run-time) (in that order). It wasn't found in either place -- fairly strange APT you have.

($method) is unspecified

$method is required argument, so, please, provide.

($method): ($?): died without handshake

Start-up configuration is essential. If $method disconnects early, than that makes a problem. The exit code (no postprocessing at all) is provided in braces.

($method): timeouted without handshake

$method failed to configure within time frame provided. (v.0.0.8) "_read()" has more about timeouts.

($method): ($Status): that's supposed to be (100 Capabilities)

As described in "APT Method Interface", Section 2.2, $method starts with '100 Capabilities' Status Code. $method didn't. Thus that's not an APT-Method. File::AptFetch has given up.

Yet refer to "_parse_status_code()", "_parse_message()", and "_cache_configuration()" -- those can emit their own give-up codes (they are passed up immediately by init() without postprocessing).

DESTROY()
    undef $fetch;
    # or leave the scope

Cleanups. A method is killed and waitpided, pipes are explicitly closed. I anything goes wrong then carps, for obvious reasons. waitpid is unconditional and isn't timeout protected.

The actual signal sent for $pid is configured with $signal in File::AptFetch::ConfigData. However one can override (upon build time) or explicitly set it to any desired name or number (upon runtime). Refer to File::AptFetch::ConfigData for details.

set_callback()
    File::AptFetch::set_callback %callbacks;

(v0.1.6) Sets (whatever known) callbacks. Semantics and procedures are documented where apropriate. Keys of %callbacks are tags (subject to hash handling by perl, don't mess); key must be among known (or else). Values are either

  • CODE -- whatever previous value was would be vanished;

  • undef -- resets callback to default, if any;

  • anything else -- croak.

Known tags are:

gain

(v0.1.7) "gain()" has more.

read

"_read()" has more.

select

(v0.1.8) "_read()" has more.

request()
    my $rc = $fetch->request(
      $target0 => $source,
      $target1 => { uri => $source } );
    $rc and die $rc;

(bug) In that section abbreviation "URI" actually refers to "scheme-specific-part". Beware.

That files requests for transfer. Each request is a pair of $target and either of

$source

Simple scalar; It MUST NOT provide schema -- pure filename (either local or remote); It MUST provide all (and no more than) needed leading slashes though (double slash for remotes).

$source is preprocessed -- $method (with obvious colon) is prepended. (That seems, APT's method become very nervous if being requested mismatching method's name schema.) (bug) That requirement will be slightly relaxed in next release.

%$source HASH ref

Such keys are known

$uri

The same requirements as for $source apply.

There're other keys yet that must be supported. Right now I unaware of any (pending real-life testing).

(v0.1.5) If request list is empty then silently succeeds without doing anything.

Actual request is filed at once (subject to buffering though), in one big (or not so) chunk (as requested by API). @$diag field is updated accordingly.

Give-up codes:

($method): ($filename): URI is undefined

Either $source or $source{uri} was evaluated to FALSE. (What request is supposed to be?)

(caveat) While undef and empty string are invalid URIs, is 0 a valid URI? No, URI is supposed to have at least one leading slash.

request() pretends to be atomic, the request would happen only in case @_ has been parsed successfully.

gain()
    $rc = $fetch->gain;
    $rc and die $rc;

That gains something. 'Something' means it's unknown what kind of message APT's method would return. It can be 'URI Start', 'URI Done', or 'URI Failure' messages. Anyway, message is stored in @$diag and %$message fields of object; $Status and $status are set too.

Give-up codes:

($method): ($CHLD_error): died

Something gone wrong, the APT's method has died; More diagnostic might gone onto STDERR. Even if $CHLD_error is 0 the method still died on us -- it's not supposed to exit.

($method): timeouted without responce

The APT's method has quit without properly terminating message with empty line or failed to output anything at all. Supposedly, shouldn't happen. Otherwise, that's your fault -- you asked for entry without reason.

($method): timeouted

The APT's method has sat silently all the time. The possible cause would be you've run out of requests (than the method has nothing to do at all (they don't tick after all)).

"_parse_status_code()" and "_parse_message()" can emit their own give-up codes.

Unless any problems just before return gain callback is tried (if any). That CODE is given the object as an argument. There's no default callback. RV is ignored; (note) That might change in future, beter return TRUE.

_parse_status_code()
    $rc = $self->_parse_status_code;
    return $rc if $rc;

Internal. Picks one item from @$log and attempts to process it as a Status Code. Consequent items are unaffected.

Give-up codes:

($method): ($log_item): that's not a Status Code

The $log_item must be qr/^\d{3}\s+.+/. No luck this time.

Sets apropriate fields ($Status with the Status Code, $status with the informational string), then backups the processed item.

_parse_message()
    $rc = $self->_parse_message;
    return $rc if $rc;

Internal. Processes the log entry. Atomically sets either %$capabilities (if $Status is 100) or %$message (any other). Each key is lowercased. (v0.1.4) Since "_read()" has been rewritten there could be multiple messages in @$log; those are preserved for next turn.

(v0.1.2) Each hyphen (-) is replaced with an underscore (_). For convinience reasons (compare 'last-modified' => $time with last_modified => $time.) (bug) What if a method yelds Foo-Bar: and Foo_Bar: headers? (RFC2822 headers are anything but space and colon after all.) Right now, _parse_message() will fail if a message header gets reset. But those headers are different and should be handled appropriately. They aren't.

Give-up codes:

($method): ($log_item): message must be terminated by empty line

APT method API dictates that messages must be terminated by empty line. This one is not. Shouldn't happen.

($method): ($log_item): that resets header ($header)

The leading message header ($header) has been seen before. That's a panic. The offending and all consequent items are left on @$log. Shouldn't happen.

($method): ($log_item): that's not a Message

The $log_item must be qr/^[0-9a-z-]+:(?>\s+).+/i. It's not. No luck this time. The offending and all consequent items are left on @$log.

The $log_items are backed up and removed from @$log.

(bug) If the last item isn't an empty line, then undef will be pushed. Beware and prevent before going for parsing.

_cache_configuration()
    $rc = $self->_cache_configuration;
    return $rc if $rc;

Internal. forks. dies if fork fails. forked child execs an array set in @$config_source (from File::AptCache::ConfigData). If $ConfigData{lib_method} is unset, then parses prepared cache for Dir::Bin::methods item and (if finds) sets $lib_method. It doesn't complain if $lib_method happens to be left unset. If cache is set it returns without any activity.

@$config_source is subject to the build-time configuration. It's preset with qw[ /usr/bin/apt-config dump ] (YMMV, refer to F::AF::CD to be sure). @$config_source must provide reasonable output -- that's the only requirement (look below for details).

(bug) While @$config_source is configurable all give-up codes and diagnostic messages refer to 'apt-config'.

@$config_source's output is postprocessed -- configuration names and values are stored as equal ('=') separated pairs in scalars and pushed into intermediate array. If everything finishes OK, then the package-wide cache is set. That cache is lexical (that's possible, I would find a reason to make some kind of iterator some time later; such iterator is missing right now).

(v0.1.2) Parsing cycle has suffered total rewrite. First line is split on space into $name and $value (or else). Then comes validation (it woulnd't be needed if @{$ConfigData{config_source}} would be hardcoded, it's not): * $name must consist of alphanumerics, underscores, pluses, minuses, dots, colons, and slashes (qr[\w/:.+-]) (or else); * (that's an heuristic) colons come in pairs (or else); * $value must be double-quote (") enclosed, with no double-quote inside allowed (or else); * there must be terminating semicolon (;) (or else). Then comes cooking (all cooking is found by observation, it mimics APT-talk with methods): * trailing double pair-of-colons in $name is trimmed to single pair; * every space in $value is percent escaped (%20); * every equal sign in $value is percent escaped (%3d).

That last one, needs some explanation. apt.conf(5) clearly states: "Values must not include backslashes or extra quotation marks".

    apt-config dump | grep \\\\

disagrees on backslashes (if you're upgraded enough). So does F::AF: backslashes are passed through. After some experiments double-quote handling looks, roughly, like this: * double-quotes must come in pairs; * those double-quotes are dropped from $value withouth any visible effects (double-quotes, not enclosed content; it stays intact; whatever content, empty string is content too); * if there's any odd double-quote that fails parsing. F::AF doesn't need to do anything about it -- @{$ConfigData{config_source}} is supposed to handle those itself.

(bug) What should be investigated: * what if double-quote is explicitly percent-escaped in apt.conf? * how percents in $value are handled? Pending.

Give-up codes:

($method): ($line): that's unparsable

Validation (described above) has failed.

($method): [close] (apt-config) failed: $!

After processing input a pipe is closed. That close failed with $!.

($method): (apt-config): timeouted

While processing a fair 120sec timeout is given (it's reset after each $line). @$config_source hanged for that time.

($method): (apt-config) died: ($?)

@$config_source has exited uncleanly. More diagnostic is supposed to be on STDERR.

($method): (apt-config): failed to output anything

@$config_source has exited cleanly, but failed to provide any output to parse at all.

_uncache_configuration()
    File::AptFetch::_uncache_configuration;
    # or
    $self->_uncache_configuration;
    # or
    $fetch->_uncache_configuration;

Internal. That cleans APT's configuration cache. That doesn't trigger recacheing. That cacheing would happen whenever that cache would be required again (subject to the natural control flow).

(caveat) _cache_configuration sets $lib_method (in File::AptFetch::ConfigData) (if it happens to be undefined). &_uncache_configuration untouches it.

_read()
    $fetch->_read;
    $fetch->{ALRM_error} and
      die "internal error: requesting read while there shouldn't be any";
    $fetch->{CHLD_error} and
      die "external error: method has gone nuts and AWOLed";

Internal. Refactored. That attempts to read the log entry. Whatever has been read is split in items, chomped, and pushed onto @$log. Now, item consuming will be finished if:

empty-line separator has been found

(v0.1.9 there was major breakage at that point after v0.1.4) Somewhere in @$log there's, at least one, empty-line separtor. For technical reasons it doesn't have to be the last one. For more confusion the last item might be unempty. It's up to you would you consume everything in @$log, complete entries (with empty-line separtors), or only first complete entry -- _read doesn't care. In either case, you may be sure if _read returns clean (see below) there's at least one compelte entry.

child has timeouted

If child timeouts, then $ALRM_error is set (to TRUE, otherwise meaningles). Technically speaking a method just has nothing to say. It's up to caller to decide what to do (and it's caller's fault that there was attempt to get entry while there was no reason to be any). Anyway, $ALRM_error is forced to be FALSE upon entering select loop.

(v0.0.8) And more about what timeout is. It was believed, that methods pulse their progress. That belief was in vain. Thus for now:

  • The timeout is configurable through $ConfigData{timeout} (120sec, by stock configuration; no defaults.) The timeout is cached in each instance of File::AptFetch object.

  • (v0.1.6) Target filenames are cached in the F::AF object. For each target there's a HASH. In the HASH a key filename is set to target filename value.

  • (v0.1.4) Timeout (the big one $timeout) is made in supposedly small $ConfigData{tick}s (5sec, by stock configuration; no defaults.) The small timeout is made with 4-arg select.

  • (v0.1.6) If there's no input from method then routing is made as follows:

    +

    Each target's cached HASH is passed to read callback ("set_callback()" has more).

    +

    If any callback returns TRUE then resets timeout counter and goes for next $tick long select (IOW, file transfer (whatever that means) is in progress).

    +

    If every callbacks return FALSE then advances to timeout and goes for next $tick long select.

    +

    (not implemented) If any callback returns undef then fails entirely.

child has exited

The child is waitpided and then $CHLD_error is set. It's possible that's normal for child to exit -- it's up to caller to decide. Anyway, after child has exited there's nothing to read from.

unknown error has happened

(v.0.1.4) It used to be read-with-alarm-in-eval. It's not anymore, thus any signal(7) will kill a process. Then it dies.

_read_callback()

(v0.1.6) Internal. It's a default read callback ("_read()" has more). It was supposed to be simple. In vain.

The primary objective is avoiding false negatives at all cost. Here comes list of avoided false negatives:

  • Somewhere on lenny/squeeze time-span APT methods have changed behaviour. In past they opened target for writing instantly. Now they create a temporal and upon finishing rename it to target. For obvious reasons methods do not communicate neither progress nor filename of temporal. If naming or handling of unfinished transfers would ever change there will be breakage.

  • Then. When transfer is finished *physically* it's not reported just yet (temporal has been renamed). A method calculates hashes. For obvious reasons methods do not coummunicate progress either. Naive approach would be to check size and then just wait forever. That's possible size isn't known beforehand. So _read_callback() increases number of ticks before signaling timeout. That increase is function of tick length ($ConfigData{tick}), current file size, and supposed IO speed. The IO speed is hardcoded to be 15MB/sec. So if media is realy slow (like a diskette or something) there's a possibility of breakage. However, those nitty-gritty manipulations won't result ever in timeout decrease.

For now it's not clear if _read_callback() ought to provide some diagnostics. Right now it doesn't.

DIAGNOSTICS

Most error communication is done through give-up codes. However, some conditions aren't worth of keeping process alive -- those are marked as (fatal). Others are (mostly) in just forked process that just couldn't boot properly -- those are communicated back (somehow).

(%s): candiate to pass is neither CODE nor (undef)

(fatal) In "set_callback()". Tag %s (may be unknown) tries to set something for callback. That must be either CODE or undef. It's not.

(%s): unknown callback

(fatal) In "set_callback()". Tag %s is unknown. Nothing to do with it but croak.

[close] (reader): $!

In "DESTROY()" (that's why it's not fatal). Closing STDIN of child has failed. Nothing to do with it except blast ahead (probably, would stuck in waitpid then).

[close] (writer): $!

In "DESTROY()" (that's why it's not fatal). Closing STDOUT of child has failed. Nothing to do with it except blast ahead (probably, would stuck in waitpid then).

[dup] (STDIN): $!

In "init()". Turning reader pipe into STDIN has failed. Parent will express it with ($method): ($?): died without handshake give-up code.

[dup] (STDOUT): $!

In "init()" or "_cache_configuration()". Turning writer pipe into STDOUT has failed. Parent will express it with ($method): ($?): died without handshake or ($method): (apt-config) died: ($?) give-up code.

[exec] ($method): $!

In "init()". Executing requested $method has failed. Parent will express it with ($method): ($?): died without handshake give-up code.

[fork] ($method): $!
[fork] (apt-config): $!

(fatal) In "init()" (or "_cache_configuration()" if talks about apt-config). fork has failed. Nothing can be done about it.

[kill] ($pid): nothing to kill or $!

In "DESTROY()" (that's why it's not fatal). Child has been reaped somehow already. Probably OK for *nix of yours.

[open] (STDIN): failed: $!

In "_cache_configuration()". Turning STDOUT of upcoming $config_source (in File::AptFetch::ConfigData) into /dev/null has failed. Parent will express it with ($method): (apt-config) died: ($?) give-up code.

should not be here at .../File/AptFetch.pm line %i

(fatal) In "_read()". Per implementetaion there's a chain of if-elsif-else. That else covers a routes I haven't think of. Purely my fault.

[sysread] ($method): $!

In "_read()". That's what has happened -- sysread() has failed for reasons.

SEE ALSO

File::AptFetch::Cookbook, "APT Method Itnerface" in libapt-pkg-doc package, apt-config(1), apt.conf(5)

AUTHOR

Eric Pozharski, <whynot@cpan.org>

COPYRIGHT & LICENSE

Copyright 2009, 2010, 2014 by Eric Pozharski

This library is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This library is released under GNU LGPLv3.