The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::R::IO::REXPFactory - Functions for parsing R data files

VERSION

version 0.06

SYNOPSIS

    use Statistics::R::IO::REXPFactory qw( unserialize );

    # Assume $data was created by reading, say, an RDS file
    my ($rexp, $state) = @{unserialize($data)}
        or die "couldn't parse";
    
    # If we're reading an RDS file, there should be no data left
    # unparsed
    die 'Unread data remaining in the RDS file' unless $state->eof;

    # the result of the unserialization is a REXP
    say $rexp;

    # REXPs can be converted to the closest native Perl data type
    print $rexp->to_pl;

DESCRIPTION

This module implements the actual reading of serialized R objects and their conversion to a Statistics::R::REXP. You are not expected to use it directly, as it's normally wrapped by "readRDS" in Statistics::R::IO and "readRData" in Statistics::R::IO.

SUBROUTINES

unserialize $data

Constructs a Statistics::R::REXP object from its serialization in $data. Returns a pair of the object and the Statistics::R::IO::ParserState at the end of serialization.

intsxp, langsxp, lglsxp, listsxp, rawsxp, realsxp, refsxp, strsxp, symsxp, vecsxp, envsxp, charsxp

Parsers for the corresponding R SEXP-types.

object_content

Parses object info and its data by sequencing "unpack_object_info" and "object_data".

unpack_object_info

Parser for serialized object info structure. Returns a hash with keys "is_object", "has_attributes", "has_tag", "object_type", and "levels", each corresponding to the field in R serialization described in http://cran.r-project.org/doc/manuals/r-release/R-ints.html#Serialization-Formats. An additional key "flags" contains the full 32-bit value as stored in the file.

object_data $obj_info

Parser for a serialized R object, using the object type stored in $obj_info hash's "object_type" key to use the correct parser for the particular type.

vector_and_attributes $object_info, $element_parser, $rexp_class

Convenience parser for vectors, which are serialized first with a SEXP for the vector elements, followed by attributes stored as a tagged pairlist. Attributes are stored only if $object_info indicates their presence, while vector elements are parsed using $element_parser. Finally, the parsed attributes and elements are used as arguments to the constructor of the $rexp_class, which should be a subclass of Statistics::R::REXP::Vector.

Parser for header of R serialization: the serialization format (XDR, binary, etc.), the version number of the serialization (currently 2), and two 32-bit integers indicating the version of R which wrote the file followed by the minimal version of R needed to read the format.

xdr, bin

Parsers for RDS header indicating files in XDR or native-binary format.

maybe_long_length

Parser for vector length, allowing for the encoding of 64-bit long vectors introduced in R 3.0.

tagged_pairlist_to_rexp_hash

Converts a pairlist to a REXP hash whose keys are the pairlist's element tags and values the pairlist elements themselves.

tagged_pairlist_to_attribute_hash

Converts object attributes, which are serialized as a pairlist with attribute name in the element's tag, to a hash that can be used as the attributes argument to Statistics::R::REXP constructors.

Some attributes are serialized using a compact encoding (for instance, when a table's row names are just integers 1:nrows), and this function will decode them to a complete REXP.

BUGS AND LIMITATIONS

There are no known bugs in this module. Please see Statistics::R::IO for bug reporting.

SUPPORT

See Statistics::R::IO for support and contact information.

AUTHOR

Davor Cubranic <cubranic@stat.ubc.ca>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2014 by University of British Columbia.

This is free software, licensed under:

  The GNU General Public License, Version 3, June 2007