The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Tie::FieldVals;
use strict;
use warnings;

=head1 NAME

Tie::FieldVals - an array tie for a file of enhanced Field:Value data

=head1 VERSION

This describes version B<0.6202> of Tie::FieldVals.

=cut

our $VERSION = '0.6202';

=head1 SYNOPSIS

    use Tie::FieldVals;
    use Tie::FieldVals::Row;

    # tie the array
    my @records;
    my $recs_obj = tie @records, 'Tie::FieldVals', datafile=>$datafile;

    # object methods
    my @field_names = $recs_obj->field_names();

=head1 DESCRIPTION

This is a Tie object to map the records in an enhanced Field:Value data
file into an array.  Each file has multiple records, each record has its
values defined by a Field:Value pair, with the enhancements that (a) the
Value part can extend over more than one line (because the Field names
are predefined) and (b) Fields can have multiple values by repeating
the Field:Value part for a given field.

Because of its use of the Tie::File module, access to each record is
reasonably fast. The Tie::File module also ensures that (a) the whole file
doesn't have to be read into memory (b) record changes are written to the
file straight away (c) record changes don't require the whole file to be
rewritten, just the part of the file after the change.

The advantage of this setup is that one can have useful data files which
are plain text, human readable, human editable, and at the same time able
to be accessed faster than using XML (I know, I wrote a version of my
reporting software using XML data, and even the fastest XML parsers weren't
as fast as this setup, once there were a reasonable number of records).
This also has advantages over a simpler setup where values are given one
per line with no indication of what value belongs to what field; the
problems with that is that it is harder to fix corrupted data by hand, and
it is harder to add new fields, and one can't have multi-line data.

It is likewise better than a CSV (Comma-Separated Values) file, because
again, with a CSV file, the data is positional and therefore harder to fix
and harder to change, and again one can't have multi-line data.

This module is both better and worse than file-oriented databases like
L<DB_File> and its variants and extensions (such as L<MLDBM>).  This module
does not require that each record have a unique key, and the fact that a
DBM file is binary makes it not only less correctable, but also less
portable.  On the downside, this module isn't as fast.

Naturally, if one's data needs are more complex, it is probably better to
use a fully-fledged database; this is oriented towards those who don't wish
to have the overhead of setting up and maintaining a relational database
server, and wish to use something more straightforward.

This comes bundled with other support modules, such as the
Tie::FieldVals::Row module.  The Tie::FieldVals::Select module is for
selecting and sorting a sub-set from a Tie::FieldVals array, and the
Tie::FieldVals::Join is a very simple method of joining two files on a
common field.

This distribution includes the fv2xml script, which converts a
Tie::FieldVals data file into an XML file, and xml2fv which
converts an XML file into a Tie::FieldVals data file.

=head1 FILE FORMAT

The data file is in the form of Field:Value pairs, with each
record separated by a line with '=' on it. The first record
is an "empty" record, which just contains the field names;
this lets us know what the legal fields are.
A line which doesn't start with a recognised field is
considered to be part of the value of the most recent Field.

=head2 Example 1

    Name:
    Entry:
    =
    Name:fanzine
    Entry:Fanzines are amateur magazines produced by fans.
    =
    Name:fan fiction (fanfic)
    Entry:Original fiction written by fans of a particular
    TV Show/Movie set in the universe depicted by that work.
    =

The first record just contains Name: and Entry: fields to show that those
are the legal fields for this file.  The third record gives an example
of a value that goes over more than one line.

=head2 Example 2

    Author:
    AuthorEmail:
    AuthorURL:
    AuthorURLName:
    =
    Author:Adele
    AuthorEmail:adele@example.com
    AuthorEmail:adele@example.tas.edu
    AuthorURL:
    AuthorURLName:
    =
    Author:Danzer,Brenda
    AuthorEmail:
    AuthorURL:http://www.example.com/~danzer
    AuthorURLName:Danzer Dancing
    AuthorURL:http://www.brendance.com/
    AuthorURLName:BrenDance
    =

This one gives examples of multi-valued fields.

=head2 Gotchas

Field names cannot have spaces in them, indeed, they must
consist of plain alphanumeric characters or underscores.
They are case-sensitive.

The record separator (=) must be on a line by itself, and the last record
in the file must also have a record-separator after it.

=cut

use 5.006;
use strict;
use Carp;
use Tie::Array;
use Tie::File;
use Tie::FieldVals::Row;
use Fcntl qw(:DEFAULT);
use Data::Dumper;

our @ISA = qw(Tie::Array);

# to make taint happy
$ENV{PATH} = "/bin:/usr/bin:/usr/local/bin";
$ENV{CDPATH} = '';
$ENV{BASH_ENV} = '';

# for debugging
my $DEBUG = 0;

=head1 PUBLIC FUNCTIONS

=head2 find_field_names

    my @field_names = Tie::FieldVals::find_field_names($datafile);

Read the field-name information from the file, if the file
exists and is readable.

=cut
sub find_field_names ($) {
    carp &whowasi if $DEBUG;
    my $datafile = shift;

    my @field_names = ();
    if (-r $datafile)
    {
	# make a temporary file object to look at
	my @records;
	my $file_obj = tie @records, 'Tie::File', "$datafile",
	    recsep =>"\n=\n", mode=>O_RDONLY, memory=>0
	    or croak "Tie::FieldVals::find_field_names - Could not open '",
		$datafile, "'.";

	# the field info is in the first record
	my %row = ();
	my $row_obj = tie %row,
	   'Tie::FieldVals::Row', fields=>['dummy'];
	my $rec_str = $records[0];
	if (defined $rec_str)
	{
	    $row_obj->set_from_string($rec_str,
				      override_keys=>1);
	    @field_names = @{$row_obj->field_names()};
	}
	undef $file_obj;
	untie @records;
	undef $row_obj;
	untie %row;
    }

    return @field_names;

} # find_field_names

=head1 OBJECT METHODS

=head2 field_names

Get the field names of this data.

my @field_names = $recs_obj->field_names();

=cut
sub field_names {
    carp &whowasi if $DEBUG;
    my $self = shift;

    @{$self->{field_names}};
}

=head2 flock

    $recs_obj->flock(MODE);

Locks the data file.  "MODE" has the same meaning as the second
argument to the Perl built-in "flock" function; for example
"LOCK_SH" or "LOCK_EX | LOCK_NB". (These constants are provided
by the "use Fcntl ':flock';" declaration.)

"MODE" is optional; the default is "LOCK_EX".

When you use "flock" to lock the file, "Tie::FieldVals" assumes that the
record cache is no longer trustworthy, because another process might have
modified the file since the last time it was read.  Therefore, a successful
call to "flock" discards the contents of the record cache.

The best way to unlock a file is to discard the object and untie the
array.  It is probably unsafe to unlock the file without also untying
it, because if you do, changes may remain unwritten inside the object.
That is why there is no shortcut for unlocking.  If you really want to
unlock the file prematurely, you know what to do; if you don't know
what to do, then don't do it.

See L<Tie::File/flock> for more information (this calls the 
flock method of that module).

=cut
sub flock {
    carp &whowasi if $DEBUG;
    my $self = shift;

    # call the Tie::File flock method
    if ($self->{FILE_OBJ}->flock(@_))
    {
	# clear the cache
	$self->{REC_CACHE} = {};
    }
}

=head1 TIE-ARRAY METHODS

=head2 TIEARRAY

Create a new instance of the object as tied to an array.

    tie @people, 'Tie::FieldVals', datafile=>$datafile;

    tie @people, 'Tie::FieldVals', datafile=>$datafile,
	mode=>O_RDONLY, cache_size=>1000, memory=>0;

    tie @people, 'Tie::FieldVals', datafile=>$datafile,
	fields=>[qw(Name Email)], mode=>(O_RDWR|O_CREAT);

    tie @people, 'Tie::FieldVals', datafile=>$datafile,
	mode=>O_RDWR, cache_all=>1;

Arguments:

=over

=item datafile

The file with the data in it. (required)

=item fields

Field defintions for creating a new file.  This is ignored if the
file already exists.

=item mode

The mode to open the file with. O_RDONLY means that the file is read-only.
O_RDWR means that the file is read-write.
(default: O_RDONLY)

=item cache_all

If true, cache all the records in the file.  This will speed things up,
but consume more memory. (default: false)

Note that this merely sets the cache_size to the size of the file when
the tie is initially made: if you add more records to the file, the
cache size will not be increased.

=item cache_size

The size of the cache (if we aren't caching all the records).
(default: 100)  As ever, there is a trade-off between space and time.

=item memory

The upper limit on the memory consumed by C<Tie::File>.
(See L<Tie::File>).
(default: 10,000,000)

Note that there are two caches: the cache of unparsed records maintained
by Tie::File, and the cache of parsed records maintained by Tie::FieldVals.
The B<memory> option affects the Tie::File cache, and the B<cache_*>
options affect the Tie::FieldVals cache.

=back

=cut
sub TIEARRAY {
    carp &whowasi if $DEBUG;
    my $class = shift;
    my %args = (
	datafile=>'',
	mode=>(O_RDONLY),
	cache_size=>100,
	cache_all=>0,
	memory=>10_000_000,
	fields=>undef,
	@_
    );

    my $self = {};

    # check if the file is readable while existing
    if (-e $args{datafile} && !-r $args{datafile})
    {
	croak "Tie::FieldVals::TIEARRAY - Could not read '", $args{datafile}, "'.";
    }
    my @records;
    if (-e $args{datafile})
    {
	@{$self->{field_names}} = find_field_names($args{datafile});
	$self->{FILE_OBJ} = tie @records, 'Tie::File', "$args{datafile}",
	    recsep =>"\n=\n", mode=>$args{mode}, memory=>$args{memory}
	or croak "Tie::FieldVals - Could not open '", $args{datafile}, "'.";
	$self->{FILE_RECS} = \@records;
    }
    else
    {
	# check that the fields have been given
	if (!defined $args{fields}
	    || ref $args{fields} ne 'ARRAY')
	{
	    croak "Tie::FieldVals - ", $args{datafile},
		" does not exist and no field names were given";
	}
	# set the fields and tie the file
	@{$self->{field_names}} = @{$args{fields}};

	$self->{FILE_OBJ} = tie @records, 'Tie::File', "$args{datafile}",
	    recsep =>"\n=\n", mode=>$args{mode}, memory=>$args{memory}
	or croak "Tie::FieldVals - Could not open '", $args{datafile}, "'.";
	$self->{FILE_RECS} = \@records;

	set_field_names($self);
    }

    $self->{OPTIONS} = \%args;

    # set a hash of the field names
    foreach my $fn (@{$self->{field_names}})
    {
	$self->{field_names_hash}->{$fn} = 1;
    }

    $self->{REC_CACHE} = {};
    if ($args{cache_all}) # set the cache to the size of the file
    {
	my $count = @records;
	$self->{OPTIONS}->{cache_size} = $count;
    }

    bless ($self, (ref $class || $class));
} # TIEARRAY

=head2 FETCH

Get a row from the array.

    $val = $array[$ind];

Returns a reference to a Tie::FieldVals::Row hash, or undef.

=cut
sub FETCH {
    carp &whowasi if $DEBUG;
    my ($self, $ind) = @_;

    if (defined $self->{REC_CACHE}->{$ind})
    {
	return $self->{REC_CACHE}->{$ind};
    }
    else # not cached, add to cache
    {
	# remove one from cache if cache full
	my @cached = keys %{$self->{REC_CACHE}};
	if (@cached >= $self->{OPTIONS}->{cache_size})
	{
	    delete $self->{REC_CACHE}->{shift @cached};
	}
	%{$self->{REC_CACHE}->{$ind}} = ();
	my $row_obj = tie %{$self->{REC_CACHE}->{$ind}},
	    'Tie::FieldVals::Row', fields=>$self->{field_names};
	# remember, the 0 record is the empty fields record
	my $rec_str = $self->{FILE_RECS}->[$ind + 1];
	if (defined $rec_str)
	{
	    $row_obj->set_from_string($rec_str);
	    return $self->{REC_CACHE}->{$ind};
	}
	else
	{
	    delete $self->{REC_CACHE}->{$ind};
	    return undef;
	}
    }
    return undef;
} # FETCH

=head2 STORE

Add a value to the array.  Value must be a Tie::FieldVals::Row hash.

    $array[$ind] = $val;

If $ind is bigger than the array, then just push, don't extend.

=cut
sub STORE {
    carp &whowasi if $DEBUG;
    my ($self, $ind, $val) = @_;

    # only store a hash and if writing
    if (ref $val eq 'HASH'
	&& $self->{OPTIONS}->{mode} & O_RDWR)
    {
	if ($ind > $self->FETCHSIZE())
	{
	    $ind = $self->FETCHSIZE();
	    $self->{REC_CACHE}->{$ind} = $val;
	    my $row_obj = tied %{$val};
	    my $rec_str = $row_obj->get_as_string();
	    $self->{FILE_OBJ}->PUSH($rec_str);
	}
	else
	{
	    $self->{REC_CACHE}->{$ind} = $val;
	    my $row_obj = tied %{$val};
	    my $rec_str = $row_obj->get_as_string();
	    # remember record 0 is the empty fields record
	    $self->{FILE_OBJ}->STORE($ind + 1, $rec_str);
	}
    }
} # STORE

=head2 FETCHSIZE

Get the size of the array.

=cut
sub FETCHSIZE {
    carp &whowasi if $DEBUG;
    my $self = shift;

    # remember record 0 is the empty fields record
    return ($self->{FILE_OBJ}->FETCHSIZE() - 1);
} # FETCHSIZE

=head2 STORESIZE

Set the size of the array, if the file is writeable.

=cut
sub STORESIZE {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $count = shift;

    if ($self->{OPTIONS}->{mode} & O_RDWR)
    {
	# remember record 0 is the empty fields record
	$self->{FILE_OBJ}->STORESIZE($count + 1);
    }
} # STORESIZE

=head2 EXISTS

    exists $array[$ind];

=cut
sub EXISTS {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $ind = shift;

    # remember record 0 is the empty fields record
    return $self->{FILE_OBJ}->EXISTS($ind + 1);
} # EXISTS

=head2 DELETE

    delete $array[$ind];

Delete the value at $ind if the file is writeable.

=cut
sub DELETE {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $ind = shift;

    if ($self->{OPTIONS}->{mode} & O_RDWR)
    {
	if (exists $self->{REC_CACHE}->{$ind})
	{
	    delete $self->{REC_CACHE}->{$ind};
	}
	# remember record 0 is the empty fields record
	$self->{FILE_OBJ}->DELETE($ind + 1);
    }
} # DELETE

=head2 CLEAR

    @array = ();

Clear the array if the file is writeable.

=cut
sub CLEAR {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $ind = shift;

    if ($self->{OPTIONS}->{mode} & O_RDWR)
    {
	$self->{REC_CACHE} = {};
	# remember record 0 is the empty fields record
	my $rec_str = $self->{FILE_RECS}->[0];
	$self->{FILE_OBJ}->CLEAR();
	$self->{FILE_RECS}->[0] = $rec_str;
    }
} # CLEAR

=head2 UNTIE

    untie @array;

Untie the array.

=cut
sub UNTIE {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $count = shift;

    carp "untie attempted while $count inner references still exist" if $count;
    $self->{REC_CACHE} = {};
    undef $self->{FILE_OBJ};
    untie @{$self->{FILE_RECS}};
} # UNTIE

=head1 PRIVATE METHODS

This documentation is for developer reference only.

=head2 debug

Set debugging on.

=cut
sub debug { $DEBUG = @_ ? shift : 1 }

=head2 whowasi

For debugging: say who called this 

=cut
sub whowasi { (caller(1))[3] . '()' }

=head2 set_field_names

Set the field names in the data-file to be the given field names.
(Assumes the file didn't exist before).

=cut
sub set_field_names ($) {
    carp &whowasi if $DEBUG;
    my $self = shift;

    my %row = ();
    # set the row fields from the given fields
    my $row_obj = tie %row,
       'Tie::FieldVals::Row', fields=>$self->{field_names};
    # give the row fields values of the empty string
    # (right now they are undefined)
    foreach my $fn (@{$self->{field_names}})
    {
	$row{$fn} = '';
    }
    # get the empty row as a string, and set the file record[0]
    # to that string
    my $rec_str = $row_obj->get_as_string();
    $self->{FILE_RECS}->[0] = $rec_str;

} # set_field_names

=head1 REQUIRES

    Test::More

    Carp
    Tie::Array
    Tie::File
    Fcntl
    Data::Dumper

    Getopt::Long
    Pod::Usage
    Getopt::ArgvFile
    File::Basename

=head1 INSTALLATION

To install this module, run the following commands:

    perl Build.PL
    ./Build
    ./Build test
    ./Build install

Or, if you're on a platform (like DOS or Windows) that doesn't like the
"./" notation, you can do this:

   perl Build.PL
   perl Build
   perl Build test
   perl Build install

In order to install somewhere other than the default, such as
in a directory under your home directory, like "/home/fred/perl"
go

   perl Build.PL --install_base /home/fred/perl

as the first step instead.

This will install the files underneath /home/fred/perl.

You will then need to make sure that you alter the PERL5LIB variable to
find the modules, and the PATH variable to find the script.

Therefore you will need to change:
your path, to include /home/fred/perl/script (where the script will be)

	PATH=/home/fred/perl/script:${PATH}

the PERL5LIB variable to add /home/fred/perl/lib

	PERL5LIB=/home/fred/perl/lib:${PERL5LIB}


=head1 SEE ALSO

perl(1).
L<Tie::FieldVals::Row>
L<Tie::FieldVals::Select>
L<Tie::FieldVals::Join>
L<Tie::FieldVals::Row::Join>

=head1 BUGS

Please report any bugs or feature requests to the author.

=head1 AUTHOR

    Kathryn Andersen (RUBYKAT)
    perlkat AT katspace dot com
    http://www.katspace.com

=head1 COPYRIGHT AND LICENCE

Copyright (c) 2004-2008 by Kathryn Andersen

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


=cut

1; # End of Tie::FieldVals
# vim: ts=8 sts=4 sw=4
__END__