
PTools::SDF::SDF - Implements a Simple Data File as a 'Self Defining File'

This document describes version 0.32, released February, 2006.

use PTools::SDF::SDF;
use PTools::SDF::SDF qw( noparse ); # see Performance Note, below
$fileName= "/etc/passwd";
(@fields)= qw(name passwd uid gid gcos dir shell);
$sdfObj = new PTools::SDF::SDF($fileName,"","",@fields);
$sdfObj->sort("","",'uname'); # sort on user name
foreach $idx (0 .. $sdfObj->param) {
$uname = $sdfObj->param($idx, 'uname');
$gcos = $sdfObj->param($idx, 'gcos');
printf(" %10s %-30s\n", $uname, $gcos);
}

PTools::SDF::SDF is used to eliminate dependence on field positions within file records. This package reads and writes files with an arbitrary character used as the 'internal field separator,' or 'IFS,' usually a colon (':') or perhaps a pipe ('|') character.
A given data file becomes 'self defining' when it includes one or more special comment headers that define the file's characteristics. This includes naming each of the fields within a record, and specifying the IFS character used within each record.
As shown in the SYNOPSIS, above, for files where it is not feasible to embed the header within the file, the fields are named as the file is loaded during object instantiation.
Optional field definition header(s) must appear before first record For example, an application log file might have the following fields and, in this case, record fields are separated by an exclamation mark.
#FieldNames date:uname:pid:event_message #IFSChar !
The #FieldNames field1:field2:field3... header is read by this class and used to name each field within records in the file during object creation. By default the save method writes this back into the file.
This class also allows for special cases with the '#IFSChar' header. White space characters can be used singly, as a 'space', a 'tab', or Perl's special '\s' meta character. In addition multiple white space characters can be specified using Perl's special '\s+' syntax. The IFS character can be quoted using double quotes within the header.
#IFSChar " " # single tab character #IFSChar " " # single space character #IFSChar "\s" # single space character #IFSChar "\s+" # multiple space character
This implies that the double quote character can not be used as a field separator within a data file.
Warning: When specifying a single white space character, make sure that there is only one of them in between each field within a record.
Parsing every field in each record to encode/decode the 'IFS' character adds quite a bit of overhead here. To disable this parsing, add a 'noparse' parameter as shown in SYNOPSIS section, above. Only do this when you know it is safe to do so (i.e., when there is no possibility of an 'IFS' character embedded within a field).
Other modules exist to manipulate PTools::SDF::SDF objects in various ways including user defined indices and a 'Simple Data Base' definition. Other modules also exist that implement other types of 'Simple Data Files' including Windows '.INI' files, 'tagged' data files, and others. See PTools::SDF::Overview for further details.
Creates a new PTools::SDF::SDF object and, if a FileName parameter is specified and the file currently exists, loads file data into the object.
The FileName parameter is optional. When specified and the data file exists, the file is loaded into the new object. If the specified file does not exist when the object is created, the filename will be used by the save method to create a new file, if possible.
This class is often used simply to store data in memory during the execution of a script. It is often convenient to use familiar data structures, such as this file format, without any actual disk file.
To load only a subset of the data file, pass a Match value. The match can either be for the entire data record or limited to a specific field within each record. Any valid Perl expression will work, including 'regular expression' matches.
Examples:
(@fields)= qw(name passwd uid gid gcos dir shell);
# Load passwd entries for "root" users only
$sdfObj->new("/etc/passwd", "\$uid eq '0'", undef, @fields);
# Load passwd entries for "C Shell" users only
$sdfObj->new("/etc/passwd", "\$shell =~ /csh$/", undef, @fields);
# Load entries where "smith" is found in any of the fields
$sdfObj->new("/etc/passwd", "/smith/i", undef, @fields);
Note that when a 'subset' of a data file is loaded, the save method is disabled. Use the Force parameter of the save method to override, or use the ctrl method to specify a different FileName prior to calling save.
The 'internal field separator' (IFS) character that is used to delimit fields within each record in the data file. The default is a colon (':') character. See the 'Performance Note' in the Description section, above, regarding parsing each and every field for this character.
Note that when the '#FieldName' header is used for the 'Self Defining File' format, always use a colon (':') character to separate field names within the header.
When it is not appropriate to embed a comment header within a data file to define the field names for each record, pass a FieldNames parameter to the new method. This can be either a colon separated list of fields, or an array.
Examples
$sdfObj = new PTools::SDF::SDF;
$sdfObj = new PTools::SDF::SDF( "/home/cobb/data/testfile.sdf" );
Fetch or set field values within a record. When called without any parameters, returns a zero-based count of entries in the object. Use the count method to obtain a one-based count of entries.
Specify the relative record number within the PTools::SDF::SDF object.
Specify the field name to access within the indexed record.
The Value is an optional parameter that is used to set the value of the specified FieldName. Without a this parameter, the current value of the field is returned.
Examples:
$fieldValue = $sdfObj->param( 0, 'fieldname' );
$sdfObj->param( 0, 'fieldname', "new value" );
There is a special form of the param method. If a hash reference is passed for the FieldName parameter, this hash ref will replace the data record specified by the Index parameter. Note that no checking is done. It is up to the programmer to ensure appropriate key names and values.
This mechanism has many uses, one of which is with the tag2sdf method in the PTools::SDF::TAG class. For example,
use PTools::SDF::SDF;
use PTools::SDF::TAG;
$sdfObj = new PTools::SDF::SDF;
$tagObj = new PTools::SDF::TAG( "myFile.tag" );
$tagHashRef = $tagObj->tag2sdf;
$nextRecord = $sdfObj->count; # (one-based count)
$sdfObj->param( $nextRecord, $tagHashRef );
Fetch the entire record indexed by RecNumber as a hash reference. This can then be used to access and update field values.
WARNING: Modifying data values in the returned hash reference will update the values in the corresponding data record.
Example:
$hashRef = $sdfObj->getRecEntry( $index );
$hashRef->{shell} = "/bin/ksh"; # updates the $sdfObj, too.
Fetch or set control field parameters within an object. This can also be used to cache temporary data in the current PTools::SDF::SDF object. Just be sure to use a unique attribute name and remember that this data will not be saved with the file data. See the dump method for an example of displaying control fields and values.
Specify the field name to access within the indexed record.
The CtrlValue is an optional parameter that is used to set the value of the specified CtrlField. Without a this parameter, the current value of the field is returned.
Examples:
# Specify a new file name for the current PTools::SDF::SDF object
$fieldValue = $sdfObj->ctrl( "fileName", '/tmp/newDataFilename' );
# Fetch a colon-separated list of field names in the file.
# In list context, an array of field names is returned.
$fieldList = $sdfObj->ctrl( "dataFields" );
(@fieldList) = $sdfObj->ctrl( "dataFields" );
# Specify a new list of fieldnames for the current object
# WARN: this will *not* change any existing field names, and
# only existing fields that appear in this list will be written
# to the data file via the "save" method. This is provided as
# a way to create a subset of a file, to add new fields, and/or
# to re-arrange the field order when file is saved to disk.
$sdfObj->ctrl( "dataFields", "colon:separated:list:of:names" );
$sdfObj->ctrl( "dataFields", @fieldNameList );
Delete the value for a named data field within a record.
Example:
# Delete the value for the 'proddesc' field in record 24
$sdfObj->fieldDelete( 24, 'proddesc');
Delete the value for a named control field in the current object.
Example:
# Loading a subset of a data file sets an attribute to disable the
# "save" method. This removes the attribute and re-enables "save":
$sdfObj->ctrlDelete('readOnly');
Delete one or more entire records from the current PTools::SDF::SDF object. The deleted records are available as a return parameter. This will be returned as a list of hash references.
Record number at which to start deleting.
Number of record entries to delete. Defaults to 1;
Examples:
$hashRef = $sdfObj->delete( 5 );
(@arrayRef) = $sdfObj->delete( 10, 30 );
Write the data in the current object out to a disk file.
Note: Only those fields that have an entry in the dataFields control parameter will be written to the disk file. See the ctrl method, above, for details on using this attribute.
In addition, the only control parameters that are saved with the file are the field names.
By default the PTools::SDF::SDF module adds a header that includes the uname of the person running the current script. Use this parameter to log a different user name.
Example:
$webUserid = $ENV{'REMOTE_USER'}; # (from Web Server Basic Auth)
$sdfObj->save( $webUserid, $filename );
By default the PTools::SDF::SDF module saves the file using the original name specified when creating the current object. This can be changed by passing a new file name here.
By default the PTools::SDF::SDF module adds a header that includes a date stamp (Unix 'epoch' number) of when the file was saved. To write a different header, pass the text here.
If a 'subset' of the original data file was loaded using 'match' criteria, the save parameter is disabled by default. Pass any non-null parameter here to override this default and force a save.
WARN: This will cause any records omitted during the load to be lost.
Examples:
$sdfObj->save;
($stat,$err) = $sdfObj->save;
$sdfObj->save( undef, "newfilename" );
Another Example:
$sdfObj->ctrl('fileName', "newfilename" );
$sdfObj->save;
($stat,$err) = $sdfObj->status;
The options used for sorting depend entirely on which sort module is loaded at the time of the call. There are several sort modules available (listed in the See Also section, below).
Options used by the default PTools::SDF::Sort::Bubble module include the following. See description of the extend method, in the PTools::SDF::File class, on how to select other sort modules. See descriptions of the other sort modules for details of the parameters they expect.
The sort modules that accompany the SDF modules will ONLY work with 'PTools::SDF::SDF type' objects.
The Mode parameter can be any of the following. Remember that this list is for the default sorter only. Other sort modules may not allow a mode parameter.
Reverse the sort order. Note that when reversing the sort order, the KeyFields should still be in decending order (primary, secondary, tertiary, etc).
Ignore upper/lower case when sorting.
Both of the above.
For the default sort module, this parameter accepts a list of field names starting with the primary sort key. Other sort modules included with the PTools::SDF::SDF module will only accept a single sort key.
There must be at least two records in an PTools::SDF::SDF object for a sort to be effective.
Example:
$sdfObj->isSortable and $sdfObj->sort( $mode, @keyFields );
These methods exist for convenience to query the state of the object.
Examples:
# The following two examples are equivalent
$sdfObj->hasData and do { ... }
$sdfObj->param and do { ... }
# The following two examples are equivalent
$sdfObj->noData and do { ... }
$sdfObj->param or do { ... }
Determine whether an error occurred during the last call to a method on this object. The stat method returns different values depending on the calling context.
($stat,$err) = $sdfObj->status; $stat = $sdfObj->stat; # scalar context returns status number ($err)= $sdfObj->stat; # array context returns error message $err = $sdfObj->err;
Display contents of the current PTools::SDF::SDF object. This is useful during testing and debugging, but does not produce a 'pretty' format. For large data files limiting the output will be most useful.
Examples:
print $sdfObj->dump; # can produce a *lot* of output
print $sdfObj->dump( 0, -1 ) # dump only the "control field" values
print $sdfObj->dump( 10, 5 ) # dump recs 10 through 15.

Warning: When specifying a single white space IFS character, make sure that there is only one of delimiter character in between each field within a record.

This PTools::SDF::SDF class inherits from the PTools::SDF::File abstract base class. Additional methods are available via this parent class.
The following SDF classes inherit from this class either directly or indirectly. PTools::SDF::ARRAY, PTools::SDF::DIR, PTools::SDF::DSET and PTools::SDF::IDX. These are contained in the 'PTools-SDF-DB' distribution available on CPAN.

See PTools::SDF::Overview, PTools::SDF::ARRAY, PTools::SDF::CSV, PTools::SDF::DB, PTools::SDF::DIR, PTools::SDF::DSET, PTools::SDF::File, PTools::SDF::IDX, PTools::SDF::INI, PTools::SDF::TAG, PTools::SDF::Lock::Advisory, PTools::SDF::Sort::Bubble, PTools::SDF::Sort::Quick, PTools::SDF::Sort::Random and PTools::SDF::Sort::Shell.
In addition, several implementation examples are available. See PTools::SDF::File::AutoHome, PTools::SDF::File::Mnttab and PTools::SDF::File::Passwd. These are contained in the 'PTools-File-Cmd' distribution available on CPAN.

Chris Cobb, <nospamplease@ccobb.net>

Copyright (c) 1997-2007 by Chris Cobb. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.