File::Tabular - searching and editing flat tabular files
use File::Tabular; my $f = new File::Tabular($filename); my $row = $f->fetchrow; print $row->{field1}, $row->{field2}; $row = $f->fetchrow(where => 'someWord'); $row = $f->fetchrow(where => 'field1 > 4 AND field2 >= "01.01.2001"'); $row = $f->fetchrow(where => qr/some\s+(complex\s*)?(regex|regular expression)/i); $f->rewind; my $rows = $f->fetchall(where => 'someField =~ ^[abc]+'); print $_->{someField} foreach @$rows; $f->rewind; $rows = $f->fetchall(where => '+field1:someWord -field2:otherWord', orderBy => 'field3, field6:num, field5:-alpha'); $f->rewind; my $hashRows = $f->fetchall(where => 'foo AND NOT bar', key => 'someField'); print $hashRows->{someKey}{someOtherField}; # open for updates, and remember the updates in a journal file $f = new File::Tabular("+<$filename", {journal => ">>$journalFile"}); # updates at specific positions (line numbers) $f->splices(4 => 2, undef, # delete 2 lines from position 4 7 => 1, {f1 => $v1, f2 => $v2, ...}, # replace line 7 9 => 0, { ...}, # insert 1 new line at position 9 22 => 0, [{...}, {...}, ...] # insert several lines at pos. 22 ... -1 => 0, [{...}, {...}, ...] # append at the end ); # shorthand to add new data at the end $f->append({f1 => $v1, f2 => $v2, ...}); # same thing, but use the "Hash::Type" associated to the file $f->append($f->ht->new($v1, $v2, ...)); $f->clear; # removes all data (but keeps the header line) # updates at specific keys, corresponding to @keyFields $f->writeKeys({key1 => {f1 => $v1, f2 => $v2, ...}, # add or update key2 => undef, # remove ... }, @keyFields); # replay the updates on a backup file my $bck = new File::Tabular("+<$backupFile"); $bck->playJournal($journalFile); # get info from associated filehandle printf "%d size, %d blocks", $f->stat->{size}, $f->stat->{blocks}; my $mtime = $f->mtime; printf "time last modified : %02d:%02d:%02d", @{$mtime}{qw(hour min sec)};
A tabular file is a flat text file containing data organised in rows (records) and columns (fields).
This module provides database-like functionalities for managing tabular files : retrieving, searching, writing, autonumbering, journaling. However, unlike other modules like DBD::CSV, it doesn't try to make it look like a database : rather, the API was designed specifically for work with tabular files. Instead of SQL, search queries are specified in a web-like fashion, with support for regular expressions and cross-field searches. Queries are compiled internally into perl closures before being applied to every data record, which makes it quite fast.
Write operations take a list of modifications as argument; then they apply the whole list atomically in a single rewrite of the data file.
Here are some of the reasons why you might choose to work with a tabular file rather than a regular database :
no need to install a database system (not even buy one)!
easy portability and data exchange with external tools (text editor, spreadsheet, etc.)
search queries immediately ready for a web application
good search performance, even with several thousand records
On the other hand, tabular files will probably be inappropriate if you need very large volumes of data, complex multi-table data models or frequent write operations.
new (open1, open2, ..., {opt1 => v1, opt2 => v2, ...})
Creates a new tabular file object. The list of arguments open1, open2, ... is fed directly to "open" in perlfunc for opening the associated file. Can also be a reference to an already opened filehandle.
open1, open2, ...
The final hash ref is a collection of optional parameters, taken from the following list :
field separator : any character except '%' ('|' by default). Escape sequences like \t are admitted.
\t
record separator ('\n' by default).
string to substitute if fieldSep is met in the data. (by default, url encoding of fieldSep, i.e. '%7C' )
string to substitute if recordSep is met in the data (by default, url encoding of recordSep, i.e. '%0A' )
name of field for which autonumbering is turned on (none by default). This is useful to generate keys : when you write a record, the character '#' in that field will be replaced by a fresh number, incremented automatically. This number will be 1 + the largest number read so far (it is your responsability to read all records before the first write operation).
initial value of the counter for autonumbering (1 by default).
character that will be substituted by an autonumber when writing records ('#' by default).
mode for locking the file, see "flock" in perlfunc. By default, this will be LOCK_EX if open1 contains '>' or '+<', LOCK_SH otherwise.
Number of attempts to lock the file, at 1 second intervals, before returning an error. Zero by default. If nonzero, LOCK_NB is added to flockMode; if zero, a single locking attempt will be made, blocking until the lock is available.
reference to an array of field names. If not present, headers will be read from the first line of the file.
if true, the headers will be printed to the file. If not specified, treated as 'true' if open1 contains '>'.
name of journaling file, or reference to a list of arguments for "open" in perlfunc. The journaling file will log all write operations. If specified as a simple file name, it will be be opened in '>>' mode.
A journal file can then be replayed through method "playJournal" (this is useful to recover after a crash, by playing the journal on a backup copy of your data).
Regular expression for matching a date. Default value is qr/^\d\d?\.\d\d?\.\d\d\d?\d?$/. This will be used by "compileFilter" to perform appropriate comparisons.
qr/^\d\d?\.\d\d?\.\d\d\d?\d?$/
Ref to a function for transforming dates into strings suitable for sorting (i.e. year-month-day). Default is :
sub {my ($d, $m, $y) = ($_[0] =~ /(\d\d?)\.(\d\d?)\.(\d\d\d?\d?)$/); $y += ($y > 50) ? 1900 : 2000 if defined($y) && $y < 100; return sprintf "%04d%02d%02d", $y, $m, $d;}
Regular expression for matching a number. Default value is qr/^[-+]?\d+(?:\.\d*)?$/. This will be used by "compileFilter" to perform appropriate comparisons.
qr/^[-+]?\d+(?:\.\d*)?$/
Strings to insert before or after a match when filtering rows (will only apply to search operator ':' on the whole line, i.e. query "foo OR bar" will highlight both "foo" and "bar", but query "~ 'foo' OR someField:bar" will not highlight anything; furthermore, a match-all request containing just '*' will not highlight anything either).
"foo OR bar"
"~ 'foo' OR someField:bar"
If true, searches will avoid to match on the first field. So a request like $ft->fetchall(where => '123 OR 456') will not find the record with key 123, unless the word '123' appears somewhere in the other fields. This is useful when queries come from a Web application, and we don't want users to match a purely technical field.
$ft->fetchall(where => '123 OR 456')
This search behaviour will not apply to regex searches. So requests like $ft->fetchall(where => qr/\b(123|456)\b/) or $ft->fetchall(where => ' ~ 123 OR ~ 456') will actually find the record with key 123.
$ft->fetchall(where => qr/\b(123|456)\b/)
$ft->fetchall(where => ' ~ 123 OR ~ 456')
fetchrow(where => filter)
returns the next record matching the (optional) filter. If there is no filter, just returns the next record.
The filter is either a code reference generated by "compileFilter", or a string which will be automatically fed as argument to "compileFilter"; this string can contain just a word, a regular expression, a complex boolean query involving field names and operators, etc., as explained below.
fetchall(where => filter, orderBy => cmp)
fetchall(where => filter, key => keySpecif)
finds all next records matching the (optional) filter. If there is no filter, finds all remaining records.
The filter is either a code reference generated by "compileFilter", or a string which will be automatically fed as argument to "compileFilter".
The return value depends on context and on arguments :
if no key parameter is given, and we are in a scalar context, then fetchall returns a reference to an array of records.
fetchall
The optional orderBy parameter can be a field name, a ref to a list of field names, a string like "field1: -alpha, field2:-num, ...", or, more generally, a user-provided comparison function; see "cmp" in Hash::Type for a fully detailed explanation.
"field1: -alpha, field2:-num, ..."
Otherwise, the resulting array is in data source order.
if no key parameter is given, and we are in a list context, then fetchall returns a pair : the first item is a reference to an array of records as explained above ; the second item is a reference to an array of line numbers corresponding to those records (first data line has number 0). These line numbers might be useful later if you update the records through the "splices" method. No orderBy is allowed if fetchall is called in list context.
if a key parameter is given, then fetchall returns a reference to a hash, whose values are the retrieved records, and whose keys are built according to the keySpecif argument. This must be either a single field name (scalar), or a a list of field names (ref to an array of scalars). Values corresponding to those field names will form the key for each entry of the hash; if necessary, multiple values are joined together through $;. No orderBy argument is allowed, because hashes have no ordering.
rewind
Rewinds the file to the first data line (after the headers)
ht
Returns the instance of Hash::Type associated with the file.
headers
returns the list of field names
stat
returns a hash ref corresponding to a call of stat on the associated filehandle. Keys of the hash have names as documented in stat. Ex:
printf "%d size, %d blocks", $f->stat->{size}, $f->stat->{blocks};
atime
mtime
ctime
each of these methods returns a hash ref corresponding to a call of localtime on the last access time, last modified time, or last inode change time of the associated filehandle (see stat for explanations). Keys of the hash have names as documented in localtime. Ex:
my $mtime = $f->mtime; printf "time last modified : %02d:%02d:%02d", @{$mtime}{qw(hour min sec)};
splices
splices(pos1 => 2, undef, # delete 2 lines pos2 => 1, row, # replace 1 line pos3 => 0, [row1, row2 ...] # insert lines ... -1 => 0, [row1, ... ] # append lines ); # special case : autonum if pos== -1
Updates the data, in a spirit similar to "splice" in perlfunc (hence the name of the method). The whole file is rewritten in an atomic operation, deleting, replacing or appending data lines as specified by the "splice instructions". Returns the number of "splice instructions" performed.
A splice instruction is a triple composed of :
a position (line number) that specifies the place where modifications will occur. Line numbers start at 0. Position -1 means end of data.
a number of lines to delete (might be zero).
a ref to a hash or to a list of hashes containing new data to insert (or undef if there is no new data).
undef
If there are several splice instructions, their positions must be sorted in increasing order (except of course position -1, meaning "end of data", which must appear last).
Positions always refer to line numbers in the original file, before any modifications. Therefore, it makes no sense to write
splices(10 => 5, undef, 12 => 0, $myRow)
because after deleting 5 rows at line 10, we cannot insert a new row at line 12.
The whole collection of splice instructions may also be passed as an array ref instead of a list.
If you intend to fetch rows again after a splice, you must rewind the file first.
append(row1, row2, ...)
This appends new records at the end of data, i.e. it is a shorthand for
splices(-1 => 0, [row1, row2, ...])
clear
removes all data (but keeps the header line)
writeKeys({key1 => row1, key2 => ...}, @keyFields)
Rewrites the whole file, applying modifications as specified in the hash ref passed as first argument. Keys in this hash are compared to keys built from the original data, according to @keyFields. Therefore, row1 may replace an existing row, if the key corresponding to key1 was found ; otherwise, a new row is added. If row1 is undef, the corresponding row is deleted from the file.
@keyFields
row1
key1
@keyFields must contain the name of one or several fields that build up the primary key. For each data record, the values corresponding to those fields are taken and joined together through $;, and then compared to key1, key2, etc.
key2
If you intend to fetch rows again after a writeKeys, you must rewind the file first.
playJournal(open1, open2, ...)
Reads a sequence of update instructions from a journal file and applies them to the current tabular file. Arguments open1, open2, ... will be passed to perl open for opening the journal file ; in most cases, just give the filename.
The journal file must contain a sequence of instructions as encoded by the automatic journaling function of this module ; to activate journaling, see the journal parameter of the "new" method.
journal
compileFilter(query [, implicitPlus])
Compiles a query into a filter (code reference) that can be passed to "fetchrow" or "fetchall".
The query can be
a regular expression compiled through qr/.../. The regex will be applied to whole data lines, and therefore covers all fields at once. This is the fastest way to filter lines, because it avoids systematic splitting into data records.
qr/.../
a data structure resulting from a previous call to Search::QueryParser::parse
Search::QueryParser::parse
a string of shape K_E_Y : value (without any spaces before or after ':'). This will be compiled into a regex matching value in the first column. The special spelling is meant to avoid collision with a real field hypothetically named 'KEY'.
K_E_Y : value
value
a string that will be analyzed through Search::QueryParser, and then compiled into a filter function. The query string can contain boolean combinators, parenthesis, comparison operators, etc., as documented in Search::QueryParser. The optional second argument implicitPlus is passed to Search::QueryParser::parse ; if true, an implicit '+' is added in front of every query item (therefore the whole query is a big AND).
Search::QueryParser
Notice that in addition to usual comparison operators, you can also use regular expressions in queries like
+field1=~'^[abc]+' +field2!~'foobar$'
The query compiler needs to distinguish between word and non-word characters ; therefore it is important to use locale in your scripts (see perllocale). The compiler tries to be clever about a number of details :
use locale
Words in queries become regular expressions enclosed by \b (word boundaries) ; so a query for foo OR bar will not match foobar.
\b
foo OR bar
foobar
A '*' in a word is compiled into regular expression \w* ; so queries foo* or *bar will both match foobar.
\w*
foo*
*bar
Iso-latin-1 accented characters are translated into character classes, so for example hétaïre becomes qr/h[ée]ta[ïi]re/i. Furthermore, as shown in this example, the i flag is turned on (case-insensitive). Therefore this query will also match HETAIRE.
hétaïre
qr/h[ée]ta[ïi]re/i
i
HETAIRE
When compiling a subquery like fieldname >= 'value', the compiler checks the value against rxNum and rxDate (as specified in the "new" method). Depending on these tests, the subquery is translated into a string comparison, a numerical comparison, or a date comparison (more precisely, {date2str($a) cmp date2str($b)}).
fieldname >= 'value'
rxNum
rxDate
{date2str($a) cmp date2str($b)}
Operator # means comparison with a set of integers; internally this is implemented with a bit vector. So query Id#2,3,5,7,11,13,17 will return records where field Id contains one of the listed integers. The field name may be omitted if it is the first field (usually the key field).
#
Id#2,3,5,7,11,13,17
Id
Words matched by a query can be highlighted; see parameters preMatch and postMatch in the "new" method.
preMatch
postMatch
Laurent Dami, <laurent.dami AT etat ge ch>
Copyright (C) 2005 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in 'C<hétaïre>'. Assuming CP1252
To install File::Tabular, copy and paste the appropriate command in to your terminal.
cpanm
cpanm File::Tabular
CPAN shell
perl -MCPAN -e shell install File::Tabular
For more information on module installation, please visit the detailed CPAN module installation guide.