Genealogy::Gedcom::Date - Parse GEDCOM dates
my($parser) = Genealogy::Gedcom::Date -> new; or, in debug mode, which prints progress reports: my($parser) = Genealogy::Gedcom::Date -> new(debug => 1); # These samples are from t/value.t. for my $candidate ( '(Unknown date)', # Use parse_interpreted_date(). 'Abt 1 Jan 2001', # use parse_approximate_date(). 'Aft 1 Jan 2001', # Use parse_date_range(). 'From 0' # Use parse_date_period(). ) { my($hashref) = $parser -> parse_date_value(date => $candidate); }
See the "FAQ"'s first QA for the definition of $hashref.
Genealogy::Gedcom::Date ships with t/date.t, t/escape.t and t/value.t. You are strongly encouraged to peruse them, and perhaps to set the debug option in each to see extra progress reports.
Genealogy::Gedcom::Date provides a parser for GEDCOM dates.
See the GEDCOM Specification Ged551-5.pdf.
Install Genealogy::Gedcom::Date as you would for any Perl module:
Perl
Run:
cpanm Genealogy::Gedcom::Date
or run:
sudo cpan Genealogy::Gedcom::Date
or unpack the distro, and then either:
perl Build.PL ./Build ./Build test sudo ./Build install
or:
perl Makefile.PL make (or dmake or nmake) make test make install
new() is called as my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...).
new()
my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...)
It returns a new object of type Genealogy::Gedcom::Date.
Genealogy::Gedcom::Date
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. debug()]):
The string to be parsed.
This string is always converted to lower case before being processed.
Default: ''.
This parameter is optional. It can be supplied to new() or to parse_approximate_date([%arg]), parse_date_period([%arg]) or parse_date_range([%arg]).
Turn debugging prints off or on.
Default: 0.
This parameter is optional.
The [] indicate an optional parameter.
Get or set the debug flag.
Returns an arrayref of 2 arrayrefs, the first being the month names in English and the second being the month abbreviations.
Here, the [] indicate an optional parameter.
Parse the candidate date and return a hashref.
The date is expected to be an approximate date as per p. 45 of the GEDCOM Specification Ged551-5.pdf.
Key => value pairs for %arg:
Specify the string to parse.
The candidate can be passed in to new as new(date => $a_string), or into this method as parse_approximate_date(date => $a_string).
The string in parse_approximate_date(date => $a_string) takes precedence over the one in new(date => $a_string).
Throw an exception if the string cannot be parsed.
Specify the case-insensitive words, in your language, which indicate an approximate date.
This lets you specify a candidate as 'Abt 1999', 'Cal 2000' or 'Est 1999', and have the code recognize 'Abt', 'Cal' and 'Est'.
This parameter is optional. If supplied, it must be a 3-element arrayref.
The elements of this arrayref are:
Default: 'Abt', for 'About'.
Default: 'Cal', for 'Calculated'.
Default: 'Est', for 'Estimated'.
You must use the abbreviated forms of those words.
Note: These arrayref elements are not the same as used by parse_date_period([%arg]) nor as used by parse_date_range([%arg]).
These strings are always converted to lower case before being processed.
This key is explained in the "FAQ".
The string in parse_approximate_date(style => $a_string) takes precedence over the one in new(style => $a_string).
Default: 'english'.
The return value is a hashref as described in the "FAQ"'s first QA.
Since a single date is provided, with 'Abt 1999', 'Cal 1999' or 'Est 2000 BC', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.
Parse the candidate period and return a hashref.
The date is expected to be a date period as per p. 46 of the GEDCOM Specification Ged551-5.pdf.
The candidate period can be passed in to new as new(date => $a_string), or into this method as parse_date_period(date => $a_string).
The string in parse_date_period(date => $a_string) takes precedence over the one in new(date => $a_string).
Specify the case-insensitive words, in your language, which indicate a date period.
This lets you specify a period as 'From 1999', 'To 2000' or 'From 1999 to 2000', and have the code recognize 'From' and 'To'.
This parameter is optional. If supplied, it must be a 2-element arrayref.
The 'From' and 'To' strings can be passed in to new as new(from_to => $arrayref), or into this method as parse_date_period(from_to => $arrayref).
Default: 'From'.
Default: 'To'.
Note: These arrayref elements are not the same as used by parse_approximate_date([%arg]) nor as used by parse_date_range([%arg]).
The string in parse_date_period(style => $a_string) takes precedence over the one in new(style => $a_string).
The return value is a hashref as described in the "FAQ"'s first Q and A.
Parse the candidate range and return a hashref.
The date is expected to be a date range as per p. 47 of the GEDCOM Specification Ged551-5.pdf.
The candidate range can be passed in to new as new(date => $a_string), or into this method as parse_date_range(date => $a_string).
The string in parse_date_range(date => $a_string) takes precedence over the one in new(date => $a_string).
Specify the case-insensitive words, in your language, which indicate a date range.
This lets you specify a range as 'Bef 1999', 'Aft 2000' or 'Bet 1999 and 2000', and have the code recognize 'Bef', 'Aft', 'Bet' and 'And'.
Default: ['Aft', 'Bef', 'Bet'], which stand for 'After', 'Before' and 'Between'.
Default: 'And'.
Note: These arrayref elements are not the same as used by parse_approximate_date([%arg]) nor as used by parse_date_period([%arg]).
The string in parse_date_range(style => $a_string) takes precedence over the one in new(style => $a_string).
When a single date is provided, with 'Aft 1999' or 'Bef 2000 BC', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.
Parse the candidate date using a series of methods, until one succeeds or we run out of methods.
See the definition of date_value on p. 47 of the GEDCOM Specification Ged551-5.pdf.
The methods are, in this order:
In the hash %arg, only the 'date' key is passed to the named method. In each case, the algorithm must use the default for the other key, since the name and format of that other key depends on the method.
See t/value.t for details.
Throw an exception if the date cannot be parsed.
Parse the string and return a hashref as described in the "FAQ"'s first Q and A.
The candidate can be passed in to new as new(date => $a_string), or into this method as parse_datetime($a_string) or parse_datetime(date => $a_string).
The string in parse_datetime($a_string) takes precedence over the one in new(date => $a_string).
The date is expected to be an exact date as per p. 45 of the GEDCOM Specification Ged551-5.pdf.
The date string is mandatory.
Throw an exception if the date string cannot be parsed.
Further, the 'style' key can be passed in as parse_datetime(date => $a_string, style => 'standard').
The string in parse_datetime(style => $a_string) takes precedence over the one in new(style => $a_string).
The date is expected to be an interpreted date as per the definition of date_value on p. 47 of the GEDCOM Specification Ged551-5.pdf.
The candidate can be passed in to new as new(date => $a_string), or into this method as parse_interpreted_date(date => $a_string).
The string in parse_interpreted_date(date => $a_string) takes precedence over the one in new(date => $a_string).
Specify a case-insensitive word, in your language, which indicates an interpreted date.
This lets you specify a candidate as 'Int 1999', 'Int 2000 (more or less)' or '(Date not known)', and have the code recognize 'Int'.
This parameter is optional. If supplied, it must be a string meaning 'Int'.
Default: 'Int'.
The string in parse_interpreted_date(style => $a_string) takes precedence over the one in new(style => $a_string).
Since a single date is provided, with 'Int 1999' or 'Int 1999 (more or less)', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.
Also in the returned hashref, the key 'phrase' will have the value of the text between '(' and ')', if any.
Parse the fields of the date, already split on ' ', '-' and '/', and return the fields as an array.
In the process, convert month full names and abbreviations to Gregorian abbreviations, to make parsing easier.
Supported calendars:
Notes:
It has these key => value pairs:
Returns the first (or only) date as a string, after 'Abt', 'Bef', 'From' or whatever.
This is for cases like '1999' in 'abt 1999', '1999' in 'bef 1999, '1999' in 'from 1999', and for '1999' in 'from 1999 to 2000'.
A missing month defaults to 01. A missing day defaults to 01.
'500BC' will be returned as '0500-01-01', with the 'one_bc' flag set. See also the key 'one_date'.
Default: DateTime::Infinite::Past -> new, which stringifies to '-inf'.
Note: On some systems (MS Windows), DateTime::Infinite::Past -> new stringifies to '-1.#INF', but, as of V 1.02, the code changes this to '-inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Past -> new stringifies to '-Infinity', but, as of V 1.07, the code changes this to '-inf'.
The default value does not set the one_ambiguous and one_bc flags.
Returns 1 if the first (or only) date is ambiguous. Possibilities:
This is checked for by testing whether or not the day is <= 12, since in that case it could be a month.
Obviously, the 'one_ambiguous' flag can be set for a date specified in a non-ambiguous way, e.g. 'From 1 Jan 2000', since the numeric value of the month is 1 and the day is also 1.
Returns 1 if the first date is followed by one of (case-insensitive): 'B.C.', 'BC.' or 'BC'. 'BC' may be written as 'BCE', with or without full-stops.
In the input, this suffix can be separated from the year by spaces, so both '500BC' and '500 B.C.' are accepted.
This object is of type DateTime.
Warning: Since these objects only accept 4-digit years, any year 0 .. 999 will have 1000 added to it. Of course, the value for the 'one' key will not have 1000 added it.
This means that if the value of the 'one' key does not match the stringified value of the 'one_date' key (assuming the latter is not '-inf'), then the year is < 1000.
Alternately, if the stringified value of the 'one_date' key is '-inf', the period supplied did not have a 'From' date.
Returns 1 if the input date had no value for the first date's day. The code sets the default day to 1.
Returns 1 if the input date had no value for the first date's month. The code sets the default month to 1.
This holds the text, if any, between '(' and ')' in an interpreted date.
Possible values for the prefix:
Returns the second (or only) date as a string, after 'and' in 'bet 1999 and 2000', or 'to' in 'from 1999 to 2000', or '2000' in 'to 2000'.
'500BC' will be returned as '0500-01-01', with the 'two_bc' flag set. See also the key 'two_date'.
Default: DateTime::Infinite::Future -> new, which stringifies to 'inf'.
Note: On some systems (MS Windows), DateTime::Infinite::Future -> new stringifies to '1.#INF', but, as of V 1.03, the code changes this to 'inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Future -> new stringifies to 'Infinity', but, as of V 1.07, the code changes this to 'inf'.
The default value does not set the two_ambiguous and two_bc flags.
Returns 1 if the second (or only) date is ambiguous. Possibilities:
Obviously, the 'two_ambiguous' flag can be set for a date specified in a non-ambiguous way, e.g. 'To 1 Jan 2000', since the numeric value of the month is 1 and the day is also 1.
Returns 1 if the second date is followed by one of (case-insensitive): 'B.C.', 'BC.' or 'BC'. 'BC' may be written as 'BCE', with or without full-stops.
Warning: Since these objects only accept 4-digit years, any year 0 .. 999 will have 1000 added to it. Of course, the value for the 'two' key will not have 1000 added it.
This means that if the value of the 'two' key does not match the stringified value of the 'two_date' key (assuming the latter is not 'inf'), then the year is < 1000.
Alternately, if the stringified value of the 'two_date' key is 'inf', the period supplied did not have a 'To' date.
Returns 1 if the input date had no value for the second date's day. The code sets the default day to 1.
Returns 1 if the input date had no value for the second date's month. The code sets the default month to 1.
So far (as reported by CPAN Testers):
Possible values:
Expect dates in 'month day year' format, as in From Jan 2 2011 BC to Mar 4 2011.
Expect dates in 'day month year' format, as in From 1 Jan 2001 to 25 Dec 2002.
This is the default.
Expect dates in 'year month day' format, as in 2011-01-02 to 2011-03-04.
The string in parse_*(style => $a_string) takes precedence over the one in new(style => $a_string).
Use the hashref keys 'one' and 'two', to get dates in the form 2011-06-21. Re-format as necessary.
Such a hashref is returned from all parse_*() methods.
No, not yet. See "process_date_escape(@field)" for more details.
Firstly, all commas are deleted from incoming dates.
Then, dates are split on ' ', '-' and '/', and the resultant fields are analyzed one at a time.
The 'style' key can be used to force the code to assume a certain type of date format. This option is explained above, in this FAQ.
A missing month is set to 1 and a missing day is set to 1.
Further, in the hashref returned by the parse_*() methods, the flags one_default_month, one_default_day, two_default_month and two_default_day are set to 1, as appropriate, so you can tell that the code supplied the value.
Note: These flags take a Boolean value; it is only by coincidence that they can take the value of the default month or day.
Because such objects have the sophistication required to handle such a complex topic.
See DateTime and http://datetime.perl.org/wiki/datetime/dashboard for details.
Then the returned hashref will have:
Clearly then, the code does not reorder the dates.
The DateTime suite of modules aren't designed, IMHO, for GEDCOM-like applications. It was a mistake to use that name in the first place.
By releasing under the Genealogy::Gedcom::* namespace, I can be much more targeted in the data types I choose as method return values.
My policy is to use the lightweight Hash::FieldHash for stand-alone modules and Moose for applications.
Sample code to overload '<' and '>' is in Gedcom::Date.
See p. 65 of the GEDCOM Specification Ged551-5.pdf.
Genealogy::Gedcom.
Gedcom::Date.
See "References" in Genealogy::Gedcom::Reader::Lexer.
The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom::Date.
Thanx to Eugene van der Pijll, the author of the Gedcom::Date::* modules.
Thanx also to the authors of the DateTime::* family of modules. See http://datetime.perl.org/wiki/datetime/dashboard for details.
Genealogy::Gedcom::Date was written by Ron Savage <ron@savage.net.au> in 2011.
Home page: http://savage.net.au/index.html.
Australian copyright (c) 2011, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
To install Genealogy::Gedcom::Date, copy and paste the appropriate command in to your terminal.
cpanm
CPAN shell
perl -MCPAN -e shell install Genealogy::Gedcom::Date
For more information on module installation, please visit the detailed CPAN module installation guide.