The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Genealogy::Gedcom::Date - Parse GEDCOM dates

Synopsis

        my($parser) = Genealogy::Gedcom::Date -> new;

        or, in debug mode, which prints progress reports:

        my($parser) = Genealogy::Gedcom::Date -> new(debug => 1);

        # These samples are from t/value.t.

        for my $candidate (
        '(Unknown date)', # Use parse_interpreted_date().
        'Abt 1 Jan 2001', # use parse_approximate_date().
        'Aft 1 Jan 2001', # Use parse_date_range().
        'From 0'          # Use parse_date_period().
        )
        {
                my($hashref) = $parser -> parse_date_value(date => $candidate);
        }

See the "FAQ"'s first QA for the definition of $hashref.

Genealogy::Gedcom::Date ships with t/date.t, t/escape.t and t/value.t. You are strongly encouraged to peruse them, and perhaps to set the debug option in each to see extra progress reports.

Description

Genealogy::Gedcom::Date provides a parser for GEDCOM dates.

See the GEDCOM Specification Ged551-5.pdf.

Installation

Install Genealogy::Gedcom::Date as you would for any Perl module:

Run:

        cpanm Genealogy::Gedcom::Date

or run:

        sudo cpan Genealogy::Gedcom::Date

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type Genealogy::Gedcom::Date.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. debug()]):

o date => $a_string

The string to be parsed.

This string is always converted to lower case before being processed.

Default: ''.

This parameter is optional. It can be supplied to new() or to parse_approximate_date([%arg]), parse_date_period([%arg]) or parse_date_range([%arg]).

o debug => $Boolean

Turn debugging prints off or on.

Default: 0.

This parameter is optional.

Methods

debug([$Boolean])

The [] indicate an optional parameter.

Get or set the debug flag.

month_names_in_gregorian()

Returns an arrayref of 2 arrayrefs, the first being the month names in English and the second being the month abbreviations.

parse_approximate_date([%arg])

Here, the [] indicate an optional parameter.

Parse the candidate date and return a hashref.

The date is expected to be an approximate date as per p. 45 of the GEDCOM Specification Ged551-5.pdf.

Key => value pairs for %arg:

o date => $a_string

Specify the string to parse.

This parameter is optional.

The candidate can be passed in to new as new(date => $a_string), or into this method as parse_approximate_date(date => $a_string).

The string in parse_approximate_date(date => $a_string) takes precedence over the one in new(date => $a_string).

This string is always converted to lower case before being processed.

Throw an exception if the string cannot be parsed.

o prefix => $arrayref

Specify the case-insensitive words, in your language, which indicate an approximate date.

This lets you specify a candidate as 'Abt 1999', 'Cal 2000' or 'Est 1999', and have the code recognize 'Abt', 'Cal' and 'Est'.

This parameter is optional. If supplied, it must be a 3-element arrayref.

The elements of this arrayref are:

o A string

Default: 'Abt', for 'About'.

o A string

Default: 'Cal', for 'Calculated'.

o A string

Default: 'Est', for 'Estimated'.

You must use the abbreviated forms of those words.

Note: These arrayref elements are not the same as used by parse_date_period([%arg]) nor as used by parse_date_range([%arg]).

These strings are always converted to lower case before being processed.

o style => /^american|english|standard$/

This key is explained in the "FAQ".

The string in parse_approximate_date(style => $a_string) takes precedence over the one in new(style => $a_string).

Default: 'english'.

The return value is a hashref as described in the "FAQ"'s first QA.

Since a single date is provided, with 'Abt 1999', 'Cal 1999' or 'Est 2000 BC', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.

parse_date_period([%arg])

Here, the [] indicate an optional parameter.

Parse the candidate period and return a hashref.

The date is expected to be a date period as per p. 46 of the GEDCOM Specification Ged551-5.pdf.

Key => value pairs for %arg:

o date => $a_string

Specify the string to parse.

This parameter is optional.

The candidate period can be passed in to new as new(date => $a_string), or into this method as parse_date_period(date => $a_string).

The string in parse_date_period(date => $a_string) takes precedence over the one in new(date => $a_string).

This string is always converted to lower case before being processed.

Throw an exception if the string cannot be parsed.

o from_to => $arrayref

Specify the case-insensitive words, in your language, which indicate a date period.

This lets you specify a period as 'From 1999', 'To 2000' or 'From 1999 to 2000', and have the code recognize 'From' and 'To'.

This parameter is optional. If supplied, it must be a 2-element arrayref.

The 'From' and 'To' strings can be passed in to new as new(from_to => $arrayref), or into this method as parse_date_period(from_to => $arrayref).

The elements of this arrayref are:

o A string

Default: 'From'.

o A string

Default: 'To'.

Note: These arrayref elements are not the same as used by parse_approximate_date([%arg]) nor as used by parse_date_range([%arg]).

These strings are always converted to lower case before being processed.

o style => /^american|english|standard$/

This key is explained in the "FAQ".

The string in parse_date_period(style => $a_string) takes precedence over the one in new(style => $a_string).

Default: 'english'.

The return value is a hashref as described in the "FAQ"'s first Q and A.

parse_date_range([%arg])

Here, the [] indicate an optional parameter.

Parse the candidate range and return a hashref.

The date is expected to be a date range as per p. 47 of the GEDCOM Specification Ged551-5.pdf.

Key => value pairs for %arg:

o date => $a_string

Specify the string to parse.

This parameter is optional.

The candidate range can be passed in to new as new(date => $a_string), or into this method as parse_date_range(date => $a_string).

The string in parse_date_range(date => $a_string) takes precedence over the one in new(date => $a_string).

This string is always converted to lower case before being processed.

Throw an exception if the string cannot be parsed.

o from_to => $arrayref

Specify the case-insensitive words, in your language, which indicate a date range.

This lets you specify a range as 'Bef 1999', 'Aft 2000' or 'Bet 1999 and 2000', and have the code recognize 'Bef', 'Aft', 'Bet' and 'And'.

This parameter is optional. If supplied, it must be a 2-element arrayref.

The elements of this arrayref are:

o An arrayref

Default: ['Aft', 'Bef', 'Bet'], which stand for 'After', 'Before' and 'Between'.

You must use the abbreviated forms of those words.

o A string

Default: 'And'.

Note: These arrayref elements are not the same as used by parse_approximate_date([%arg]) nor as used by parse_date_period([%arg]).

These strings are always converted to lower case before being processed.

o style => /^american|english|standard$/

This key is explained in the "FAQ".

The string in parse_date_range(style => $a_string) takes precedence over the one in new(style => $a_string).

Default: 'english'.

The return value is a hashref as described in the "FAQ"'s first Q and A.

When a single date is provided, with 'Aft 1999' or 'Bef 2000 BC', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.

parse_date_value(%arg)

Parse the candidate date using a series of methods, until one succeeds or we run out of methods.

See the definition of date_value on p. 47 of the GEDCOM Specification Ged551-5.pdf.

The methods are, in this order:

o parse_date_period
o parse_date_range
o parse_approximate_date
o parse_interpreted_date

In the hash %arg, only the 'date' key is passed to the named method. In each case, the algorithm must use the default for the other key, since the name and format of that other key depends on the method.

See t/value.t for details.

Throw an exception if the date cannot be parsed.

parse_datetime($a_string)

Parse the string and return a hashref as described in the "FAQ"'s first Q and A.

The candidate can be passed in to new as new(date => $a_string), or into this method as parse_datetime($a_string) or parse_datetime(date => $a_string).

The string in parse_datetime($a_string) takes precedence over the one in new(date => $a_string).

The date is expected to be an exact date as per p. 45 of the GEDCOM Specification Ged551-5.pdf.

The date string is mandatory.

Throw an exception if the date string cannot be parsed.

Further, the 'style' key can be passed in as parse_datetime(date => $a_string, style => 'standard').

The string in parse_datetime(style => $a_string) takes precedence over the one in new(style => $a_string).

Default: 'english'.

parse_interpreted_date([%arg])

Here, the [] indicate an optional parameter.

Parse the candidate date and return a hashref.

The date is expected to be an interpreted date as per the definition of date_value on p. 47 of the GEDCOM Specification Ged551-5.pdf.

Key => value pairs for %arg:

o date => $a_string

Specify the string to parse.

This parameter is optional.

The candidate can be passed in to new as new(date => $a_string), or into this method as parse_interpreted_date(date => $a_string).

The string in parse_interpreted_date(date => $a_string) takes precedence over the one in new(date => $a_string).

This string is always converted to lower case before being processed.

Throw an exception if the string cannot be parsed.

o prefix => $a_string

Specify a case-insensitive word, in your language, which indicates an interpreted date.

This lets you specify a candidate as 'Int 1999', 'Int 2000 (more or less)' or '(Date not known)', and have the code recognize 'Int'.

This parameter is optional. If supplied, it must be a string meaning 'Int'.

This string is always converted to lower case before being processed.

Default: 'Int'.

o style => /^american|english|standard$/

This key is explained in the "FAQ".

The string in parse_interpreted_date(style => $a_string) takes precedence over the one in new(style => $a_string).

Default: 'english'.

The return value is a hashref as described in the "FAQ"'s first Q and A.

Since a single date is provided, with 'Int 1999' or 'Int 1999 (more or less)', the date is stored - in the returned hashref - under the 2 keys 'one' and 'one_date'. The other date in the hashref ('two', 'two_date') is an object of type DateTime::Infinite::Future.

Also in the returned hashref, the key 'phrase' will have the value of the text between '(' and ')', if any.

process_date_escape(@field)

Parse the fields of the date, already split on ' ', '-' and '/', and return the fields as an array.

In the process, convert month full names and abbreviations to Gregorian abbreviations, to make parsing easier.

Supported calendars:

o Gregorian, using the escape @#DGregorian@

Notes:

o Non-Gregorian date escapes are ignored at this stage
o See t/escape.t for sample code

FAQ

What is the format of the hashref returned by parse_*()?

It has these key => value pairs:

o one => $first_date_in_range

Returns the first (or only) date as a string, after 'Abt', 'Bef', 'From' or whatever.

This is for cases like '1999' in 'abt 1999', '1999' in 'bef 1999, '1999' in 'from 1999', and for '1999' in 'from 1999 to 2000'.

A missing month defaults to 01. A missing day defaults to 01.

'500BC' will be returned as '0500-01-01', with the 'one_bc' flag set. See also the key 'one_date'.

Default: DateTime::Infinite::Past -> new, which stringifies to '-inf'.

Note: On some systems (MS Windows), DateTime::Infinite::Past -> new stringifies to '-1.#INF', but, as of V 1.02, the code changes this to '-inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Past -> new stringifies to '-Infinity', but, as of V 1.07, the code changes this to '-inf'.

The default value does not set the one_ambiguous and one_bc flags.

o one_ambiguous => $Boolean

Returns 1 if the first (or only) date is ambiguous. Possibilities:

o Only the year is present
o Only the year and month are present
o The day and month are reversible

This is checked for by testing whether or not the day is <= 12, since in that case it could be a month.

Obviously, the 'one_ambiguous' flag can be set for a date specified in a non-ambiguous way, e.g. 'From 1 Jan 2000', since the numeric value of the month is 1 and the day is also 1.

Default: 0.

o one_bc => $Boolean

Returns 1 if the first date is followed by one of (case-insensitive): 'B.C.', 'BC.' or 'BC'. 'BC' may be written as 'BCE', with or without full-stops.

In the input, this suffix can be separated from the year by spaces, so both '500BC' and '500 B.C.' are accepted.

Default: 0.

o one_date => $a_date_object

This object is of type DateTime.

Warning: Since these objects only accept 4-digit years, any year 0 .. 999 will have 1000 added to it. Of course, the value for the 'one' key will not have 1000 added it.

This means that if the value of the 'one' key does not match the stringified value of the 'one_date' key (assuming the latter is not '-inf'), then the year is < 1000.

Alternately, if the stringified value of the 'one_date' key is '-inf', the period supplied did not have a 'From' date.

Default: DateTime::Infinite::Past -> new, which stringifies to '-inf'.

Note: On some systems (MS Windows), DateTime::Infinite::Past -> new stringifies to '-1.#INF', but, as of V 1.02, the code changes this to '-inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Past -> new stringifies to '-Infinity', but, as of V 1.07, the code changes this to '-inf'.

o one_default_day => $Boolean

Returns 1 if the input date had no value for the first date's day. The code sets the default day to 1.

Default: 0.

o one_default_month => $Boolean

Returns 1 if the input date had no value for the first date's month. The code sets the default month to 1.

Default: 0.

o phrase => $string

This holds the text, if any, between '(' and ')' in an interpreted date.

Default: ''.

o prefix => $string

Possible values for the prefix:

o 'abt', given the approximate date 'Abt 1999'
o 'aft', given the date range 'Aft 1999'
o 'bef', given the date range 'Bef 1999'
o 'bet', given the date range 'Bet 1999 and 2000'
o 'cal', given the approximate date 'Cal 1999'
o 'est', given the approximate date 'Est 1999'
o 'from', given the date period 'From 1999' or 'From 1999 to 2000'
o 'int', given the interpreted date 'Int 1999 (Guesswork)'
o 'phrase', given the date phrase '(Unknown)'
o 'to', given the date period 'To 2000'

Default: ''.

o two => $second_date_in_range

Returns the second (or only) date as a string, after 'and' in 'bet 1999 and 2000', or 'to' in 'from 1999 to 2000', or '2000' in 'to 2000'.

A missing month defaults to 01. A missing day defaults to 01.

'500BC' will be returned as '0500-01-01', with the 'two_bc' flag set. See also the key 'two_date'.

Default: DateTime::Infinite::Future -> new, which stringifies to 'inf'.

Note: On some systems (MS Windows), DateTime::Infinite::Future -> new stringifies to '1.#INF', but, as of V 1.03, the code changes this to 'inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Future -> new stringifies to 'Infinity', but, as of V 1.07, the code changes this to 'inf'.

The default value does not set the two_ambiguous and two_bc flags.

o two_ambiguous => $Boolean

Returns 1 if the second (or only) date is ambiguous. Possibilities:

o Only the year is present
o Only the year and month are present
o The day and month are reversible

This is checked for by testing whether or not the day is <= 12, since in that case it could be a month.

Obviously, the 'two_ambiguous' flag can be set for a date specified in a non-ambiguous way, e.g. 'To 1 Jan 2000', since the numeric value of the month is 1 and the day is also 1.

Default: 0.

o two_bc => $Boolean

Returns 1 if the second date is followed by one of (case-insensitive): 'B.C.', 'BC.' or 'BC'. 'BC' may be written as 'BCE', with or without full-stops.

In the input, this suffix can be separated from the year by spaces, so both '500BC' and '500 B.C.' are accepted.

Default: 0.

o two_date => $a_date_object

This object is of type DateTime.

Warning: Since these objects only accept 4-digit years, any year 0 .. 999 will have 1000 added to it. Of course, the value for the 'two' key will not have 1000 added it.

This means that if the value of the 'two' key does not match the stringified value of the 'two_date' key (assuming the latter is not 'inf'), then the year is < 1000.

Alternately, if the stringified value of the 'two_date' key is 'inf', the period supplied did not have a 'To' date.

Default: DateTime::Infinite::Future -> new, which stringifies to 'inf'.

Note: On some systems (MS Windows), DateTime::Infinite::Future -> new stringifies to '1.#INF', but, as of V 1.03, the code changes this to 'inf'. Likewise, on some systems (Solaris), DateTime::Infinite::Future -> new stringifies to 'Infinity', but, as of V 1.07, the code changes this to 'inf'.

o two_default_day => $Boolean

Returns 1 if the input date had no value for the second date's day. The code sets the default day to 1.

Default: 0.

o two_default_month => $Boolean

Returns 1 if the input date had no value for the second date's month. The code sets the default month to 1.

Default: 0.

On what systems do DateTime::Inifinite::(Past, Future) return '-1.#INF' and '1.#INF'?

So far (as reported by CPAN Testers):

o Win32::GetOSName = Win7
o Win32::GetOSName = WinXP/.Net

On what systems do DateTime::Inifinite::(Past, Future) return '-Infinity' and 'Infinity'?

So far (as reported by CPAN Testers):

o osname=solaris, osvers=2.11

What is the meaning of the 'style' key in calls to the new() and parse_*() methods?

Possible values:

o style => 'american'

Expect dates in 'month day year' format, as in From Jan 2 2011 BC to Mar 4 2011.

o style => 'english'

Expect dates in 'day month year' format, as in From 1 Jan 2001 to 25 Dec 2002.

This is the default.

o style => 'standard'

Expect dates in 'year month day' format, as in 2011-01-02 to 2011-03-04.

The string in parse_*(style => $a_string) takes precedence over the one in new(style => $a_string).

How do I format dates for output?

Use the hashref keys 'one' and 'two', to get dates in the form 2011-06-21. Re-format as necessary.

Such a hashref is returned from all parse_*() methods.

Does this module handle non-Gregorian calendars?

No, not yet. See "process_date_escape(@field)" for more details.

How are the various date formats handled?

Firstly, all commas are deleted from incoming dates.

Then, dates are split on ' ', '-' and '/', and the resultant fields are analyzed one at a time.

The 'style' key can be used to force the code to assume a certain type of date format. This option is explained above, in this FAQ.

How are incomplete dates handled?

A missing month is set to 1 and a missing day is set to 1.

Further, in the hashref returned by the parse_*() methods, the flags one_default_month, one_default_day, two_default_month and two_default_day are set to 1, as appropriate, so you can tell that the code supplied the value.

Note: These flags take a Boolean value; it is only by coincidence that they can take the value of the default month or day.

Why are dates returned as objects of type DateTime?

Because such objects have the sophistication required to handle such a complex topic.

See DateTime and http://datetime.perl.org/wiki/datetime/dashboard for details.

What happens if parse_date_period() is given a string like 'From 2000 to 1999'?

Then the returned hashref will have:

o one => '2000-01-01T00:00:00'
o two => '1999-01-01T00:00:00'

Clearly then, the code does not reorder the dates.

Why was this module renamed from DateTime::Format::Gedcom?

The DateTime suite of modules aren't designed, IMHO, for GEDCOM-like applications. It was a mistake to use that name in the first place.

By releasing under the Genealogy::Gedcom::* namespace, I can be much more targeted in the data types I choose as method return values.

Why did you choose Hash::FieldHash over Moose?

My policy is to use the lightweight Hash::FieldHash for stand-alone modules and Moose for applications.

TODO

o Comparisons between dates

Sample code to overload '<' and '>' is in Gedcom::Date.

o Handle Gregorian years of the form 1699/00

See p. 65 of the GEDCOM Specification Ged551-5.pdf.

See Also

Genealogy::Gedcom.

Gedcom::Date.

References

See "References" in Genealogy::Gedcom::Reader::Lexer.

Machine-Readable Change Log

The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom::Date.

Thanx

Thanx to Eugene van der Pijll, the author of the Gedcom::Date::* modules.

Thanx also to the authors of the DateTime::* family of modules. See http://datetime.perl.org/wiki/datetime/dashboard for details.

Author

Genealogy::Gedcom::Date was written by Ron Savage <ron@savage.net.au> in 2011.

Home page: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2011, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html