Genealogy::Gedcom::Date - Parse GEDCOM dates in French r/German/Gregorian/Hebrew/Julian
A script (scripts/synopsis.pl):
#!/usr/bin/env perl use strict; use warnings; use Genealogy::Gedcom::Date; # -------------------------- sub process { my($count, $parser, $date) = @_; print "$count: $date: "; my($result) = $parser -> parse(date => $date); print "Canonical date @{[$_ + 1]}: ", $parser -> canonical_date($$result[$_]), ". \n" for (0 .. $#$result); print 'Canonical form: ', $parser -> canonical_form($result), ". \n"; print "\n"; } # End of process. # -------------------------- my($parser) = Genealogy::Gedcom::Date -> new(maxlevel => 'debug'); process(1, $parser, 'Julian 1950'); process(2, $parser, '@#dJulian@ 1951'); process(3, $parser, 'From @#dJulian@ 1952 to Gregorian 1953/54'); process(4, $parser, 'From @#dFrench r@ 1955 to 1956'); process(5, $parser, 'From @#dJulian@ 1957 to German 1.Dez.1958');
One-liners:
perl scripts/parse.pl -max debug -d 'Between Gregorian 1701/02 And Julian 1703'
Output:
Return value from parse(): [ { canonical => "1701/02", flag => "BET", kind => "Date", suffix => "02", type => "Gregorian", year => 1701 }, { canonical => "\@#dJULIAN\@ 1703", flag => "AND", kind => "Date", type => "Julian", year => 1703 } ] perl scripts/parse.pl -max debug -d 'Int 10 Nov 1200 (Approx)'
[ { canonical => "10 Nov 1200 (Approx)", day => 10, flag => "INT", kind => "Date", month => "Nov", phrase => "(Approx)", type => "Gregorian", year => 1200 } ] perl scripts/parse.pl -max debug -d '(Unknown)'
Return value from parse(): [ { canonical => "(Unknown)", kind => "Phrase", phrase => "(Unknown)", type => "Phrase" } ]
See the "FAQ" for the explanation of the output arrayrefs.
See also scripts/parse.pl and scripts/compare.pl for sample code.
Lastly, you are strongly encouraged to peruse t/*.t.
Genealogy::Gedcom::Date provides a Marpa-based parser for GEDCOM dates.
Calender escapes supported are (case-insensitive): French r/German/Gregorian/Hebrew/Julian.
Gregorian is the default, and does not need to be used at all.
Comparison of 2 Genealogy::Gedcom::Date-based objects is supported by calling the sub "compare($other_object)" method on one object and passing the other object as the parameter.
Genealogy::Gedcom::Date
Note: compare() can return any one of four (4) values.
compare()
See the GEDCOM Specification, p 45.
Install Genealogy::Gedcom::Date as you would for any Perl module:
Perl
Run:
cpanm Genealogy::Gedcom::Date
or run:
sudo cpan Genealogy::Gedcom::Date
or unpack the distro, and then either:
perl Build.PL ./Build ./Build test sudo ./Build install
or:
perl Makefile.PL make (or dmake or nmake) make test make install
new() is called as my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...).
new()
my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...)
It returns a new object of type Genealogy::Gedcom::Date.
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. "date([$date])"]):
Note: Nothing is printed unless maxlevel is set to debug.
maxlevel
debug
Data::Dumper::Concise's Dumper() prints the output of the parse.
canonical_form() is called on the output of parse() to print a string.
canonocal_date() is called on each element in the result from parse(), to print strings on separate lines.
Default: 0.
The string to be parsed.
Each ',' is replaced by a space. See the "FAQ" for details.
Default: ''.
Specify a logger compatible with Log::Handler, for the lexer and parser to use.
Default: A logger of type Log::Handler which writes to the screen.
To disable logging, just set 'logger' to the empty string (not undef).
This option affects Log::Handler.
See the Log::Handler::Levels docs.
By default nothing is printed.
Typical values are: 'error', 'notice', 'info' and 'debug'.
The default produces no output.
Default: 'notice'.
Default: 'error'.
No lower levels are used.
Note: The parameters canonical and date can also be passed to "parse([%args])".
canonical
date
Here, the [] indicate an optional parameter.
Gets or sets the canonical option, which controls what exactly "parse([%args])" prints when "maxlevel([$string])" is set to debug.
See "canonical_date($hashref)", next, for sample code.
$hashref is either element of the arrayref returned by "parse([%args])". The hashref may be empty.
Returns a date string (or the empty string) normalized in various ways:
This is done because it's the default.
And it's output in all caps.
And as a special case, 'FRENCHR' is returned as 'FRENCH R'.
This means the flag key in the hashref is ignored.
flag
Note: This method is called by "parse([%args])" to populate the canonical key in the arrayref of hashrefs returned by parse().
parse()
Try:
perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 0 perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 1 perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 2
Returns a date string containing zero, one or two dates.
This method calls "canonical_date($hashref)" for each element in the $arrayref. The arrayref may be empty.
Then it adds information from the flag key in each element, if present.
For sample code, see "canonical_date($hashref)" just above.
Returns an integer 0 .. 3 (sic) indicating the temporal relationship between the invoking object ($self) and $other_object.
Returns one of these values:
0 if the dates have different date escapes. 1 if $date_1 < $date_2. 2 if $date_1 = $date_2. 3 if $date_1 > $date_2.
Note: Gregorian years like 1510/02 are converted into 1510 before the dates are compared. Create a sub-class and override "normalize_date($date_hash)" if desired.
See scripts/compare.pl for sample code.
See also "normalize_date($date_hash)".
Here, [ and ] indicate an optional parameter.
Gets or sets the date to be parsed.
The date in parse(date => $date) takes precedence over both new(date => $date) and date($date).
parse(date => $date)
new(date => $date)
date($date)
This means if you call parse() as parse(date => $date), then the value $date is stored so that if you subsequently call date(), that value is returned.
$date
date()
Note: date is a parameter to new().
Gets the last error message.
Returns '' (the empty string) if there have been no errors.
If Marpa::R2 throws an exception, it is caught by a try/catch block, and the Marpa error is returned by this method.
Marpa
See "parse([%args])" for more about error().
error()
If a logger is defined, this logs the message $s at level $level.
Get or set the logger object.
To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".
This logger is passed to other modules.
'logger' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the value used by the logger object.
This option is only used if an object of type Log::Handler is ceated. See Log::Handler::Levels.
Typical values are: 'notice', 'info' and 'debug'. The default, 'notice', produces no output.
The code emits a message with log level 'error' if Marpa throws an exception, and it displays the result of the parse at level 'debug' if maxlevel is set that high. The latter display uses Data::Dumper::Concise's function Dumper().
Dumper()
'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.
'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
The constructor. See "Constructor and Initialization".
Normalizes $date_hash for each date during a call to "compare($other_object)".
Override in a sub-class if you wish to change the normalization technique.
parse() returns an arrayref. See the "FAQ" for details.
If the arrayref is empty, call "error()" to retrieve the error message.
In particular, the arrayref will be empty if the input date is the empty string.
parse() takes the same parameters as new().
Warning: The array can contain 1 element when 2 are expected. This can happen if your input contains 'From ... To ...' or 'Between ... And ...', and one of the dates is invalid. That is, the return value from parse() will contain the valid date but no indicator of the invalid one.
This chapter lists exactly how this code differs from the Gedcom spec.
vc, v.c., v.chr., vchr, vuz, v.u.z.
jan, feb, mär, maer, mrz, apr, mai, jun, jul, aug, sep, sept, okt, nov, dez
jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec
tsh, csh, ksl, tvt, shv, adr, ads, nsn, iyr, svn, tmz, aav, ell
It is always an arrayref.
If the date is like '1950' or 'Bef 1950 BCE', there will be 1 element in the arrayref.
If the date contains both 'From' and 'To', or both 'Between' and 'And', then the arrayref will contain 2 elements.
Each element is a hashref, with various combinations of the following keys. You need to check the existence of some keys before processing the date.
This means missing values (day, month, bce) are never fabricated. These keys only appear in the hashref if such a token was found in the input.
Keys:
If the input contains any (case-insensitive) BCE indicator, under any calendar escape, the bce key will hold the exact indicator.
bce
"parse([%args])" calls "canonical_date($hashref)" to populate this key.
If the input contains a day, then the day key will be present.
day
If the input contains any of the following (case-insensitive), then the flag key will be present:
$string will take one of these values (case-sensitive):
The kind key is always present, and always takes the value 'Date' or 'Phrase'.
kind
If the value is 'Phrase', see the phrase and type keys.
phrase
type
During processing, there can be another - undocumented - element in the arrayref. It represents the calendar escape, and in that case kind takes the value 'Calendar'. This element is discarded before the final arrayref is returned to the caller.
If the input contains a month, then the month key will be present. The case of $string will be exactly whatever was in the input.
month
If the input contains a date phrase, then the phrase key will be present. The case of $string will be exactly whatever was in the input.
parse(date => 'Int 10 Nov 1200 (Approx)') returns:
[ { day => 10, flag => "INT", kind => "Date", month => "Nov", phrase => "(Approx)", type => "Gregorian", year => 1200 } ]
parse(date => '(Unknown)') returns:
[ { kind => "Phrase", phrase => "(Unknown)", type => "Phrase" } ]
See also the kind and type keys.
If the year contains a suffix (/00), then the suffix key will be present. The '/' is discarded.
suffix
Obviously, this key can only appear when the year is of the Gregorian form 1700/00.
See also the year key below.
year
The type key is always present, and takes one of these case-sensitive values:
See also the kind and phrase keys.
If the input contains a year, then the year key is present.
If the year contains a suffix (/00), see also the suffix key, above. This means the value of the year key is never "$integer/$two_digits".
In practice, if the month name is unique to a specific language, then the escape is not needed, since Marpa::R2 and this code automatically handle ambiguity.
Likewise, if you use a Gregorian year in the form 1700/01, then the calendar escape is obvious.
The escape is, of course, always inserted into the values returned by the canonical pair of methods when they process non-Gregorian dates. That makes their output compatible with other software. And no matter what case you use specifying the calendar escape, it is always output in upper-case.
All Gregorian and Julian dates are ambiguous, unless they use the year format 1700/01.
So, to resolve the ambiguity, add the calendar escape.
That's just how that module handles '@'.
Yes.
See t/German.t for sample code.
No. It is always Gregorian.
Yes. Commas are replaced by spaces.
See "Extensions to the Gedcom specification".
The code does not reorder the dates.
The DateTime suite of modules aren't designed, IMHO, for GEDCOM-like applications. It was a mistake to use that name in the first place.
By releasing under the Genealogy::Gedcom::* namespace, I can be much more targeted in the data types I choose as method return values.
My policy is to use the lightweight Moo for all modules and applications.
Things to consider:
Consider the possibility that the parse ends without a successful parse, but the input is the prefix of some input that can lead to a successful parse.
successful
can
Marpa is not reporting a problem during the read(), because you can add more to the input string, and Marpa does not know that you do not plan to do this.
Read more about this by running 'perl scripts/parse.pl -h', where it discusses '-d'.
Check: Are any of these valid?
Yes, the last 3 are accepted by this module, and the last one is accepted by other software.
Dates - such as 1900/01 - which do not fit the Gedcom definition of a Julian year, are filtered out.
File::Bom::Utils.
Genealogy::Gedcom
DateTime
DateTimeX::Lite
Time::ParseDate
Time::Piece is in Perl core. See http://perltricks.com/article/59/2014/1/10/Solve-almost-any-datetime-need-with-Time-Piece
Time::Duration is more sophisticated than Time::Elapsed
Time::Moment implements ISO 8601
http://blogs.perl.org/users/buddy_burden/2015/09/a-date-with-cpan-part-1-state-of-the-union.html
http://blogs.perl.org/users/buddy_burden/2015/10/a-date-with-cpan-part-2-target-first-aim-afterwards.html
http://blogs.perl.org/users/buddy_burden/2015/10/-a-date-with-cpan-part-3-paving-while-driving.html
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
https://github.com/ronsavage/Genealogy-Gedcom-Date.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom::Date.
Thanx to Eugene van der Pijll, the author of the Gedcom::Date::* modules.
Thanx also to the authors of the DateTime::* family of modules. See http://datetime.perl.org/wiki/datetime/dashboard for details.
Thanx for Mike Elston on the perl-gedcom mailing list for providing French month abbreviations, amongst other information pertaining to the French language.
Thanx to Michael Ionescu on the perl-gedcom mailing list for providing the grammar for German dates and German month abbreviations.
Genealogy::Gedcom::Date was written by Ron Savage <ron@savage.net.au> in 2011.
Homepage: http://savage.net.au/index.html.
Australian copyright (c) 2011, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Perl License, a copy of which is available at: http://dev.perl.org/licenses/
To install Genealogy::Gedcom::Date, copy and paste the appropriate command in to your terminal.
cpanm
CPAN shell
perl -MCPAN -e shell install Genealogy::Gedcom::Date
For more information on module installation, please visit the detailed CPAN module installation guide.