Lingua::EN::GivenNames::Database::Import - An SQLite database of derivations of English given names
See "Synopsis" in Lingua::EN::GivenNames for a long synopsis.
See also "How do the scripts and modules interact to produce the data?" in Lingua::EN::GivenNames.
Documents the methods used to populate the SQLite database, lingua.en.givennames.sqlite, which ships with this distro.
See "Description" in Lingua::EN::GivenNames for a long description.
Also, it's vital you study "How do the scripts and modules interact to produce the data?" in Lingua::EN::GivenNames. See also scripts/import.sh for the order in which they must be run.
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules.html for details.
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.
new(...) returns an object of type Lingua::EN::GivenNames::Database::Import.
Lingua::EN::GivenNames::Database::Import
This is the class's contructor.
Usage: Lingua::EN::GivenNames::Database::Import -> new().
Lingua::EN::GivenNames::Database::Import -> new()
This module is a sub-class of Lingua::EN::GivenNames::Database and consequently inherits its methods.
Extract the derivations from 1 page of either female or male English given names, and write them to data/derivations.raw.
This file is opened during each method call in append mode ('>>'), meaning if you wish to start from scratch, that file must be deleted before scripts/extract.derivations.pl is run. See scripts/import.sh for details.
Since the input data/*.htm files contain data in alphabetical order (usually), the output is also in order.
The output file is processed by parse_derivations().
Returns 0 to indicate success.
Takes a hashref, $item, and constructs a string which is the derivation of the given name whose components are the values of various keys in this hashref.
The string returned depends on which regexp was used to parse the input.
See "FAQ" in Lingua::EN::GivenNames for details.
Reads the file data/derivations.csv created by sub parse_derivations() by calling read_derivations().
It checks for duplicate records, and then writes all the data to the appropriate database tables.
See "Constructor and initialization".
Reads the file data/derivations.raw created by sub extract_derivations(), applies a set of regexps to each line, and writes data/derivations.csv.
Mismatches are written to data/mismatches.log, and a 1-line report is written to data/parse.log.
Clearly, this is where most of the work takes place.
This method is called by sub import_derivations(). It reads and validates data/derivations.raw.
Also, this method checks to ensure no data is missing, which would indicate a programming error in the handling of the output from the regexp processing phase.
Returns an arrayref.
$file_name is the file currently being processed (data/derivations.csv), and is used for error messages.
$derivation is a hashref keyed by columns in the input file, so unique entries in each column can be checked.
This method is called by sub read_derivations(). It performs a simple reasonableness check on each input line, and also logs, at level notice, all non-ASCII names.
$table is the name of the table to write, which is always names.
$derivation is an arrayref of derivations to write.
$foreign_key is a hashref of primary keys returned by "write_table($table, $item)" for each table other than the names table.
Called by sub import_derivations() and writes the names table.
$table is the name of the table to write.
$item is an arrayref of values to write.
Called by sub import_derivations() and writes all tables except the names table.
Returns a hashref of primary key ids for use as foreign keys when the names table is written.
See "FAQ" in Lingua::EN::GivenNames.
The regexps in sub parse_derivations() split each line of data/derivations.raw into these fields, when using the regexp called 'a':
These fields are described in "FAQ" in Lingua::EN::GivenNames. Other regexps have similar outputs.
1) 'male. ALLISTAIR: Anglicized form of Scottish Gaelic Alastair, meaning "defender of mankind."' becomes the hashref (with keys in alphabetical order, and text from data/derivations.raw):
{ form => 'form', kind => 'Anglicized', meaning => 'defender of mankind', name => 'ALLISTAIR', original => 'Alastair', rating => 'meaning', sex => 'male', source => 'Scottish Gaelic', }
The derivation is: Anglicized form of Scottish Gaelic Alastair, meaning "defender of mankind".
2) 'male. ANTONY: Variant spelling of English Anthony, possibly meaning "invaluable."' becomes:
{ form => 'spelling', kind => 'Variant', meaning => 'invaluable', name => 'ANTONY', original => 'Anthony', rating => 'possibly meaning', sex => 'male', source => 'English', }
The derivation is: Variant spelling of English Anthony, possibly meaning "invaluable".
In each case the derivation is built by sub generate_derivation($item) as:
qq|$$item{kind} $$item{form} of $$item{source} $$item{original}, $$item{rating} $$item{meaning}|
3) 'female. ANTONIA: Feminine form of Roman Latin Antonius, possibly meaning "invaluable." In use by the English, Italians and Spanish. Compare with another form of Antonia.' becomes:
{ form => 'form', kind => 'Feminine', meaning => 'invaluable', name => 'ANTONIA', original => 'Anthony', rating => 'possibly meaning', sex => 'female', source => 'Roman Latin', }
The derivation is: Feminine form of Roman Latin Antonius, possibly meaning "invaluable".
The derivation is built by sub generate_derivation($item) as:
4) 'male. HENGIST: Old English name meaning "stallion." In English legend, this is the name of the brother of Horsa, and ruler of Kent. In Arthurian legend, he was killed by Uther Pendragon.' becomes:
{ form => 'name', kind => 'Old English', meaning => 'stallion', name => 'HENGIST', original => '-', rating => 'meaning', sex => 'male', source => '-', }
The derivation is: Old English name, meaning "stallion".
qq|$$item{kind} $$item{form}, $$item{rating} $$item{meaning}|
5) 'female. PRU: Short form of English Prudence "cautious" and Prunella "little prune."' becomes:
{ form => 'form', kind => 'Short', meaning => '"cautious" and Prunella "little prune"', name => 'PRU', original => 'Prudence', rating => 'meaning', sex => 'female', source => 'English', }
The derivation is: Short form of English Prudence, meaning "cautious" and Prunella "little prune".
See "References" in Lingua::EN::GivenNames.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Lingua::EN::GivenNames.
Lingua::EN::GivenNames was written by Ron Savage <ron@savage.net.au> in 2012.
Lingua::EN::GivenNames
Home page: http://savage.net.au/index.html.
Australian copyright (c) 2012 Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
To install Lingua::EN::GivenNames, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::EN::GivenNames
CPAN shell
perl -MCPAN -e shell install Lingua::EN::GivenNames
For more information on module installation, please visit the detailed CPAN module installation guide.