
I18N::LangTags::List -- tags and names for human languages

use I18N::LangTags::List;
print "Parlez-vous... ", join(', ',
I18N::LangTags::List::name('elx') || 'unknown_language',
I18N::LangTags::List::name('ar-Kw') || 'unknown_language',
I18N::LangTags::List::name('en') || 'unknown_language',
I18N::LangTags::List::name('en-CA') || 'unknown_language',
), "?\n";
prints:
Parlez-vous... Elamite, Kuwait Arabic, English, Canadian English?

This module provides a function I18N::LangTags::List::name( langtag ) that takes a language tag (see I18N::LangTags) and returns the best attempt at an English name for it, or undef if it can't make sense of the tag.
The function I18N::LangTags::List::name(...) is not exported.
This module also provides a function I18N::LangTags::List::is_decent( langtag ) that returns true iff the language tag is syntactically valid and is for general use (like "fr" or "fr-ca", below). That is, it returns false for tags that are syntactically invalid and for tags, like "aus", that are listed in brackets below. This function is not exported.
The map of tags-to-names that it uses is accessable as %I18N::LangTags::List::Name, and it's the same as the list that follows in this documentation, which should be useful to you even if you don't use this module.

Internet language tags, as defined in RFC 3066, are a formalism for denoting human languages. The two-letter ISO 639-1 language codes are well known (as "en" for English), as are their forms when qualified by a country code ("en-US"). Less well-known are the arbitrary-length non-ISO codes (like "i-mingo"), and the recently (in 2001) introduced three-letter ISO-639-2 codes.
Remember these important facts:
m/^\w\w_\w\w\b/, and means something different than a language tag. A language tag denotes a language. A locale ID denotes a language as used in a particular place, in combination with non-linguistic location-specific information such as what currency is used there. Locales also often denote character set information, as in "en_US.ISO8859-1"..jp for Japan.
The first part of each item is the language tag, between {...}. It is followed by an English name for the language or language-group. Language tags that I judge to be not for general use, are bracketed.
This list is in alphabetical order by English name of the language.
eq Abkhaz
eq Adygei
(Artificial)
(Formerly "aka".)
(Historical)
NOT Algonquin!
NOT Aramaic!
eq Amis. eq 'Amis. eq Pangca.
Many forms are mutually un-intelligible in spoken media. Notable forms: {ar-ae} UAE Arabic; {ar-bh} Bahrain Arabic; {ar-dz} Algerian Arabic; {ar-eg} Egyptian Arabic; {ar-iq} Iraqi Arabic; {ar-jo} Jordanian Arabic; {ar-kw} Kuwait Arabic; {ar-lb} Lebanese Arabic; {ar-ly} Libyan Arabic; {ar-ma} Moroccan Arabic; {ar-om} Omani Arabic; {ar-qa} Qatari Arabic; {ar-sa} Sauda Arabic; {ar-sy} Syrian Arabic; {ar-tn} Tunisian Arabic; {ar-ye} Yemen Arabic.
NOT Amharic! NOT Samaritan Aramaic!
eq Bable.
eq Athabaskan. eq Athapaskan. eq Athabascan.
(Formerly "ava".)
eq Zend
eq Azeri
Notable forms: {az-Arab} Azerbaijani in Arabic script; {az-Cyrl} Azerbaijani in Cyrillic script; {az-Latn} Azerbaijani in Latin script.
(Formerly "bam".)
eq Belarussian. eq Byelarussian. eq Belorussian. eq Byelorussian. eq White Russian. eq White Ruthenian. NOT Ruthenian!
eq Bangla.
eq Bichelamar.
eq Catalán. eq Catalonian.
Notable forms: {cel-gaulish} Gaulish (Historical)
(Historical?)
eq Tsalagi
(Historical) NOT Chibchan (which is a language family).
eq Nyanja. eq Chinyanja.
Many forms are mutually un-intelligible in spoken media. Notable forms: {zh-Hans} Chinese, in simplified script; {zh-Hant} Chinese, in traditional script; {zh-tw} Taiwan Chinese; {zh-cn} PRC Chinese; {zh-sg} Singapore Chinese; {zh-mo} Macau Chinese; {zh-hk} Hong Kong Chinese; {zh-guoyu} Mandarin [Putonghua/Guoyu]; {zh-hakka} Hakka [formerly "i-hakka"]; {zh-min} Hokkien; {zh-min-nan} Southern Hokkien; {zh-wuu} Shanghaiese; {zh-xiang} Hunanese; {zh-gan} Gan; {zh-yue} Cantonese.
eq Chinook Wawa.
eq Old Church Slavonic.
eq Trukese. eq Chuuk. eq Truk. eq Ruk.
eq Corse.
NOT Creek! (Formerly "cre".)
NOT Cree!
eq Croat.
eq Nakota. eq Latoka.
Defined in RFC 2277, this is for tagging text (which must include English text, and might/should include text in other appropriate languages) that is emitted in a context where language-negotiation wasn't possible -- in SMTP mail failure messages, for example.
eq Maldivian. (Formerly "div".)
NOT Dogrib!
NOT Dogri!
eq Netherlander. Notable forms: {nl-nl} Netherlands Dutch; {nl-be} Belgian Dutch.
(Historical)
(Historical)
(Historical)
Notable forms: {en-au} Australian English; {en-bz} Belize English; {en-ca} Canadian English; {en-gb} UK English; {en-ie} Irish English; {en-jm} Jamaican English; {en-nz} New Zealand English; {en-ph} Philippine English; {en-tt} Trinidad English; {en-us} US English; {en-za} South African English; {en-zw} Zimbabwe English.
(Historical)
eq Anglo-Saxon. (Historical)
(Artificial)
(Formerly "ewe".)
eq Finno-Ugric. NOT Ugaritic!
Notable forms: {fr-fr} France French; {fr-be} Belgian French; {fr-ca} Canadian French; {fr-ch} Swiss French; {fr-lu} Luxembourg French; {fr-mc} Monaco French.
(Historical)
(Historical)
(Formerly "ful".)
NOT Scots!
eq Galician
(Formerly "lug".)
eq Ge'ez
Notable forms: {de-at} Austrian German; {de-be} Belgian German; {de-ch} Swiss German; {de-de} Germany German; {de-li} Liechtenstein German; {de-lu} Luxembourg German.
(Historical)
(Historical)
(Historical)
(Historical) (Until 15th century or so.)
(Since 15th century or so.)
Guaraní
eq Gwichin
eq Haitian Creole
Hawai'ian
(Formerly "iw".)
(Historical)
(Artificial)
(Formerly "ibo".)
(Formerly "in".)
(Artificial) NOT Interlingue!
(Artificial) NOT Interlingua!
A subform of "Eskimo".
A subform of "Eskimo".
(Historical)
(Historical)
Notable forms: {it-it} Italy Italian; {it-ch} Swiss Italian.
(NOT "jp"!)
(Formerly "jw" because of a typo.)
eq Greenlandic "Eskimo"