
LaTeX::Authors - Perl extension to extract authors and laboratories in a LaTeX file

Extraction from a string with latex commands:
use LaTeX::Authors;
use strict;
my $tex_string = "\documentclass...";
my @article = router($tex_string);
my $string_xml = string_byauthors_xml(@article);
print $string_xml;
Extraction from a latex file:
use LaTeX::Authors;
use strict;
my $file = shift;
my $tex_string = load_file_string($file);
my @article = router($tex_string);
my $string_xml = string_byauthors_xml(@article);
print $string_xml;
Extraction from a directory with latex files:
use LaTeX::Authors;
use strict;
my $directory = shift;
#my $error= un_archive($directory);
my $file = find_tex_file($directory);
my $tex_string = load_file_string($file);
my @article = router($tex_string);
my $string_xml = string_byauthors_xml(@article);
print $string_xml;

LaTeX::Authors try to find the authors and laboratories in a LaTex file. The output is an xml or html string. This is an example of the xml output:
<article> <item> <author>author1</author> <labo>lab1</labo> <labo>lab2</labo> </item> <item> ... </item> </article>
The module try to found something like the \author and \affiliation latex command on the file. With articles about physics try to found a collaboration name to work with more exotic way to show authors list. It is especially design for article about physics where there is hundreds of authors.
It can work on input with: - an archiv file (tar, zip...), it's useful for arXiv file (function un_archiv) - a directory with latex file (function find_tex_file) - a latex file (function load_file_string) - a string (function router)
For the output it can produce: - an xml string - by author: author1 lab1 lab2 (string _byauthors_xml) - by laboratory: lab1 author1 author2 (string_bylabs_xml) - an html string - by author (string_byauthors_html) - by lab (string_bylabs_html)

un_archive - uncompress, untar or unzip file in a directoryTake the archive file and uncompress (useful for arXiv files)
my $error = un_archive($directory);
find_tex_file - Try to find the main tex file on a directory with multiple filesmy $texfile = find_tex_file($directory);
load_file_string - Load a file and put the content to a stringmy $string = load_file_string($file);
Also delete the latex comments (%...).
router - Try to qelect the good function to extract the authors and laboratories and return an array with the authors and the laboratories in the latex file.@article = router($string);
found_collaboration - Try to found a collaboration nameUseful for physics articles whrere there often a collaboration name. The authors list format can be found with the collaboration name. Used by the router function.
delete_comment - Delete tex comment (%) on a stringmy $string_out = delete_comment($string_in);
bichop - Double end chopWith
my $string_in = bichop("{aaa}")
in $string_in there is:
"aaa"
greplatexcom - To get all the ocurrences of a latex command @l_section = greplatexcom("section",["title"],$string);
for $s (@l_section) {print $s->{title} };
Optional arguments can be described with "[name]". See this example:
@class = greplatexcom("documentclass",[["args"],"class"],$string);
print $class[0]->{class} ;
With \documentclass[xyz]{abc}
$class[0]->{args} = xyz
$class[0]->{class} = abc
theenv - To get a latex environment contents $abstract_string = theenv("abstract",$string);
theenv returns the contents of the environment "abstract".
For example if:
$string ="\begin{abstract} xyz... \end{abstract}";
after theenv in $abstract_string there is the string:
xyz...
theenvs - To get all the latex environments contents @array = theenvs("sloppypar",$string);
theenvs returns the contents of all the environment "sloopypar".
greplatexenv - To get all ocurrences of a latex environment @a = greplatexenv("letter",["to"],$string) ;
greplatexenv returns a list of all the ocurrences of environment "letter", reading its first argument to the "to" field and saving its content in the "env" field;
newcommand - Return a hash with all the "newcommand" occurrences%listnewcom = newcommand($string);
If you have
$string="\newcommand[xyz]{abc}";
so after newcommand:
$listnewcom{xyz} = "abc";
list_index - Return a hash with all the command occurencesFor example with:
my $command_name = "command"; %list = list_index($command_name,$string);
\command[index]{xyz...} -> $list{index} = "xyz...";
Generalize the function newcommand with any command.
accent - Transform the latex caracters with accent to standard caractersmy $string_out = accent($string_in);
string_byauthors_xml - Retrun a string with xml tags all the authors and lab found in an article my $string = string_byauthors_xml(@article);
<article>
<item>
<author>author1</author>
<labo>lab1</labo>
<labo>lab2</labo>
</item>
<item>
...
</item>
</article>
string_onlyauthors_xml - Retrun a string with xml tags all the authors found in an article my $string = string_onlyauthors_xml(@article);
<article>
<author>author1</author>
<author>author2</author>
...
</article>
author_to_lab - Convert the author array to a lab arraymy @array_lab = author_to_lab(@array_author);
(author1, lab1, lab2)(author2, lab1, lab3) -> (lab1,author1,author2)(lab2,author1)(lab3,author2)
string_bylabs_xml - Return a string with xml tags all the lab and authors found in an articlemy $xml_string = string_bylabs_xml(@article);
<article>
<item>
<labo>lab1</labo>
<author>authors1</author>
<author>authors2</author>
</item>
<item>
...
</item>
</article>
string_onlylabs_xml - Return a string with xml tags all the lab found in an articlemy $string = string_onlylabs_xml(@article);
<article>
<labo>lab1</labo>
<labo>lab2</labo>
...
</article>
string_byauthors_html - Return a string with all the authors and lab using html tagsmy $string_out = string_by_authors_html(@article);
<hr>
author1
<p>
<ul>
<li> lab1
<li> lab2
</ul>
<p>
string_bylabs_html - PrintReturn a string with all the laboratories with authors using html tags <hr>
lab1
<p>
<ul>
<li> author1
<li> author2
</ul>
<p>

Christian Rossi (<rossi@in2p3.fr> and <rossi@loria.fr>)
