The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

LaTeX::Authors - Perl extension to extract authors and laboratories in a LaTeX file

SYNOPSIS

Extraction from a string with latex commands:

        use LaTeX::Authors;
        use strict;        
        my $tex_string = "\documentclass...";
        my @article = router($tex_string);
        my $string_xml =  string_byauthors_xml(@article);
        print $string_xml;
        
        

Extraction from a latex file:

 use LaTeX::Authors;
        use strict;        
        my $file = shift;
        my $tex_string = load_file_string($file);
        my @article = router($tex_string);
        my $string_xml =  string_byauthors_xml(@article);
        print $string_xml;
        
        

Extraction from a directory with latex files:

        use LaTeX::Authors;
        use strict;        
        my $directory = shift;
        #my $error= un_archive($directory);
        my $file = find_tex_file($directory);
        my $tex_string = load_file_string($file);
        my @article = router($tex_string);
        my $string_xml =  string_byauthors_xml(@article);
        print $string_xml;

                

DESCRIPTION

LaTeX::Authors try to find the authors and laboratories in a LaTex file. The output is an xml or html string. This is an example of the xml output:

<article> <item> <author>author1</author> <labo>lab1</labo> <labo>lab2</labo> </item> <item> ... </item> </article>

The module try to found something like the \author and \affiliation latex command on the file. With articles about physics try to found a collaboration name to work with more exotic way to show authors list. It is especially design for article about physics where there is hundreds of authors.

It can work on input with: - an archiv file (tar, zip...), it's useful for arXiv file (function un_archiv) - a directory with latex file (function find_tex_file) - a latex file (function load_file_string) - a string (function router)

For the output it can produce: - an xml string - by author: author1 lab1 lab2 (string _byauthors_xml) - by laboratory: lab1 author1 author2 (string_bylabs_xml) - an html string - by author (string_byauthors_html) - by lab (string_bylabs_html)

FUNCTION

un_archive - uncompress, untar or unzip file in a directory

Take the archive file and uncompress (useful for arXiv files)

my $error = un_archive($directory);

find_tex_file - Try to find the main tex file on a directory with multiple files

my $texfile = find_tex_file($directory);

load_file_string - Load a file and put the content to a string

my $string = load_file_string($file);

Also delete the latex comments (%...).

router - Try to qelect the good function to extract the authors and laboratories and return an array with the authors and the laboratories in the latex file.

@article = router($string);

found_collaboration - Try to found a collaboration name

Useful for physics articles whrere there often a collaboration name. The authors list format can be found with the collaboration name. Used by the router function.

delete_comment - Delete tex comment (%) on a string

my $string_out = delete_comment($string_in);

bichop - Double end chop

With

my $string_in = bichop("{aaa}")

in $string_in there is:

"aaa"

greplatexcom - To get all the ocurrences of a latex command

   @l_section = greplatexcom("section",["title"],$string);
   for $s (@l_section) {print $s->{title} };

Optional arguments can be described with "[name]". See this example:

   @class = greplatexcom("documentclass",[["args"],"class"],$string);
   print $class[0]->{class} ;

With \documentclass[xyz]{abc}

  $class[0]->{args} = xyz
  $class[0]->{class} = abc

theenv - To get a latex environment contents

   $abstract_string = theenv("abstract",$string);

theenv returns the contents of the environment "abstract".

For example if:

$string ="\begin{abstract} xyz... \end{abstract}";

after theenv in $abstract_string there is the string:

xyz...

theenvs - To get all the latex environments contents

   @array = theenvs("sloppypar",$string);

theenvs returns the contents of all the environment "sloopypar".

greplatexenv - To get all ocurrences of a latex environment

   @a = greplatexenv("letter",["to"],$string) ; 

greplatexenv returns a list of all the ocurrences of environment "letter", reading its first argument to the "to" field and saving its content in the "env" field;

newcommand - Return a hash with all the "newcommand" occurrences

%listnewcom = newcommand($string);

If you have

$string="\newcommand[xyz]{abc}";

so after newcommand:

$listnewcom{xyz} = "abc";

list_index - Return a hash with all the command occurences

For example with:

my $command_name = "command"; %list = list_index($command_name,$string);

 \command[index]{xyz...} -> $list{index} = "xyz...";

Generalize the function newcommand with any command.

accent - Transform the latex caracters with accent to standard caracters

my $string_out = accent($string_in);

string_byauthors_xml - Retrun a string with xml tags all the authors and lab found in an article

 my $string = string_byauthors_xml(@article);

 <article>
   <item>
      <author>author1</author>
      <labo>lab1</labo>
      <labo>lab2</labo>
   </item>
   <item> 
     ...  
   </item>   
 </article>

string_onlyauthors_xml - Retrun a string with xml tags all the authors found in an article

 my $string = string_onlyauthors_xml(@article);
 
 <article>
     <author>author1</author>
     <author>author2</author>
     ...   
 </article>

author_to_lab - Convert the author array to a lab array

my @array_lab = author_to_lab(@array_author);

(author1, lab1, lab2)(author2, lab1, lab3) -> (lab1,author1,author2)(lab2,author1)(lab3,author2)

string_bylabs_xml - Return a string with xml tags all the lab and authors found in an article

my $xml_string = string_bylabs_xml(@article);

 <article>
   <item>
      <labo>lab1</labo>
      <author>authors1</author>
      <author>authors2</author>
   </item>  
   <item>   
     ...  
   </item>  
 </article>  

string_onlylabs_xml - Return a string with xml tags all the lab found in an article

my $string = string_onlylabs_xml(@article);

 <article>
     <labo>lab1</labo>
     <labo>lab2</labo> 
     ...  
 </article> 

string_byauthors_html - Return a string with all the authors and lab using html tags

my $string_out = string_by_authors_html(@article);

                <hr>
                author1 
                <p>
                <ul>
                  <li> lab1
                  <li> lab2
                </ul>
                <p>
                 

string_bylabs_html - PrintReturn a string with all the laboratories with authors using html tags

                <hr>
                lab1 
                <p>
                <ul>
                  <li> author1
                  <li> author2
                </ul>
                <p>
                

AUTHOR

Christian Rossi (<rossi@in2p3.fr> and <rossi@loria.fr>)

SEE ALSO

perl, latex, Text::Balanced.