Wray Buntine > Alvis-Convert-0.4 > wikipedia2alvis.pl

Download:
Alvis-Convert-0.4.tar.gz

Annotate this POD

View/Report Bugs
Source  

NAME ^

    wikipedia2alvis.pl - Wikipedia XML dump to Alvis XML converter

SYNOPSIS ^

    wikipedia2alvis.pl [options] [Wikipedia XML dump file]

  Options:

    --out-dir                      output directory
    --namespaces                   list of namespaces to extract
    --N-per-out-dir                # of records per output directory
    --[no-]original                include original document?
    --[no-]expand-templates-fully  do we try to expand templates fully?
    --[no-]dump-templates          do we dump the templates?
    --template-dump-file           the file to dump the templates to
    --[no-]convert-via-html        do we convert via HTML or directly to Alvis? 
    --date                         the date of the Wikipedia dump
    --[no-]dump-category-graph     do we dump the category graph?
    --category-graph-dump-file     the file to dump the category graph to
    --category-word                category namespace identifier
    --root-category                root category identifier
    --template-word                template namespace identifier
    --language                     the language of the Wikipedia dump
    --help                         brief help message
    --man                          full documentation
    --[no]warnings                 warnings output flag

OPTIONS ^

--out-dir
    Sets the output directory. Default value: '.'.
--namespaces
    Sets the namespaces whose records to extract. Given as a ','-separated
    list. The namespace names have to be the exact identifiers. 
    Articles are always extracted. Default value: '''', i.e. articles.
--N-per-out-dir
    Sets the # of records per output directory. Default value: 1000.
--[no-]original
    Shall the original document be included in the output? Default
    value: no.
--[no-]expand-templates-fully
    Do we try to expand templates fully or do we simply insert a list of
    the template parameter values given in the call? Default value: no.
--[no-]dump-templates
    Do we dump the templates onto disk in a loadable format? 
    Default value: no.
--template-dump-file
    The name of the (possible) template dump file. Default value: 
   'Templates.storable'.
--[no-]convert-via-html
    Do we sacrifice speed for quality (possibly) by converting from 
    Wikitext to Alvis XML via an intermediate HTML version. 
    Default value: yes.
--language
    The language of the Wikipedia dump. Affects category and template
    extraction. Possible values: 'en' (English), 'fr' (French), 'sl'
    (Slovenian). Default value: 'en'.
--category-word
    The identifier for the category namespace. Overruled by '--language'.
    Default value: 'Category'.
--root-category
    The identifier for the root category of the category graph. 
    Overruled by '--language'. Default value: 'fundamental'.
--template-word
    The identifier for the template namespace. Overruled by '--language'.
    Default value: 'Template'.
--date
    The date of the Wikipedia dump as YYYYMMDD. Default value: undefined 
    (means: use current date).
--[no-]dump-category-graph
    Do we dump the category graph onto disk in a loadable format?. 
    Default value: yes.
--category-graph-dump-file
    The name of the (possible) category graph dump file. Default value: 
    'CategoryGraph.storable'.
--help
    Prints a brief help message and exits.
--man
    Prints the manual page and exits.
--[no]warnings
    Output (or suppress) warnings. Default value: yes.

DESCRIPTION ^

    Converts the articles in the Wikipedia XML dump to Alvis records.
syntax highlighting: