NAME

update_Text_Corpus_CNN.pl - Script to update CNN news article corpus.

SYNOPSIS

  update_Text_Corpus_CNN.pl [-d corpusDirectory -c -t -h]

DESCRIPTION

The script update_Text_Corpus_CNN.pl may be used to create or update a temporary corpus of CNN news articles for personal research and testing of information processing techniques. Read the CNN Interactive Service Agreement to ensure you abide by it when using this script.

All errors and warnings are logged using Log::Log4perl to the file corpusDirectory/log.txt.

OPTIONS

`-d corpusDirectory`

The option -d sets the cache directory for the corpus of documents. If the directory does not exist, it will be created. The default is a directory named 'corpus_cnn' in the current working directory.

`-t`

If the option -t is present, parsing tests will be performed on all the documents in the cache.

`-v`

If the option -v is present, then after each new document is fetched a message is logged stating the number of documents remaining to fetch and the approximate time to completion.

`-h`

Causes documentation to be printed.

AUTHOR

 Jeff Kubina<jeff.kubina@gmail.com>

COPYRIGHT

The full text of the license can be found in the LICENSE file included with this module.

KEYWORDS

cnn, cable news network, english corpus, information processing

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)