Lingua::EN::StopWordList - A sorted list of English stop words
use Lingua::EN::StopWordList; my($ara_ref) = Lingua::EN::StopWordList -> new -> words;
Here's a complete program:
use strict; use warnings; use Lingua::EN::StopWordList; my($count) = 0; print map{"@{[++$count]}: $_\n"} @{Lingua::EN::StopWordList -> new -> words};
Lingua::EN::StopWordList is a pure Perl module.
Lingua::EN::StopWordList
It returns a sorted arrayref of 659 English stop words.
new(...) returns an object of type Lingua::EN::StopWordList.
This is the class's contructor.
Usage: Lingua::EN::StopWordList -> new.
Lingua::EN::StopWordList -> new
This module is available as a Unix-style distro (*.tgz).
Install Lingua::EN::StopWordList as you would for any Perl module:
Perl
Run:
cpanm Lingua::EN::StopWordList
or run:
sudo cpan Lingua::EN::StopWordList
or unpack the distro, and then run one of:
perl Build.PL ./Build ./Build test ./Build install
or
perl Makefile.PL make (or dmake) make test make install
See http://savage.net.au/Perl-modules.html for details.
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.
See "Constructor and initialization".
Returns the sorted arrayref of English stop words.
No, there is no such thing as a definitive list. For an important discussion, e.g. including 'phrase search', see the Wikipedia discussion of word lists.
I downloaded it from the bottom of this page: http://www.translatum.gr/forum/index.php?topic=2476.0. It contains 659 words.
Sure. Try http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop. This list contains 570 words.
Another good place to look is http://www.ranks.nl/resources/stopwords.html, but its English list only contains 174 words. Since Lingua::StopWords (below) also has 174 words in its Englist list, perhaps this is where that module got its words from. Lastly, it has stop word lists for a whole range of languages.
Alternately, just Google for references to various lists. Note however these lists are normally very short.
Lingua::StopWords only has a short list of words (174). And its bug list goes back 3 years.
Lingua::EN::StopWords only has a short list of words (227). Also, this module is part of Lingua::EN::Segmenter, whose documentation is poor. Even the exact basis of how it splits text is not documented. Lastly, its bug list goes back 6 years.
I could have offered to take over maintentance of either or both those modules, but there are problems:
It ships with a set of sub-modules, with names like Lingua::StopWords::EN, but I'm not in a position to support its other languages if I put my module's English list into it.
Nevertheless, the fact that it supports 13 languages is definitely something in favour of this module.
This is part of text processing stuff which I don't want to get involved with. Also, it has a long list of pre-reqs (not listed on MetaCPAN until you view the makefile), which may well suit the purposes of Lingua::EN::Segmenter, but is overkill for just a stop word list.
Several other Perl modules, written for various purposes, either use one of the above, or have their own very short (as always) lists.
If you translate the list of stop words in this module into your favourite language and email it to me, I will include your words in the next release.
It all depends on whether you think this new list is somehow 'better' than the lists in pre-existing modules. I cannot make that decision on your behalf.
Benchmark::Featureset::StopwordLists.
This module includes a comparison of various stopword list modules.
See http://savage.net.au/Perl-modules/html/stopwordlists.report.html.
Lingua::EN::StopWords.
Lingua::StopWords.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Lingua::EN::StopWordList.
https://github.com/ronsavage/Lingua-EN-StopWordList.git.
Lingua::EN::StopWordList was written by Ron Savage <ron@savage.net.au> in 2012.
Homepage: http://savage.net.au/index.html.
Australian copyright (c) 2012 Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
To install Lingua::EN::StopWordList, copy and paste the appropriate command in to your terminal.
cpanm
CPAN shell
perl -MCPAN -e shell install Lingua::EN::StopWordList
For more information on module installation, please visit the detailed CPAN module installation guide.