blacklist_classifier [OPTIONS] lang1 lang2 ... < file
blacklist_classifier -n [OPTIONS] text1 text2 > blacklist.txt blacklist_classifier [OPTIONS] -t "t1.txt t2.txt ..." lang1 lang2 ...
blacklist_classifier -t "t1.txt t2.txt ..." \ -e "e1.txt e2.txt ..." \ lang1 lang2 ...
lang1 lang2 ... are language ID's blacklists are expected in <BlackListDir>/<lang1-lang2.txt t1.txt t2.txt ... are training data files (in UTF-8) e1.txt e2.txt ... are training data files (in UTF-8) the order of languages needs to be the same for training data, eval data as given by the command line arguments (lang1 lang2 ..) -a <freq> ...... min freq for common words -b <freq> ...... max freq for uncommon words -c <score> ..... min difference score to be relevant -d <dir> ....... directory of black lists -i ............. classify each line separately -m <number> .... use approximately <number> tokens to train/classify -n ............. train a new black list -v ............. verbose mode -U ............. don't lowercase -S ............. don't tokenize (use the string as it is) -A ............. don't discard tokens with non-alphabetic characters
Jörg Tiedemann, https://bitbucket.org/tiedemann
Please report any bugs or feature requests to https://bitbucket.org/tiedemann/blacklist-classifier. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
Copyright 2012 Jörg Tiedemann.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.