exclude all bigrams made up two words from stop.txt
count.pl -stop stop.txt h-stop.cnt h.txt
count all bigrams that occur within a 4 word window AND use a stop list (this is especially useful to prevent bigrams caused by multiple occurrences of frequent words within the given window size (like 'and and' 'of of' etc.)
create a list of bigrams ranked by log-likelihood ratios. only allow scores of 6.00 or better among bigrams that occur more 3 or more times. (if you had used count to exclude certain frequencies you could simply use that file as input) (the .pm after the test name is optional)
compare the ranked list of bigrams created by pointwise mutual information and the log-likelihood ratio. make comparisons based on 3 digits of precision.
rank.pl -precision 3 mi ll h.mi-ll-rank h.cnt
compare the ranked list of bigrams created by pointwise mutual information and the dice coefficient. make comparisons based on 5 digits of precision and compare only the top 10 bigrams selected by mutual information.
rank.pl -precision 5 -rank 10 mi dice h.mi-dice-rank h.cnt
compare the ranked list of bigrams found by fisher's exact test and the dice coefficient. make comparisons based on 2 digits of precision and compare only those bigrams that score greater than 0.90 on fisher's exact test.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.