Search for "dist:Text-NSP TPEDERSE"

TODO

++

The following list describes some of the features that we'd like to include in NSP in future. No particular priority is assigned to these items - they are all things we've discussed amongst ourselves or with users and agree would be good to add. If y...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

USAGE

++

These are some sample usages of NSP. While this is not intended to be an exhaustive treatment of the various features, it should give you some idea of some of the ways you can use NSP. GETTING HELP count.pl -help rank.pl -help statistic.pl -help COUN...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

INSTALL - Installation instructions for Text-NSP

++

Dependencies Getopt::Long : to support command line options, very likely already installed on your system with Perl Detailed Installation Instructions If you have superuser (root) access, you should be able to install Text::NSP by following these sim...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

README

++

1. Introduction The Ngram Statistics Package (NSP) is a suite of programs that aids in analyzing Ngrams in text files. We define an Ngram as a sequence of 'n' tokens that occur within a window of at least 'n' tokens in the text; what constitutes a "t...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

CHANGES

++

1.31 Released October 4, 2015 all changes by BTM * generalized the rank.pl program format to accept any input in the w1<>w2<>rank format -- it does not look at any additional information if exists after the rank * modified the Testing programs for ra...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

Text::NSP - Extract collocations and Ngrams from text

++

The Ngram Statistics Package (NSP) is a collection of perl modules that aid in analyzing Ngrams in text files. We define an Ngram as a sequence of 'n' tokens that occur within a window of at least 'n' tokens in the text; what constitutes a "token" ca...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

rank.pl - Calculate Spearman's Correlation on two ranked lists output by count.pl or statistic.pl

++

1. Introduction This is a program that is meant to be used to compare two different statistical measures of association. Given the same set of n-grams ranked in two different ways by two different statistical measures, this program computes Spearman'...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

count.pl - Count the frequency of Ngrams in text

++

See perldoc README.pod...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

kocos.pl - Find the Kth order co-occurrences of a word

++

1. What are Kth order co-occurrences? Co-occurrences are the words which occur together in the same context. All words which co-occur with a given target word are called its co-occurrences. The concept of 2nd order co-occurrences is explained in the ...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

combig.pl - Combine frequency counts to determine co-occurrence

++

USAGE combig.pl [OPTIONS] BIGRAM INPUT PARAMETERS * BIGRAM Specify a file of bigram counts created by NSP programs count.pl. The entries in BIGRAM will be formatted as follows: word1<>word2<>n11 n1p np1 Here, word1 is followed by word2 n11 times. wor...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

Text::NSP::Measures - Perl modules for computing association scores of Ngrams. This module provides the basic framework for these measures.

++

Introduction These modules provide perl implementations of mathematical functions (association measures) that can be used to interpret the co-occurrence frequency data for Ngrams. We define an Ngram as a sequence of 'n' tokens that occur within a win...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

statistic.pl - Measure the association of Ngrams in text

++

See perldoc README.pod...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-sort.pl - Sort a --tokenlist of bigrams from huge-count.pl in alphabetical order.

++

huge-sort.pl takes as input a duplicate bigram file generate by count.pl with --tokenlist option, counts the frequency of each bigram and sorts them in alphabetical order. The output file will be found in input-file.sorted. This program is used inter...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-split.pl - Split bigram files from huge-count.pl into pieces.

++

See perldoc huge-split.pl...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

count2huge.pl - Convert the output of count.pl to huge-count.pl.

++

count2huge.pl convert the output of the count.pl to huge-count.pl for the same input text and options. The reason we do this is because for the vector relatedness measure of UMLS-Similarity, it requires the bigrams which starts with the same term are...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-merge.pl - Merge the results of multiple huge-sort generated files into a single sorted file.

++

Combine the sorted bigram files generated by huge-sort.pl efficiently. This program is used internally by huge-count.pl....

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

split-data.pl - Divide a text file in N approximately equal parts

++

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-count.pl - Count all the bigrams in a huge text without using huge amounts of memory.

++

Runs count.pl efficiently on large amounts of data by splitting the data into separate files, and counting up each file separately, and then merging them to get overall results. Two output files are created. destination-dir/huge-count.output contains...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-count.pl - Divide huge text into pieces and run count.pl separately on each (and then combine)

++

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-delete.pl - Delete bigrams found by huge-count.pl based on low/high frequency.

++

See perldoc huge-delete.pl...

TPEDERSE /Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

Search results for "dist:Text-NSP TPEDERSE"

TODO River stage one • 2 direct dependents • 3 total dependents ++ ++

USAGE River stage one • 2 direct dependents • 3 total dependents ++ ++

INSTALL - Installation instructions for Text-NSP River stage one • 2 direct dependents • 3 total dependents ++ ++

README River stage one • 2 direct dependents • 3 total dependents ++ ++

CHANGES River stage one • 2 direct dependents • 3 total dependents ++ ++

Text::NSP - Extract collocations and Ngrams from text River stage one • 2 direct dependents • 3 total dependents ++ ++

rank.pl - Calculate Spearman's Correlation on two ranked lists output by count.pl or statistic.pl River stage one • 2 direct dependents • 3 total dependents ++ ++

count.pl - Count the frequency of Ngrams in text River stage one • 2 direct dependents • 3 total dependents ++ ++

kocos.pl - Find the Kth order co-occurrences of a word River stage one • 2 direct dependents • 3 total dependents ++ ++

combig.pl - Combine frequency counts to determine co-occurrence River stage one • 2 direct dependents • 3 total dependents ++ ++

Text::NSP::Measures - Perl modules for computing association scores of Ngrams. This module provides the basic framework for these measures. River stage one • 2 direct dependents • 3 total dependents ++ ++

statistic.pl - Measure the association of Ngrams in text River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-sort.pl - Sort a --tokenlist of bigrams from huge-count.pl in alphabetical order. River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-split.pl - Split bigram files from huge-count.pl into pieces. River stage one • 2 direct dependents • 3 total dependents ++ ++

count2huge.pl - Convert the output of count.pl to huge-count.pl. River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-merge.pl - Merge the results of multiple huge-sort generated files into a single sorted file. River stage one • 2 direct dependents • 3 total dependents ++ ++

split-data.pl - Divide a text file in N approximately equal parts River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-count.pl - Count all the bigrams in a huge text without using huge amounts of memory. River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-count.pl - Divide huge text into pieces and run count.pl separately on each (and then combine) River stage one • 2 direct dependents • 3 total dependents ++ ++

huge-delete.pl - Delete bigrams found by huge-count.pl based on low/high frequency. River stage one • 2 direct dependents • 3 total dependents ++ ++

TODO

++

USAGE

++

INSTALL - Installation instructions for Text-NSP

++

README

++

CHANGES

++

Text::NSP - Extract collocations and Ngrams from text

++

rank.pl - Calculate Spearman's Correlation on two ranked lists output by count.pl or statistic.pl

++

count.pl - Count the frequency of Ngrams in text

++

kocos.pl - Find the Kth order co-occurrences of a word

++

combig.pl - Combine frequency counts to determine co-occurrence

++

Text::NSP::Measures - Perl modules for computing association scores of Ngrams. This module provides the basic framework for these measures.

++

statistic.pl - Measure the association of Ngrams in text

++

huge-sort.pl - Sort a --tokenlist of bigrams from huge-count.pl in alphabetical order.

++

huge-split.pl - Split bigram files from huge-count.pl into pieces.

++

count2huge.pl - Convert the output of count.pl to huge-count.pl.

++

huge-merge.pl - Merge the results of multiple huge-sort generated files into a single sorted file.

++

split-data.pl - Divide a text file in N approximately equal parts

++

huge-count.pl - Count all the bigrams in a huge text without using huge amounts of memory.

++

huge-count.pl - Divide huge text into pieces and run count.pl separately on each (and then combine)

++

huge-delete.pl - Delete bigrams found by huge-count.pl based on low/high frequency.

++