The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Search results for "dist:Text-NSP TPEDERSE"

TODO River stage one • 2 direct dependents • 3 total dependents

The following list describes some of the features that we'd like to include in NSP in future. No particular priority is assigned to these items - they are all things we've discussed amongst ourselves or with users and agree would be good to add. If y...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

USAGE River stage one • 2 direct dependents • 3 total dependents

These are some sample usages of NSP. While this is not intended to be an exhaustive treatment of the various features, it should give you some idea of some of the ways you can use NSP. GETTING HELP count.pl -help rank.pl -help statistic.pl -help COUN...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

INSTALL - Installation instructions for Text-NSP River stage one • 2 direct dependents • 3 total dependents

Dependencies Getopt::Long : to support command line options, very likely already installed on your system with Perl Detailed Installation Instructions If you have superuser (root) access, you should be able to install Text::NSP by following these sim...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

README River stage one • 2 direct dependents • 3 total dependents

1. Introduction The Ngram Statistics Package (NSP) is a suite of programs that aids in analyzing Ngrams in text files. We define an Ngram as a sequence of 'n' tokens that occur within a window of at least 'n' tokens in the text; what constitutes a "t...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

CHANGES River stage one • 2 direct dependents • 3 total dependents

1.31 Released October 4, 2015 all changes by BTM * generalized the rank.pl program format to accept any input in the w1<>w2<>rank format -- it does not look at any additional information if exists after the rank * modified the Testing programs for ra...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

Text::NSP - Extract collocations and Ngrams from text River stage one • 2 direct dependents • 3 total dependents

The Ngram Statistics Package (NSP) is a collection of perl modules that aid in analyzing Ngrams in text files. We define an Ngram as a sequence of 'n' tokens that occur within a window of at least 'n' tokens in the text; what constitutes a "token" ca...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

rank.pl - Calculate Spearman's Correlation on two ranked lists output by count.pl or statistic.pl River stage one • 2 direct dependents • 3 total dependents

1. Introduction This is a program that is meant to be used to compare two different statistical measures of association. Given the same set of n-grams ranked in two different ways by two different statistical measures, this program computes Spearman'...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

count.pl - Count the frequency of Ngrams in text River stage one • 2 direct dependents • 3 total dependents

See perldoc README.pod...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

kocos.pl - Find the Kth order co-occurrences of a word River stage one • 2 direct dependents • 3 total dependents

1. What are Kth order co-occurrences? Co-occurrences are the words which occur together in the same context. All words which co-occur with a given target word are called its co-occurrences. The concept of 2nd order co-occurrences is explained in the ...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

combig.pl - Combine frequency counts to determine co-occurrence River stage one • 2 direct dependents • 3 total dependents

USAGE combig.pl [OPTIONS] BIGRAM INPUT PARAMETERS * BIGRAM Specify a file of bigram counts created by NSP programs count.pl. The entries in BIGRAM will be formatted as follows: word1<>word2<>n11 n1p np1 Here, word1 is followed by word2 n11 times. wor...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

Text::NSP::Measures - Perl modules for computing association scores of Ngrams. This module provides the basic framework for these measures. River stage one • 2 direct dependents • 3 total dependents

Introduction These modules provide perl implementations of mathematical functions (association measures) that can be used to interpret the co-occurrence frequency data for Ngrams. We define an Ngram as a sequence of 'n' tokens that occur within a win...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

statistic.pl - Measure the association of Ngrams in text River stage one • 2 direct dependents • 3 total dependents

See perldoc README.pod...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-sort.pl - Sort a --tokenlist of bigrams from huge-count.pl in alphabetical order. River stage one • 2 direct dependents • 3 total dependents

huge-sort.pl takes as input a duplicate bigram file generate by count.pl with --tokenlist option, counts the frequency of each bigram and sorts them in alphabetical order. The output file will be found in input-file.sorted. This program is used inter...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-split.pl - Split bigram files from huge-count.pl into pieces. River stage one • 2 direct dependents • 3 total dependents

See perldoc huge-split.pl...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

count2huge.pl - Convert the output of count.pl to huge-count.pl. River stage one • 2 direct dependents • 3 total dependents

count2huge.pl convert the output of the count.pl to huge-count.pl for the same input text and options. The reason we do this is because for the vector relatedness measure of UMLS-Similarity, it requires the bigrams which starts with the same term are...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-merge.pl - Merge the results of multiple huge-sort generated files into a single sorted file. River stage one • 2 direct dependents • 3 total dependents

Combine the sorted bigram files generated by huge-sort.pl efficiently. This program is used internally by huge-count.pl....

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

split-data.pl - Divide a text file in N approximately equal parts River stage one • 2 direct dependents • 3 total dependents

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-count.pl - Count all the bigrams in a huge text without using huge amounts of memory. River stage one • 2 direct dependents • 3 total dependents

Runs count.pl efficiently on large amounts of data by splitting the data into separate files, and counting up each file separately, and then merging them to get overall results. Two output files are created. destination-dir/huge-count.output contains...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-count.pl - Divide huge text into pieces and run count.pl separately on each (and then combine) River stage one • 2 direct dependents • 3 total dependents

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC

huge-delete.pl - Delete bigrams found by huge-count.pl based on low/high frequency. River stage one • 2 direct dependents • 3 total dependents

See perldoc huge-delete.pl...

TPEDERSE/Text-NSP-1.31 - 04 Oct 2015 16:42:20 UTC
56 results (0.031 seconds)