NAME

Algorithm::AM::DataSet - Manage data used by Algorithm::AM

VERSION

version 3.12

SYNOPSIS

 use Algorithm::AM::DataSet 'dataset_from_file';
 use Algorithm::AM::DataSet::Item 'new_item';
 my $dataset = Algorithm::AM::DataSet->new(cardinality => 10);
 # or
 $dataset = dataset_from_file(path => 'finnverb', format => 'nocommas');
 $dataset->add_item(
   new_item(features => [qw(a b c d e f g h i)]));
 my $item = $dataset->get_item(2);

DESCRIPTION

This package contains a list of items that can be used by Algorithm::AM or Algorithm::AM::Batch for classification. DataSets can be made one item at a time via the "add_item" method, or they can be read from files via the "dataset_from_file" function.

`new`

Creates a new DataSet object. You must provide a cardinality argument indicating the number of features to be contained in each data vector. You can then add items via the add_item method. Each item will contain a feature vector, and also optionally a class label and a comment (also called a "spec").

`cardinality`

Returns the number of features contained in the feature vector of a single item.

`size`

Returns the number of items in the data set.

`classes`

Returns the list of all unique class labels in the data set.

`add_item`

Adds a new item to the data set. The input may be either an Algorithm::AM::DataSet::Item object, or the arguments to create one via its constructor (features, class, comment). This method will croak if the cardinality of the item does not match "cardinality".

`get_item`

Return the item at the given index. This will be a Algorithm::AM::DataSet::Item object.

`num_classes`

Returns the number of different classification labels contained in the data set.

`dataset_from_file`

This function may be exported. Given 'path' and 'format' arguments, it reads a file containing a dataset and returns a new DataSet object with the given data. The 'path' argument should be the path to the file. The 'format' argument should be 'commas' or 'nocommas', indicating one of the following formats. You may also specify 'unknown' and 'null' arguments to indicate the strings meant to represent an unknown class value and null feature values. By default these are 'UNK' and '='.

The 'commas' file format is shown below:

 class , f eat u re s , your comment here

The commas separate the class label, feature values, and comments, and the whitespace around the commas is optional. Each feature value is separated with whitespace.

The 'nocommas' file format is shown below:

 class   features  your comment here

Here the class, feature values, and comments are separated by whitespace. Each feature value must be a single character with no separating characters, so here the features are f, e, a, t, u, r, e, and s.

Lines beginning with a pound character (#) are ignored.

AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Algorithm::AM, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::AM

CPAN shell

perl -MCPAN -e shell
install Algorithm::AM

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

new

cardinality

size

classes

add_item

get_item

num_classes

dataset_from_file

SEE ALSO