The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

In order to become familiar with the Algorithm::DecisionTree module:

  (1)    Run the 

               generate_training_data.pl

         script to create your training data. First run the
         script as it is, and then make a copy of the
         param.txt file, modify this parameter file as you
         wish, and run the above script with your version of
         param.txt.


  (2)    Next run the 

                construct_dt_and_classify_one_sample.pl

         script as it is.  

         Now modify the test sample in this script and see
         what classification results you get for the new
         test sample.  Next run this script on the new
         training datafile that you yourself created.  You
         would obviously need to use the test samples that
         mention the feature and value names in your own
         parameter file.


  (3)    If you are using a large number of features or if the
         number of possible values for the features is very
         large, unless you take care, the tree you construct
         could end up being much too large and much too slow
         to construct.  To limit the size of the tree, you
         may need to change the values of the following
         constructor parameters in the previous step:

                    max_depth_desired

                    entropy_threshold

         The first parameter, max_depth_desired, controls
         the depth of the tree from the root node, and the
         second parameter, entropy_threshold, controls the
         resolution in the entropy space.  The smaller the
         value for the first parameter and the larger the
         value for the second parameter, the smaller the
         decision tree.  The largest possible value for
         max_depth_desired is the number of features.  Take
         it down from there to make the tree smaller.  The
         smallest possible value for entropy_threshold is 0.
         Take it up from there to make the tree smaller.


  (4)    Now run the test data generator script by invoking 

                generate_test_data.pl

         As it is, it will put out 20 samples for testing. But you
         can set that number to anything you wish.

         The test data is dumped into a file without the class labels
         for obvious reasons.  The class labels are dumped into a
         separate file whose name you can specify in the above 
         script.  As currently programmed, the name of this file is

                test_data_class_labels.dat

         By comparing the class labels returned by the classifier 
         with the class labels in this file, you can assess the 
         accuracy of the classifier.


  (5)    Finally, run the classifier on the test datafile by

         classify_test_data_in_a_file.pl  training.dat  testdata2.dat  out.txt

         Note carefully the three arguments you must supply the script.
         The first is for where the training data is, the second for 
         where the test data is, and the last where the classification 
         results will be deposited.


=======================================================================

FOR USING A DECISION TREE CLASSIFIER INTERACTIVELY:

Starting with Version 1.6 of the module, you can use the
DecisionTree classifier in an interactive mode.  In this
mode, after you have constructed the decision tree, the user
is prompted for answers to the questions regarding the
feature tests at the nodes of the tree.  Depending on the
answer supplied by the user at a node, the classifier takes
a path corresponding to the answer to descend down the tree
to the next node, and so on.

To get a feel for using a decision tree in this mode,
examine the script

    classify_by_asking_questions.pl

Execute the script as it is and see what happens.



=======================================================================


FOR THE CASE OF VERY LARGE DECISION TREES:


  Large decision trees can take a very long time to create.
If that is the case with your application, having to create
afresh a decision tree every time you want to classify
something can quickly become tiresome.  If such is the case
with your application, consider storing your decision tree
in a diskfile.  Subsequently, you can use the disk-stored
decision tree for your classification work.  The following
scripts in this directory:

      store_dt_on_disk.pl

      classify_from_disk_stored_dt.pl

show you how you can do that.