Text::SenseClusters::LabelEvaluation::ConfusionMatrixTotalCalc - Module responsible for processing of decision matrix.
This module provide two functions. First function will calculate the probability decision matrix from the scores of the original decision matrix. The second function will then use the new decision matrix to decide whether labels are appropriately assigned or not.
The following function is responsible for printing the calculated score matrix from the decision matrix. @argument1 : outputFileHandle: DataType(File Handler) This the file handler used for defining where to print the output message/statements of this module. Its default value is: STDERR. @argument2 : clusterNameArrayRef: DataType(Reference_Of_Array) Reference to Array containing Cluster Name. @argument3 : standardTermsArrayRef: DataType(Reference_Of_Array) Reference to Array containing Standard terms. @argument4 : hashForClusterTopicScoreRef: DataType(Reference_Of_Hash) Reference to hash containing Cluster Name, corresponding StandardTopic and its score. @argument5 : topicTotalSumHashRef: DataType(Reference_Of_Hash) Hash which will contains the total score for a topic against each clusters. @argument6 : clusterTotalSumHashRef: DataType(Reference_Of_Hash) Hash which will contains the total score for a cluster against each topics. @argument7 : $isDecisionMatrixDebugOn: DataType(number 0 or 1) Verbose:: This decide whether to detail output or not. @return : SimilarityScore This indicate the similarity score of labels and actual topics which are correctly identified by SenseClusters or similar application. @description : This module is responsible of decision matrix which is identified as: Calculated Decision MATRIX: ========================================================= | Cluster0 | Cluster1 | --------------------------------------------------------- Bill Clinton: | 0.478 | 0.522 | --------------------------------------------------------- --------------------------------------------------------- Tony Blair: | 0.625 | 0.375 | --------------------------------------------------------- ========================================================= Where, 1) Cluster0, Cluster1 are Cluster Names, (Column Header). 2) Bill Clinton, Tony Blair are Standard Topics, (Row Header). 3) Cell content is the probability measure which indicates likelihood of a cluster's label against a Topic. Steps: 1. First, it will iterate through hash, '%hashForClusterTopicScore'. 2. It will divide the cluster-topic overlapping score with the total count value of the decision matrix. 3. This will give the normalized score. 4. Based on user input on Verbose, it will display the normalized decision matrix. 5. It will then call the function 'concludingFromDecisionMatrix' which will used the normalized decision matrix to conclude a) which cluster's labels is matching with which Gold-Standard -topic's data. a) which Gold-Standard-topic's data label is matching with which cluster's labels. 6. Finally, it will compare the Clusterwise results with Topicwise results to conclude final cluster-topic match results along with their matching score.
The following matrix is responsible for printing the calculated score matrix from the decision matrix. @argument1 : hashForClusterTopicScoreRef: DataType(Reference_Of_Hash) Reference to hash containing Cluster Name, corresponding StandardTopic and its score. @argument2 : topicTotalSumHashRef: DataType(Reference_Of_Hash) Hash which will contains the total score for a topic against each clusters. @argument3 : clusterTotalSumHashRef: DataType(Reference_Of_Hash) Hash which will contains the total score for a cluster against each topics. @argument4 : directClusterTopicHashRef: DataType(Reference_Of_Hash) HashOfHash to store conclusion of Direct calculation, row-wise i.e a topic (OuterKey) score against each cluster(InnerKey). @argument5 : directTopicClusterHashRef: DataType(Reference_Of_Hash) HashOfHash to store conclusion of Direct calculation, columnwise i.e a Cluster (OuterKey) scores against each topics(InnerKey). @return1 : directClusterTopicHashRef: DataType(Reference_Of_Hash) HashOfHash which store conclusion of calculation, row-wise i.e a topic (OuterKey) score against each cluster(InnerKey). @return2 : directTopicClusterHashRef: DataType(Reference_Of_Hash) HashOfHash to store conclusion of calculation, columnwise i.e a Cluster (OuterKey) scores against each topics(InnerKey). @description : The following block of code is responsible for 1. Calculating the probabilities (normalized value) of all the topic against a cluster. 2. Chosing a topic which has the maximum probability (normali -zed value) value for the given cluster. 3. In current approach, for calculating the probability (norm -alized value) we will divide the similarity score of a topic against a cluster with total similarity score of all the topics against all the cluster. Future enhancement:: 4. The above approach can be done in two way i.e. using the direct way as well as inverse way. 5. In direct approach, for calculating the probability we will divide the similarity score of a topic against a cluster with total similarity score of all the topics against that cluster. 6. In inverse approach, for calculating the probability we will divide the similarity score of a topic against a cluster with total similarity score of all the clusters against that topic.
http://senseclusters.cvs.sourceforge.net/viewvc/senseclusters/LabelEvaluation/
@Last modified by : Anand Jha @Last_Modified_Date : 24th Dec. 2012 @Modified Version : 1.6
Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu Anand Jha, University of Minnesota, Duluth jhaxx030 at d.umn.edu
Copyright (C) 2012 Ted Pedersen, Anand Jha
See http://dev.perl.org/licenses/ for more information.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
To install Text::SenseClusters::LabelEvaluation::LabelEvaluation, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::SenseClusters::LabelEvaluation::LabelEvaluation
CPAN shell
perl -MCPAN -e shell install Text::SenseClusters::LabelEvaluation::LabelEvaluation
For more information on module installation, please visit the detailed CPAN module installation guide.