<html>
<title>Word Clustering (native SenseClusters)</title>
<body>
<h1>Word Clustering (native SenseClusters) </h1>
</body>
</html>

Word clustering relies on the creation of a word by word matrix from 
bigram or co-occurrence features. The first word in these pairs serves as 
the row, the second word serves as the column, and the cell contains the 
association score, frequency count, or binary value indicating the 
relationship between the pair of words. 
<br><br>
This matrix can be reduced via SVD, and is reconstructed such that the 
original rows are preserved and the columns are reduced. Thus the rows are 
clustered such that word that occur with similar words in bigrams or 
co-occurrences are grouped together. 
<br><br>
Note that the word matrix used is identical to that which is used in 
creating the second order representation for context discrimination. 
<br><br>
The input must be a Senseval-2 formatted test file. It can be either
headed or headless. Even if the data has target words (marked with head 
tags) the test_scope option and target co-occurrence features are not  
available. Only bigram or co-occurrence features may be used, and it  
should be understand that the first word in the bigram or co-occurrence  
pairs is what will be clustered. A separate set of feature selection data  
(ie., training data) may not be used with word clustering.