Ted Pedersen > Text-SenseClusters > mat2harbo.pl

Download:
Text-SenseClusters-1.03.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

NAME ^

mat2harbo.pl - Convert matrix in Senseclusters sparse format to Harwell-Boeing (HB) format and set input parameters (lap2) for input to SVDPACKC.

SYNOPSIS ^

 mat2harbo.pl [OPTIONS] MATRIX

The file input is a SenseClusters sparse matrix

 cat input

Output =>

 5 4 12
 1 1.5 3 2.5 4 1.0
 2 2.5 3 2.5
 1 1.5 3 2.5 4 1.0
 2 2.5 3 2.5
 2 2.5 3 2.5

Convert that to Harwell-Boeing form.

 mat2harbo.pl input --title "matrix format convestion" --id "sample" --numform 10f8.4

Output =>

 matrix format convestion                                                sample
 #
 rra                        5             4            12             0
           (10i8)          (10i8)            (10f8.4)            (10f8.4)
        1       3       6      11      13
        1       3       2       4       5       1       2       3       4       5
        1       3
   1.5000  1.5000  2.5000  2.5000  2.5000  2.5000  2.5000  2.5000  2.5000  2.5000
   1.0000  1.0000

The Harwell Boeing format stores data in 80 columns. The numform 10f8.4 says that there should be 10 numbers per line, each with 8 numeric values, where 4 digits are to the right of the decimal point.

See http://math.nist.gov/MatrixMarket/formats.html#hb for a detailed explanation of Harwell Boeing format.

Type mat2harbo.pl --help for a quick summary of options

DESCRIPTION ^

Converts a sparse matrix in SenseClusters format to Harwell-Boeing (HB) sparse format, which is the format required by SVDPACKC. This program also creates (optionally) the lap2 file which provides parameter settings for SVDPACKC.

INPUT ^

Required Arguments:

MATRIX

A sparse MATRIX in SenseClusters' format that is to be converted into Harwell Boeing format.

First line should show exactly 3 numbers separated by blanks as :

 #nrows #ncols #nnz

where

 #nrows = Number of rows 
 #ncols = Number of columns 
 #nnz = Total number of non-zero values

in the MATRIX.

Each line thereafter should show a row of the MATRIX in sparse format. A sparse row should be a space separated list of pairs of numbers where the first number shows the column index of a non-zero value and second number is the non-zero value itself that appears at that column index.

Column index counting starts from 1.

Sample MATRIX examples =>

  1.  5 5 15
     2 9 4 9
     1 6 2 5 3 7 4 8 5 6
     1 4 2 5
     1 7 2 6 3 7
     1 9 2 8 3 9

    Shows a 5 x 5 integer matrix containing total 15 non-zero elements. Each ith line after the first line shows the non-zero elements in the ith row. e.g. 2nd line (1st row) has 2 non-zero values (both 9) at column indices 2 and 4. 6th line (5th row) has 3 non-zero values; 9 at index 1, 8 at index 2 and 9 at index 3.

  2.  7 8 34
     1 0.160 2 -0.059 3 1.864 5 0.724 6 -0.472 7 -0.467
     2 -0.209 4 1.487 5 6.728 7 -3.085 8 1.396
     1 14.594 3 -2.858 4 -0.618 6 16.510 8 -2.314
     3 -0.384 5 -1.189 7 -0.155 8 0.006
     1 -0.128 3 0.020 4 -0.125 8 0.039
     2 0.062 3 0.058 4 0.016 5 0.057 7 0.407 8 0.015
     4 0.033 6 1.377 7 0.074 8 0.994

    Shows a 7 x 8 real matrix =>

       7 8
       0.160 -0.059  1.864  0.000  0.724 -0.472 -0.467  0.000
       0.000 -0.209  0.000  1.487  6.728  0.000 -3.085  1.396
      14.594  0.000 -2.858 -0.618  0.000 16.510  0.000 -2.314
       0.000  0.000 -0.384  0.000 -1.189  0.000 -0.155  0.006
      -0.128  0.000  0.020 -0.125  0.000  0.000  0.000  0.039
       0.000  0.062  0.058  0.016  0.057  0.000  0.407  0.015
       0.000  0.000  0.000  0.033  0.000  1.377  0.074  0.994

Optional Arguments:

--title TITLE

Allows user to specify the Title of the MATRIX which is displayed at Line1 (1-72) of the output HB matrix. If --title is not specified, mat2harbo uses the MATRIX file name as the default title.

--id ID

Programs processing the HB formatted matrix can identify the matrix by the ID specified at Line1 (73-80). Default ID is "harbomat". This identifier is limited to 8 characters.

--cpform CPFORM

Specifies the Column Pointer Format. The column pointer should have the format of type MiN which indicates that each line in Block1 contains M integer pointers each occupying N character spaces. Default format is 10i8.

Note: M x N must be 80.

--rpform RPFORM

Specifies the Row Pointer Format for row pointers in Block2. This has same MiN type of format as --cpform.

--numform NUMFORM

Specifies the Numeric Format to represent the matrix values in Block3.

mat2harbo allows 2 numeric formats :

2. MfD.F - which means that there are total M real numbers on each line of block3, each occupying total D digit/character space, of which last F digits show fractional portion. =back

Thus, Matrix values could be Integer or Real, selected by specifying a particular format.

Default NUMFORM is (5f16.10) which uses 16 digits for each MATRIX value of which last 10 digits stand for the fractional part and each line contains 5 such real numbers.

Parameter Setting Options :

The options listed in this section create the parameter file (lap2) for las2.c automatically.

--param

Creates the parameter file lap2 that can be directly used while running las2.

--k K

Sets the value of maxprs option in LAP2 file to K i.e. requests K singular triplets from las2. Value of K should not exceed the number of columns of MATRIX. Default K = 300

--rf RF

Reduces the dimensions of the column space of the MATRIX by scaling factor RF i.e. if the MATRIX has N columns, maxprs is set to N/RF where RF > = 1

In other words, N/RF singular triplets are requested from las2. Default RF = 10 that reduces the column space to 10% or preserves 10% of the original dimensions.

If both --k and --rf are specified, maxprs = min(K,N/RF) Thus, default maxprs = min(300,N/10)

--iter I

Specifies the number of iterations for las2. I, if specified, should not exceed the number of columns in the MATRIX and I should be at least as high as maxprs. Default I = min((3 * maxprs),#cols) where maxprs = min(K,N/RF).

Help on setting parameters in file las2.h

The header file las2.h in SVDPACKC specifies values of various constants for las2. This section provides some guidelines on setting these constants for using SenseClusters. Please note that the version of SVDPACKC found in /External has been modified with the settings as described below.

In case if las2 fails due to insufficient values of these parameters as indicated by the las2.h file, an error message will be shown in output file lao2 suggesting that the matrix is too large or something ... User is adviced to check 3rd line of the matrix in Harwell-Boeing format (as produced by this program) that is given to las2. Check if NCOLS shown at column 3 of line 3 in the HB matrix exceeds NMAX. If so, increase NMAX to something higher than NCOLS. If not, check if NNZ shown by column 4 on line 3 of the HB matrix exceeds NZMAX in las2.h, if so, increase NZMAX. If not, increase the LMTNW to something higher than (6*NMAX + 4*NMAX + 1 + NMAX*NMAX), or simply increase it without too much computations until las2 succeeds :-)

The other problem that a user might notice is that sometimes las2 runs for a very long time like more than few days. In such case, user is advised to restart las2 by reducing the values of parameters 'maxprs' and 'iter' in parameter file lap2. Specifically, the 2nd parameter in lap2 is iter and the 3rd one is maxprs. Remember that, iter has to be >= maxprs.

Other Options :

--help

Displays this message.

--version

Displays the version information.

Harwell Boeing Format ^

Header Section

The above 4 Lines form the Header of the HB sparse matrix.

Data Section

This section contains 3 blocks which contain the non-zero values in the matrix along with their row and column index information.

 *************************************************************************
             NON-ZERO ENTRIES ARE STORED IN COLUMN ORDER !!!
 *************************************************************************

We consider data section of 3 blocks:

BLOCK1 POINTERS

The first block is an array whose entries show the indices (in block3) of the leading non-zero value of every column.

e.g. If a given matrix is

 4 6
 2 3 0 0 0 1
 0 2 0 1 2 0
 0 0 2 4 1 0 
 1 1 0 0 5 0

Then the first block will contain the pointers

[1 3 6 7 9 12 13]

This shows that

The first column begins at the 1st non-zero entry (2) The second column begins at the 3rd non-zero entry (3) [in COLUMN ORDER] The third column begins at the 6th non-zero entry (2) The forth column begins at the 7th non-zero entry (1) and so on ...

 *************************************************************************
        NULL columns (having no non-zero elements) are not allowed. 
 *************************************************************************

Note: The column pointers start at 1.

The last entry in @pointers contains an extra pointer pointing to one location after the last entry. So the last index in @pointers will always be #nnz + 1 (where #nnz = total no. of non-zero entries)

BLOCK2 ROW_INDICES

This block stores the row indices of the non-zero matrix entries in column order.

For above matrix, this block will look like

[1 4 1 2 4 3 2 3 2 3 4 1]

which shows that

The 1st non-zero entry is at 1st row. The 2nd non-zero entry is at 4th row. The 3rd non-zero entry is at 1st row and so on ....

Note: Row indices start from 1.

BLOCK3 VALUES

This block contains the actual non-zero values from the matrix in column order.

Thus, the block3 for the above shown matrix will look like

[2 1 3 2 1 2 1 4 2 1 5 1]

General Observations:

1. The length(block2)=length(block3) and each is equal to the number of non-zero entries in the matrix.
2. The length(block1) = #columns of matrix + 1 as each column will have an entry in block1 that shows the position of the leading non-zero element in it and there are no NULL columns allowed.

+1 because there is an extra pointer pointing to the location after the last non-zero entry.

3. The column pointers in block1 are also the pointers to block3 entries where the leading(first) non-zero entry of each column is located.

Sample Output

matrix.dat harbomat

#

 rra                  4             6            11             0

    (10i8)          (10i8)            (8f10.3)            (8f10.3)

       1       3       5       6       7      10      12
       2       3       2       3       4       1       1       2       
       3       3       4
     1.000     1.000     2.000     4.000     1.000     2.000     1.000     
     2.000     2.000     3.000     1.000

Shows the HB format for a 4 x 6 matrix :

 4 6
 0 0 0 2 1 0
 1 2 0 0 2 0
 1 4 0 0 2 3
 0 0 1 0 0 1

AUTHORS ^

Amruta Purandare, University of Pittsburgh

Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu

COPYRIGHT ^

Copyright (c) 2003-2008, Amruta Purandare and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.
syntax highlighting: